[jira] [Commented] (IMPALA-5633) Bloom filters underestimate false positive probability

2020-08-25 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184916#comment-17184916
 ] 

Tim Armstrong commented on IMPALA-5633:
---

Ok, thanks for confirming. I thought about it a bit and it seems like there are 
probably a lot of reasons why the estimate might be biased high. I'll try to 
find some time to test it out on those TPC-DS queries.

> Bloom filters underestimate false positive probability
> --
>
> Key: IMPALA-5633
> URL: https://issues.apache.org/jira/browse/IMPALA-5633
> Project: IMPALA
>  Issue Type: Bug
>  Components: Perf Investigation
>Reporter: Jim Apple
>Assignee: Jim Apple
>Priority: Minor
>
> Block Bloom filters have a higher false positive rate than standard Bloom 
> filter, due to the uneven distribution of keys between buckets. We should 
> change the code to match the theory, using an approximation from the paper 
> that introduced block Bloom filters, "Cache-, Hash- and Space-Efficient Bloom 
> Filters" by Putze et al.
> For a false positive probability of 1%, this would increase filter size by 
> about 10% and a decrease filter false positive probability by 50%. However, 
> this is obscured by the coarseness of the fact that filters are constrained 
> to have a size in bytes that is a power of two. Loosening that restriction is 
> potential future work.
> See 
> https://github.com/apache/kudu/commit/d1190c2b06a6eef91b21fd4a0b5fb76534b4e9f9



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5633) Bloom filters underestimate false positive probability

2020-08-25 Thread Jim Apple (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184908#comment-17184908
 ] 

Jim Apple commented on IMPALA-5633:
---

I think it's just a mistake. IIRC, [~mmokhtar] and I had a brief discussion 
about it a few years ago and he thought it was an error as well.

At the time, I wrote a patch to change it to 1% and ran some perf tests, which 
were inconclusive, likely due to me not picking judiciously which tests would 
demonstrate a difference, so the patch went in my backlog and has yet to escape.

I'd be delighted to +2 a patch that changed it and demonstrated the perf 
impact. It might be good to combine it with a patch to allow non-power-of-two 
sizes, which can be done without a modulo via

{code:c}
uint64_t libfilter_block_index(uint64_t hash, uint64_t num_buckets) {
  return (((unsigned __int128)hash) * ((unsigned __int128)num_buckets)) >> 64;
}
{code}


> Bloom filters underestimate false positive probability
> --
>
> Key: IMPALA-5633
> URL: https://issues.apache.org/jira/browse/IMPALA-5633
> Project: IMPALA
>  Issue Type: Bug
>  Components: Perf Investigation
>Reporter: Jim Apple
>Assignee: Jim Apple
>Priority: Minor
>
> Block Bloom filters have a higher false positive rate than standard Bloom 
> filter, due to the uneven distribution of keys between buckets. We should 
> change the code to match the theory, using an approximation from the paper 
> that introduced block Bloom filters, "Cache-, Hash- and Space-Efficient Bloom 
> Filters" by Putze et al.
> For a false positive probability of 1%, this would increase filter size by 
> about 10% and a decrease filter false positive probability by 50%. However, 
> this is obscured by the coarseness of the fact that filters are constrained 
> to have a size in bytes that is a power of two. Loosening that restriction is 
> potential future work.
> See 
> https://github.com/apache/kudu/commit/d1190c2b06a6eef91b21fd4a0b5fb76534b4e9f9



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5633) Bloom filters underestimate false positive probability

2020-08-25 Thread Jim Apple (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple updated IMPALA-5633:
--
Description: 
Block Bloom filters have a higher false positive rate than standard Bloom 
filter, due to the uneven distribution of keys between buckets. We should 
change the code to match the theory, using an approximation from the paper that 
introduced block Bloom filters, "Cache-, Hash- and Space-Efficient Bloom 
Filters" by Putze et al.

For a false positive probability of 1%, this would increase filter size by 
about 10% and a decrease filter false positive probability by 50%. However, 
this is obscured by the coarseness of the fact that filters are constrained to 
have a size in bytes that is a power of two. Loosening that restriction is 
potential future work.

See 
https://github.com/apache/kudu/commit/d1190c2b06a6eef91b21fd4a0b5fb76534b4e9f9

  was:
{{bloom-filter.cc}} says:

{noformat}
fpp = (1 - exp(-BUCKET_WORDS * ndv/space))^BUCKET_WORDS

where space is in bits.
{noformat}

This is true only discounting the false positive rate induced by hash 
collisions. Using {{w}}}-bit hashes, with {{n}} distinct values gives a false 
positive rate of

{noformat}
n / exp2(w) + (1.0 - n / exp2(w)) * ((1 - exp(-BUCKET_WORDS * 
ndv/space))^BUCKET_WORDS)
{noformat}

This starts to become a factor as {{n}} approaches {{1 << w}}. It also suggests 
using bitmaps rather than Bloom filters for large {{n}}, since the false 
positive rate for a bitmap is

{noformat}
n / exp2(w) + (1.0 - n / exp2(w)) * (1 - exp(-ndv/space))
{noformat}

This is lower than the current BF false positive rate for high {{n}} and low 
relative space (aka, high false positive probability).

Summary: Bloom filters underestimate false positive probability  (was: 
Bloom filters underestimate false positive probability for high NDV)

> Bloom filters underestimate false positive probability
> --
>
> Key: IMPALA-5633
> URL: https://issues.apache.org/jira/browse/IMPALA-5633
> Project: IMPALA
>  Issue Type: Bug
>  Components: Perf Investigation
>Reporter: Jim Apple
>Assignee: Jim Apple
>Priority: Minor
>
> Block Bloom filters have a higher false positive rate than standard Bloom 
> filter, due to the uneven distribution of keys between buckets. We should 
> change the code to match the theory, using an approximation from the paper 
> that introduced block Bloom filters, "Cache-, Hash- and Space-Efficient Bloom 
> Filters" by Putze et al.
> For a false positive probability of 1%, this would increase filter size by 
> about 10% and a decrease filter false positive probability by 50%. However, 
> this is obscured by the coarseness of the fact that filters are constrained 
> to have a size in bytes that is a power of two. Loosening that restriction is 
> potential future work.
> See 
> https://github.com/apache/kudu/commit/d1190c2b06a6eef91b21fd4a0b5fb76534b4e9f9



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10101) Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python requirements

2020-08-25 Thread Kevin Yu (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184857#comment-17184857
 ] 

Kevin Yu commented on IMPALA-10101:
---

Thanks again, Tim.

> Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python 
> requirements
> -
>
> Key: IMPALA-10101
> URL: https://issues.apache.org/jira/browse/IMPALA-10101
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.9.0
> Environment: ubuntu 1604 x86_64
>Reporter: Kevin Yu
>Priority: Major
>
> Reproduce steps:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10101) Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python requirements

2020-08-25 Thread Kevin Yu (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183752#comment-17183752
 ] 

Kevin Yu edited comment on IMPALA-10101 at 8/26/20, 2:20 AM:
-

Thanks Tim for the reply.  I have to look into the python infra scripts to 
figure it out how to build older versions of cloudera impala branches.


was (Author: keens312):
This Tim for the reply.  I have to look into the python infra scripts to figure 
it out how to build older versions of cloudera impala branches.

> Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python 
> requirements
> -
>
> Key: IMPALA-10101
> URL: https://issues.apache.org/jira/browse/IMPALA-10101
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.9.0
> Environment: ubuntu 1604 x86_64
>Reporter: Kevin Yu
>Priority: Major
>
> Reproduce steps:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9544) Replace Intel's SSE instructions with ARM's NEON instructions

2020-08-25 Thread zhaorenhai (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaorenhai resolved IMPALA-9544.

Resolution: Fixed

> Replace Intel's SSE instructions with ARM's NEON instructions
> -
>
> Key: IMPALA-9544
> URL: https://issues.apache.org/jira/browse/IMPALA-9544
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: zhaorenhai
>Assignee: zhaorenhai
>Priority: Major
>
> Replace Intel's SSE instructions with ARM's NEON instructions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10088) DeadLock while run unifiedbetests on aarch64 platform

2020-08-25 Thread zhaorenhai (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaorenhai resolved IMPALA-10088.
-
  Assignee: zhaorenhai
Resolution: Fixed

> DeadLock while run unifiedbetests on aarch64 platform
> -
>
> Key: IMPALA-10088
> URL: https://issues.apache.org/jira/browse/IMPALA-10088
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: zhaorenhai
>Assignee: zhaorenhai
>Priority: Major
>
> When run unifiedbetests and impalad on aarch64 platform, when init tcmalloc, 
> will happen deadlock.
> The stacktrace is as following:
>  
> {code:java}
> (gdb) bt
> #0  0x83099544 in __GI___nanosleep (requested_time=0xffc71698, 
> remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
> #1  0x054cf144 in base::internal::SpinLockDelay (w=0x77385b0 
> , value=2, loop=727956) at 
> /home/impala/impala/be/src/gutil/spinlock_linux-inl.h:86
> #2  0x05529800 in SpinLock::SlowLock() ()
> #3  0x055fb5c4 in tcmalloc::ThreadCache::InitModule() ()
> #4  0x05743374 in tc_calloc ()
> #5  0x81c737f4 in _dlerror_run (operate=operate@entry=0x81c73158 
> , args=0xffc717d8, args@entry=0xffc717f8) at dlerror.c:140
> #6  0x81c731f0 in __dlsym (handle=, name= out>) at dlsym.c:70
> #7  0x0310ee04 in (anonymous namespace)::dlsym_or_die (sym=0x606b260 
> "dlopen") at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:74
> #8  0x0310ef1c in (anonymous namespace)::InitIfNecessary () at 
> /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:100
> #9  0x0310f0b4 in dl_iterate_phdr (callback=0x81620d18 
> <_Unwind_IteratePhdrCallback>, data=0xffc71900) at 
> /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:158
> #10 0x816215b4 in _Unwind_Find_FDE (pc=0x8161f98f 
> <_Unwind_Backtrace+79>, bases=bases@entry=0xffc72438) at 
> ../../../gcc-7.5.0/libgcc/unwind-dw2-fde-dip.c:469
> #11 0x8161dfdc in uw_frame_state_for 
> (context=context@entry=0xffc72110, fs=fs@entry=0xffc719f0) at 
> ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1249
> #12 0x8161ef3c in uw_init_context_1 
> (context=context@entry=0xffc72110, outer_cfa=0xffc72b50, 
> outer_cfa@entry=0xffc72be0, outer_ra=0x55298d8 
> )
>     at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1578
> #13 0x8161f990 in _Unwind_Backtrace (trace=0x5529a48 
> , 
> trace_argument=0xffc72b68) at ../../../gcc-7.5.0/libgcc/unwind.inc:283
> #14 0x055298d8 in GetStackTrace_libgcc(void**, int, int) ()
> #15 0x05529db4 in GetStackTrace(void**, int, int) ()
> #16 0x055f891c in tcmalloc::PageHeap::GrowHeap(unsigned long) ()
> {code}
> I think this is same issue with 
> [https://github.com/gperftools/gperftools/issues/1184] ,
> because the issue will happen  when I building gperftools both with libunwind 
> and without libunwind .
>  
> And KUDU also has same issue:
> https://issues.apache.org/jira/browse/KUDU-3072
> I think the  solution in following link is not correct
> [https://gerrit.cloudera.org/#/c/15420/]
> On aarch64 , the method of getting stacktrace is not same with arm.
> I think the correct solution of getting stacktrace is should like this:
> [https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/stacktrace_aarch64-inl.inc]
>  or just use libunwind or use gcc.
>  
> But I think the gperftools maybe not the root cause of this issue, because 
> both gperftools and libunwind now can support aarch64 perfectly (with 
> libunwind or gcc).
> Maybe this commit of kudu has bug?
> [https://github.com/apache/kudu/commit/b621f9c1a3949dc31ca4836b0767b2840fa73f29]
> Because on x86, the gperftools will not use libunwind or libgcc to 
> getstacktrace, so the issue will not happen.
> I tried :
> {code:java}
> #if !defined(THREAD_SANITIZER) && !defined(__APPLE__)
> #define HOOK_DL_ITERATE_PHDR 1
> #endif
> {code}
> change to 
> {code:java}
> #if !defined(THREAD_SANITIZER) && !defined(__APPLE__) && !defined(__aarch64__)
> #define HOOK_DL_ITERATE_PHDR 1
> #endif{code}
> the deadlock issue will not happen.
>  
> [~tarmstr...@cloudera.com] [~tlipcon] [~adar]
> What do you think about this issue? how to fix it? any suggestion?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-10088) DeadLock while run unifiedbetests on aarch64 platform

2020-08-25 Thread zhaorenhai (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaorenhai reopened IMPALA-10088:
-

> DeadLock while run unifiedbetests on aarch64 platform
> -
>
> Key: IMPALA-10088
> URL: https://issues.apache.org/jira/browse/IMPALA-10088
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: zhaorenhai
>Assignee: zhaorenhai
>Priority: Major
>
> When run unifiedbetests and impalad on aarch64 platform, when init tcmalloc, 
> will happen deadlock.
> The stacktrace is as following:
>  
> {code:java}
> (gdb) bt
> #0  0x83099544 in __GI___nanosleep (requested_time=0xffc71698, 
> remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
> #1  0x054cf144 in base::internal::SpinLockDelay (w=0x77385b0 
> , value=2, loop=727956) at 
> /home/impala/impala/be/src/gutil/spinlock_linux-inl.h:86
> #2  0x05529800 in SpinLock::SlowLock() ()
> #3  0x055fb5c4 in tcmalloc::ThreadCache::InitModule() ()
> #4  0x05743374 in tc_calloc ()
> #5  0x81c737f4 in _dlerror_run (operate=operate@entry=0x81c73158 
> , args=0xffc717d8, args@entry=0xffc717f8) at dlerror.c:140
> #6  0x81c731f0 in __dlsym (handle=, name= out>) at dlsym.c:70
> #7  0x0310ee04 in (anonymous namespace)::dlsym_or_die (sym=0x606b260 
> "dlopen") at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:74
> #8  0x0310ef1c in (anonymous namespace)::InitIfNecessary () at 
> /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:100
> #9  0x0310f0b4 in dl_iterate_phdr (callback=0x81620d18 
> <_Unwind_IteratePhdrCallback>, data=0xffc71900) at 
> /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:158
> #10 0x816215b4 in _Unwind_Find_FDE (pc=0x8161f98f 
> <_Unwind_Backtrace+79>, bases=bases@entry=0xffc72438) at 
> ../../../gcc-7.5.0/libgcc/unwind-dw2-fde-dip.c:469
> #11 0x8161dfdc in uw_frame_state_for 
> (context=context@entry=0xffc72110, fs=fs@entry=0xffc719f0) at 
> ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1249
> #12 0x8161ef3c in uw_init_context_1 
> (context=context@entry=0xffc72110, outer_cfa=0xffc72b50, 
> outer_cfa@entry=0xffc72be0, outer_ra=0x55298d8 
> )
>     at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1578
> #13 0x8161f990 in _Unwind_Backtrace (trace=0x5529a48 
> , 
> trace_argument=0xffc72b68) at ../../../gcc-7.5.0/libgcc/unwind.inc:283
> #14 0x055298d8 in GetStackTrace_libgcc(void**, int, int) ()
> #15 0x05529db4 in GetStackTrace(void**, int, int) ()
> #16 0x055f891c in tcmalloc::PageHeap::GrowHeap(unsigned long) ()
> {code}
> I think this is same issue with 
> [https://github.com/gperftools/gperftools/issues/1184] ,
> because the issue will happen  when I building gperftools both with libunwind 
> and without libunwind .
>  
> And KUDU also has same issue:
> https://issues.apache.org/jira/browse/KUDU-3072
> I think the  solution in following link is not correct
> [https://gerrit.cloudera.org/#/c/15420/]
> On aarch64 , the method of getting stacktrace is not same with arm.
> I think the correct solution of getting stacktrace is should like this:
> [https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/stacktrace_aarch64-inl.inc]
>  or just use libunwind or use gcc.
>  
> But I think the gperftools maybe not the root cause of this issue, because 
> both gperftools and libunwind now can support aarch64 perfectly (with 
> libunwind or gcc).
> Maybe this commit of kudu has bug?
> [https://github.com/apache/kudu/commit/b621f9c1a3949dc31ca4836b0767b2840fa73f29]
> Because on x86, the gperftools will not use libunwind or libgcc to 
> getstacktrace, so the issue will not happen.
> I tried :
> {code:java}
> #if !defined(THREAD_SANITIZER) && !defined(__APPLE__)
> #define HOOK_DL_ITERATE_PHDR 1
> #endif
> {code}
> change to 
> {code:java}
> #if !defined(THREAD_SANITIZER) && !defined(__APPLE__) && !defined(__aarch64__)
> #define HOOK_DL_ITERATE_PHDR 1
> #endif{code}
> the deadlock issue will not happen.
>  
> [~tarmstr...@cloudera.com] [~tlipcon] [~adar]
> What do you think about this issue? how to fix it? any suggestion?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10089) Fix large code model issue of llvm on aarch64

2020-08-25 Thread zhaorenhai (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaorenhai resolved IMPALA-10089.
-
Resolution: Fixed

> Fix large code model issue of llvm on aarch64
> -
>
> Key: IMPALA-10089
> URL: https://issues.apache.org/jira/browse/IMPALA-10089
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: zhaorenhai
>Assignee: zhaorenhai
>Priority: Major
>
> This issue referenced the following llvm issue:
> [https://reviews.llvm.org/D27629]
> If not fix, when loading test data of impala, the process will core dump.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7782) discrepancy in results with a subquery containing an agg that produces an empty set

2020-08-25 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7782.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> discrepancy in results with a subquery containing an agg that produces an 
> empty set
> ---
>
> Key: IMPALA-7782
> URL: https://issues.apache.org/jira/browse/IMPALA-7782
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: Michael Brown
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: correctness, query_generator, ramp-up
> Fix For: Impala 4.0
>
>
> A discrepancy exists between Impala and Postgres when a subquery contains an 
> agg and results in an empty set, yet the WHERE clause looking at the subquery 
> should produce a "True" condition.
> Example queries include:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE NULL NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE (SELECT COUNT(id) FROM alltypestiny HAVING false) IS NULL;
> {noformat}
> These queries do not produce any rows in Impala. In Postgres, the queries 
> produce all 8 rows for the functional.alltypestiny id column.
> Thinking maybe there were Impala and Postgres differences with {{NOT IN}} 
> behavior, I also tried this:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT 1 FROM alltypestiny WHERE bool_col IS NULL);
> {noformat}
> This subquery also produces an empty set just like the subquery in the 
> problematic queries at the top, but unlike those queries, this full query 
> returns the same results in Impala and Postgres (all 8 rows for the 
> functional.alltypestiny id column).
> For anyone interested in this bug, you can migrate data into postgres in a 
> dev environment using
> {noformat}
> tests/comparison/data_generator.py --use-postgresql --migrate-table-names 
> alltypestiny --db-name functional migrate
> {noformat}
> This is in 2.12 at least, so it's not a 3.1 regression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7782) discrepancy in results with a subquery containing an agg that produces an empty set

2020-08-25 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7782:
--
Priority: Blocker  (was: Major)

> discrepancy in results with a subquery containing an agg that produces an 
> empty set
> ---
>
> Key: IMPALA-7782
> URL: https://issues.apache.org/jira/browse/IMPALA-7782
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: Michael Brown
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: correctness, query_generator, ramp-up
> Fix For: Impala 4.0
>
>
> A discrepancy exists between Impala and Postgres when a subquery contains an 
> agg and results in an empty set, yet the WHERE clause looking at the subquery 
> should produce a "True" condition.
> Example queries include:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE NULL NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE (SELECT COUNT(id) FROM alltypestiny HAVING false) IS NULL;
> {noformat}
> These queries do not produce any rows in Impala. In Postgres, the queries 
> produce all 8 rows for the functional.alltypestiny id column.
> Thinking maybe there were Impala and Postgres differences with {{NOT IN}} 
> behavior, I also tried this:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT 1 FROM alltypestiny WHERE bool_col IS NULL);
> {noformat}
> This subquery also produces an empty set just like the subquery in the 
> problematic queries at the top, but unlike those queries, this full query 
> returns the same results in Impala and Postgres (all 8 rows for the 
> functional.alltypestiny id column).
> For anyone interested in this bug, you can migrate data into postgres in a 
> dev environment using
> {noformat}
> tests/comparison/data_generator.py --use-postgresql --migrate-table-names 
> alltypestiny --db-name functional migrate
> {noformat}
> This is in 2.12 at least, so it's not a 3.1 regression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5633) Bloom filters underestimate false positive probability for high NDV

2020-08-25 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184795#comment-17184795
 ] 

Tim Armstrong commented on IMPALA-5633:
---

[~jbapple] I know you're looking at some related things in Kudu now. Do you 
have any idea why max_filter_error_rate might be set to 0.75 - that seems 
awfully high.

On some queries - TPC-DS Q84 for example, it seems to be too aggressive and is 
choosing filters that are too small and end up being less effective than 
desired. E.g. a 4kb filter for household_demographics.hd_demo_sk, which has 
7200 distinct values.

> Bloom filters underestimate false positive probability for high NDV
> ---
>
> Key: IMPALA-5633
> URL: https://issues.apache.org/jira/browse/IMPALA-5633
> Project: IMPALA
>  Issue Type: Bug
>  Components: Perf Investigation
>Reporter: Jim Apple
>Assignee: Jim Apple
>Priority: Minor
>
> {{bloom-filter.cc}} says:
> {noformat}
> fpp = (1 - exp(-BUCKET_WORDS * ndv/space))^BUCKET_WORDS
> where space is in bits.
> {noformat}
> This is true only discounting the false positive rate induced by hash 
> collisions. Using {{w}}}-bit hashes, with {{n}} distinct values gives a false 
> positive rate of
> {noformat}
> n / exp2(w) + (1.0 - n / exp2(w)) * ((1 - exp(-BUCKET_WORDS * 
> ndv/space))^BUCKET_WORDS)
> {noformat}
> This starts to become a factor as {{n}} approaches {{1 << w}}. It also 
> suggests using bitmaps rather than Bloom filters for large {{n}}, since the 
> false positive rate for a bitmap is
> {noformat}
> n / exp2(w) + (1.0 - n / exp2(w)) * (1 - exp(-ndv/space))
> {noformat}
> This is lower than the current BF false positive rate for high {{n}} and low 
> relative space (aka, high false positive probability).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7782) discrepancy in results with a subquery containing an agg that produces an empty set

2020-08-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184785#comment-17184785
 ] 

ASF subversion and git services commented on IMPALA-7782:
-

Commit e133d1838ab05e75007fef24e2ce1b6f18113c8d in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e133d18 ]

IMPALA-7782: fix constant NOT IN subqueries that can return 0 rows

The bug was the the statement rewriter converted NOT IN 
predicates to !=  predicates when the subquery could
be an empty set. This was invalid, because NOT IN ()
is true, but != () is false.

Testing:
Added targeted planner and end-to-end tests.

Ran exhaustive tests.

Change-Id: I66c726f0f66ce2f609e6ba44057191f5929a67fc
Reviewed-on: http://gerrit.cloudera.org:8080/16338
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> discrepancy in results with a subquery containing an agg that produces an 
> empty set
> ---
>
> Key: IMPALA-7782
> URL: https://issues.apache.org/jira/browse/IMPALA-7782
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: Michael Brown
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: correctness, query_generator, ramp-up
>
> A discrepancy exists between Impala and Postgres when a subquery contains an 
> agg and results in an empty set, yet the WHERE clause looking at the subquery 
> should produce a "True" condition.
> Example queries include:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE NULL NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE (SELECT COUNT(id) FROM alltypestiny HAVING false) IS NULL;
> {noformat}
> These queries do not produce any rows in Impala. In Postgres, the queries 
> produce all 8 rows for the functional.alltypestiny id column.
> Thinking maybe there were Impala and Postgres differences with {{NOT IN}} 
> behavior, I also tried this:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT 1 FROM alltypestiny WHERE bool_col IS NULL);
> {noformat}
> This subquery also produces an empty set just like the subquery in the 
> problematic queries at the top, but unlike those queries, this full query 
> returns the same results in Impala and Postgres (all 8 rows for the 
> functional.alltypestiny id column).
> For anyone interested in this bug, you can migrate data into postgres in a 
> dev environment using
> {noformat}
> tests/comparison/data_generator.py --use-postgresql --migrate-table-names 
> alltypestiny --db-name functional migrate
> {noformat}
> This is in 2.12 at least, so it's not a 3.1 regression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8547) get_json_object fails to get value for numeric key

2020-08-25 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8547.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> get_json_object fails to get value for numeric key
> --
>
> Key: IMPALA-8547
> URL: https://issues.apache.org/jira/browse/IMPALA-8547
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Eugene Zimichev
>Assignee: Eugene Zimichev
>Priority: Minor
>  Labels: built-in-function
> Fix For: Impala 4.0
>
>
> {code:java}
> select get_json_object('{"1": 5}', '$.1');
> {code}
> returns error:
>  
> {code:java}
> "Expected key at position 2"
> {code}
>  
> I guess it's caused by using function FindEndOfIdentifier that expects first 
> symbol of key to be a letter.
> Hive version of get_json_object works fine in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9957) Impalad crashes when serializing large rows in aggregation spilling

2020-08-25 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-9957.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Impalad crashes when serializing large rows in aggregation spilling
> ---
>
> Key: IMPALA-9957
> URL: https://issues.apache.org/jira/browse/IMPALA-9957
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.0
>
>
> Queries to reproduce the crash using the testdata:
> {code:sql}
> create table bigstrs stored as parquet as
>   select *, repeat(uuid(), cast(random() * 10 as int)) as bigstr
>   from functional.alltypes;
> set MAX_ROW_SIZE=3.5MB;
> set MEM_LIMIT=4GB;
> create table my_str_group stored as parquet as
>   select group_concat(string_col) as ss, bigstr
>   from bigstrs group by bigstr;
> {code}
> The last query 1) has large rows, 2) needs spilling in aggregation 3) has 
> aggregation on functions needs serialize (e.g. group_concat, appx_median, 
> min(string), etc). With these 3 conditions, it will trigger this bug.
>  The crash stacktraces are different in different build modes. Crash 
> stacktrace in RELEASE build with codegen enabled:
> {code:java}
> Thread 316 (crashed)
>  0  impalad!impala::HashTable::Close() [hash-table.cc : 512 + 0x0]
>  1  impalad!impala::GroupingAggregator::Partition::Spill(bool) 
> [grouping-aggregator-partition.cc : 180 + 0x9]
>  2  impalad!impala::GroupingAggregator::SpillPartition(bool) 
> [grouping-aggregator.cc : 904 + 0x10]
>  3  0x7f5fba83db3c
>  4  impalad!impala::GroupingAggregator::AddBatch(impala::RuntimeState*, 
> impala::RowBatch*) [grouping-aggregator.cc : 437 + 0x2]
>  5  impalad!impala::AggregationNode::Open(impala::RuntimeState*) 
> [aggregation-node.cc : 70 + 0x6]
>  6  libstdc++.so.6.0.24 + 0x120b28
>  7  
> impalad!apache::hive::service::cli::thrift::TColumnValue::printTo(std::ostream&)
>  const [converter_lexical_streams.hpp : 161 + 0x8]
>  8  impalad!impala::FragmentInstanceState::Open() [fragment-instance-state.cc 
> : 396 + 0x11]
>  9  impalad!tc_newarray + 0x171
> {code}
> Crash stacktrace in RELEASE build with codegen disabled (set 
> DISABLE_CODEGEN=true):
> {code:java}
> Thread 320 (crashed)
>  0  impalad!impala::HashTable::Close() [hash-table.cc : 512 + 0x0]
>  1  impalad!impala::GroupingAggregator::Partition::Spill(bool) 
> [grouping-aggregator-partition.cc : 180 + 0x9]
>  2  impalad!impala::GroupingAggregator::SpillPartition(bool) 
> [grouping-aggregator.cc : 904 + 0x10]
>  3  impalad!impala::Status 
> impala::GroupingAggregator::AddBatchImpl(impala::RowBatch*, 
> impala::TPrefetchMode::type, impala::HashTableCtx*) 
> [grouping-aggregator-ir.cc : 148 + 0x11]
>  4  impalad!impala::GroupingAggregator::AddBatch(impala::RuntimeState*, 
> impala::RowBatch*) [grouping-aggregator.cc : 439 + 0x5]
>  5  impalad!impala::AggregationNode::Open(impala::RuntimeState*) 
> [aggregation-node.cc : 70 + 0x6]
>  6  impalad!impala::FragmentInstanceState::Open() [fragment-instance-state.cc 
> : 396 + 0x11]
>  7  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc 
> : 97 + 0x12]
>  8  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> [query-state.cc : 815 + 0x19]
>  9  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [function_template.hpp : 770 
> + 0x7]
> 10  impalad!boost::detail::thread_data (*)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > 
> >::run() [bind.hpp : 531 + 0xc]
> 11  impalad!thread_proxy + 0x72
> 12  libpthread-2.23.so + 0x76ba
> 13  libc-2.23.so + 0x1074dd
> {code}
> Crash stacktrace in DEBUG build with codegen disabled is a bit ealier - 
> crashed at a DCHECK:
> {code:java}
> F0715 20:29:24.389505 16868 grouping-aggregator-partition.cc:125] 
> 1d4b40df02e6ad76:433ed5740003] Check failed: !status.ok() Stream was 
> unpinned - AddRow() only fails on error
> *** Check failure stack trace: ***
> @  0x513f31c  google::LogMessage::Fail()
> @  0x5140c0c  google::LogMessage::SendToLog()
> @  0x513ec7a  google::LogMessage::Flush()
> @  0x5142878  google::LogMessageFatal::~LogMessageFatal()
> @  0x28b2ca7  
> 

[jira] [Resolved] (IMPALA-9955) Internal error for a query with large rows and spilling

2020-08-25 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-9955.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Internal error for a query with large rows and spilling
> ---
>
> Key: IMPALA-9955
> URL: https://issues.apache.org/jira/browse/IMPALA-9955
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: impalad.INFO, impalad_node1.INFO, impalad_node2.INFO
>
>
> Encounter a query failure due to internal error:
> {code:java}
> create table bigstrs stored as parquet as select *, repeat(uuid(), 
> cast(random() * 10 as int)) as bigstr from functional.alltypes;
> set MAX_ROW_SIZE=3.5MB;
> set MEM_LIMIT=4GB;
> set DISABLE_CODEGEN=true;
> create table my_cnt stored as parquet as select count(*) as cnt, bigstr from 
> bigstrs group by bigstr;
> {code}
> The error is
> {code:java}
> ERROR: Internal error: couldn't pin large page of 4194304 bytes, client only 
> had 2097152 bytes of unused reservation:
>  0xcf9dae0 internal state: { 
> 0xbdf6ac0 name: GroupingAggregator id=3 ptr=0xcf9d900 write_status:  buffers 
> allocated 2097152 num_pages: 2094 pinned_bytes: 41943040 
> dirty_unpinned_bytes: 0 in_flight_write_bytes: 0 reservation: 
> {: reservation_limit 9223372036854775807 reservation 
> 46137344 used_reservation 44040192 child_reservations 0 parent:
> : reservation_limit 9223372036854775807 reservation 
> 46137344 used_reservation 0 child_reservations 46137344 parent:
> : reservation_limit 9223372036854775807 reservation 
> 46137344 used_reservation 0 child_reservations 46137344 parent:
> : reservation_limit 3435973836 reservation 46137344 
> used_reservation 0 child_reservations 46137344 parent:
> : reservation_limit 6647046144 reservation 46137344 
> used_reservation 0 child_reservations 46137344 parent:
> NULL}
>   12 pinned pages:  0xc9160a0 len: 2097152 pin_count: 1 
> buf:  0xc916118 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0x1320 len: 2097152
>  0xc919d40 len: 4194304 pin_count: 1 buf: 
>  0xc919db8 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0x12460 len: 4194304
>  0xd42aaa0 len: 4194304 pin_count: 1 buf: 
>  0xd42ab18 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0x12b20 len: 4194304
>  0xd42b900 len: 4194304 pin_count: 1 buf: 
>  0xd42b978 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0x132a0 len: 4194304
>  0xd42d3e0 len: 2097152 pin_count: 1 buf: 
>  0xd42d458 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0xc6a0 len: 2097152
>  0xd42dd40 len: 4194304 pin_count: 1 buf: 
>  0xd42ddb8 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0x132e0 len: 4194304
>  0xd42de80 len: 4194304 pin_count: 1 buf: 
>  0xd42def8 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0x137c0 len: 4194304
>  0x12d48320 len: 4194304 pin_count: 1 buf: 
>  0x12d48398 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0x102c0 len: 4194304
>  0x12d483c0 len: 4194304 pin_count: 1 buf: 
>  0x12d48438 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0x108a0 len: 4194304
>  0x12d48780 len: 4194304 pin_count: 1 buf: 
>  0x12d487f8 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0x108e0 len: 4194304
>  0x12d492c0 len: 2097152 pin_count: 1 buf: 
>  0x12d49338 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0x12760 len: 2097152
>  0x12d4a9e0 len: 2097152 pin_count: 1 buf: 
>  0x12d4aa58 client: 0xcf9dae0/0xbdf6ac0 data: 
> 0x12d20 len: 2097152
>   0 dirty unpinned pages: 
>   0 in flight write pages: }
> {code}
> Found the stacktrace from the log:
> {code}
> @  0x1c9dfbe  impala::Status::Status()
> @  0x1ca5a78  impala::Status::Status()
> @  0x2bfe4ec  impala::BufferedTupleStream::NextReadPage()
> @  0x2c04b72  impala::BufferedTupleStream::GetNextInternal<>()
> @  0x2c029e6  impala::BufferedTupleStream::GetNextInternal<>()
> @  0x2bffd19  impala::BufferedTupleStream::GetNext()
> @  0x28aa43f  impala::GroupingAggregator::ProcessStream<>()
> @  0x28a2e17  impala::GroupingAggregator::BuildSpilledPartition()
> @  0x28a2401  impala::GroupingAggregator::NextPartition()
> @  0x289df5a  impala::GroupingAggregator::GetRowsFromPartition()
> @  0x289db20  impala::GroupingAggregator::GetNext()
> @  0x28dbfc7  impala::AggregationNode::GetNext()
> @  0x2259dfc  impala::FragmentInstanceState::ExecInternal()
> @  0x22564a0  impala::FragmentInstanceState::Exec()
> @  0x22801ed  impala::QueryState::ExecFInstance()
> @  0x227e5ef  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @  0x2281d8e  
> 

[jira] [Commented] (IMPALA-8547) get_json_object fails to get value for numeric key

2020-08-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184765#comment-17184765
 ] 

ASF subversion and git services commented on IMPALA-8547:
-

Commit adf2c464aed5d35c976c1439982e0a927f76b609 in impala's branch 
refs/heads/master from Eugene Zimichev
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=adf2c46 ]

IMPALA-8547: get_json_object fails to get value for numeric key

Allows numeric keys for JSON objects in get_json_object. This patch
makes Impala consistent with Hive and Postgres behavior for
get_json_object.

Queries such as "select get_json_object('{"1": 5}', '$.1');"
would fail before this patch. Now the query will return '5'.

Testing:
* Added tests to expr-test

Change-Id: I7df037ccf2c79da0ba86a46df1dd28ab0e9a45f4
Reviewed-on: http://gerrit.cloudera.org:8080/14905
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> get_json_object fails to get value for numeric key
> --
>
> Key: IMPALA-8547
> URL: https://issues.apache.org/jira/browse/IMPALA-8547
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Eugene Zimichev
>Assignee: Eugene Zimichev
>Priority: Minor
>  Labels: built-in-function
>
> {code:java}
> select get_json_object('{"1": 5}', '$.1');
> {code}
> returns error:
>  
> {code:java}
> "Expected key at position 2"
> {code}
>  
> I guess it's caused by using function FindEndOfIdentifier that expects first 
> symbol of key to be a letter.
> Hive version of get_json_object works fine in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

2020-08-25 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-7876:
-

Assignee: Tim Armstrong

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> --
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Andre Araujo
>Assignee: Tim Armstrong
>Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +---+
> | summary   |
> +---+
> | Updated 1 partition(s) and 103 column(s). |
> +---+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +---+--++-+--+---+-+---+-+
> | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location|
> +---+--++-+--+---+-+---+-+
> | 0 | -1   | 84 | 20.35GB | NOT CACHED   | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +---+--++-+--+---+-+---+-+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10103) Jquery upgrade to 3.5.1

2020-08-25 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10103.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Jquery upgrade to 3.5.1
> ---
>
> Key: IMPALA-10103
> URL: https://issues.apache.org/jira/browse/IMPALA-10103
> Project: IMPALA
>  Issue Type: Task
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9962) Implement ds_kll_quantiles() function

2020-08-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184750#comment-17184750
 ] 

ASF subversion and git services commented on IMPALA-9962:
-

Commit 41065845e927acef5a0ff95ef8fb32b2f86272d8 in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4106584 ]

IMPALA-9962: Implement ds_kll_quantiles_as_string() function

This function is very similar to ds_kll_quantile() but this one can
receive any number of rank parameters and returns a comma separated
string that holds the results for all of the given ranks.
For more details about ds_kll_quantile() see IMPALA-9959.

Note, ds_kll_quantiles() should return an Array of floats as the result
but with that we have to wait for the complex type support. Until, we
provide ds_kll_quantiles_as_string() that can be deprecated once we
have array support. Tracking Jira for returning complex types from
functions is IMPALA-9520.

Change-Id: I76f6039977f4e14ded89a3ee4bc4e6ff855f5e7f
Reviewed-on: http://gerrit.cloudera.org:8080/16324
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Implement ds_kll_quantiles() function
> -
>
> Key: IMPALA-9962
> URL: https://issues.apache.org/jira/browse/IMPALA-9962
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>
> Requirements for ds_kll_quantiles()
>  - Receives a serialized KLL sketch in BINARY type (in Impala it can be 
> STRING as long as we don't have BINARY) as first parameter.
>  - Receives one or more double values to represent the quantile points.
>  - In Hive the return type is an array of doubles. However, Impala can't 
> return complex types from functions at this point so we have to find some 
> alternative approaches to implement this function.
>  ** One would be to return as many columns as many quantile points were given.
>  ** Another approach is to create a comma separated string from the results 
> of this function and return that string instead of an array.
> Hive example:
> {code:java}
> select ds_kll_quantiles(ds_kll_sketch(cast(int_col as float)), 0, 0.1, 0.5, 
> 1) from table_name
> ++
> |_c0 |
> ++
> | [1.0,1.0,1.0,1.0]  |
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9520) Allow returning complex types from UDFs

2020-08-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184752#comment-17184752
 ] 

ASF subversion and git services commented on IMPALA-9520:
-

Commit 41065845e927acef5a0ff95ef8fb32b2f86272d8 in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4106584 ]

IMPALA-9962: Implement ds_kll_quantiles_as_string() function

This function is very similar to ds_kll_quantile() but this one can
receive any number of rank parameters and returns a comma separated
string that holds the results for all of the given ranks.
For more details about ds_kll_quantile() see IMPALA-9959.

Note, ds_kll_quantiles() should return an Array of floats as the result
but with that we have to wait for the complex type support. Until, we
provide ds_kll_quantiles_as_string() that can be deprecated once we
have array support. Tracking Jira for returning complex types from
functions is IMPALA-9520.

Change-Id: I76f6039977f4e14ded89a3ee4bc4e6ff855f5e7f
Reviewed-on: http://gerrit.cloudera.org:8080/16324
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Allow returning complex types from UDFs
> ---
>
> Key: IMPALA-9520
> URL: https://issues.apache.org/jira/browse/IMPALA-9520
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Priority: Major
>  Labels: complextype
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9959) Implement ds_kll_sketch() and ds_kll_quantile() functions

2020-08-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184751#comment-17184751
 ] 

ASF subversion and git services commented on IMPALA-9959:
-

Commit 41065845e927acef5a0ff95ef8fb32b2f86272d8 in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4106584 ]

IMPALA-9962: Implement ds_kll_quantiles_as_string() function

This function is very similar to ds_kll_quantile() but this one can
receive any number of rank parameters and returns a comma separated
string that holds the results for all of the given ranks.
For more details about ds_kll_quantile() see IMPALA-9959.

Note, ds_kll_quantiles() should return an Array of floats as the result
but with that we have to wait for the complex type support. Until, we
provide ds_kll_quantiles_as_string() that can be deprecated once we
have array support. Tracking Jira for returning complex types from
functions is IMPALA-9520.

Change-Id: I76f6039977f4e14ded89a3ee4bc4e6ff855f5e7f
Reviewed-on: http://gerrit.cloudera.org:8080/16324
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Implement ds_kll_sketch() and ds_kll_quantile() functions
> -
>
> Key: IMPALA-9959
> URL: https://issues.apache.org/jira/browse/IMPALA-9959
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
> Fix For: Impala 4.0
>
>
> 1:
>  STRING ds_kll_sketch(float)
>  Accepts float as parameter and returns a DataSketches KLL sketch in string 
> type (or binary once that work is submitted).
> 2:
>  FLOAT (or DOUBLE?) ds_kll_quantile(KLL sketch, double)
>  Accepts two parameters: a KLL sketch created by ds_hll_sketch() and a double 
> in [0, 1] to represent the quantile.
> At this point I'm not sure about the return value, it's either a float or 
> double, it's a subject of further investigation.
> Example:
> {code:java}
> select ds_kll_quantile(ds_kll_sketch(cast(int_col as float)), 1) from 
> table_name;
> +--+
> | _c0  |
> +--+
> | 1.0  |
> +--+
> {code}
> Some further examples found here:
>  [https://datasketches.apache.org/docs/Quantiles/QuantilesCppExample.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9957) Impalad crashes when serializing large rows in aggregation spilling

2020-08-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184749#comment-17184749
 ] 

ASF subversion and git services commented on IMPALA-9957:
-

Commit e0a6e942b28909baa0f56e21e3d33adfb5eb19b7 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e0a6e94 ]

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in 
GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Reviewed-on: http://gerrit.cloudera.org:8080/16240
Tested-by: Impala Public Jenkins 
Reviewed-by: Tim Armstrong 


> Impalad crashes when serializing large rows in aggregation spilling
> ---
>
> Key: IMPALA-9957
> URL: https://issues.apache.org/jira/browse/IMPALA-9957
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Queries to reproduce the crash using the testdata:
> {code:sql}
> create table bigstrs stored as parquet as
>   select *, repeat(uuid(), cast(random() * 10 as int)) as bigstr
>   from functional.alltypes;
> set MAX_ROW_SIZE=3.5MB;
> set MEM_LIMIT=4GB;
> create table my_str_group stored as parquet as
>   select group_concat(string_col) as ss, bigstr
>   from bigstrs group by bigstr;
> {code}
> The last query 1) has large rows, 2) needs spilling in aggregation 3) has 
> aggregation on functions needs serialize (e.g. group_concat, appx_median, 
> min(string), etc). With these 3 conditions, it will trigger this bug.
>  The crash stacktraces are different in different build modes. Crash 
> stacktrace in RELEASE build with codegen enabled:
> {code:java}
> Thread 316 (crashed)
>  0  impalad!impala::HashTable::Close() [hash-table.cc : 512 + 0x0]
>  1  impalad!impala::GroupingAggregator::Partition::Spill(bool) 
> [grouping-aggregator-partition.cc : 180 + 0x9]
>  2  impalad!impala::GroupingAggregator::SpillPartition(bool) 
> [grouping-aggregator.cc : 904 + 0x10]
>  3  0x7f5fba83db3c
>  4  impalad!impala::GroupingAggregator::AddBatch(impala::RuntimeState*, 
> impala::RowBatch*) [grouping-aggregator.cc : 437 + 0x2]
>  5  impalad!impala::AggregationNode::Open(impala::RuntimeState*) 
> [aggregation-node.cc : 70 + 0x6]
>  6  libstdc++.so.6.0.24 + 0x120b28
>  7  
> impalad!apache::hive::service::cli::thrift::TColumnValue::printTo(std::ostream&)
>  

[jira] [Commented] (IMPALA-7779) Parquet Scanner can write binary data into profile

2020-08-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184746#comment-17184746
 ] 

ASF subversion and git services commented on IMPALA-7779:
-

Commit 2ebf554dfdb0dc9055ef95c8f2ec4fad51f1e657 in impala's branch 
refs/heads/master from Qifan Chen
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2ebf554 ]

IMPALA-7779 Parquet Scanner can write binary data into profile

This fix addresses the current limitation in that an ill-formatted
Parquet version string is not properly formatted before appearing
in an error message or impalad.INFO. With the fix, any such string is
converted to a hex string first. The hex string is a sequence of
four hex digit groups separated by spaces and each group is one or
two hex digits, such as "6c 65 2e a".

Testing:
 Ran "core" tests successfully.

Change-Id: I281d6fa7cb2f88f04588110943e3e768678b9cf1
Reviewed-on: http://gerrit.cloudera.org:8080/16331
Tested-by: Impala Public Jenkins 
Reviewed-by: Sahil Takiar 


> Parquet Scanner can write binary data into profile
> --
>
> Key: IMPALA-7779
> URL: https://issues.apache.org/jira/browse/IMPALA-7779
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Lars Volker
>Assignee: Qifan Chen
>Priority: Major
>  Labels: profile
>
> In 
> [hdfs-parquet-scanner.cc:1224|https://github.com/apache/impala/blob/master/be/src/exec/hdfs-parquet-scanner.cc#L1224]
>  we log an invalid file version string. Whatever 4 bytes that that pointer 
> points to will end up in the profile. These can be non-ascii characters, thus 
> potentially breaking tools that parse the profiles and expect their content 
> to be plain text. We should either remove the bytes from the message, or 
> escape them as hex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9955) Internal error for a query with large rows and spilling

2020-08-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184747#comment-17184747
 ] 

ASF subversion and git services commented on IMPALA-9955:
-

Commit e0a6e942b28909baa0f56e21e3d33adfb5eb19b7 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e0a6e94 ]

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in 
GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Reviewed-on: http://gerrit.cloudera.org:8080/16240
Tested-by: Impala Public Jenkins 
Reviewed-by: Tim Armstrong 


> Internal error for a query with large rows and spilling
> ---
>
> Key: IMPALA-9955
> URL: https://issues.apache.org/jira/browse/IMPALA-9955
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
> Attachments: impalad.INFO, impalad_node1.INFO, impalad_node2.INFO
>
>
> Encounter a query failure due to internal error:
> {code:java}
> create table bigstrs stored as parquet as select *, repeat(uuid(), 
> cast(random() * 10 as int)) as bigstr from functional.alltypes;
> set MAX_ROW_SIZE=3.5MB;
> set MEM_LIMIT=4GB;
> set DISABLE_CODEGEN=true;
> create table my_cnt stored as parquet as select count(*) as cnt, bigstr from 
> bigstrs group by bigstr;
> {code}
> The error is
> {code:java}
> ERROR: Internal error: couldn't pin large page of 4194304 bytes, client only 
> had 2097152 bytes of unused reservation:
>  0xcf9dae0 internal state: { 
> 0xbdf6ac0 name: GroupingAggregator id=3 ptr=0xcf9d900 write_status:  buffers 
> allocated 2097152 num_pages: 2094 pinned_bytes: 41943040 
> dirty_unpinned_bytes: 0 in_flight_write_bytes: 0 reservation: 
> {: reservation_limit 9223372036854775807 reservation 
> 46137344 used_reservation 44040192 child_reservations 0 parent:
> : reservation_limit 9223372036854775807 reservation 
> 46137344 used_reservation 0 child_reservations 46137344 parent:
> : reservation_limit 9223372036854775807 reservation 
> 46137344 used_reservation 0 child_reservations 46137344 parent:
> : reservation_limit 3435973836 reservation 46137344 
> used_reservation 0 child_reservations 46137344 parent:
> : reservation_limit 6647046144 reservation 46137344 
> used_reservation 0 child_reservations 46137344 parent:
> NULL}
>   12 pinned pages:  0xc9160a0 

[jira] [Commented] (IMPALA-10103) Jquery upgrade to 3.5.1

2020-08-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184753#comment-17184753
 ] 

ASF subversion and git services commented on IMPALA-10103:
--

Commit b46ea7664c3b38e12ea6a06e7d342273d132fbf7 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b46ea76 ]

IMPALA-10103: upgrade jquery to 3.5.1

Testing:
Manually clicked through most of the web UI pages
and interacted with data tables, etc.

Change-Id: Icf0445163a6bf15c56de0c6ca10798e09e0a4fcb
Reviewed-on: http://gerrit.cloudera.org:8080/16355
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Jquery upgrade to 3.5.1
> ---
>
> Key: IMPALA-10103
> URL: https://issues.apache.org/jira/browse/IMPALA-10103
> Project: IMPALA
>  Issue Type: Task
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

2020-08-25 Thread Vincent Tran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184733#comment-17184733
 ] 

Vincent Tran commented on IMPALA-7876:
--

This should reproduce this master.
{noformat}
CREATE TABLE default.one_gram_p ( ngram STRING, match_count INT,volume_count 
INT )PARTITIONED BY (year STRING)STORED AS TEXTFILE TBLPROPERTIES 
('impala.enable.stats.extrapolation'='true');
insert into one_gram_p partition(year) values('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010');
set compute_stats_min_sample_size=1B;
compute stats one_gram_p tablesample system(50);
show table stats one_gram_p;{noformat}

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> --
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Andre Araujo
>Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +---+
> | summary   |
> +---+
> | Updated 1 partition(s) and 103 column(s). |
> +---+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +---+--++-+--+---+-+---+-+
> | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location|
> +---+--++-+--+---+-+---+-+
> | 0 | -1   | 84 | 20.35GB | NOT CACHED   | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +---+--++-+--+---+-+---+-+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

2020-08-25 Thread Vincent Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Tran reopened IMPALA-7876:
--

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> --
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Andre Araujo
>Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +---+
> | summary   |
> +---+
> | Updated 1 partition(s) and 103 column(s). |
> +---+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +---+--++-+--+---+-+---+-+
> | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location|
> +---+--++-+--+---+-+---+-+
> | 0 | -1   | 84 | 20.35GB | NOT CACHED   | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +---+--++-+--+---+-+---+-+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10060) Postgres JDBC driver should be upgraded to 42.2.14

2020-08-25 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184715#comment-17184715
 ] 

Tim Armstrong commented on IMPALA-10060:


Posted it here and running precommits - https://gerrit.cloudera.org/#/c/16362/

> Postgres JDBC driver should be upgraded to 42.2.14
> --
>
> Key: IMPALA-10060
> URL: https://issues.apache.org/jira/browse/IMPALA-10060
> Project: IMPALA
>  Issue Type: Task
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Attachments: IMPALA-10060.patch
>
>
> Impala currently uses Postgres driver version 42.2.5 which isn't up to date 
> and has a CVE associated with it. It would be good to upgrade to 42.2.14 
> which is the latest as of June 2020.
> https://mvnrepository.com/artifact/org.postgresql/postgresql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

2020-08-25 Thread Vincent Tran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184701#comment-17184701
 ] 

Vincent Tran commented on IMPALA-7876:
--

 
{noformat}
===Without sampling===
[:21000] default> compute stats one_gram_p1;
Query: compute stats one_gram_p1
+-+
| summary |
+-+
| Updated 1 partition(s) and 3 column(s). |
+-+
Fetched 1 row(s) in 1.51s
[:21000] default> show table stats one_gram_p1;
Query: show table stats one_gram_p1
+---+--+--++--+--+---++---+---+
| year | #Rows | Extrap #Rows | #Files | Size | Bytes Cached | Cache 
Replication | Format | Incremental stats | Location |
+---+--+--++--+--+---++---+---+
| 2000 | -1 | 19013482 | 3 | 289.07MB | NOT CACHED | NOT CACHED | TEXT | false 
| hdfs://:8020/user/hive/warehouse/one_gram_p1/year=2000 |
| Total | 19013482 | 19013482 | 3 | 289.07MB | 0B | | | | |
+---+--+--++--+--+---++---+---+
Fetched 2 row(s) in 0.01s

===With sampling===
[:21000] default> set compute_stats_min_sample_size=1MB;
COMPUTE_STATS_MIN_SAMPLE_SIZE set to 1MB
:21000] default> compute stats one_gram_p1 tablesample system(10);
Query: compute stats one_gram_p1 tablesample system(10)
+-+
| summary |
+-+
| Updated 1 partition(s) and 3 column(s). |
+-+
Fetched 1 row(s) in 1.72s
[:21000] default> show table stats one_gram_p1;
Query: show table stats one_gram_p1
+---+---+--++--+--+---++---+---+
| year | #Rows | Extrap #Rows | #Files | Size | Bytes Cached | Cache 
Replication | Format | Incremental stats | Location |
+---+---+--++--+--+---++---+---+
| 2000 | -1 | -1 | 3 | 289.07MB | NOT CACHED | NOT CACHED | TEXT | false | 
hdfs://:8020/user/hive/warehouse/one_gram_p1/year=2000 |
| Total | 0 | -1 | 3 | 289.07MB | 0B | | | | |
+---+---+--++--+--+---++---+---+
Fetched 2 row(s) in 0.01s
{noformat}
 

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> --
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Andre Araujo
>Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +---+
> | summary   |
> +---+
> | Updated 1 partition(s) and 103 column(s). |
> +---+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +---+--++-+--+---+-+---+-+
> | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location|
> +---+--++-+--+---+-+---+-+
> | 0 | -1   | 84 | 20.35GB | NOT CACHED   | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> 

[jira] [Commented] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

2020-08-25 Thread Vincent Tran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184697#comment-17184697
 ] 

Vincent Tran commented on IMPALA-7876:
--

I can reproduce this on ~ 3.2.0. I think this may be unrelated to the width of 
the table.

The specs for my table is below:

 
{noformat}
default> show create table one_gram_p;
Query: show create table one_gram_p
CREATE TABLE default.one_gram_p ( 
 ngram STRING, 
 match_count INT, 
 volume_count INT 
 ) 
 PARTITIONED BY ( 
 year STRING 
 ) 
 STORED AS TEXTFILE 
 LOCATION 'hdfs:user/hive/warehouse/one_gram_p' 
 TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'STATS_GENERATED'='TASK', 
'impala.enable.stats.extrapolation'='true', 
'impala.lastComputeStatsTime'='1598383227', 'numRows'='1430731493', 
'totalSize'='22081529047')
{noformat}
 

I need to check against the master branch next.

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> --
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Andre Araujo
>Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +---+
> | summary   |
> +---+
> | Updated 1 partition(s) and 103 column(s). |
> +---+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +---+--++-+--+---+-+---+-+
> | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location|
> +---+--++-+--+---+-+---+-+
> | 0 | -1   | 84 | 20.35GB | NOT CACHED   | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +---+--++-+--+---+-+---+-+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5564) Return a profile for queries during planning

2020-08-25 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184620#comment-17184620
 ] 

Sahil Takiar commented on IMPALA-5564:
--

A WIP patch for this was posted here: https://gerrit.cloudera.org/#/c/8434/

> Return a profile for queries during planning
> 
>
> Key: IMPALA-5564
> URL: https://issues.apache.org/jira/browse/IMPALA-5564
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Lars Volker
>Priority: Major
>  Labels: supportability
>
> During planning we currently don't return a profile from the debug webpages. 
> It would be nice to do so, to allow various monitoring tools to retrieve 
> information about queries during their planning phase.
> This could be a minimal version of the profiles with information about the 
> current state of planning, e.g. that the FE is currently waiting for metadata 
> to be loaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10101) Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python requirements

2020-08-25 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184168#comment-17184168
 ] 

Tim Armstrong commented on IMPALA-10101:


[~keens312] I looked at some of our branches and I think you probably need 
IMPALA-6682, IMPALA-6690, IMPALA-6695, IMPALA-6697, IMPALA-6731, IMPALA-6752, 
IMPALA-6731, IMPALA-6863. Can't guarantee that is complete, but that would be 
the bulk of it.

> Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python 
> requirements
> -
>
> Key: IMPALA-10101
> URL: https://issues.apache.org/jira/browse/IMPALA-10101
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.9.0
> Environment: ubuntu 1604 x86_64
>Reporter: Kevin Yu
>Priority: Major
>
> Reproduce steps:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10104) multiple if funtion and multiple-agg cause impalad crashed

2020-08-25 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184153#comment-17184153
 ] 

Tim Armstrong commented on IMPALA-10104:


This is most likely IMPALA-8969, which was still present in CDH6.3.1

I couldn't reproduce on master with a similar query.

{noformat}
SELECT max(date_string_col) as datekey ,
if(`string_col` like '%a','A',
if(`string_col` like '%b', 'B',
if(`string_col` like '%c', 'C',
if(`string_col` like '%d', 'D', 'E' test,
sum(cast(double_col AS FLOAT)) / (id/ 1000) AS ecpm,
id AS e_num,
count(DISTINCT timestamp_col) AS r_num,
sum(float_col) AS d_num,
sum(double_col) AS c_num,
count(DISTINCT tinyint_col) AS uv,
sum(bigint_col) /count(DISTINCT bigint_col) AS d_rate,
sum(int_col) /count(DISTINCT int_col) AS c_rate,
sum(cast(double_col AS FLOAT)) AS money
FROM alltypes
WHERE date_string_col = '01/13/09'
GROUP BY 2, 4;
{noformat}

> multiple if funtion and multiple-agg cause  impalad crashed
> ---
>
> Key: IMPALA-10104
> URL: https://issues.apache.org/jira/browse/IMPALA-10104
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Affects Versions: Impala 3.2.0
> Environment: CDH6.3.1
> jdk 1.8.0_131
>Reporter: lxc
>Priority: Major
>
> sql:
> SELECT max(datekey) as datekey ,
>  if(`exp` like '%a','A',
>  if(`exp` like '%b', 'B',
>  if(`exp` like '%c', 'C',
>  if(`exp` like '%d', 'D', 'E' test,
>  sum(cast(money AS FLOAT)) / (count(*)/ 1000) AS ecpm,
>  count(*) AS e_num,
>  count(DISTINCT aa) AS r_num,
>  sum(isd) AS d_num,
>  sum(isc) AS c_num,
>  count(DISTINCT bb) AS uv,
>  sum(isd) /count(DISTINCT aa) AS d_rate,
>  sum(isc) /count(DISTINCT aa) AS c_rate,
>  sum(cast(money AS FLOAT)) AS money
>  FROM tableA
>  WHERE datekey = '20200812'
>  GROUP BY test;
> the coredump info.:
>  
> Program terminated with signal 6, Aborted.
>  #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6
>  Missing separate debuginfos, use: debuginfo-install 
> cyrus-sasl-gssapi-2.1.26-23.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 
> cyrus-sasl-plain-2.1.26-23.el7.x86_64 glibc-2.17-196.el7.x86_64 
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 
> libcom_err-1.42.9-10.el7.x86_64 libdb-5.3.21-20.el7.x86_64 
> libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-6.el7.x86_64 
> openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 
> zlib-1.2.7-17.el7.x86_64
>  (gdb) bt
>  #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6
>  #1 0x7f5498dec8e8 in abort () from /lib64/libc.so.6
>  #2 0x7f549bd7d3b5 in os::abort(bool) () from 
> /opt/jdk/jre/lib/amd64/server/libjvm.so
>  #3 0x7f549bf1f673 in VMError::report_and_die() () from 
> /opt/jdk/jre/lib/amd64/server/libjvm.so
>  #4 0x7f549bd828bf in JVM_handle_linux_signal () from 
> /opt/jdk/jre/lib/amd64/server/libjvm.so
>  #5 0x7f549bd78e13 in signalHandler(int, siginfo*, void*) () from 
> /opt/jdk/jre/lib/amd64/server/libjvm.so
>  #6 
>  #7 0x011084ff in impala_udf::FunctionContext::Free(unsigned char*) ()
>  #8 0x00b92d3f in 
> impala::AggregateFunctions::StringValSerializeOrFinalize(impala_udf::FunctionContext*,
>  impala_udf::StringVal const&) ()
>  #9 0x013207ed in 
> impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, 
> impala::SlotDescriptor const&, impala::Tuple*, void*) ()
>  #10 0x7f54321eb82f in ?? ()
>  #11 0x7f535bf19e00 in ?? ()
>  #12 0x0003 in ?? ()
>  #13 0x0400e360a8c5 in ?? ()
>  #14 0x0001 in ?? ()
>  #15 0x7f4fd302f1e0 in ?? ()
>  #16 0x435d70c0 in ?? ()
>  #17 0x7f535bf19f00 in ?? ()
>  #18 0x7f532b536000 in ?? ()
>  #19 0x12effc08 in ?? ()
>  #20 0x1e39d5d0 in ?? ()
>  #21 0x0400 in ?? ()
>  #22 0x in ?? ()
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10060) Postgres JDBC driver should be upgraded to 42.2.14

2020-08-25 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184066#comment-17184066
 ] 

Kevin Risden commented on IMPALA-10060:
---

I attached a patch  [^IMPALA-10060.patch] instead of the gerrit setup due to 
the excessive permissions requested from Gerrit. 

The patch was generated with:

{code:java}
git format-patch -1 HEAD --stdout > ~/Downloads/IMPALA-10060.patch
{code}


> Postgres JDBC driver should be upgraded to 42.2.14
> --
>
> Key: IMPALA-10060
> URL: https://issues.apache.org/jira/browse/IMPALA-10060
> Project: IMPALA
>  Issue Type: Task
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Attachments: IMPALA-10060.patch
>
>
> Impala currently uses Postgres driver version 42.2.5 which isn't up to date 
> and has a CVE associated with it. It would be good to upgrade to 42.2.14 
> which is the latest as of June 2020.
> https://mvnrepository.com/artifact/org.postgresql/postgresql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10060) Postgres JDBC driver should be upgraded to 42.2.14

2020-08-25 Thread Kevin Risden (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated IMPALA-10060:
--
Attachment: IMPALA-10060.patch

> Postgres JDBC driver should be upgraded to 42.2.14
> --
>
> Key: IMPALA-10060
> URL: https://issues.apache.org/jira/browse/IMPALA-10060
> Project: IMPALA
>  Issue Type: Task
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Attachments: IMPALA-10060.patch
>
>
> Impala currently uses Postgres driver version 42.2.5 which isn't up to date 
> and has a CVE associated with it. It would be good to upgrade to 42.2.14 
> which is the latest as of June 2020.
> https://mvnrepository.com/artifact/org.postgresql/postgresql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9952) Parquet with lz4 ColumnIndex filter error

2020-08-25 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184011#comment-17184011
 ] 

Zoltán Borók-Nagy commented on IMPALA-9952:
---

[~guojingfeng] Could you tell me the schema of your table? Is the data sorted 
by any of the columns? What kind of queries hit this bug?

Also, were you able to reproduce this bug on an obscured data set that can be 
shared?

> Parquet with lz4 ColumnIndex filter error
> -
>
> Key: IMPALA-9952
> URL: https://issues.apache.org/jira/browse/IMPALA-9952
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: guojingfeng
>Priority: Major
>
> When reading parquet file with lz4 compress codec, encountered the following 
> error:
> {code:java}
> I0714 16:11:48.307806 1075820 runtime-state.cc:207] 
> 8c43203adb2d4fc8:0478df9b018b] Error from query 
> 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> I0714 16:11:48.834901 1075838 status.cc:126] 
> 8c43203adb2d4fc8:0478df9b02c0] Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> @   0xbf4ef9
> @  0x1748c41
> @  0x174e170
> @  0x1750e58
> @  0x17519f0
> @  0x1748559
> @  0x1510b41
> @  0x1512c8f
> @  0x137488a
> @  0x1375759
> @  0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> I0714 16:11:48.835763 1075838 runtime-state.cc:207] 
> 8c43203adb2d4fc8:0478df9b02c0] Error from query 
> 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> I0714 16:11:48.893784 1075820 status.cc:126] 
> 8c43203adb2d4fc8:0478df9b018b] Top level rows aren't in sync during page 
> filtering in file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> @   0xbf4ef9
> @  0x1749104
> @  0x17494cc
> @  0x1751aee
> @  0x1748559
> @  0x1510b41
> @  0x1512c8f
> @  0x137488a
> @  0x1375759
> @  0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> {code}
>  Corresponding source code:
> {code:java}
> Status HdfsParquetScanner::CheckPageFiltering() {
>   if (candidate_ranges_.empty() || scalar_readers_.empty()) return 
> Status::OK();  int64_t current_row = scalar_readers_[0]->LastProcessedRow();
>   for (int i = 1; i < scalar_readers_.size(); ++i) {
> if (current_row != scalar_readers_[i]->LastProcessedRow()) {
>   DCHECK(false);
>   return Status(Substitute(
>   "Top level rows aren't in sync during page filtering in file $0.", 
> filename()));
> }
>   }
>   return Status::OK();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10108) Implement ds_kll_stringify function

2020-08-25 Thread Gabor Kaszab (Jira)
Gabor Kaszab created IMPALA-10108:
-

 Summary: Implement ds_kll_stringify function
 Key: IMPALA-10108
 URL: https://issues.apache.org/jira/browse/IMPALA-10108
 Project: IMPALA
  Issue Type: New Feature
  Components: Backend
Reporter: Gabor Kaszab


ds_kll_stringify() receives a string that is a serialized Apache DataSketches 
sketch and returns its stringified format by invoking the related function on 
the sketch's interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10107) Implement HLL functions to have full compatibility with Hive

2020-08-25 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab updated IMPALA-10107:
--
Priority: Minor  (was: Major)

> Implement HLL functions to have full compatibility with Hive
> 
>
> Key: IMPALA-10107
> URL: https://issues.apache.org/jira/browse/IMPALA-10107
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Gabor Kaszab
>Priority: Minor
>
> ds_hll_estimate_bounds
> ds_hll_stringify
> ds_hll_union_f
> For parameters and expected behaviour check Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10107) Implement HLL functions to have full compatibility with Hive

2020-08-25 Thread Gabor Kaszab (Jira)
Gabor Kaszab created IMPALA-10107:
-

 Summary: Implement HLL functions to have full compatibility with 
Hive
 Key: IMPALA-10107
 URL: https://issues.apache.org/jira/browse/IMPALA-10107
 Project: IMPALA
  Issue Type: New Feature
  Components: Backend
Reporter: Gabor Kaszab


ds_hll_estimate_bounds
ds_hll_stringify
ds_hll_union_f

For parameters and expected behaviour check Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10106) Update DataSketches

2020-08-25 Thread Adam Tamas (Jira)
Adam Tamas created IMPALA-10106:
---

 Summary: Update DataSketches
 Key: IMPALA-10106
 URL: https://issues.apache.org/jira/browse/IMPALA-10106
 Project: IMPALA
  Issue Type: Improvement
Reporter: Adam Tamas


Update the external DataSketches files for HLL/KLL to version 2.1.x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10106) Update DataSketches

2020-08-25 Thread Adam Tamas (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Tamas reassigned IMPALA-10106:
---

Assignee: Adam Tamas

> Update DataSketches
> ---
>
> Key: IMPALA-10106
> URL: https://issues.apache.org/jira/browse/IMPALA-10106
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Adam Tamas
>Assignee: Adam Tamas
>Priority: Minor
>
> Update the external DataSketches files for HLL/KLL to version 2.1.x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10105) Implement KLL functions that return array of doubles

2020-08-25 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab updated IMPALA-10105:
--
Summary: Implement KLL functions that return array of doubles  (was: 
Rewrite KLL functions to return array of doubles)

> Implement KLL functions that return array of doubles
> 
>
> Key: IMPALA-10105
> URL: https://issues.apache.org/jira/browse/IMPALA-10105
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Gabor Kaszab
>Priority: Major
>
> There are functions that originally meant to return Array but since 
> Impala doesn't have support for returning complex types these functions were 
> implemented to return a comma separated string.
> To avoid breaking compatibility these functions got "_as_string" postfix:
> - ds_kll_quantiles_as_string()
> - ds_kll_pmf_as_string()
> - ds_kll_cdf_as_string()
> This ticket is to implement the version of these functions that actually 
> return an array of doubles.
> - ds_kll_quantiles()
> - ds_kll_pmf()
> - ds_kll_cdf()
> This is the Jira that track complex types support as return type:
> https://issues.apache.org/jira/browse/IMPALA-9520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10105) Rewrite KLL functions to return array of doubles

2020-08-25 Thread Gabor Kaszab (Jira)
Gabor Kaszab created IMPALA-10105:
-

 Summary: Rewrite KLL functions to return array of doubles
 Key: IMPALA-10105
 URL: https://issues.apache.org/jira/browse/IMPALA-10105
 Project: IMPALA
  Issue Type: New Feature
  Components: Backend
Reporter: Gabor Kaszab


There are functions that originally meant to return Array but since 
Impala doesn't have support for returning complex types these functions were 
implemented to return a comma separated string.

To avoid breaking compatibility these functions got "_as_string" postfix:
- ds_kll_quantiles_as_string()
- ds_kll_pmf_as_string()
- ds_kll_cdf_as_string()

This ticket is to implement the version of these functions that actually return 
an array of doubles.
- ds_kll_quantiles()
- ds_kll_pmf()
- ds_kll_cdf()

This is the Jira that track complex types support as return type:
https://issues.apache.org/jira/browse/IMPALA-9520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10020) Implement ds_kll_cdf() function

2020-08-25 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab updated IMPALA-10020:
--
Description: 
Requirements for ds_kll_cdf() (Cumulative Distribution Function):
 - Receives a serialized KLL sketch in BINARY type (in Impala it can be STRING 
as long as we don't have BINARY) as first parameter.
 - Receives one or more float values to create ranges from the sketched data.
 - In Hive the return type is an array of doubles. However, Impala can't return 
complex types from functions at this point so we have to find some alternative 
approaches to implement this function. Follow whatever solution came up in 
https://issues.apache.org/jira/browse/IMPALA-9962

An example:
{code:java}
select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table;
{code}
This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), 
(-inf,4), (-inf,+inf)
In Hive, the result would have an array of 5 doubles for the 5 ranges, where 
each number gives the probability between [0,1] that an item will fall into the 
particular range. Or in other words a ratio of items belonging to that range.

Taking input values such as: 1,2,3,4,5
{code:java}
select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches;
++
|_c0 |
++
| [0.0,0.4,0.6,0.8,1.0,1.0]  |
++
{code}

  was:
Requirements for ds_kll_cdf() (Cumulative Distribution Function):
 - Receives a serialized KLL sketch in BINARY type (in Impala it can be STRING 
as long as we don't have BINARY) as first parameter.
 - Receives one or more double values to create ranges from the sketched data.
 - In Hive the return type is an array of doubles. However, Impala can't return 
complex types from functions at this point so we have to find some alternative 
approaches to implement this function. Follow whatever solution came up in 
https://issues.apache.org/jira/browse/IMPALA-9962

An example:
{code:java}
select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table;
{code}
This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), 
(-inf,4), (-inf,+inf)
In Hive, the result would have an array of 5 doubles for the 5 ranges, where 
each number gives the probability between [0,1] that an item will fall into the 
particular range. Or in other words a ratio of items belonging to that range.

Taking input values such as: 1,2,3,4,5
{code:java}
select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches;
++
|_c0 |
++
| [0.0,0.4,0.6,0.8,1.0,1.0]  |
++
{code}


> Implement ds_kll_cdf() function
> ---
>
> Key: IMPALA-10020
> URL: https://issues.apache.org/jira/browse/IMPALA-10020
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>
> Requirements for ds_kll_cdf() (Cumulative Distribution Function):
>  - Receives a serialized KLL sketch in BINARY type (in Impala it can be 
> STRING as long as we don't have BINARY) as first parameter.
>  - Receives one or more float values to create ranges from the sketched data.
>  - In Hive the return type is an array of doubles. However, Impala can't 
> return complex types from functions at this point so we have to find some 
> alternative approaches to implement this function. Follow whatever solution 
> came up in https://issues.apache.org/jira/browse/IMPALA-9962
> An example:
> {code:java}
> select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table;
> {code}
> This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), 
> (-inf,4), (-inf,+inf)
> In Hive, the result would have an array of 5 doubles for the 5 ranges, where 
> each number gives the probability between [0,1] that an item will fall into 
> the particular range. Or in other words a ratio of items belonging to that 
> range.
> Taking input values such as: 1,2,3,4,5
> {code:java}
> select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches;
> ++
> |_c0 |
> ++
> | [0.0,0.4,0.6,0.8,1.0,1.0]  |
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10020) Implement ds_kll_cdf() function

2020-08-25 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab updated IMPALA-10020:
--
Description: 
Requirements for ds_kll_cdf() (Cumulative Distribution Function):
 - Receives a serialized KLL sketch in BINARY type (in Impala it can be STRING 
as long as we don't have BINARY) as first parameter.
 - Receives one or more double values to create ranges from the sketched data.
 - In Hive the return type is an array of doubles. However, Impala can't return 
complex types from functions at this point so we have to find some alternative 
approaches to implement this function. Follow whatever solution came up in 
https://issues.apache.org/jira/browse/IMPALA-9962

An example:
{code:java}
select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table;
{code}
This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), 
(-inf,4), (-inf,+inf)
In Hive, the result would have an array of 5 doubles for the 5 ranges, where 
each number gives the probability between [0,1] that an item will fall into the 
particular range. Or in other words a ratio of items belonging to that range.

Taking input values such as: 1,2,3,4,5
{code:java}
select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches;
++
|_c0 |
++
| [0.0,0.4,0.6,0.8,1.0,1.0]  |
++
{code}

  was:
Requirements for ds_kll_cdf() (Cumulative Distribution Function):
 - Receives a serialized KLL sketch in BINARY type (in Impala it can be STRING 
as long as we don't have BINARY) as first parameter.
 - Receives one or more double values to create ranges from the sketched data.
 - In Hive the return type is an array of doubles. However, Impala can't return 
complex types from functions at this point so we have to find some alternative 
approaches to implement this function. Follow whatever solution came up in 
https://issues.apache.org/jira/browse/IMPALA-9962

An example:
{code:java}
select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table;
{code}
This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), 
(-inf,4), [4,+inf)
In Hive, the result would have an array of 5 doubles for the 5 ranges, where 
each number gives the probability between [0,1] that an item will fall into the 
particular range. Or in other words a ratio of items belonging to that range.

Taking input values such as: 1,2,3,4,5
{code:java}
select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches;
++
|_c0 |
++
| [0.0,0.4,0.6,0.8,1.0,1.0]  |
++
{code}


> Implement ds_kll_cdf() function
> ---
>
> Key: IMPALA-10020
> URL: https://issues.apache.org/jira/browse/IMPALA-10020
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>
> Requirements for ds_kll_cdf() (Cumulative Distribution Function):
>  - Receives a serialized KLL sketch in BINARY type (in Impala it can be 
> STRING as long as we don't have BINARY) as first parameter.
>  - Receives one or more double values to create ranges from the sketched data.
>  - In Hive the return type is an array of doubles. However, Impala can't 
> return complex types from functions at this point so we have to find some 
> alternative approaches to implement this function. Follow whatever solution 
> came up in https://issues.apache.org/jira/browse/IMPALA-9962
> An example:
> {code:java}
> select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table;
> {code}
> This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), 
> (-inf,4), (-inf,+inf)
> In Hive, the result would have an array of 5 doubles for the 5 ranges, where 
> each number gives the probability between [0,1] that an item will fall into 
> the particular range. Or in other words a ratio of items belonging to that 
> range.
> Taking input values such as: 1,2,3,4,5
> {code:java}
> select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches;
> ++
> |_c0 |
> ++
> | [0.0,0.4,0.6,0.8,1.0,1.0]  |
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10020) Implement ds_kll_cdf() function

2020-08-25 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab reassigned IMPALA-10020:
-

Assignee: Gabor Kaszab

> Implement ds_kll_cdf() function
> ---
>
> Key: IMPALA-10020
> URL: https://issues.apache.org/jira/browse/IMPALA-10020
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>
> Requirements for ds_kll_cdf() (Cumulative Distribution Function):
>  - Receives a serialized KLL sketch in BINARY type (in Impala it can be 
> STRING as long as we don't have BINARY) as first parameter.
>  - Receives one or more double values to create ranges from the sketched data.
>  - In Hive the return type is an array of doubles. However, Impala can't 
> return complex types from functions at this point so we have to find some 
> alternative approaches to implement this function. Follow whatever solution 
> came up in https://issues.apache.org/jira/browse/IMPALA-9962
> An example:
> {code:java}
> select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table;
> {code}
> This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), 
> (-inf,4), [4,+inf)
> In Hive, the result would have an array of 5 doubles for the 5 ranges, where 
> each number gives the probability between [0,1] that an item will fall into 
> the particular range. Or in other words a ratio of items belonging to that 
> range.
> Taking input values such as: 1,2,3,4,5
> {code:java}
> select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches;
> ++
> |_c0 |
> ++
> | [0.0,0.4,0.6,0.8,1.0,1.0]  |
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10020) Implement ds_kll_cdf() function

2020-08-25 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10020 started by Gabor Kaszab.
-
> Implement ds_kll_cdf() function
> ---
>
> Key: IMPALA-10020
> URL: https://issues.apache.org/jira/browse/IMPALA-10020
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>
> Requirements for ds_kll_cdf() (Cumulative Distribution Function):
>  - Receives a serialized KLL sketch in BINARY type (in Impala it can be 
> STRING as long as we don't have BINARY) as first parameter.
>  - Receives one or more double values to create ranges from the sketched data.
>  - In Hive the return type is an array of doubles. However, Impala can't 
> return complex types from functions at this point so we have to find some 
> alternative approaches to implement this function. Follow whatever solution 
> came up in https://issues.apache.org/jira/browse/IMPALA-9962
> An example:
> {code:java}
> select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table;
> {code}
> This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), 
> (-inf,4), [4,+inf)
> In Hive, the result would have an array of 5 doubles for the 5 ranges, where 
> each number gives the probability between [0,1] that an item will fall into 
> the particular range. Or in other words a ratio of items belonging to that 
> range.
> Taking input values such as: 1,2,3,4,5
> {code:java}
> select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches;
> ++
> |_c0 |
> ++
> | [0.0,0.4,0.6,0.8,1.0,1.0]  |
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10104) multiple if funtion and multiple-agg cause impalad crashed

2020-08-25 Thread lxc (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lxc updated IMPALA-10104:
-
Description: 
sql:

SELECT max(datekey) as datekey ,
 if(`exp` like '%a','A',
 if(`exp` like '%b', 'B',
 if(`exp` like '%c', 'C',
 if(`exp` like '%d', 'D', 'E' test,
 sum(cast(money AS FLOAT)) / (count(*)/ 1000) AS ecpm,
 count(*) AS e_num,
 count(DISTINCT aa) AS r_num,
 sum(isd) AS d_num,
 sum(isc) AS c_num,
 count(DISTINCT bb) AS uv,
 sum(isd) /count(DISTINCT aa) AS d_rate,
 sum(isc) /count(DISTINCT aa) AS c_rate,
 sum(cast(money AS FLOAT)) AS money
 FROM tableA
 WHERE datekey = '20200812'
 GROUP BY test;

the coredump info.:

 

Program terminated with signal 6, Aborted.
 #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6
 Missing separate debuginfos, use: debuginfo-install 
cyrus-sasl-gssapi-2.1.26-23.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 
cyrus-sasl-plain-2.1.26-23.el7.x86_64 glibc-2.17-196.el7.x86_64 
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 
libcom_err-1.42.9-10.el7.x86_64 libdb-5.3.21-20.el7.x86_64 
libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-6.el7.x86_64 
openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 
zlib-1.2.7-17.el7.x86_64
 (gdb) bt
 #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6
 #1 0x7f5498dec8e8 in abort () from /lib64/libc.so.6
 #2 0x7f549bd7d3b5 in os::abort(bool) () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
 #3 0x7f549bf1f673 in VMError::report_and_die() () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
 #4 0x7f549bd828bf in JVM_handle_linux_signal () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
 #5 0x7f549bd78e13 in signalHandler(int, siginfo*, void*) () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
 #6 
 #7 0x011084ff in impala_udf::FunctionContext::Free(unsigned char*) ()
 #8 0x00b92d3f in 
impala::AggregateFunctions::StringValSerializeOrFinalize(impala_udf::FunctionContext*,
 impala_udf::StringVal const&) ()
 #9 0x013207ed in 
impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, 
impala::SlotDescriptor const&, impala::Tuple*, void*) ()
 #10 0x7f54321eb82f in ?? ()
 #11 0x7f535bf19e00 in ?? ()
 #12 0x0003 in ?? ()
 #13 0x0400e360a8c5 in ?? ()
 #14 0x0001 in ?? ()
 #15 0x7f4fd302f1e0 in ?? ()
 #16 0x435d70c0 in ?? ()
 #17 0x7f535bf19f00 in ?? ()
 #18 0x7f532b536000 in ?? ()
 #19 0x12effc08 in ?? ()
 #20 0x1e39d5d0 in ?? ()
 #21 0x0400 in ?? ()
 #22 0x in ?? ()

 

  was:
sql:

SELECT max(datekey) as datekey ,
if(`exp` like '%a','A',
if(`exp` like '%b', 'B',
if(`exp` like '%c', 'C',
if(`exp` like '%d', 'D', 'E' test,
sum(cast(money AS FLOAT)) / (count(*)/ 1000) AS ecpm,
count(*) AS e_num,
count(DISTINCT aa) AS r_num,
sum(isd) AS d_num,
sum(isc) AS c_num,
count(DISTINCT bb) AS uv,
sum(isd) /count(DISTINCT aa) AS d_rate,
sum(isc) /count(DISTINCT aa) AS c_rate,
sum(cast(money AS FLOAT)) AS money
FROM tableA
WHERE datekey = '20200812'
GROUP BY test;

the coredump info:

 

Program terminated with signal 6, Aborted.
#0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
cyrus-sasl-gssapi-2.1.26-23.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 
cyrus-sasl-plain-2.1.26-23.el7.x86_64 glibc-2.17-196.el7.x86_64 
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 
libcom_err-1.42.9-10.el7.x86_64 libdb-5.3.21-20.el7.x86_64 
libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-6.el7.x86_64 
openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 
zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6
#1 0x7f5498dec8e8 in abort () from /lib64/libc.so.6
#2 0x7f549bd7d3b5 in os::abort(bool) () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
#3 0x7f549bf1f673 in VMError::report_and_die() () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
#4 0x7f549bd828bf in JVM_handle_linux_signal () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
#5 0x7f549bd78e13 in signalHandler(int, siginfo*, void*) () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
#6 
#7 0x011084ff in impala_udf::FunctionContext::Free(unsigned char*) ()
#8 0x00b92d3f in 
impala::AggregateFunctions::StringValSerializeOrFinalize(impala_udf::FunctionContext*,
 impala_udf::StringVal const&) ()
#9 0x013207ed in 
impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, 
impala::SlotDescriptor const&, impala::Tuple*, void*) ()
#10 0x7f54321eb82f in ?? ()
#11 0x7f535bf19e00 in ?? ()
#12 0x0003 in ?? ()
#13 0x0400e360a8c5 in ?? ()
#14 0x0001 in ?? ()
#15 0x7f4fd302f1e0 in ?? ()
#16 0x435d70c0 in ?? ()
#17 0x7f535bf19f00 in ?? ()
#18 0x7f532b536000 in ?? ()
#19 0x12effc08 in ?? ()
#20 0x1e39d5d0 in ?? ()
#21 0x0400 in ?? ()

[jira] [Created] (IMPALA-10104) multiple if funtion and multiple-agg cause impalad crashed

2020-08-25 Thread lxc (Jira)
lxc created IMPALA-10104:


 Summary: multiple if funtion and multiple-agg cause  impalad 
crashed
 Key: IMPALA-10104
 URL: https://issues.apache.org/jira/browse/IMPALA-10104
 Project: IMPALA
  Issue Type: Bug
  Components: Backend, Frontend
Affects Versions: Impala 3.2.0
 Environment: CDH6.3.1
jdk 1.8.0_131
Reporter: lxc


sql:

SELECT max(datekey) as datekey ,
if(`exp` like '%a','A',
if(`exp` like '%b', 'B',
if(`exp` like '%c', 'C',
if(`exp` like '%d', 'D', 'E' test,
sum(cast(money AS FLOAT)) / (count(*)/ 1000) AS ecpm,
count(*) AS e_num,
count(DISTINCT aa) AS r_num,
sum(isd) AS d_num,
sum(isc) AS c_num,
count(DISTINCT bb) AS uv,
sum(isd) /count(DISTINCT aa) AS d_rate,
sum(isc) /count(DISTINCT aa) AS c_rate,
sum(cast(money AS FLOAT)) AS money
FROM tableA
WHERE datekey = '20200812'
GROUP BY test;

the coredump info:

 

Program terminated with signal 6, Aborted.
#0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
cyrus-sasl-gssapi-2.1.26-23.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 
cyrus-sasl-plain-2.1.26-23.el7.x86_64 glibc-2.17-196.el7.x86_64 
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 
libcom_err-1.42.9-10.el7.x86_64 libdb-5.3.21-20.el7.x86_64 
libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-6.el7.x86_64 
openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 
zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6
#1 0x7f5498dec8e8 in abort () from /lib64/libc.so.6
#2 0x7f549bd7d3b5 in os::abort(bool) () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
#3 0x7f549bf1f673 in VMError::report_and_die() () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
#4 0x7f549bd828bf in JVM_handle_linux_signal () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
#5 0x7f549bd78e13 in signalHandler(int, siginfo*, void*) () from 
/opt/jdk/jre/lib/amd64/server/libjvm.so
#6 
#7 0x011084ff in impala_udf::FunctionContext::Free(unsigned char*) ()
#8 0x00b92d3f in 
impala::AggregateFunctions::StringValSerializeOrFinalize(impala_udf::FunctionContext*,
 impala_udf::StringVal const&) ()
#9 0x013207ed in 
impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, 
impala::SlotDescriptor const&, impala::Tuple*, void*) ()
#10 0x7f54321eb82f in ?? ()
#11 0x7f535bf19e00 in ?? ()
#12 0x0003 in ?? ()
#13 0x0400e360a8c5 in ?? ()
#14 0x0001 in ?? ()
#15 0x7f4fd302f1e0 in ?? ()
#16 0x435d70c0 in ?? ()
#17 0x7f535bf19f00 in ?? ()
#18 0x7f532b536000 in ?? ()
#19 0x12effc08 in ?? ()
#20 0x1e39d5d0 in ?? ()
#21 0x0400 in ?? ()
#22 0x in ?? ()

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10101) Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python requirements

2020-08-25 Thread Kevin Yu (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183752#comment-17183752
 ] 

Kevin Yu edited comment on IMPALA-10101 at 8/25/20, 5:59 AM:
-

This Tim for the reply.  I have to look into the python infra scripts to figure 
it out how to build older versions of cloudera impala branches.


was (Author: keens312):
This Tim for the reply. 

> Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python 
> requirements
> -
>
> Key: IMPALA-10101
> URL: https://issues.apache.org/jira/browse/IMPALA-10101
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.9.0
> Environment: ubuntu 1604 x86_64
>Reporter: Kevin Yu
>Priority: Major
>
> Reproduce steps:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org