[jira] [Assigned] (IMPALA-7221) While reading from object store S3/ADLS at fast rates +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck

2018-06-28 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar reassigned IMPALA-7221:
---

Assignee: Sailesh Mukil

> While reading from object store S3/ADLS at fast rates +500MB/sec 
> TypeArrayKlass::allocate_common becomes a CPU bottleneck
> -
>
> Key: IMPALA-7221
> URL: https://issues.apache.org/jira/browse/IMPALA-7221
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Mostafa Mokhtar
>Assignee: Sailesh Mukil
>Priority: Major
> Attachments: s3_alloc_expensive_1_js.txt, s3_alloc_expensive_2_ps.txt
>
>
> From Perf
> {code}
> Samples: 1M of event 'cpu-clock', Event count (approx.): 32005850
>   Children  Self  Command Shared Object  Symbol   
>   ◆
> -   16.46% 0.04%  impalad impalad[.] 
> hdfsRead   ▒
>- 16.45% hdfsRead  
>   ▒
>   - 9.71% jni_NewByteArray
>   ▒
>9.63% TypeArrayKlass::allocate_common  
>   ▒
> 6.57% __memmove_ssse3_back
>   ▒
> +9.72% 0.03%  impalad libjvm.so  [.] 
> jni_NewByteArray   ▒
> +9.67% 8.79%  impalad libjvm.so  [.] 
> TypeArrayKlass::allocate_co▒
> +8.82% 0.00%  impalad [unknown]  [.] 
>    ▒
> +7.67% 0.04%  impalad [kernel.kallsyms]  [k] 
> system_call_fastpath   ▒
> +7.19% 7.02%  impalad impalad[.] 
> impala::ScalarColumnReader<▒
> +7.18% 6.55%  impalad libc-2.17.so   [.] 
> __memmove_ssse3_back   ▒
> +6.32% 0.00%  impalad [unknown]  [.] 
> 0x001a9458 ▒
> +6.07% 0.00%  impalad [kernel.kallsyms]  [k] 
> do_softirq ▒
> +6.07% 0.00%  impalad [kernel.kallsyms]  [k] 
> call_softirq   ▒
> +6.05% 0.24%  impalad [kernel.kallsyms]  [k] 
> __do_softirq   ▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> xen_hvm_callback_vector▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> xen_evtchn_do_upcall   ▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> irq_exit   ▒
> +5.81% 0.03%  impalad [kernel.kallsyms]  [k] 
> net_rx_action  ▒
> {code}
> {code}
> #0  0x7ffa3d78d69b in TypeArrayKlass::allocate_common(int, bool, Thread*) 
> () from /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so
> #1  0x7ffa3d3e22d2 in jni_NewByteArray () from 
> /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so
> #2  0x020ec13c in hdfsRead ()
> #3  0x01100948 in impala::io::ScanRange::Read(unsigned char*, long, 
> long*, bool*) ()
> #4  0x010fa294 in 
> impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, 
> impala::io::RequestContext*, impala::io::ScanRange*) ()
> #5  0x010fa3f4 in 
> impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) ()
> #6  0x00d15193 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::Promise*) ()
> #7  0x00d158d4 in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, 
> impala::Promise*), boost::_bi::list4, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value*> > > >::run() ()
> #8  0x012919aa in thread_proxy ()
> #9  0x7ffa3b6a6e25 in start_thread () from /lib64/libpthread.so.0
> #10 0x7ffa3b3d0bad in clone () from /lib64/libc.so.6
> {code}
> There is also log4j contention in the JVM due to writing error messages to 
> impalad.ERRO like this
> {code}
> readDirect: FSDataInputStream#read error:
> UnsupportedOperationException: Byte-buffer read unsupported by input 
> streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by input stream
> at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)
> readDirect: FSDataInputStream#read error:
> UnsupportedOperationException: Byte-buffer read unsupported by input 
> streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by 

[jira] [Updated] (IMPALA-7221) While reading from object store S3/ADLS at +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck

2018-06-28 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated IMPALA-7221:

Summary: While reading from object store S3/ADLS at +500MB/sec 
TypeArrayKlass::allocate_common becomes a CPU bottleneck  (was: While reading 
from object store S3/ADLS at fast rates +500MB/sec 
TypeArrayKlass::allocate_common becomes a CPU bottleneck)

> While reading from object store S3/ADLS at +500MB/sec 
> TypeArrayKlass::allocate_common becomes a CPU bottleneck
> --
>
> Key: IMPALA-7221
> URL: https://issues.apache.org/jira/browse/IMPALA-7221
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Mostafa Mokhtar
>Assignee: Sailesh Mukil
>Priority: Major
> Attachments: s3_alloc_expensive_1_js.txt, s3_alloc_expensive_2_ps.txt
>
>
> From Perf
> {code}
> Samples: 1M of event 'cpu-clock', Event count (approx.): 32005850
>   Children  Self  Command Shared Object  Symbol   
>   ◆
> -   16.46% 0.04%  impalad impalad[.] 
> hdfsRead   ▒
>- 16.45% hdfsRead  
>   ▒
>   - 9.71% jni_NewByteArray
>   ▒
>9.63% TypeArrayKlass::allocate_common  
>   ▒
> 6.57% __memmove_ssse3_back
>   ▒
> +9.72% 0.03%  impalad libjvm.so  [.] 
> jni_NewByteArray   ▒
> +9.67% 8.79%  impalad libjvm.so  [.] 
> TypeArrayKlass::allocate_co▒
> +8.82% 0.00%  impalad [unknown]  [.] 
>    ▒
> +7.67% 0.04%  impalad [kernel.kallsyms]  [k] 
> system_call_fastpath   ▒
> +7.19% 7.02%  impalad impalad[.] 
> impala::ScalarColumnReader<▒
> +7.18% 6.55%  impalad libc-2.17.so   [.] 
> __memmove_ssse3_back   ▒
> +6.32% 0.00%  impalad [unknown]  [.] 
> 0x001a9458 ▒
> +6.07% 0.00%  impalad [kernel.kallsyms]  [k] 
> do_softirq ▒
> +6.07% 0.00%  impalad [kernel.kallsyms]  [k] 
> call_softirq   ▒
> +6.05% 0.24%  impalad [kernel.kallsyms]  [k] 
> __do_softirq   ▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> xen_hvm_callback_vector▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> xen_evtchn_do_upcall   ▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> irq_exit   ▒
> +5.81% 0.03%  impalad [kernel.kallsyms]  [k] 
> net_rx_action  ▒
> {code}
> {code}
> #0  0x7ffa3d78d69b in TypeArrayKlass::allocate_common(int, bool, Thread*) 
> () from /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so
> #1  0x7ffa3d3e22d2 in jni_NewByteArray () from 
> /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so
> #2  0x020ec13c in hdfsRead ()
> #3  0x01100948 in impala::io::ScanRange::Read(unsigned char*, long, 
> long*, bool*) ()
> #4  0x010fa294 in 
> impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, 
> impala::io::RequestContext*, impala::io::ScanRange*) ()
> #5  0x010fa3f4 in 
> impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) ()
> #6  0x00d15193 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::Promise*) ()
> #7  0x00d158d4 in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, 
> impala::Promise*), boost::_bi::list4, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value*> > > >::run() ()
> #8  0x012919aa in thread_proxy ()
> #9  0x7ffa3b6a6e25 in start_thread () from /lib64/libpthread.so.0
> #10 0x7ffa3b3d0bad in clone () from /lib64/libc.so.6
> {code}
> There is also log4j contention in the JVM due to writing error messages to 
> impalad.ERRO like this
> {code}
> readDirect: FSDataInputStream#read error:
> UnsupportedOperationException: Byte-buffer read unsupported by input 
> streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by input stream
> at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)

[jira] [Updated] (IMPALA-7221) While reading from object store S3/ADLS at fast rates +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck

2018-06-28 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated IMPALA-7221:

Attachment: s3_alloc_expensive_2_ps.txt
s3_alloc_expensive_1_js.txt

> While reading from object store S3/ADLS at fast rates +500MB/sec 
> TypeArrayKlass::allocate_common becomes a CPU bottleneck
> -
>
> Key: IMPALA-7221
> URL: https://issues.apache.org/jira/browse/IMPALA-7221
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Mostafa Mokhtar
>Priority: Major
> Attachments: s3_alloc_expensive_1_js.txt, s3_alloc_expensive_2_ps.txt
>
>
> From Perf
> {code}
> Samples: 1M of event 'cpu-clock', Event count (approx.): 32005850
>   Children  Self  Command Shared Object  Symbol   
>   ◆
> -   16.46% 0.04%  impalad impalad[.] 
> hdfsRead   ▒
>- 16.45% hdfsRead  
>   ▒
>   - 9.71% jni_NewByteArray
>   ▒
>9.63% TypeArrayKlass::allocate_common  
>   ▒
> 6.57% __memmove_ssse3_back
>   ▒
> +9.72% 0.03%  impalad libjvm.so  [.] 
> jni_NewByteArray   ▒
> +9.67% 8.79%  impalad libjvm.so  [.] 
> TypeArrayKlass::allocate_co▒
> +8.82% 0.00%  impalad [unknown]  [.] 
>    ▒
> +7.67% 0.04%  impalad [kernel.kallsyms]  [k] 
> system_call_fastpath   ▒
> +7.19% 7.02%  impalad impalad[.] 
> impala::ScalarColumnReader<▒
> +7.18% 6.55%  impalad libc-2.17.so   [.] 
> __memmove_ssse3_back   ▒
> +6.32% 0.00%  impalad [unknown]  [.] 
> 0x001a9458 ▒
> +6.07% 0.00%  impalad [kernel.kallsyms]  [k] 
> do_softirq ▒
> +6.07% 0.00%  impalad [kernel.kallsyms]  [k] 
> call_softirq   ▒
> +6.05% 0.24%  impalad [kernel.kallsyms]  [k] 
> __do_softirq   ▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> xen_hvm_callback_vector▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> xen_evtchn_do_upcall   ▒
> +5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
> irq_exit   ▒
> +5.81% 0.03%  impalad [kernel.kallsyms]  [k] 
> net_rx_action  ▒
> {code}
> {code}
> #0  0x7ffa3d78d69b in TypeArrayKlass::allocate_common(int, bool, Thread*) 
> () from /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so
> #1  0x7ffa3d3e22d2 in jni_NewByteArray () from 
> /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so
> #2  0x020ec13c in hdfsRead ()
> #3  0x01100948 in impala::io::ScanRange::Read(unsigned char*, long, 
> long*, bool*) ()
> #4  0x010fa294 in 
> impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, 
> impala::io::RequestContext*, impala::io::ScanRange*) ()
> #5  0x010fa3f4 in 
> impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) ()
> #6  0x00d15193 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::Promise*) ()
> #7  0x00d158d4 in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, 
> impala::Promise*), boost::_bi::list4, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value*> > > >::run() ()
> #8  0x012919aa in thread_proxy ()
> #9  0x7ffa3b6a6e25 in start_thread () from /lib64/libpthread.so.0
> #10 0x7ffa3b3d0bad in clone () from /lib64/libc.so.6
> {code}
> There is also log4j contention in the JVM due to writing error messages to 
> impalad.ERRO like this
> {code}
> readDirect: FSDataInputStream#read error:
> UnsupportedOperationException: Byte-buffer read unsupported by input 
> streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by input stream
> at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)
> readDirect: FSDataInputStream#read error:
> UnsupportedOperationException: Byte-buffer read unsupported by input 
> streamjava.lang.UnsupportedOperationException: Byte-buffer read 

[jira] [Created] (IMPALA-7221) While reading from object store S3/ADLS at fast rates +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck

2018-06-28 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created IMPALA-7221:
---

 Summary: While reading from object store S3/ADLS at fast rates 
+500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck
 Key: IMPALA-7221
 URL: https://issues.apache.org/jira/browse/IMPALA-7221
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.8.0
Reporter: Mostafa Mokhtar


>From Perf

{code}
Samples: 1M of event 'cpu-clock', Event count (approx.): 32005850
  Children  Self  Command Shared Object  Symbol 
◆
-   16.46% 0.04%  impalad impalad[.] 
hdfsRead   ▒
   - 16.45% hdfsRead
▒
  - 9.71% jni_NewByteArray  
▒
   9.63% TypeArrayKlass::allocate_common
▒
6.57% __memmove_ssse3_back  
▒
+9.72% 0.03%  impalad libjvm.so  [.] 
jni_NewByteArray   ▒
+9.67% 8.79%  impalad libjvm.so  [.] 
TypeArrayKlass::allocate_co▒
+8.82% 0.00%  impalad [unknown]  [.] 
   ▒
+7.67% 0.04%  impalad [kernel.kallsyms]  [k] 
system_call_fastpath   ▒
+7.19% 7.02%  impalad impalad[.] 
impala::ScalarColumnReader<▒
+7.18% 6.55%  impalad libc-2.17.so   [.] 
__memmove_ssse3_back   ▒
+6.32% 0.00%  impalad [unknown]  [.] 
0x001a9458 ▒
+6.07% 0.00%  impalad [kernel.kallsyms]  [k] 
do_softirq ▒
+6.07% 0.00%  impalad [kernel.kallsyms]  [k] 
call_softirq   ▒
+6.05% 0.24%  impalad [kernel.kallsyms]  [k] 
__do_softirq   ▒
+5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
xen_hvm_callback_vector▒
+5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
xen_evtchn_do_upcall   ▒
+5.98% 0.00%  impalad [kernel.kallsyms]  [k] 
irq_exit   ▒
+5.81% 0.03%  impalad [kernel.kallsyms]  [k] 
net_rx_action  ▒
{code}

{code}
#0  0x7ffa3d78d69b in TypeArrayKlass::allocate_common(int, bool, Thread*) 
() from /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so
#1  0x7ffa3d3e22d2 in jni_NewByteArray () from 
/usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so
#2  0x020ec13c in hdfsRead ()
#3  0x01100948 in impala::io::ScanRange::Read(unsigned char*, long, 
long*, bool*) ()
#4  0x010fa294 in 
impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, 
impala::io::RequestContext*, impala::io::ScanRange*) ()
#5  0x010fa3f4 in 
impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) ()
#6  0x00d15193 in impala::Thread::SuperviseThread(std::string const&, 
std::string const&, boost::function, impala::Promise*) ()
#7  0x00d158d4 in boost::detail::thread_data, 
impala::Promise*), boost::_bi::list4, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value*> > > >::run() ()
#8  0x012919aa in thread_proxy ()
#9  0x7ffa3b6a6e25 in start_thread () from /lib64/libpthread.so.0
#10 0x7ffa3b3d0bad in clone () from /lib64/libc.so.6
{code}

There is also log4j contention in the JVM due to writing error messages to 
impalad.ERRO like this

{code}
readDirect: FSDataInputStream#read error:
UnsupportedOperationException: Byte-buffer read unsupported by input 
streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported by 
input stream
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)
readDirect: FSDataInputStream#read error:
UnsupportedOperationException: Byte-buffer read unsupported by input 
streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported by 
input stream
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)
readDirect: FSDataInputStream#read error:
UnsupportedOperationException: Byte-buffer read unsupported by input 
streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported by 
input stream
{code}

Stack 
{code}
"Thread-66" #93 prio=5 os_prio=0 tid=0x10220800 nid=0x3412 waiting for 
monitor entry [0x7ff9a3609000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.log4j.Category.callAppenders(Category.java:204)
- waiting to lock <0x80229658> (a 

[jira] [Resolved] (IMPALA-2712) Investigate increasing batch size

2018-06-28 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar resolved IMPALA-2712.
-
Resolution: Won't Fix

> Investigate increasing batch size
> -
>
> Key: IMPALA-2712
> URL: https://issues.apache.org/jira/browse/IMPALA-2712
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
>Priority: Minor
>  Labels: performance
>
> Investigate increasing batch size for better performance. 
> Initial results 
> |Query||10,000,000||1,000,000||100,000||10,000||1,000||100||10|
> |broadcast_join_1|5|4|4|4|3|3|33|
> |broadcast_join_2|9|7|8|8|12|18|96|
> |broadcast_join_3|55|63|67|73|81|133|556|
> |exchange_broadcast| |115.59|118|119|149|285|1,193|
> |exchange_shuffle|196|187|186|191|191.81| | |
> |filter_bigint_non_selective|9|6|6|6|7|31|275|
> |filter_bigint_selective|3|3|3|3|3|3|13|
> |filter_decimal_non_selective|3|3|3|3|3|9|65|
> |filter_decimal_selective|3|3|3|3|3|10|46|
> |filter_string_non_selective|2|2|2|2|3|17|143|
> |filter_string_selective|2|2|2|2|2|7|34|
> |groupBy_bigint_highndv|58|55|55|58|55|69|254|
> |groupBy_bigint_lowndv|13|9|9|9|10|26|202|
> |groupBy_decimal_highndv|116|87|81|105|102|103|257|
> |groupBy_decimal_lowndv|30|27|28|29|33|45|217|
> |groupBy_spilling| | |566|534|527|523|546|
> |insert_partitioned|392|383|375|385|483|451|486|
> |insert|392|383|375|385|483|451|486|
> |orderby_all| | |158|176|173|191|323|
> |orderby_bigint|30|34|34|35|34|49|281|
> |shuffle_join_one_to_many_string_with_groupby|554|568|613|561|549|577|739|
> |shuffle_join_union_all_with_groupby| | |97|109|122|119|262|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-2564) Introduce mechanism to limit query fan-out

2018-06-28 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar reassigned IMPALA-2564:
---

Assignee: (was: Mostafa Mokhtar)

> Introduce mechanism to limit query fan-out
> --
>
> Key: IMPALA-2564
> URL: https://issues.apache.org/jira/browse/IMPALA-2564
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Distributed Exec
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: customer, performance, scalability
>
> The target use case is small queries on large clusters.
> Today Impala schedules queries on all Impalad instances regardless of how 
> much data each Impalad would read, this results in spreading the work too 
> thin between nodes and exposes undesired scalability issues. 
> The proposal is to introduce a parameter that controls the Min/Max amount of 
> data read by a single Impala instance.
> The SimpleScheduler would combine several splits together in order to satisfy 
> the Min size requirements for a single Impalad before moving on the to the 
> next node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6657) Investigate why memory allocation in Exchange receiver node takes a long time

2018-06-28 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar resolved IMPALA-6657.
-
Resolution: Fixed

Turns out tcmalloc is reason, issue fixe in 2.12

> Investigate why memory allocation in Exchange receiver node takes a long time
> -
>
> Key: IMPALA-6657
> URL: https://issues.apache.org/jira/browse/IMPALA-6657
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 2.11.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
>Priority: Major
>  Labels: memorymanager, performance, scalability
> Attachments: Impala query profile.txt
>
>
> It was observed while inserting large amounts of data into a Kudu table 
> Exchange operator was running slow, query profile showed a big portion of the 
> time was spent in memory allocation in the buffer pool
> {code}
>   EXCHANGE_NODE (id=1):(Total: 5h53m, non-child: 48s853ms, % non-child: 
> 0.23%)
>- ConvertRowBatchTime: 20s289ms
>- PeakMemoryUsage: 19.53 MB (20483562)
>- RowsReturned: 575.30M (575298780)
>- RowsReturnedRate: 27.10 K/sec
>   Buffer pool:
>  - AllocTime: 2h53m
>  - CumulativeAllocationBytes: 261.26 GB (280526643200)
>  - CumulativeAllocations: 13.70M (13697590)
>  - PeakReservation: 18.06 MB (18939904)
>  - PeakUnpinnedBytes: 0
>  - PeakUsedReservation: 18.06 MB (18939904)
>  - ReadIoBytes: 0
>  - ReadIoOps: 0 (0)
>  - ReadIoWaitTime: 0.000ns
>  - WriteIoBytes: 0
>  - WriteIoOps: 0 (0)
>  - WriteIoWaitTime: 0.000ns
>   RecvrSide:
> BytesReceived(8m32s): 20.91 GB, 37.03 GB, 45.62 GB, 53.22 GB, 
> 60.17 GB, 66.30 GB, 71.60 GB, 76.59 GB, 81.36 GB, 86.03 GB, 90.35 GB, 94.30 
> GB, 98.17 GB, 101.98 GB, 105.58 GB, 109.08 GB, 112.33 GB, 115.47 GB, 118.45 
> GB, 121.30 GB, 124.09 GB, 126.74 GB, 129.26 GB, 131.88 GB, 134.41 GB, 136.85 
> GB, 139.32 GB, 141.77 GB, 144.23 GB, 146.71 GB, 148.26 GB, 148.29 GB, 148.29 
> GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 
> GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 
> GB, 148.29 GB, 148.30 GB, 148.30 GB, 148.30 GB, 148.30 GB, 148.30 GB, 148.30 
> GB, 148.30 GB, 148.30 GB
>  - FirstBatchArrivalWaitTime: 1s071ms
>  - TotalBytesReceived: 148.30 GB (159234237617)
>  - TotalGetBatchTime: 5h53m
>- DataArrivalTimer: 5h52m
>   SenderSide:
>  - DeserializeRowBatchTime: 3h4m
>  - NumBatchesArrived: 6.85M (6848795)
>  - NumBatchesDeferred: 99.67K (99667)
>  - NumBatchesEnqueued: 6.85M (6848795)
>  - NumBatchesReceived: 6.85M (6848795)
>  - NumEarlySenders: 0 (0)
>  - NumEosReceived: 0 (0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2682) Address exchange operator CPU bottlenecks in Thrift

2018-06-28 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar resolved IMPALA-2682.
-
Resolution: Fixed

Fixed in 2.13

> Address exchange operator CPU bottlenecks in Thrift
> ---
>
> Key: IMPALA-2682
> URL: https://issues.apache.org/jira/browse/IMPALA-2682
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
>Priority: Minor
>  Labels: performance
> Attachments: Screen Shot 2015-11-25 at 4.19.28 PM.png
>
>
> Currently the exchange operator is a major bottleneck for partitioned table 
> insert, Broadcast and Shuffle joins as well as aggregates, exchange alone 
> constitutes for 60% of slow down in partitioned table insert compared to 
> un-partitioned.
> The CPU cost of exchange operators can vary between 10-50% depending on 
> cardinality and complexity of other operators. 
> Part of the bottleneck is thrift is due to 512 bytes buffer used by 
> TBufferedTransport, which causes the reads to go through a "slow" path where 
> the payload is mem-copied over N chunks of 512 bytes. 
> These are the top contributing call stacks in the exchange operator. 
> Read path in thrift
> {code}
> Data Of Interest (CPU Metrics)
> 1 of 18: 58.1% (43.860s of 75.474s)
> impalad!apache::thrift::transport::TSocket::read - TSocket.cpp
> impalad!apache::thrift::transport::TTransport::read+0x5 - TTransport.h:109
> impalad!apache::thrift::transport::TBufferedTransport::readSlow+0x4b - 
> TBufferTransports.cpp:52
> impalad!apache::thrift::transport::TBufferBase::read+0xb9 - 
> TBufferTransports.h:69
> impalad!apache::thrift::transport::readAll+0x28
>  - TTransport.h:44
> impalad!apache::thrift::transport::TTransport::readAll+0xa - TTransport.h:126
> impalad!apache::thrift::protocol::TBinaryProtocolT::readStringBody+0xae
>  - TBinaryProtocol.tcc:458
> impalad!readString, 
> std::allocator > >+0x24 - TBinaryProtocol.tcc:412
> impalad!apache::thrift::protocol::TVirtualProtocol,
>  apache::thrift::protocol::TProtocolDefaults>::readString_virt+0x11 - 
> TVirtualProtocol.h:515
> impalad!apache::thrift::protocol::TProtocol::readString+0x10 - TProtocol.h:621
> impalad!impala::TRowBatch::read+0x17d - Results_types.cpp:89
> {code}
> Send path compressing the data 
> {code}
> Data Of Interest (CPU Metrics)
> 1 of 1: 100.0% (65.480s of 65.480s)
> impalad!LZ4_compress64kCtx - [Unknown]
> impalad!impala::Lz4Compressor::ProcessBlock+0x32 - compress.cc:294
> impalad!impala::RowBatch::Serialize+0x20b - row-batch.cc:211
> impalad!impala::RowBatch::Serialize+0x34 - row-batch.cc:168
> impalad!impala::DataStreamSender::SerializeBatch+0x117 - 
> data-stream-sender.cc:463
> impalad!impala::DataStreamSender::Channel::SendCurrentBatch+0x44 - 
> data-stream-sender.cc:256
> impalad!impala::DataStreamSender::Channel::AddRow+0x48 - 
> data-stream-sender.cc:232
> impalad!impala::DataStreamSender::Send+0x6d2 - data-stream-sender.cc:444
> {code}
> Memory copy in send path
> {code}
> Data Of Interest (CPU Metrics)
> 1 of 137: 24.6% (6.870s of 27.911s)
> libc.so.6!memcpy - [Unknown]
> impalad!impala::Tuple::DeepCopyVarlenData+0xfe - tuple.cc:89
> impalad!impala::Tuple::DeepCopy+0xc3 - tuple.cc:69
> impalad!impala::DataStreamSender::Channel::AddRow+0xf5 - 
> data-stream-sender.cc:245
> impalad!impala::DataStreamSender::Send+0x6d2 - data-stream-sender.cc:444
> impalad!impala::PlanFragmentExecutor::OpenInternal+0x3e2 - 
> plan-fragment-executor.cc:355
> {code}
> Un-compressing in read path 
> {code}
> Data Of Interest (CPU Metrics)
> 1 of 1: 100.0% (25.100s of 25.100s)
> impalad!LZ4_uncompress - [Unknown]
> impalad!impala::Lz4Decompressor::ProcessBlock+0x47 - decompress.cc:454
> impalad!impala::RowBatch::RowBatch+0x3ec - row-batch.cc:108
> impalad!impala::DataStreamRecvr::SenderQueue::AddBatch+0x1a1 - 
> data-stream-recvr.cc:207
> impalad!impala::DataStreamMgr::AddData+0x134 - data-stream-mgr.cc:103
> impalad!impala::ImpalaServer::TransmitData+0x175 - impala-server.cc:1077
> impalad!impala::ImpalaInternalService::TransmitData+0x43 - 
> impala-internal-service.h:60
> {code}
> Write path in thrift 
> {code}
> Data Of Interest (CPU Metrics)
> 1 of 14: 31.9% (7.010s of 21.980s)
> libpthread.so.0!__send - [Unknown]
> impalad!apache::thrift::transport::TSocket::write_partial+0x36 - 
> TSocket.cpp:567
> impalad!apache::thrift::transport::TSocket::write+0x3c - TSocket.cpp:542
> impalad!apache::thrift::transport::TTransport::write+0xa - TTransport.h:158
> impalad!apache::thrift::transport::TBufferedTransport::writeSlow+0x79 - 
> TBufferTransports.cpp:93
> impalad!apache::thrift::transport::TTransport::write+0xb - TTransport.h:158
> impalad!writeString, 
> std::allocator > >+0x39 - 

[jira] [Assigned] (IMPALA-7020) Order by expressions in Analytical functions are not materialized causing slowdown

2018-06-26 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar reassigned IMPALA-7020:
---

Assignee: Adrian Ng  (was: Alexander Behm)

> Order by expressions in Analytical functions are not materialized causing 
> slowdown
> --
>
> Key: IMPALA-7020
> URL: https://issues.apache.org/jira/browse/IMPALA-7020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Assignee: Adrian Ng
>Priority: Major
>  Labels: performance
> Attachments: Slow case profile.txt, Workaround profile.txt
>
>
> Order by expressions in Analytical functions are not materialized and cause 
> queries to run much slower.
> The rewrite for the query below is 20x faster, profiles attached.
> Repro 
> {code}
> select *
> FROM
>   (
> SELECT
>   o.*,
>   ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn
> FROM
>   (
> SELECT
>   l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as 
> string) evt_ts
> FROM
>   lineitem
> WHERE
>   l_shipdate  BETWEEN '1992-01-01 00:00:00' AND '1992-01-15 00:00:00'
>   ) o
>   ) r
> WHERE
>   rn BETWEEN 1 AND 101
> ORDER BY rn;
> {code}
> Workaround 
> {code}
> select *
> FROM
>   (
> SELECT
>   o.*,
>   ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn
> FROM
>   (
> SELECT
>   l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as 
> string) evt_ts
> FROM
>   lineitem
> WHERE
>   l_shipdate  BETWEEN '1992-01-01 00:00:00' AND '1992-01-15 00:00:00'
>   union all 
>   SELECT
>   l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as 
> string) evt_ts
> FROM
>   lineitem limit 0
> 
>   ) o
>   ) r
> WHERE
>   rn BETWEEN 1 AND 101
> ORDER BY rn;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6746) Reduce the number of comparison for analytical functions with partitioning when incoming data is clustered

2018-06-26 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar reassigned IMPALA-6746:
---

Assignee: Mostafa Mokhtar  (was: Tianyi Wang)

> Reduce the number of comparison for analytical functions with partitioning 
> when incoming data is clustered
> --
>
> Key: IMPALA-6746
> URL: https://issues.apache.org/jira/browse/IMPALA-6746
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.13.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
>Priority: Major
> Attachments: percentile query profile 2.txt
>
>
> Checking if the current row belongs to the same partition in ANALYTIC is very 
> expensive, as it does N comparisons where N is number of rows, in cases when 
> the cardinality of the partition column(s) is relatively small the values 
> will be clustered.
> One optimization as proposed by [~alex.behm] is to check the first and last 
> tuples in the batch and if they match go avoid calling 
> AnalyticEvalNode::PrevRowCompare for the entire batch.
> For the query attached which is a common pattern the expected speedup is 
> 20-30%.
> Query
> {code}
> select l_commitdate
> ,avg(l_extendedprice) as avg_perc
> ,percentile_cont (.25) within group (order by l_extendedprice asc) as 
> perc_25
> ,percentile_cont (.5) within group (order by l_extendedprice asc) as 
> perc_50
> ,percentile_cont (.75) within group (order by l_extendedprice asc) as 
> perc_75
> ,percentile_cont (.90) within group (order by l_extendedprice asc) as 
> perc_90
> from lineitem
> group by l_commitdate
> order by l_commitdate
> {code}
> Plan
> {code}
> F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |  Per-Host Resources: mem-estimate=0B mem-reservation=0B
> PLAN-ROOT SINK
> |  mem-estimate=0B mem-reservation=0B
> |
> 09:MERGING-EXCHANGE [UNPARTITIONED]
> |  order by: l_commitdate ASC
> |  mem-estimate=0B mem-reservation=0B
> |  tuple-ids=5 row-size=66B cardinality=2559
> |
> F02:PLAN FRAGMENT [HASH(l_commitdate)] hosts=1 instances=1
> Per-Host Resources: mem-estimate=22.00MB mem-reservation=13.94MB
> 05:SORT
> |  order by: l_commitdate ASC
> |  mem-estimate=12.00MB mem-reservation=12.00MB spill-buffer=2.00MB
> |  tuple-ids=5 row-size=66B cardinality=2559
> |
> 08:AGGREGATE [FINALIZE]
> |  output: avg:merge(l_extendedprice), 
> _percentile_cont_interpolation:merge(l_extendedprice, 
> `_percentile_row_number_diff_0`), 
> _percentile_cont_interpolation:merge(l_extendedprice, 
> `_percentile_row_number_diff_1`), 
> _percentile_cont_interpolation:merge(l_extendedprice, 
> `_percentile_row_number_diff_2`), 
> _percentile_cont_interpolation:merge(l_extendedprice, 
> `_percentile_row_number_diff_3`)
> |  group by: l_commitdate
> |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB
> |  tuple-ids=4 row-size=66B cardinality=2559
> |
> 07:EXCHANGE [HASH(l_commitdate)]
> |  mem-estimate=0B mem-reservation=0B
> |  tuple-ids=3 row-size=66B cardinality=2559
> |
> F01:PLAN FRAGMENT [HASH(l_commitdate)] hosts=1 instances=1
> Per-Host Resources: mem-estimate=64.00MB mem-reservation=22.00MB
> 04:AGGREGATE [STREAMING]
> |  output: avg(l_extendedprice), 
> _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - 
> count(l_extendedprice) - 1 * 0.25), 
> _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - 
> count(l_extendedprice) - 1 * 0.5), 
> _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - 
> count(l_extendedprice) - 1 * 0.75), 
> _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - 
> count(l_extendedprice) - 1 * 0.90)
> |  group by: l_commitdate
> |  mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB
> |  tuple-ids=3 row-size=66B cardinality=2559
> |
> 03:ANALYTIC
> |  functions: count(l_extendedprice)
> |  partition by: l_commitdate
> |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
> |  tuple-ids=9,7,8 row-size=50B cardinality=59986052
> |
> 02:ANALYTIC
> |  functions: row_number()
> |  partition by: l_commitdate
> |  order by: l_extendedprice ASC
> |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
> |  tuple-ids=9,7 row-size=42B cardinality=59986052
> |
> 01:SORT
> |  order by: l_commitdate ASC NULLS FIRST, l_extendedprice ASC NULLS LAST
> |  mem-estimate=46.00MB mem-reservation=12.00MB spill-buffer=2.00MB
> |  tuple-ids=9 row-size=34B cardinality=59986052
> |
> 06:EXCHANGE [HASH(l_commitdate)]
> |  mem-estimate=0B mem-reservation=0B
> |  tuple-ids=0 row-size=34B cardinality=59986052
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
> Per-Host Resources: 

[jira] [Assigned] (IMPALA-6746) Reduce the number of comparison for analytical functions with partitioning when incoming data is clustered

2018-06-26 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar reassigned IMPALA-6746:
---

Assignee: Adrian Ng  (was: Mostafa Mokhtar)

> Reduce the number of comparison for analytical functions with partitioning 
> when incoming data is clustered
> --
>
> Key: IMPALA-6746
> URL: https://issues.apache.org/jira/browse/IMPALA-6746
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.13.0
>Reporter: Mostafa Mokhtar
>Assignee: Adrian Ng
>Priority: Major
> Attachments: percentile query profile 2.txt
>
>
> Checking if the current row belongs to the same partition in ANALYTIC is very 
> expensive, as it does N comparisons where N is number of rows, in cases when 
> the cardinality of the partition column(s) is relatively small the values 
> will be clustered.
> One optimization as proposed by [~alex.behm] is to check the first and last 
> tuples in the batch and if they match go avoid calling 
> AnalyticEvalNode::PrevRowCompare for the entire batch.
> For the query attached which is a common pattern the expected speedup is 
> 20-30%.
> Query
> {code}
> select l_commitdate
> ,avg(l_extendedprice) as avg_perc
> ,percentile_cont (.25) within group (order by l_extendedprice asc) as 
> perc_25
> ,percentile_cont (.5) within group (order by l_extendedprice asc) as 
> perc_50
> ,percentile_cont (.75) within group (order by l_extendedprice asc) as 
> perc_75
> ,percentile_cont (.90) within group (order by l_extendedprice asc) as 
> perc_90
> from lineitem
> group by l_commitdate
> order by l_commitdate
> {code}
> Plan
> {code}
> F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |  Per-Host Resources: mem-estimate=0B mem-reservation=0B
> PLAN-ROOT SINK
> |  mem-estimate=0B mem-reservation=0B
> |
> 09:MERGING-EXCHANGE [UNPARTITIONED]
> |  order by: l_commitdate ASC
> |  mem-estimate=0B mem-reservation=0B
> |  tuple-ids=5 row-size=66B cardinality=2559
> |
> F02:PLAN FRAGMENT [HASH(l_commitdate)] hosts=1 instances=1
> Per-Host Resources: mem-estimate=22.00MB mem-reservation=13.94MB
> 05:SORT
> |  order by: l_commitdate ASC
> |  mem-estimate=12.00MB mem-reservation=12.00MB spill-buffer=2.00MB
> |  tuple-ids=5 row-size=66B cardinality=2559
> |
> 08:AGGREGATE [FINALIZE]
> |  output: avg:merge(l_extendedprice), 
> _percentile_cont_interpolation:merge(l_extendedprice, 
> `_percentile_row_number_diff_0`), 
> _percentile_cont_interpolation:merge(l_extendedprice, 
> `_percentile_row_number_diff_1`), 
> _percentile_cont_interpolation:merge(l_extendedprice, 
> `_percentile_row_number_diff_2`), 
> _percentile_cont_interpolation:merge(l_extendedprice, 
> `_percentile_row_number_diff_3`)
> |  group by: l_commitdate
> |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB
> |  tuple-ids=4 row-size=66B cardinality=2559
> |
> 07:EXCHANGE [HASH(l_commitdate)]
> |  mem-estimate=0B mem-reservation=0B
> |  tuple-ids=3 row-size=66B cardinality=2559
> |
> F01:PLAN FRAGMENT [HASH(l_commitdate)] hosts=1 instances=1
> Per-Host Resources: mem-estimate=64.00MB mem-reservation=22.00MB
> 04:AGGREGATE [STREAMING]
> |  output: avg(l_extendedprice), 
> _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - 
> count(l_extendedprice) - 1 * 0.25), 
> _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - 
> count(l_extendedprice) - 1 * 0.5), 
> _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - 
> count(l_extendedprice) - 1 * 0.75), 
> _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - 
> count(l_extendedprice) - 1 * 0.90)
> |  group by: l_commitdate
> |  mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB
> |  tuple-ids=3 row-size=66B cardinality=2559
> |
> 03:ANALYTIC
> |  functions: count(l_extendedprice)
> |  partition by: l_commitdate
> |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
> |  tuple-ids=9,7,8 row-size=50B cardinality=59986052
> |
> 02:ANALYTIC
> |  functions: row_number()
> |  partition by: l_commitdate
> |  order by: l_extendedprice ASC
> |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
> |  tuple-ids=9,7 row-size=42B cardinality=59986052
> |
> 01:SORT
> |  order by: l_commitdate ASC NULLS FIRST, l_extendedprice ASC NULLS LAST
> |  mem-estimate=46.00MB mem-reservation=12.00MB spill-buffer=2.00MB
> |  tuple-ids=9 row-size=34B cardinality=59986052
> |
> 06:EXCHANGE [HASH(l_commitdate)]
> |  mem-estimate=0B mem-reservation=0B
> |  tuple-ids=0 row-size=34B cardinality=59986052
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
> Per-Host Resources: mem-estimate=88.00MB 

[jira] [Resolved] (IMPALA-2937) Introduce sampled statistics

2018-06-15 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar resolved IMPALA-2937.
-
   Resolution: Fixed
Fix Version/s: Impala 2.12.0

> Introduce sampled statistics
> 
>
> Key: IMPALA-2937
> URL: https://issues.apache.org/jira/browse/IMPALA-2937
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: planner, supportability
> Fix For: Impala 2.12.0
>
>
> Stats computation can be expensive, introduce sampled scans for Parquet and 
> text files to speedup stats computation. 
> Database will have a default sampling rate which can be overridden at table 
> level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7172) Statestore should verify that all subscribers are running the same version of Impala

2018-06-14 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created IMPALA-7172:
---

 Summary: Statestore should verify that all subscribers are running 
the same version of Impala
 Key: IMPALA-7172
 URL: https://issues.apache.org/jira/browse/IMPALA-7172
 Project: IMPALA
  Issue Type: New Feature
  Components: Distributed Exec
Affects Versions: Impala 2.13.0
Reporter: Mostafa Mokhtar


While running a metadata test which uses sync_ddl=1, tests appeared to hang 
indefinitely.
Turns out one of the Impala daemons was running an older build which caused 
statestore topic updates to continuously fail.

Ideally the SS should track the version across subscribers and black list the 
ones that don't match the SS and CS version.

Logs from SS
{code}
I0614 11:11:04.410529 57312 statestore.cc:259] Preparing initial 
impala-membership topic update for impa...@vb0204.halxg.cloudera.com:22000. 
Size = 2.06 KB
I0614 11:11:04.411222 57312 client-cache.cc:82] ReopenClient(): re-creating 
client for vb0204.halxg.cloudera.com:23000
I0614 11:11:04.411821 57312 client-cache.h:304] RPC Error: Client for 
vb0204.halxg.cloudera.com:23000 hit an unexpected exception: No more data to 
read., type: N6apache6thrift9transport19TTransportExceptionE, rpc: 
N6impala20TUpdateStateResponseE, send: done
I0614 11:11:04.411831 57312 client-cache.cc:174] Broken Connection, destroy 
client for vb0204.halxg.cloudera.com:23000
I0614 11:11:04.411861 57312 statestore.cc:891] Unable to send priority topic 
update message to subscriber impa...@vb0204.halxg.cloudera.com:22000, received 
error: RPC Error: Client for vb0204.halxg.cloudera.com:23000 hit an unexpected 
exception: No more data to read., type: 
N6apache6thrift9transport19TTransportExceptionE, rpc: 
N6impala20TUpdateStateResponseE, send: done
{code}

Log from Impalad 
{code}
I0614 11:03:19.479164 41915 thrift-util.cc:123] TAcceptQueueServer exception: 
N6apache6thrift8protocol18TProtocolExceptionE: TProtocolException: Invalid data
I0614 11:03:19.680028 41916 thrift-util.cc:123] TAcceptQueueServer exception: 
N6apache6thrift8protocol18TProtocolExceptionE: TProtocolException: Invalid data
I0614 11:03:19.680776 41917 thrift-util.cc:123] TAcceptQueueServer exception: 
N6apache6thrift8protocol18TProtocolExceptionE: TProtocolException: Invalid data
I0614 11:03:19.881295 41918 thrift-util.cc:123] TAcceptQueueServer exception: 
N6apache6thrift8protocol18TProtocolExceptionE: TProtocolException: Invalid data
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7096) Confirm that IMPALA-4835 does not increase chance of OOM for scans of wide tables

2018-06-12 Thread Mostafa Mokhtar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510283#comment-16510283
 ] 

Mostafa Mokhtar commented on IMPALA-7096:
-

[~tarmstrong]

Attached OOM profile where most memory is consumed by the HDFS_SCAN_NODE and 
very little memory is in the queues.

{code}
Query Status: Memory limit exceeded: Error occurred on backend 
vb0228.foo:22000 by fragment d7448d78ac85ba6b:303b7242000c
Memory left in process limit: 98.37 GB
Memory left in query limit: -371.21 KB
Query(d7448d78ac85ba6b:303b7242): memory limit exceeded. Limit=1.00 GB 
Reservation=816.00 MB ReservationLimit=819.20 MB OtherMemory=208.36 MB 
Total=1.00 GB Peak=1.00 GB
  Fragment d7448d78ac85ba6b:303b7242000c: Reservation=360.00 MB 
OtherMemory=85.60 MB Total=445.60 MB Peak=445.60 MB
HDFS_SCAN_NODE (id=2): Reservation=360.00 MB OtherMemory=85.45 MB 
Total=445.45 MB Peak=445.45 MB
  Exprs: Total=4.00 KB Peak=4.00 KB
  Queued Batches: Total=22.13 MB Peak=22.13 MB
KrpcDataStreamSender (dst_id=13): Total=17.72 KB Peak=17.72 KB
CodeGen: Total=1.68 KB Peak=679.50 KB
  Fragment d7448d78ac85ba6b:303b72420023: Reservation=96.00 MB 
OtherMemory=43.09 MB Total=139.09 MB Peak=139.09 MB
HDFS_SCAN_NODE (id=6): Reservation=96.00 MB OtherMemory=42.93 MB 
Total=138.93 MB Peak=138.93 MB
  Exprs: Total=4.00 KB Peak=4.00 KB
  Queued Batches: Total=26.10 MB Peak=26.10 MB
KrpcDataStreamSender (dst_id=16): Total=17.72 KB Peak=17.72 KB
CodeGen: Total=1.68 KB Peak=679.50 KB
  Fragment d7448d78ac85ba6b:303b7242003a: Reservation=360.00 MB 
OtherMemory=79.67 MB Total=439.67 MB Peak=439.67 MB
HDFS_SCAN_NODE (id=10): Reservation=360.00 MB OtherMemory=79.52 MB 
Total=439.52 MB Peak=439.52 MB
  Exprs: Total=4.00 KB Peak=4.00 KB
  Queued Batches: Total=22.20 MB Peak=22.20 MB
KrpcDataStreamSender (dst_id=19): Total=17.72 KB Peak=17.72 KB
CodeGen: Total=1.68 KB Peak=679.50 KB
{code}

> Confirm that IMPALA-4835 does not increase chance of OOM for scans of wide 
> tables
> -
>
> Key: IMPALA-7096
> URL: https://issues.apache.org/jira/browse/IMPALA-7096
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.13.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: resource-management
> Attachments: ScanConsumingMostMemory.txt
>
>
> IMPALA-7078 showed some cases where non-buffer memory could accumulate in the 
> row batch queue and cause memory consumption problems.
> The decision for whether to spin up a scanner thread in IMPALA-4835 
> implicitly assumes that buffer memory is the bulk of memory consumed by a 
> scan, but there may be cases where that is not true and the previous 
> heuristic would be more conservative about starting a scanner thread.
> We should investigate this further and figure out how to avoid it if there's 
> an issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2682) Address exchange operator CPU bottlenecks in Thrift

2018-06-11 Thread Mostafa Mokhtar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508701#comment-16508701
 ] 

Mostafa Mokhtar commented on IMPALA-2682:
-

To get more speedup in exchange we should focus on:
* Codegen of hash value computation 
* Removing % with something more efficient

[~jbapple]
What  technique did you use to remove the modulo operation ?

> Address exchange operator CPU bottlenecks in Thrift
> ---
>
> Key: IMPALA-2682
> URL: https://issues.apache.org/jira/browse/IMPALA-2682
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
>Priority: Minor
>  Labels: performance
> Attachments: Screen Shot 2015-11-25 at 4.19.28 PM.png
>
>
> Currently the exchange operator is a major bottleneck for partitioned table 
> insert, Broadcast and Shuffle joins as well as aggregates, exchange alone 
> constitutes for 60% of slow down in partitioned table insert compared to 
> un-partitioned.
> The CPU cost of exchange operators can vary between 10-50% depending on 
> cardinality and complexity of other operators. 
> Part of the bottleneck is thrift is due to 512 bytes buffer used by 
> TBufferedTransport, which causes the reads to go through a "slow" path where 
> the payload is mem-copied over N chunks of 512 bytes. 
> These are the top contributing call stacks in the exchange operator. 
> Read path in thrift
> {code}
> Data Of Interest (CPU Metrics)
> 1 of 18: 58.1% (43.860s of 75.474s)
> impalad!apache::thrift::transport::TSocket::read - TSocket.cpp
> impalad!apache::thrift::transport::TTransport::read+0x5 - TTransport.h:109
> impalad!apache::thrift::transport::TBufferedTransport::readSlow+0x4b - 
> TBufferTransports.cpp:52
> impalad!apache::thrift::transport::TBufferBase::read+0xb9 - 
> TBufferTransports.h:69
> impalad!apache::thrift::transport::readAll+0x28
>  - TTransport.h:44
> impalad!apache::thrift::transport::TTransport::readAll+0xa - TTransport.h:126
> impalad!apache::thrift::protocol::TBinaryProtocolT::readStringBody+0xae
>  - TBinaryProtocol.tcc:458
> impalad!readString, 
> std::allocator > >+0x24 - TBinaryProtocol.tcc:412
> impalad!apache::thrift::protocol::TVirtualProtocol,
>  apache::thrift::protocol::TProtocolDefaults>::readString_virt+0x11 - 
> TVirtualProtocol.h:515
> impalad!apache::thrift::protocol::TProtocol::readString+0x10 - TProtocol.h:621
> impalad!impala::TRowBatch::read+0x17d - Results_types.cpp:89
> {code}
> Send path compressing the data 
> {code}
> Data Of Interest (CPU Metrics)
> 1 of 1: 100.0% (65.480s of 65.480s)
> impalad!LZ4_compress64kCtx - [Unknown]
> impalad!impala::Lz4Compressor::ProcessBlock+0x32 - compress.cc:294
> impalad!impala::RowBatch::Serialize+0x20b - row-batch.cc:211
> impalad!impala::RowBatch::Serialize+0x34 - row-batch.cc:168
> impalad!impala::DataStreamSender::SerializeBatch+0x117 - 
> data-stream-sender.cc:463
> impalad!impala::DataStreamSender::Channel::SendCurrentBatch+0x44 - 
> data-stream-sender.cc:256
> impalad!impala::DataStreamSender::Channel::AddRow+0x48 - 
> data-stream-sender.cc:232
> impalad!impala::DataStreamSender::Send+0x6d2 - data-stream-sender.cc:444
> {code}
> Memory copy in send path
> {code}
> Data Of Interest (CPU Metrics)
> 1 of 137: 24.6% (6.870s of 27.911s)
> libc.so.6!memcpy - [Unknown]
> impalad!impala::Tuple::DeepCopyVarlenData+0xfe - tuple.cc:89
> impalad!impala::Tuple::DeepCopy+0xc3 - tuple.cc:69
> impalad!impala::DataStreamSender::Channel::AddRow+0xf5 - 
> data-stream-sender.cc:245
> impalad!impala::DataStreamSender::Send+0x6d2 - data-stream-sender.cc:444
> impalad!impala::PlanFragmentExecutor::OpenInternal+0x3e2 - 
> plan-fragment-executor.cc:355
> {code}
> Un-compressing in read path 
> {code}
> Data Of Interest (CPU Metrics)
> 1 of 1: 100.0% (25.100s of 25.100s)
> impalad!LZ4_uncompress - [Unknown]
> impalad!impala::Lz4Decompressor::ProcessBlock+0x47 - decompress.cc:454
> impalad!impala::RowBatch::RowBatch+0x3ec - row-batch.cc:108
> impalad!impala::DataStreamRecvr::SenderQueue::AddBatch+0x1a1 - 
> data-stream-recvr.cc:207
> impalad!impala::DataStreamMgr::AddData+0x134 - data-stream-mgr.cc:103
> impalad!impala::ImpalaServer::TransmitData+0x175 - impala-server.cc:1077
> impalad!impala::ImpalaInternalService::TransmitData+0x43 - 
> impala-internal-service.h:60
> {code}
> Write path in thrift 
> {code}
> Data Of Interest (CPU Metrics)
> 1 of 14: 31.9% (7.010s of 21.980s)
> libpthread.so.0!__send - [Unknown]
> impalad!apache::thrift::transport::TSocket::write_partial+0x36 - 
> TSocket.cpp:567
> impalad!apache::thrift::transport::TSocket::write+0x3c - TSocket.cpp:542
> impalad!apache::thrift::transport::TTransport::write+0xa - TTransport.h:158
> 

[jira] [Commented] (IMPALA-6311) Evaluate smaller FPP for Bloom filters

2018-05-22 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486665#comment-16486665
 ] 

Mostafa Mokhtar commented on IMPALA-6311:
-

[~jbapple]
Yes, 75% FPP for a bloom filter is way high.
Increasing the target FPP does make sense.

> Evaluate smaller FPP for Bloom filters
> --
>
> Key: IMPALA-6311
> URL: https://issues.apache.org/jira/browse/IMPALA-6311
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Reporter: Jim Apple
>Priority: Major
>
> The Bloom filters are created by estimating the NDV and then using the FPP of 
> 75% to get the right size for the filter. This is may be too high to be very 
> useful - if our filters are currently filtering more than 75% out, then it is 
> only because we are overestimating NDV.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6311) Evaluate smaller FPP for Bloom filters

2018-05-22 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16485741#comment-16485741
 ] 

Mostafa Mokhtar commented on IMPALA-6311:
-

Actually I was thinking of reducing the default maximum size.

The main bottleneck here is that the memory used for handling the runtime 
filter RPCs and aggregating them is untracked.
Increasing the max filter default size or the number of filters makes things 
worse on the coordinator.

> Evaluate smaller FPP for Bloom filters
> --
>
> Key: IMPALA-6311
> URL: https://issues.apache.org/jira/browse/IMPALA-6311
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Reporter: Jim Apple
>Priority: Major
>
> The Bloom filters are created by estimating the NDV and then using the FPP of 
> 75% to get the right size for the filter. This is may be too high to be very 
> useful - if our filters are currently filtering more than 75% out, then it is 
> only because we are overestimating NDV.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7021) Impala query fails with BlockMissingException while HDFS throws SSLHandshakeException

2018-05-13 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created IMPALA-7021:
---

 Summary: Impala query fails with BlockMissingException while HDFS 
throws SSLHandshakeException
 Key: IMPALA-7021
 URL: https://issues.apache.org/jira/browse/IMPALA-7021
 Project: IMPALA
  Issue Type: Bug
  Components: Backend, Security
Affects Versions: Impala 2.12.0
Reporter: Mostafa Mokhtar
Assignee: Sailesh Mukil


Impala query failed with status
{code}
Status: Disk I/O error: Error reading from HDFS file: 
hdfs://vc1512.halxg.cloudera.com:8020/user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba75005b_76476061_data.0.parq
 Error(255): Unknown error 255 Root cause: BlockMissingException: Could not 
obtain block: BP-1933019065-10.17.221.22-1485905285703:blk_1084803316_11063749 
file=/user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba75005b_76476061_data.0.parq
{code}

Impala log
{code}
I0513 19:50:07.248292 150900 coordinator.cc:1092] Release admission control 
resources for query_id=f14b2814eea7b58b:c2eeddb4
W0513 19:50:11.313998 109914 ShortCircuitCache.java:826] 
ShortCircuitCache(0x188f4ffc): could not load 
1084803339_BP-1933019065-10.17.221.22-1485905285703 due to InvalidToken 
exception.
Java exception follows:
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control 
error while attempting to set up short-circuit access to 
/user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba750004_131907241_data.0.parq
at 
org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:653)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:552)
at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:804)
at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:738)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:485)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1219)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1159)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1533)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1492)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
W0513 19:50:11.317173 109914 ShortCircuitCache.java:826] 
ShortCircuitCache(0x188f4ffc): could not load 
1084803339_BP-1933019065-10.17.221.22-1485905285703 due to InvalidToken 
exception.
Java exception follows:
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control 
error while attempting to set up short-circuit access to 
/user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba750004_131907241_data.0.parq
at 
org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:653)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:552)
at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:804)
at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:738)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:485)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1219)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1159)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1533)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1492)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
W0513 19:50:11.318079 109914 DFSInputStream.java:1275] Connection failure: 
Failed to connect to /10.17.187.36:20002 for file 
/user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba750004_131907241_data.0.parq
 for block 
BP-1933019065-10.17.221.22-1485905285703:blk_1084803339_11063772:org.apache.hadoop.security.token.SecretManager$InvalidToken:
 access control error while attempting to set up short-circuit access to 
/user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba750004_131907241_data.0.parq
{code}

Impala error log
{code}
hdfsPread: FSDataInputStream#read error:
BlockMissingException: Could not obtain block: 
BP-1933019065-10.17.221.22-1485905285703:blk_1084803339_11063772 

[jira] [Assigned] (IMPALA-7020) Order by expressions in Analytical functions are not materialized causing slowdown

2018-05-13 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar reassigned IMPALA-7020:
---

Assignee: Alexander Behm

> Order by expressions in Analytical functions are not materialized causing 
> slowdown
> --
>
> Key: IMPALA-7020
> URL: https://issues.apache.org/jira/browse/IMPALA-7020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Assignee: Alexander Behm
>Priority: Major
>  Labels: performance
> Attachments: Slow case profile.txt, Workaround profile.txt
>
>
> Order by expressions in Analytical functions are not materialized and cause 
> queries to run much slower.
> The rewrite for the query below is 20x faster, profiles attached.
> Repro 
> {code}
> select *
> FROM
>   (
> SELECT
>   o.*,
>   ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn
> FROM
>   (
> SELECT
>   l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as 
> string) evt_ts
> FROM
>   lineitem
> WHERE
>   l_shipdate  BETWEEN '1992-01-01 00:00:00' AND '1992-01-15 00:00:00'
>   ) o
>   ) r
> WHERE
>   rn BETWEEN 1 AND 101
> ORDER BY rn;
> {code}
> Workaround 
> {code}
> select *
> FROM
>   (
> SELECT
>   o.*,
>   ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn
> FROM
>   (
> SELECT
>   l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as 
> string) evt_ts
> FROM
>   lineitem
> WHERE
>   l_shipdate  BETWEEN '1992-01-01 00:00:00' AND '1992-01-15 00:00:00'
>   union all 
>   SELECT
>   l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as 
> string) evt_ts
> FROM
>   lineitem limit 0
> 
>   ) o
>   ) r
> WHERE
>   rn BETWEEN 1 AND 101
> ORDER BY rn;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7015) Insert into Kudu table returns with Status OK even if there are Kudu errors

2018-05-11 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472302#comment-16472302
 ] 

Mostafa Mokhtar commented on IMPALA-7015:
-


[~tmarsh] FYI

> Insert into Kudu table returns with Status OK even if there are Kudu errors
> ---
>
> Key: IMPALA-7015
> URL: https://issues.apache.org/jira/browse/IMPALA-7015
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Priority: Major
> Attachments: Insert into kudu profile with errors.txt
>
>
> DML statements against Kudu tables return status OK even if there are Kudu 
> errors.
> This behavior is misleading. 
> {code}
>   Summary:
> Session ID: 18430b000e5dd8dc:e3e5dadb4a15d4b4
> Session Type: BEESWAX
> Start Time: 2018-05-11 10:10:07.314218000
> End Time: 2018-05-11 10:10:07.434017000
> Query Type: DML
> Query State: FINISHED
> Query Status: OK
> Impala Version: impalad version 2.12.0-cdh5.15.0 RELEASE (build 
> 2f9498d5c2f980aa7ff9505c56654c8e59e026ca)
> User: mmokhtar
> Connected User: mmokhtar
> Delegated User: 
> Network Address: :::10.17.234.27:60760
> Default Db: tpcds_1000_kudu
> Sql Statement: insert into store_2 select * from store
> Coordinator: vd1317.foo:22000
> Query Options (set by configuration): 
> Query Options (set by configuration and planner): MT_DOP=0
> Plan: 
> {code}
> {code}
> Operator  #Hosts   Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  
> Est. Peak Mem  Detail
> -
> 02:PARTIAL SORT5  909.030us   1.025ms  1.00K   1.00K   6.14 MB
> 4.00 MB
> 01:EXCHANGE56.262ms   7.232ms  1.00K   1.00K  75.50 KB
>   0  KUDU(KuduPartition(tpcds_1000_kudu.store.s_store_sk)) 
> 00:SCAN KUDU   53.694ms   4.137ms  1.00K   1.00K   4.34 MB
>   0  tpcds_1000_kudu.store 
> Errors: Key already present in Kudu table 
> 'impala::tpcds_1000_kudu.store_2'. (1 of 1002 similar)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6897) Catalog server should flag tables with large number of small files

2018-05-10 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470688#comment-16470688
 ] 

Mostafa Mokhtar commented on IMPALA-6897:
-

The Catalog has a section for Top tables in terms of metadata size.

> Catalog server should flag tables with large number of small files
> --
>
> Key: IMPALA-6897
> URL: https://issues.apache.org/jira/browse/IMPALA-6897
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.13.0
>Reporter: bharath v
>Priority: Major
>  Labels: ramp-up, supportability
>
> Since Catalog has all the file metadata information available, it should help 
> flag tables with large number of small files. This information can be 
> propagated to the coordinators and should be reflected in the query profiles 
> like how we do for "missing stats".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2945) Low memory estimation in local aggregation

2018-05-02 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461594#comment-16461594
 ] 

Mostafa Mokhtar commented on IMPALA-2945:
-

I disagree, the planner and admission control should not be blocked by one 
another.
There is no point fixing memory based admission control only to realize that 
the planner estimate are off because fixes were stalled.
Why would users migrate away from memory-based admission control?


> Low memory estimation in local aggregation
> --
>
> Key: IMPALA-2945
> URL: https://issues.apache.org/jira/browse/IMPALA-2945
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.0, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, 
> Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 
> 2.11.0, Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Priority: Major
>  Labels: planner, resource-management
>
> When computing the per-host memory estimate for local aggregations, the 
> planner does not take into account that data is randomly distributed across 
> nodes leading to significant underestimation in some cases. The suggested fix 
> is to use min(agg input cardinality, NDV * #hosts) as the per-node 
> cardinality used for the per-node memory estimate.
> Impact: In the query below, the planner significantly underestimates the 
> per-node memory of agg node 03 to be 3.8GB but the actual is 24.77.
> Query
> {code}
> select sum(l_extendedprice) / 7.0 as avg_yearly
> from
>   lineitem,
>   part
> where
>   p_partkey = l_partkey
>   and p_brand = 'Brand#23'
>   and p_container = 'MED BOX'
>   and l_quantity < (
> select
>   0.2 * avg(l_quantity)
> from
>   lineitem
> where
>   l_partkey = p_partkey
>   )
> {code}
> Plan
> {code}
> 12:AGGREGATE [FINALIZE]
> |  output: sum:merge(l_extendedprice)
> |  hosts=20 per-host-mem=unavailable
> |  tuple-ids=6 row-size=16B cardinality=1
> |
> 11:EXCHANGE [UNPARTITIONED]
> |  hosts=20 per-host-mem=unavailable
> |  tuple-ids=6 row-size=16B cardinality=1
> |
> 06:AGGREGATE
> |  output: sum(l_extendedprice)
> |  hosts=20 per-host-mem=10.00MB
> |  tuple-ids=6 row-size=16B cardinality=1
> |
> 05:HASH JOIN [RIGHT SEMI JOIN, PARTITIONED]
> |  hash predicates: l_partkey = p_partkey
> |  other join predicates: l_quantity < 0.2 * avg(l_quantity)
> |  hosts=20 per-host-mem=125.18MB
> |  tuple-ids=0,1 row-size=80B cardinality=29992141
> |
> |--10:EXCHANGE [HASH(p_partkey)]
> |  |  hosts=20 per-host-mem=0B
> |  |  tuple-ids=0,1 row-size=80B cardinality=29992141
> |  |
> |  04:HASH JOIN [INNER JOIN, BROADCAST]
> |  |  hash predicates: l_partkey = p_partkey
> |  |  hosts=20 per-host-mem=58.30MB
> |  |  tuple-ids=0,1 row-size=80B cardinality=29992141
> |  |
> |  |--09:EXCHANGE [BROADCAST]
> |  |  |  hosts=20 per-host-mem=0B
> |  |  |  tuple-ids=1 row-size=56B cardinality=100
> |  |  |
> |  |  01:SCAN HDFS [tpch_1000_decimal_parquet.part, RANDOM]
> |  | partitions=1/1 files=40 size=6.38GB
> |  | predicates: p_brand = 'Brand#23', p_container = 'MED BOX'
> |  | table stats: 2 rows total
> |  | column stats: all
> |  | hosts=20 per-host-mem=264.00MB
> |  | tuple-ids=1 row-size=56B cardinality=100
> |  |
> |  00:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM]
> | partitions=1/1 files=880 size=216.61GB
> | table stats: 589709 rows total
> | column stats: all
> | hosts=20 per-host-mem=264.00MB
> | tuple-ids=0 row-size=24B cardinality=589709
> |
> 08:AGGREGATE [FINALIZE]
> |  output: avg:merge(l_quantity)
> |  group by: l_partkey
> |  hosts=20 per-host-mem=167.89MB
> |  tuple-ids=4 row-size=16B cardinality=200052064
> |
> 07:EXCHANGE [HASH(l_partkey)]
> |  hosts=20 per-host-mem=0B
> |  tuple-ids=3 row-size=16B cardinality=200052064
> |
> 03:AGGREGATE
> |  output: avg(l_quantity)
> |  group by: l_partkey
> |  hosts=20 per-host-mem=3.28GB
> |  tuple-ids=3 row-size=16B cardinality=200052064
> |
> 02:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM]
>partitions=1/1 files=880 size=216.61GB
>table stats: 589709 rows total
>column stats: all
>hosts=20 per-host-mem=176.00MB
>tuple-ids=2 row-size=16B cardinality=589709
> {code}
> Summary
> |Operator ||#Hosts||   Avg Time||   Max Time||#Rows  ||Est. 
> #Rows||   Peak Mem  ||Est. Peak Mem  ||Detail   |  
> |12:AGGREGATE  |1  |256.620ms|  256.620ms|1|   1| 
>   92.00 KB|-1.00 B|  FINALIZE   |
> |11:EXCHANGE   |1  |184.430us|  184.430us|   20|   1| 
>  0|-1.00 B|  UNPARTITIONED  |
> |06:AGGREGATE |20  |364.045ms|1s508ms|   20|   1| 
>9.37 MB|