[jira] [Assigned] (IMPALA-7221) While reading from object store S3/ADLS at fast rates +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck
[ https://issues.apache.org/jira/browse/IMPALA-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar reassigned IMPALA-7221: --- Assignee: Sailesh Mukil > While reading from object store S3/ADLS at fast rates +500MB/sec > TypeArrayKlass::allocate_common becomes a CPU bottleneck > - > > Key: IMPALA-7221 > URL: https://issues.apache.org/jira/browse/IMPALA-7221 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Mostafa Mokhtar >Assignee: Sailesh Mukil >Priority: Major > Attachments: s3_alloc_expensive_1_js.txt, s3_alloc_expensive_2_ps.txt > > > From Perf > {code} > Samples: 1M of event 'cpu-clock', Event count (approx.): 32005850 > Children Self Command Shared Object Symbol > ◆ > - 16.46% 0.04% impalad impalad[.] > hdfsRead ▒ >- 16.45% hdfsRead > ▒ > - 9.71% jni_NewByteArray > ▒ >9.63% TypeArrayKlass::allocate_common > ▒ > 6.57% __memmove_ssse3_back > ▒ > +9.72% 0.03% impalad libjvm.so [.] > jni_NewByteArray ▒ > +9.67% 8.79% impalad libjvm.so [.] > TypeArrayKlass::allocate_co▒ > +8.82% 0.00% impalad [unknown] [.] > ▒ > +7.67% 0.04% impalad [kernel.kallsyms] [k] > system_call_fastpath ▒ > +7.19% 7.02% impalad impalad[.] > impala::ScalarColumnReader<▒ > +7.18% 6.55% impalad libc-2.17.so [.] > __memmove_ssse3_back ▒ > +6.32% 0.00% impalad [unknown] [.] > 0x001a9458 ▒ > +6.07% 0.00% impalad [kernel.kallsyms] [k] > do_softirq ▒ > +6.07% 0.00% impalad [kernel.kallsyms] [k] > call_softirq ▒ > +6.05% 0.24% impalad [kernel.kallsyms] [k] > __do_softirq ▒ > +5.98% 0.00% impalad [kernel.kallsyms] [k] > xen_hvm_callback_vector▒ > +5.98% 0.00% impalad [kernel.kallsyms] [k] > xen_evtchn_do_upcall ▒ > +5.98% 0.00% impalad [kernel.kallsyms] [k] > irq_exit ▒ > +5.81% 0.03% impalad [kernel.kallsyms] [k] > net_rx_action ▒ > {code} > {code} > #0 0x7ffa3d78d69b in TypeArrayKlass::allocate_common(int, bool, Thread*) > () from /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so > #1 0x7ffa3d3e22d2 in jni_NewByteArray () from > /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so > #2 0x020ec13c in hdfsRead () > #3 0x01100948 in impala::io::ScanRange::Read(unsigned char*, long, > long*, bool*) () > #4 0x010fa294 in > impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, > impala::io::RequestContext*, impala::io::ScanRange*) () > #5 0x010fa3f4 in > impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) () > #6 0x00d15193 in impala::Thread::SuperviseThread(std::string const&, > std::string const&, boost::function, impala::Promise*) () > #7 0x00d158d4 in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, > impala::Promise*), boost::_bi::list4, > boost::_bi::value, boost::_bi::value >, > boost::_bi::value*> > > >::run() () > #8 0x012919aa in thread_proxy () > #9 0x7ffa3b6a6e25 in start_thread () from /lib64/libpthread.so.0 > #10 0x7ffa3b3d0bad in clone () from /lib64/libc.so.6 > {code} > There is also log4j contention in the JVM due to writing error messages to > impalad.ERRO like this > {code} > readDirect: FSDataInputStream#read error: > UnsupportedOperationException: Byte-buffer read unsupported by input > streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported > by input stream > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150) > readDirect: FSDataInputStream#read error: > UnsupportedOperationException: Byte-buffer read unsupported by input > streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported > by
[jira] [Updated] (IMPALA-7221) While reading from object store S3/ADLS at +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck
[ https://issues.apache.org/jira/browse/IMPALA-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated IMPALA-7221: Summary: While reading from object store S3/ADLS at +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck (was: While reading from object store S3/ADLS at fast rates +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck) > While reading from object store S3/ADLS at +500MB/sec > TypeArrayKlass::allocate_common becomes a CPU bottleneck > -- > > Key: IMPALA-7221 > URL: https://issues.apache.org/jira/browse/IMPALA-7221 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Mostafa Mokhtar >Assignee: Sailesh Mukil >Priority: Major > Attachments: s3_alloc_expensive_1_js.txt, s3_alloc_expensive_2_ps.txt > > > From Perf > {code} > Samples: 1M of event 'cpu-clock', Event count (approx.): 32005850 > Children Self Command Shared Object Symbol > ◆ > - 16.46% 0.04% impalad impalad[.] > hdfsRead ▒ >- 16.45% hdfsRead > ▒ > - 9.71% jni_NewByteArray > ▒ >9.63% TypeArrayKlass::allocate_common > ▒ > 6.57% __memmove_ssse3_back > ▒ > +9.72% 0.03% impalad libjvm.so [.] > jni_NewByteArray ▒ > +9.67% 8.79% impalad libjvm.so [.] > TypeArrayKlass::allocate_co▒ > +8.82% 0.00% impalad [unknown] [.] > ▒ > +7.67% 0.04% impalad [kernel.kallsyms] [k] > system_call_fastpath ▒ > +7.19% 7.02% impalad impalad[.] > impala::ScalarColumnReader<▒ > +7.18% 6.55% impalad libc-2.17.so [.] > __memmove_ssse3_back ▒ > +6.32% 0.00% impalad [unknown] [.] > 0x001a9458 ▒ > +6.07% 0.00% impalad [kernel.kallsyms] [k] > do_softirq ▒ > +6.07% 0.00% impalad [kernel.kallsyms] [k] > call_softirq ▒ > +6.05% 0.24% impalad [kernel.kallsyms] [k] > __do_softirq ▒ > +5.98% 0.00% impalad [kernel.kallsyms] [k] > xen_hvm_callback_vector▒ > +5.98% 0.00% impalad [kernel.kallsyms] [k] > xen_evtchn_do_upcall ▒ > +5.98% 0.00% impalad [kernel.kallsyms] [k] > irq_exit ▒ > +5.81% 0.03% impalad [kernel.kallsyms] [k] > net_rx_action ▒ > {code} > {code} > #0 0x7ffa3d78d69b in TypeArrayKlass::allocate_common(int, bool, Thread*) > () from /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so > #1 0x7ffa3d3e22d2 in jni_NewByteArray () from > /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so > #2 0x020ec13c in hdfsRead () > #3 0x01100948 in impala::io::ScanRange::Read(unsigned char*, long, > long*, bool*) () > #4 0x010fa294 in > impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, > impala::io::RequestContext*, impala::io::ScanRange*) () > #5 0x010fa3f4 in > impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) () > #6 0x00d15193 in impala::Thread::SuperviseThread(std::string const&, > std::string const&, boost::function, impala::Promise*) () > #7 0x00d158d4 in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, > impala::Promise*), boost::_bi::list4, > boost::_bi::value, boost::_bi::value >, > boost::_bi::value*> > > >::run() () > #8 0x012919aa in thread_proxy () > #9 0x7ffa3b6a6e25 in start_thread () from /lib64/libpthread.so.0 > #10 0x7ffa3b3d0bad in clone () from /lib64/libc.so.6 > {code} > There is also log4j contention in the JVM due to writing error messages to > impalad.ERRO like this > {code} > readDirect: FSDataInputStream#read error: > UnsupportedOperationException: Byte-buffer read unsupported by input > streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported > by input stream > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)
[jira] [Updated] (IMPALA-7221) While reading from object store S3/ADLS at fast rates +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck
[ https://issues.apache.org/jira/browse/IMPALA-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated IMPALA-7221: Attachment: s3_alloc_expensive_2_ps.txt s3_alloc_expensive_1_js.txt > While reading from object store S3/ADLS at fast rates +500MB/sec > TypeArrayKlass::allocate_common becomes a CPU bottleneck > - > > Key: IMPALA-7221 > URL: https://issues.apache.org/jira/browse/IMPALA-7221 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Mostafa Mokhtar >Priority: Major > Attachments: s3_alloc_expensive_1_js.txt, s3_alloc_expensive_2_ps.txt > > > From Perf > {code} > Samples: 1M of event 'cpu-clock', Event count (approx.): 32005850 > Children Self Command Shared Object Symbol > ◆ > - 16.46% 0.04% impalad impalad[.] > hdfsRead ▒ >- 16.45% hdfsRead > ▒ > - 9.71% jni_NewByteArray > ▒ >9.63% TypeArrayKlass::allocate_common > ▒ > 6.57% __memmove_ssse3_back > ▒ > +9.72% 0.03% impalad libjvm.so [.] > jni_NewByteArray ▒ > +9.67% 8.79% impalad libjvm.so [.] > TypeArrayKlass::allocate_co▒ > +8.82% 0.00% impalad [unknown] [.] > ▒ > +7.67% 0.04% impalad [kernel.kallsyms] [k] > system_call_fastpath ▒ > +7.19% 7.02% impalad impalad[.] > impala::ScalarColumnReader<▒ > +7.18% 6.55% impalad libc-2.17.so [.] > __memmove_ssse3_back ▒ > +6.32% 0.00% impalad [unknown] [.] > 0x001a9458 ▒ > +6.07% 0.00% impalad [kernel.kallsyms] [k] > do_softirq ▒ > +6.07% 0.00% impalad [kernel.kallsyms] [k] > call_softirq ▒ > +6.05% 0.24% impalad [kernel.kallsyms] [k] > __do_softirq ▒ > +5.98% 0.00% impalad [kernel.kallsyms] [k] > xen_hvm_callback_vector▒ > +5.98% 0.00% impalad [kernel.kallsyms] [k] > xen_evtchn_do_upcall ▒ > +5.98% 0.00% impalad [kernel.kallsyms] [k] > irq_exit ▒ > +5.81% 0.03% impalad [kernel.kallsyms] [k] > net_rx_action ▒ > {code} > {code} > #0 0x7ffa3d78d69b in TypeArrayKlass::allocate_common(int, bool, Thread*) > () from /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so > #1 0x7ffa3d3e22d2 in jni_NewByteArray () from > /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so > #2 0x020ec13c in hdfsRead () > #3 0x01100948 in impala::io::ScanRange::Read(unsigned char*, long, > long*, bool*) () > #4 0x010fa294 in > impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, > impala::io::RequestContext*, impala::io::ScanRange*) () > #5 0x010fa3f4 in > impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) () > #6 0x00d15193 in impala::Thread::SuperviseThread(std::string const&, > std::string const&, boost::function, impala::Promise*) () > #7 0x00d158d4 in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, > impala::Promise*), boost::_bi::list4, > boost::_bi::value, boost::_bi::value >, > boost::_bi::value*> > > >::run() () > #8 0x012919aa in thread_proxy () > #9 0x7ffa3b6a6e25 in start_thread () from /lib64/libpthread.so.0 > #10 0x7ffa3b3d0bad in clone () from /lib64/libc.so.6 > {code} > There is also log4j contention in the JVM due to writing error messages to > impalad.ERRO like this > {code} > readDirect: FSDataInputStream#read error: > UnsupportedOperationException: Byte-buffer read unsupported by input > streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported > by input stream > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150) > readDirect: FSDataInputStream#read error: > UnsupportedOperationException: Byte-buffer read unsupported by input > streamjava.lang.UnsupportedOperationException: Byte-buffer read
[jira] [Created] (IMPALA-7221) While reading from object store S3/ADLS at fast rates +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck
Mostafa Mokhtar created IMPALA-7221: --- Summary: While reading from object store S3/ADLS at fast rates +500MB/sec TypeArrayKlass::allocate_common becomes a CPU bottleneck Key: IMPALA-7221 URL: https://issues.apache.org/jira/browse/IMPALA-7221 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 2.8.0 Reporter: Mostafa Mokhtar >From Perf {code} Samples: 1M of event 'cpu-clock', Event count (approx.): 32005850 Children Self Command Shared Object Symbol ◆ - 16.46% 0.04% impalad impalad[.] hdfsRead ▒ - 16.45% hdfsRead ▒ - 9.71% jni_NewByteArray ▒ 9.63% TypeArrayKlass::allocate_common ▒ 6.57% __memmove_ssse3_back ▒ +9.72% 0.03% impalad libjvm.so [.] jni_NewByteArray ▒ +9.67% 8.79% impalad libjvm.so [.] TypeArrayKlass::allocate_co▒ +8.82% 0.00% impalad [unknown] [.] ▒ +7.67% 0.04% impalad [kernel.kallsyms] [k] system_call_fastpath ▒ +7.19% 7.02% impalad impalad[.] impala::ScalarColumnReader<▒ +7.18% 6.55% impalad libc-2.17.so [.] __memmove_ssse3_back ▒ +6.32% 0.00% impalad [unknown] [.] 0x001a9458 ▒ +6.07% 0.00% impalad [kernel.kallsyms] [k] do_softirq ▒ +6.07% 0.00% impalad [kernel.kallsyms] [k] call_softirq ▒ +6.05% 0.24% impalad [kernel.kallsyms] [k] __do_softirq ▒ +5.98% 0.00% impalad [kernel.kallsyms] [k] xen_hvm_callback_vector▒ +5.98% 0.00% impalad [kernel.kallsyms] [k] xen_evtchn_do_upcall ▒ +5.98% 0.00% impalad [kernel.kallsyms] [k] irq_exit ▒ +5.81% 0.03% impalad [kernel.kallsyms] [k] net_rx_action ▒ {code} {code} #0 0x7ffa3d78d69b in TypeArrayKlass::allocate_common(int, bool, Thread*) () from /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so #1 0x7ffa3d3e22d2 in jni_NewByteArray () from /usr/java/jdk1.8.0_121/jre/lib/amd64/server/libjvm.so #2 0x020ec13c in hdfsRead () #3 0x01100948 in impala::io::ScanRange::Read(unsigned char*, long, long*, bool*) () #4 0x010fa294 in impala::io::DiskIoMgr::ReadRange(impala::io::DiskIoMgr::DiskQueue*, impala::io::RequestContext*, impala::io::ScanRange*) () #5 0x010fa3f4 in impala::io::DiskIoMgr::WorkLoop(impala::io::DiskIoMgr::DiskQueue*) () #6 0x00d15193 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function, impala::Promise*) () #7 0x00d158d4 in boost::detail::thread_data, impala::Promise*), boost::_bi::list4, boost::_bi::value, boost::_bi::value >, boost::_bi::value*> > > >::run() () #8 0x012919aa in thread_proxy () #9 0x7ffa3b6a6e25 in start_thread () from /lib64/libpthread.so.0 #10 0x7ffa3b3d0bad in clone () from /lib64/libc.so.6 {code} There is also log4j contention in the JVM due to writing error messages to impalad.ERRO like this {code} readDirect: FSDataInputStream#read error: UnsupportedOperationException: Byte-buffer read unsupported by input streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150) readDirect: FSDataInputStream#read error: UnsupportedOperationException: Byte-buffer read unsupported by input streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150) readDirect: FSDataInputStream#read error: UnsupportedOperationException: Byte-buffer read unsupported by input streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream {code} Stack {code} "Thread-66" #93 prio=5 os_prio=0 tid=0x10220800 nid=0x3412 waiting for monitor entry [0x7ff9a3609000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:204) - waiting to lock <0x80229658> (a
[jira] [Resolved] (IMPALA-2712) Investigate increasing batch size
[ https://issues.apache.org/jira/browse/IMPALA-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar resolved IMPALA-2712. - Resolution: Won't Fix > Investigate increasing batch size > - > > Key: IMPALA-2712 > URL: https://issues.apache.org/jira/browse/IMPALA-2712 > Project: IMPALA > Issue Type: Task > Components: Perf Investigation >Affects Versions: Impala 2.2 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar >Priority: Minor > Labels: performance > > Investigate increasing batch size for better performance. > Initial results > |Query||10,000,000||1,000,000||100,000||10,000||1,000||100||10| > |broadcast_join_1|5|4|4|4|3|3|33| > |broadcast_join_2|9|7|8|8|12|18|96| > |broadcast_join_3|55|63|67|73|81|133|556| > |exchange_broadcast| |115.59|118|119|149|285|1,193| > |exchange_shuffle|196|187|186|191|191.81| | | > |filter_bigint_non_selective|9|6|6|6|7|31|275| > |filter_bigint_selective|3|3|3|3|3|3|13| > |filter_decimal_non_selective|3|3|3|3|3|9|65| > |filter_decimal_selective|3|3|3|3|3|10|46| > |filter_string_non_selective|2|2|2|2|3|17|143| > |filter_string_selective|2|2|2|2|2|7|34| > |groupBy_bigint_highndv|58|55|55|58|55|69|254| > |groupBy_bigint_lowndv|13|9|9|9|10|26|202| > |groupBy_decimal_highndv|116|87|81|105|102|103|257| > |groupBy_decimal_lowndv|30|27|28|29|33|45|217| > |groupBy_spilling| | |566|534|527|523|546| > |insert_partitioned|392|383|375|385|483|451|486| > |insert|392|383|375|385|483|451|486| > |orderby_all| | |158|176|173|191|323| > |orderby_bigint|30|34|34|35|34|49|281| > |shuffle_join_one_to_many_string_with_groupby|554|568|613|561|549|577|739| > |shuffle_join_union_all_with_groupby| | |97|109|122|119|262| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-2564) Introduce mechanism to limit query fan-out
[ https://issues.apache.org/jira/browse/IMPALA-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar reassigned IMPALA-2564: --- Assignee: (was: Mostafa Mokhtar) > Introduce mechanism to limit query fan-out > -- > > Key: IMPALA-2564 > URL: https://issues.apache.org/jira/browse/IMPALA-2564 > Project: IMPALA > Issue Type: New Feature > Components: Distributed Exec >Affects Versions: Impala 2.2 >Reporter: Mostafa Mokhtar >Priority: Minor > Labels: customer, performance, scalability > > The target use case is small queries on large clusters. > Today Impala schedules queries on all Impalad instances regardless of how > much data each Impalad would read, this results in spreading the work too > thin between nodes and exposes undesired scalability issues. > The proposal is to introduce a parameter that controls the Min/Max amount of > data read by a single Impala instance. > The SimpleScheduler would combine several splits together in order to satisfy > the Min size requirements for a single Impalad before moving on the to the > next node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6657) Investigate why memory allocation in Exchange receiver node takes a long time
[ https://issues.apache.org/jira/browse/IMPALA-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar resolved IMPALA-6657. - Resolution: Fixed Turns out tcmalloc is reason, issue fixe in 2.12 > Investigate why memory allocation in Exchange receiver node takes a long time > - > > Key: IMPALA-6657 > URL: https://issues.apache.org/jira/browse/IMPALA-6657 > Project: IMPALA > Issue Type: Task > Components: Backend >Affects Versions: Impala 2.11.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar >Priority: Major > Labels: memorymanager, performance, scalability > Attachments: Impala query profile.txt > > > It was observed while inserting large amounts of data into a Kudu table > Exchange operator was running slow, query profile showed a big portion of the > time was spent in memory allocation in the buffer pool > {code} > EXCHANGE_NODE (id=1):(Total: 5h53m, non-child: 48s853ms, % non-child: > 0.23%) >- ConvertRowBatchTime: 20s289ms >- PeakMemoryUsage: 19.53 MB (20483562) >- RowsReturned: 575.30M (575298780) >- RowsReturnedRate: 27.10 K/sec > Buffer pool: > - AllocTime: 2h53m > - CumulativeAllocationBytes: 261.26 GB (280526643200) > - CumulativeAllocations: 13.70M (13697590) > - PeakReservation: 18.06 MB (18939904) > - PeakUnpinnedBytes: 0 > - PeakUsedReservation: 18.06 MB (18939904) > - ReadIoBytes: 0 > - ReadIoOps: 0 (0) > - ReadIoWaitTime: 0.000ns > - WriteIoBytes: 0 > - WriteIoOps: 0 (0) > - WriteIoWaitTime: 0.000ns > RecvrSide: > BytesReceived(8m32s): 20.91 GB, 37.03 GB, 45.62 GB, 53.22 GB, > 60.17 GB, 66.30 GB, 71.60 GB, 76.59 GB, 81.36 GB, 86.03 GB, 90.35 GB, 94.30 > GB, 98.17 GB, 101.98 GB, 105.58 GB, 109.08 GB, 112.33 GB, 115.47 GB, 118.45 > GB, 121.30 GB, 124.09 GB, 126.74 GB, 129.26 GB, 131.88 GB, 134.41 GB, 136.85 > GB, 139.32 GB, 141.77 GB, 144.23 GB, 146.71 GB, 148.26 GB, 148.29 GB, 148.29 > GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 > GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 GB, 148.29 > GB, 148.29 GB, 148.30 GB, 148.30 GB, 148.30 GB, 148.30 GB, 148.30 GB, 148.30 > GB, 148.30 GB, 148.30 GB > - FirstBatchArrivalWaitTime: 1s071ms > - TotalBytesReceived: 148.30 GB (159234237617) > - TotalGetBatchTime: 5h53m >- DataArrivalTimer: 5h52m > SenderSide: > - DeserializeRowBatchTime: 3h4m > - NumBatchesArrived: 6.85M (6848795) > - NumBatchesDeferred: 99.67K (99667) > - NumBatchesEnqueued: 6.85M (6848795) > - NumBatchesReceived: 6.85M (6848795) > - NumEarlySenders: 0 (0) > - NumEosReceived: 0 (0) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-2682) Address exchange operator CPU bottlenecks in Thrift
[ https://issues.apache.org/jira/browse/IMPALA-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar resolved IMPALA-2682. - Resolution: Fixed Fixed in 2.13 > Address exchange operator CPU bottlenecks in Thrift > --- > > Key: IMPALA-2682 > URL: https://issues.apache.org/jira/browse/IMPALA-2682 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.2 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar >Priority: Minor > Labels: performance > Attachments: Screen Shot 2015-11-25 at 4.19.28 PM.png > > > Currently the exchange operator is a major bottleneck for partitioned table > insert, Broadcast and Shuffle joins as well as aggregates, exchange alone > constitutes for 60% of slow down in partitioned table insert compared to > un-partitioned. > The CPU cost of exchange operators can vary between 10-50% depending on > cardinality and complexity of other operators. > Part of the bottleneck is thrift is due to 512 bytes buffer used by > TBufferedTransport, which causes the reads to go through a "slow" path where > the payload is mem-copied over N chunks of 512 bytes. > These are the top contributing call stacks in the exchange operator. > Read path in thrift > {code} > Data Of Interest (CPU Metrics) > 1 of 18: 58.1% (43.860s of 75.474s) > impalad!apache::thrift::transport::TSocket::read - TSocket.cpp > impalad!apache::thrift::transport::TTransport::read+0x5 - TTransport.h:109 > impalad!apache::thrift::transport::TBufferedTransport::readSlow+0x4b - > TBufferTransports.cpp:52 > impalad!apache::thrift::transport::TBufferBase::read+0xb9 - > TBufferTransports.h:69 > impalad!apache::thrift::transport::readAll+0x28 > - TTransport.h:44 > impalad!apache::thrift::transport::TTransport::readAll+0xa - TTransport.h:126 > impalad!apache::thrift::protocol::TBinaryProtocolT::readStringBody+0xae > - TBinaryProtocol.tcc:458 > impalad!readString, > std::allocator > >+0x24 - TBinaryProtocol.tcc:412 > impalad!apache::thrift::protocol::TVirtualProtocol, > apache::thrift::protocol::TProtocolDefaults>::readString_virt+0x11 - > TVirtualProtocol.h:515 > impalad!apache::thrift::protocol::TProtocol::readString+0x10 - TProtocol.h:621 > impalad!impala::TRowBatch::read+0x17d - Results_types.cpp:89 > {code} > Send path compressing the data > {code} > Data Of Interest (CPU Metrics) > 1 of 1: 100.0% (65.480s of 65.480s) > impalad!LZ4_compress64kCtx - [Unknown] > impalad!impala::Lz4Compressor::ProcessBlock+0x32 - compress.cc:294 > impalad!impala::RowBatch::Serialize+0x20b - row-batch.cc:211 > impalad!impala::RowBatch::Serialize+0x34 - row-batch.cc:168 > impalad!impala::DataStreamSender::SerializeBatch+0x117 - > data-stream-sender.cc:463 > impalad!impala::DataStreamSender::Channel::SendCurrentBatch+0x44 - > data-stream-sender.cc:256 > impalad!impala::DataStreamSender::Channel::AddRow+0x48 - > data-stream-sender.cc:232 > impalad!impala::DataStreamSender::Send+0x6d2 - data-stream-sender.cc:444 > {code} > Memory copy in send path > {code} > Data Of Interest (CPU Metrics) > 1 of 137: 24.6% (6.870s of 27.911s) > libc.so.6!memcpy - [Unknown] > impalad!impala::Tuple::DeepCopyVarlenData+0xfe - tuple.cc:89 > impalad!impala::Tuple::DeepCopy+0xc3 - tuple.cc:69 > impalad!impala::DataStreamSender::Channel::AddRow+0xf5 - > data-stream-sender.cc:245 > impalad!impala::DataStreamSender::Send+0x6d2 - data-stream-sender.cc:444 > impalad!impala::PlanFragmentExecutor::OpenInternal+0x3e2 - > plan-fragment-executor.cc:355 > {code} > Un-compressing in read path > {code} > Data Of Interest (CPU Metrics) > 1 of 1: 100.0% (25.100s of 25.100s) > impalad!LZ4_uncompress - [Unknown] > impalad!impala::Lz4Decompressor::ProcessBlock+0x47 - decompress.cc:454 > impalad!impala::RowBatch::RowBatch+0x3ec - row-batch.cc:108 > impalad!impala::DataStreamRecvr::SenderQueue::AddBatch+0x1a1 - > data-stream-recvr.cc:207 > impalad!impala::DataStreamMgr::AddData+0x134 - data-stream-mgr.cc:103 > impalad!impala::ImpalaServer::TransmitData+0x175 - impala-server.cc:1077 > impalad!impala::ImpalaInternalService::TransmitData+0x43 - > impala-internal-service.h:60 > {code} > Write path in thrift > {code} > Data Of Interest (CPU Metrics) > 1 of 14: 31.9% (7.010s of 21.980s) > libpthread.so.0!__send - [Unknown] > impalad!apache::thrift::transport::TSocket::write_partial+0x36 - > TSocket.cpp:567 > impalad!apache::thrift::transport::TSocket::write+0x3c - TSocket.cpp:542 > impalad!apache::thrift::transport::TTransport::write+0xa - TTransport.h:158 > impalad!apache::thrift::transport::TBufferedTransport::writeSlow+0x79 - > TBufferTransports.cpp:93 > impalad!apache::thrift::transport::TTransport::write+0xb - TTransport.h:158 > impalad!writeString, > std::allocator > >+0x39 -
[jira] [Assigned] (IMPALA-7020) Order by expressions in Analytical functions are not materialized causing slowdown
[ https://issues.apache.org/jira/browse/IMPALA-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar reassigned IMPALA-7020: --- Assignee: Adrian Ng (was: Alexander Behm) > Order by expressions in Analytical functions are not materialized causing > slowdown > -- > > Key: IMPALA-7020 > URL: https://issues.apache.org/jira/browse/IMPALA-7020 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.12.0 >Reporter: Mostafa Mokhtar >Assignee: Adrian Ng >Priority: Major > Labels: performance > Attachments: Slow case profile.txt, Workaround profile.txt > > > Order by expressions in Analytical functions are not materialized and cause > queries to run much slower. > The rewrite for the query below is 20x faster, profiles attached. > Repro > {code} > select * > FROM > ( > SELECT > o.*, > ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn > FROM > ( > SELECT > l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as > string) evt_ts > FROM > lineitem > WHERE > l_shipdate BETWEEN '1992-01-01 00:00:00' AND '1992-01-15 00:00:00' > ) o > ) r > WHERE > rn BETWEEN 1 AND 101 > ORDER BY rn; > {code} > Workaround > {code} > select * > FROM > ( > SELECT > o.*, > ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn > FROM > ( > SELECT > l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as > string) evt_ts > FROM > lineitem > WHERE > l_shipdate BETWEEN '1992-01-01 00:00:00' AND '1992-01-15 00:00:00' > union all > SELECT > l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as > string) evt_ts > FROM > lineitem limit 0 > > ) o > ) r > WHERE > rn BETWEEN 1 AND 101 > ORDER BY rn; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-6746) Reduce the number of comparison for analytical functions with partitioning when incoming data is clustered
[ https://issues.apache.org/jira/browse/IMPALA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar reassigned IMPALA-6746: --- Assignee: Mostafa Mokhtar (was: Tianyi Wang) > Reduce the number of comparison for analytical functions with partitioning > when incoming data is clustered > -- > > Key: IMPALA-6746 > URL: https://issues.apache.org/jira/browse/IMPALA-6746 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.13.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar >Priority: Major > Attachments: percentile query profile 2.txt > > > Checking if the current row belongs to the same partition in ANALYTIC is very > expensive, as it does N comparisons where N is number of rows, in cases when > the cardinality of the partition column(s) is relatively small the values > will be clustered. > One optimization as proposed by [~alex.behm] is to check the first and last > tuples in the batch and if they match go avoid calling > AnalyticEvalNode::PrevRowCompare for the entire batch. > For the query attached which is a common pattern the expected speedup is > 20-30%. > Query > {code} > select l_commitdate > ,avg(l_extendedprice) as avg_perc > ,percentile_cont (.25) within group (order by l_extendedprice asc) as > perc_25 > ,percentile_cont (.5) within group (order by l_extendedprice asc) as > perc_50 > ,percentile_cont (.75) within group (order by l_extendedprice asc) as > perc_75 > ,percentile_cont (.90) within group (order by l_extendedprice asc) as > perc_90 > from lineitem > group by l_commitdate > order by l_commitdate > {code} > Plan > {code} > F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | Per-Host Resources: mem-estimate=0B mem-reservation=0B > PLAN-ROOT SINK > | mem-estimate=0B mem-reservation=0B > | > 09:MERGING-EXCHANGE [UNPARTITIONED] > | order by: l_commitdate ASC > | mem-estimate=0B mem-reservation=0B > | tuple-ids=5 row-size=66B cardinality=2559 > | > F02:PLAN FRAGMENT [HASH(l_commitdate)] hosts=1 instances=1 > Per-Host Resources: mem-estimate=22.00MB mem-reservation=13.94MB > 05:SORT > | order by: l_commitdate ASC > | mem-estimate=12.00MB mem-reservation=12.00MB spill-buffer=2.00MB > | tuple-ids=5 row-size=66B cardinality=2559 > | > 08:AGGREGATE [FINALIZE] > | output: avg:merge(l_extendedprice), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_0`), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_1`), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_2`), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_3`) > | group by: l_commitdate > | mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB > | tuple-ids=4 row-size=66B cardinality=2559 > | > 07:EXCHANGE [HASH(l_commitdate)] > | mem-estimate=0B mem-reservation=0B > | tuple-ids=3 row-size=66B cardinality=2559 > | > F01:PLAN FRAGMENT [HASH(l_commitdate)] hosts=1 instances=1 > Per-Host Resources: mem-estimate=64.00MB mem-reservation=22.00MB > 04:AGGREGATE [STREAMING] > | output: avg(l_extendedprice), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.25), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.5), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.75), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.90) > | group by: l_commitdate > | mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB > | tuple-ids=3 row-size=66B cardinality=2559 > | > 03:ANALYTIC > | functions: count(l_extendedprice) > | partition by: l_commitdate > | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB > | tuple-ids=9,7,8 row-size=50B cardinality=59986052 > | > 02:ANALYTIC > | functions: row_number() > | partition by: l_commitdate > | order by: l_extendedprice ASC > | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW > | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB > | tuple-ids=9,7 row-size=42B cardinality=59986052 > | > 01:SORT > | order by: l_commitdate ASC NULLS FIRST, l_extendedprice ASC NULLS LAST > | mem-estimate=46.00MB mem-reservation=12.00MB spill-buffer=2.00MB > | tuple-ids=9 row-size=34B cardinality=59986052 > | > 06:EXCHANGE [HASH(l_commitdate)] > | mem-estimate=0B mem-reservation=0B > | tuple-ids=0 row-size=34B cardinality=59986052 > | > F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 > Per-Host Resources:
[jira] [Assigned] (IMPALA-6746) Reduce the number of comparison for analytical functions with partitioning when incoming data is clustered
[ https://issues.apache.org/jira/browse/IMPALA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar reassigned IMPALA-6746: --- Assignee: Adrian Ng (was: Mostafa Mokhtar) > Reduce the number of comparison for analytical functions with partitioning > when incoming data is clustered > -- > > Key: IMPALA-6746 > URL: https://issues.apache.org/jira/browse/IMPALA-6746 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.13.0 >Reporter: Mostafa Mokhtar >Assignee: Adrian Ng >Priority: Major > Attachments: percentile query profile 2.txt > > > Checking if the current row belongs to the same partition in ANALYTIC is very > expensive, as it does N comparisons where N is number of rows, in cases when > the cardinality of the partition column(s) is relatively small the values > will be clustered. > One optimization as proposed by [~alex.behm] is to check the first and last > tuples in the batch and if they match go avoid calling > AnalyticEvalNode::PrevRowCompare for the entire batch. > For the query attached which is a common pattern the expected speedup is > 20-30%. > Query > {code} > select l_commitdate > ,avg(l_extendedprice) as avg_perc > ,percentile_cont (.25) within group (order by l_extendedprice asc) as > perc_25 > ,percentile_cont (.5) within group (order by l_extendedprice asc) as > perc_50 > ,percentile_cont (.75) within group (order by l_extendedprice asc) as > perc_75 > ,percentile_cont (.90) within group (order by l_extendedprice asc) as > perc_90 > from lineitem > group by l_commitdate > order by l_commitdate > {code} > Plan > {code} > F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | Per-Host Resources: mem-estimate=0B mem-reservation=0B > PLAN-ROOT SINK > | mem-estimate=0B mem-reservation=0B > | > 09:MERGING-EXCHANGE [UNPARTITIONED] > | order by: l_commitdate ASC > | mem-estimate=0B mem-reservation=0B > | tuple-ids=5 row-size=66B cardinality=2559 > | > F02:PLAN FRAGMENT [HASH(l_commitdate)] hosts=1 instances=1 > Per-Host Resources: mem-estimate=22.00MB mem-reservation=13.94MB > 05:SORT > | order by: l_commitdate ASC > | mem-estimate=12.00MB mem-reservation=12.00MB spill-buffer=2.00MB > | tuple-ids=5 row-size=66B cardinality=2559 > | > 08:AGGREGATE [FINALIZE] > | output: avg:merge(l_extendedprice), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_0`), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_1`), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_2`), > _percentile_cont_interpolation:merge(l_extendedprice, > `_percentile_row_number_diff_3`) > | group by: l_commitdate > | mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB > | tuple-ids=4 row-size=66B cardinality=2559 > | > 07:EXCHANGE [HASH(l_commitdate)] > | mem-estimate=0B mem-reservation=0B > | tuple-ids=3 row-size=66B cardinality=2559 > | > F01:PLAN FRAGMENT [HASH(l_commitdate)] hosts=1 instances=1 > Per-Host Resources: mem-estimate=64.00MB mem-reservation=22.00MB > 04:AGGREGATE [STREAMING] > | output: avg(l_extendedprice), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.25), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.5), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.75), > _percentile_cont_interpolation(l_extendedprice, row_number() - 1 - > count(l_extendedprice) - 1 * 0.90) > | group by: l_commitdate > | mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB > | tuple-ids=3 row-size=66B cardinality=2559 > | > 03:ANALYTIC > | functions: count(l_extendedprice) > | partition by: l_commitdate > | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB > | tuple-ids=9,7,8 row-size=50B cardinality=59986052 > | > 02:ANALYTIC > | functions: row_number() > | partition by: l_commitdate > | order by: l_extendedprice ASC > | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW > | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB > | tuple-ids=9,7 row-size=42B cardinality=59986052 > | > 01:SORT > | order by: l_commitdate ASC NULLS FIRST, l_extendedprice ASC NULLS LAST > | mem-estimate=46.00MB mem-reservation=12.00MB spill-buffer=2.00MB > | tuple-ids=9 row-size=34B cardinality=59986052 > | > 06:EXCHANGE [HASH(l_commitdate)] > | mem-estimate=0B mem-reservation=0B > | tuple-ids=0 row-size=34B cardinality=59986052 > | > F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 > Per-Host Resources: mem-estimate=88.00MB
[jira] [Resolved] (IMPALA-2937) Introduce sampled statistics
[ https://issues.apache.org/jira/browse/IMPALA-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar resolved IMPALA-2937. - Resolution: Fixed Fix Version/s: Impala 2.12.0 > Introduce sampled statistics > > > Key: IMPALA-2937 > URL: https://issues.apache.org/jira/browse/IMPALA-2937 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 2.2 >Reporter: Mostafa Mokhtar >Priority: Minor > Labels: planner, supportability > Fix For: Impala 2.12.0 > > > Stats computation can be expensive, introduce sampled scans for Parquet and > text files to speedup stats computation. > Database will have a default sampling rate which can be overridden at table > level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7172) Statestore should verify that all subscribers are running the same version of Impala
Mostafa Mokhtar created IMPALA-7172: --- Summary: Statestore should verify that all subscribers are running the same version of Impala Key: IMPALA-7172 URL: https://issues.apache.org/jira/browse/IMPALA-7172 Project: IMPALA Issue Type: New Feature Components: Distributed Exec Affects Versions: Impala 2.13.0 Reporter: Mostafa Mokhtar While running a metadata test which uses sync_ddl=1, tests appeared to hang indefinitely. Turns out one of the Impala daemons was running an older build which caused statestore topic updates to continuously fail. Ideally the SS should track the version across subscribers and black list the ones that don't match the SS and CS version. Logs from SS {code} I0614 11:11:04.410529 57312 statestore.cc:259] Preparing initial impala-membership topic update for impa...@vb0204.halxg.cloudera.com:22000. Size = 2.06 KB I0614 11:11:04.411222 57312 client-cache.cc:82] ReopenClient(): re-creating client for vb0204.halxg.cloudera.com:23000 I0614 11:11:04.411821 57312 client-cache.h:304] RPC Error: Client for vb0204.halxg.cloudera.com:23000 hit an unexpected exception: No more data to read., type: N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala20TUpdateStateResponseE, send: done I0614 11:11:04.411831 57312 client-cache.cc:174] Broken Connection, destroy client for vb0204.halxg.cloudera.com:23000 I0614 11:11:04.411861 57312 statestore.cc:891] Unable to send priority topic update message to subscriber impa...@vb0204.halxg.cloudera.com:22000, received error: RPC Error: Client for vb0204.halxg.cloudera.com:23000 hit an unexpected exception: No more data to read., type: N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala20TUpdateStateResponseE, send: done {code} Log from Impalad {code} I0614 11:03:19.479164 41915 thrift-util.cc:123] TAcceptQueueServer exception: N6apache6thrift8protocol18TProtocolExceptionE: TProtocolException: Invalid data I0614 11:03:19.680028 41916 thrift-util.cc:123] TAcceptQueueServer exception: N6apache6thrift8protocol18TProtocolExceptionE: TProtocolException: Invalid data I0614 11:03:19.680776 41917 thrift-util.cc:123] TAcceptQueueServer exception: N6apache6thrift8protocol18TProtocolExceptionE: TProtocolException: Invalid data I0614 11:03:19.881295 41918 thrift-util.cc:123] TAcceptQueueServer exception: N6apache6thrift8protocol18TProtocolExceptionE: TProtocolException: Invalid data {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7096) Confirm that IMPALA-4835 does not increase chance of OOM for scans of wide tables
[ https://issues.apache.org/jira/browse/IMPALA-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510283#comment-16510283 ] Mostafa Mokhtar commented on IMPALA-7096: - [~tarmstrong] Attached OOM profile where most memory is consumed by the HDFS_SCAN_NODE and very little memory is in the queues. {code} Query Status: Memory limit exceeded: Error occurred on backend vb0228.foo:22000 by fragment d7448d78ac85ba6b:303b7242000c Memory left in process limit: 98.37 GB Memory left in query limit: -371.21 KB Query(d7448d78ac85ba6b:303b7242): memory limit exceeded. Limit=1.00 GB Reservation=816.00 MB ReservationLimit=819.20 MB OtherMemory=208.36 MB Total=1.00 GB Peak=1.00 GB Fragment d7448d78ac85ba6b:303b7242000c: Reservation=360.00 MB OtherMemory=85.60 MB Total=445.60 MB Peak=445.60 MB HDFS_SCAN_NODE (id=2): Reservation=360.00 MB OtherMemory=85.45 MB Total=445.45 MB Peak=445.45 MB Exprs: Total=4.00 KB Peak=4.00 KB Queued Batches: Total=22.13 MB Peak=22.13 MB KrpcDataStreamSender (dst_id=13): Total=17.72 KB Peak=17.72 KB CodeGen: Total=1.68 KB Peak=679.50 KB Fragment d7448d78ac85ba6b:303b72420023: Reservation=96.00 MB OtherMemory=43.09 MB Total=139.09 MB Peak=139.09 MB HDFS_SCAN_NODE (id=6): Reservation=96.00 MB OtherMemory=42.93 MB Total=138.93 MB Peak=138.93 MB Exprs: Total=4.00 KB Peak=4.00 KB Queued Batches: Total=26.10 MB Peak=26.10 MB KrpcDataStreamSender (dst_id=16): Total=17.72 KB Peak=17.72 KB CodeGen: Total=1.68 KB Peak=679.50 KB Fragment d7448d78ac85ba6b:303b7242003a: Reservation=360.00 MB OtherMemory=79.67 MB Total=439.67 MB Peak=439.67 MB HDFS_SCAN_NODE (id=10): Reservation=360.00 MB OtherMemory=79.52 MB Total=439.52 MB Peak=439.52 MB Exprs: Total=4.00 KB Peak=4.00 KB Queued Batches: Total=22.20 MB Peak=22.20 MB KrpcDataStreamSender (dst_id=19): Total=17.72 KB Peak=17.72 KB CodeGen: Total=1.68 KB Peak=679.50 KB {code} > Confirm that IMPALA-4835 does not increase chance of OOM for scans of wide > tables > - > > Key: IMPALA-7096 > URL: https://issues.apache.org/jira/browse/IMPALA-7096 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.13.0, Impala 3.1.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Blocker > Labels: resource-management > Attachments: ScanConsumingMostMemory.txt > > > IMPALA-7078 showed some cases where non-buffer memory could accumulate in the > row batch queue and cause memory consumption problems. > The decision for whether to spin up a scanner thread in IMPALA-4835 > implicitly assumes that buffer memory is the bulk of memory consumed by a > scan, but there may be cases where that is not true and the previous > heuristic would be more conservative about starting a scanner thread. > We should investigate this further and figure out how to avoid it if there's > an issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2682) Address exchange operator CPU bottlenecks in Thrift
[ https://issues.apache.org/jira/browse/IMPALA-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508701#comment-16508701 ] Mostafa Mokhtar commented on IMPALA-2682: - To get more speedup in exchange we should focus on: * Codegen of hash value computation * Removing % with something more efficient [~jbapple] What technique did you use to remove the modulo operation ? > Address exchange operator CPU bottlenecks in Thrift > --- > > Key: IMPALA-2682 > URL: https://issues.apache.org/jira/browse/IMPALA-2682 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.2 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar >Priority: Minor > Labels: performance > Attachments: Screen Shot 2015-11-25 at 4.19.28 PM.png > > > Currently the exchange operator is a major bottleneck for partitioned table > insert, Broadcast and Shuffle joins as well as aggregates, exchange alone > constitutes for 60% of slow down in partitioned table insert compared to > un-partitioned. > The CPU cost of exchange operators can vary between 10-50% depending on > cardinality and complexity of other operators. > Part of the bottleneck is thrift is due to 512 bytes buffer used by > TBufferedTransport, which causes the reads to go through a "slow" path where > the payload is mem-copied over N chunks of 512 bytes. > These are the top contributing call stacks in the exchange operator. > Read path in thrift > {code} > Data Of Interest (CPU Metrics) > 1 of 18: 58.1% (43.860s of 75.474s) > impalad!apache::thrift::transport::TSocket::read - TSocket.cpp > impalad!apache::thrift::transport::TTransport::read+0x5 - TTransport.h:109 > impalad!apache::thrift::transport::TBufferedTransport::readSlow+0x4b - > TBufferTransports.cpp:52 > impalad!apache::thrift::transport::TBufferBase::read+0xb9 - > TBufferTransports.h:69 > impalad!apache::thrift::transport::readAll+0x28 > - TTransport.h:44 > impalad!apache::thrift::transport::TTransport::readAll+0xa - TTransport.h:126 > impalad!apache::thrift::protocol::TBinaryProtocolT::readStringBody+0xae > - TBinaryProtocol.tcc:458 > impalad!readString, > std::allocator > >+0x24 - TBinaryProtocol.tcc:412 > impalad!apache::thrift::protocol::TVirtualProtocol, > apache::thrift::protocol::TProtocolDefaults>::readString_virt+0x11 - > TVirtualProtocol.h:515 > impalad!apache::thrift::protocol::TProtocol::readString+0x10 - TProtocol.h:621 > impalad!impala::TRowBatch::read+0x17d - Results_types.cpp:89 > {code} > Send path compressing the data > {code} > Data Of Interest (CPU Metrics) > 1 of 1: 100.0% (65.480s of 65.480s) > impalad!LZ4_compress64kCtx - [Unknown] > impalad!impala::Lz4Compressor::ProcessBlock+0x32 - compress.cc:294 > impalad!impala::RowBatch::Serialize+0x20b - row-batch.cc:211 > impalad!impala::RowBatch::Serialize+0x34 - row-batch.cc:168 > impalad!impala::DataStreamSender::SerializeBatch+0x117 - > data-stream-sender.cc:463 > impalad!impala::DataStreamSender::Channel::SendCurrentBatch+0x44 - > data-stream-sender.cc:256 > impalad!impala::DataStreamSender::Channel::AddRow+0x48 - > data-stream-sender.cc:232 > impalad!impala::DataStreamSender::Send+0x6d2 - data-stream-sender.cc:444 > {code} > Memory copy in send path > {code} > Data Of Interest (CPU Metrics) > 1 of 137: 24.6% (6.870s of 27.911s) > libc.so.6!memcpy - [Unknown] > impalad!impala::Tuple::DeepCopyVarlenData+0xfe - tuple.cc:89 > impalad!impala::Tuple::DeepCopy+0xc3 - tuple.cc:69 > impalad!impala::DataStreamSender::Channel::AddRow+0xf5 - > data-stream-sender.cc:245 > impalad!impala::DataStreamSender::Send+0x6d2 - data-stream-sender.cc:444 > impalad!impala::PlanFragmentExecutor::OpenInternal+0x3e2 - > plan-fragment-executor.cc:355 > {code} > Un-compressing in read path > {code} > Data Of Interest (CPU Metrics) > 1 of 1: 100.0% (25.100s of 25.100s) > impalad!LZ4_uncompress - [Unknown] > impalad!impala::Lz4Decompressor::ProcessBlock+0x47 - decompress.cc:454 > impalad!impala::RowBatch::RowBatch+0x3ec - row-batch.cc:108 > impalad!impala::DataStreamRecvr::SenderQueue::AddBatch+0x1a1 - > data-stream-recvr.cc:207 > impalad!impala::DataStreamMgr::AddData+0x134 - data-stream-mgr.cc:103 > impalad!impala::ImpalaServer::TransmitData+0x175 - impala-server.cc:1077 > impalad!impala::ImpalaInternalService::TransmitData+0x43 - > impala-internal-service.h:60 > {code} > Write path in thrift > {code} > Data Of Interest (CPU Metrics) > 1 of 14: 31.9% (7.010s of 21.980s) > libpthread.so.0!__send - [Unknown] > impalad!apache::thrift::transport::TSocket::write_partial+0x36 - > TSocket.cpp:567 > impalad!apache::thrift::transport::TSocket::write+0x3c - TSocket.cpp:542 > impalad!apache::thrift::transport::TTransport::write+0xa - TTransport.h:158 >
[jira] [Commented] (IMPALA-6311) Evaluate smaller FPP for Bloom filters
[ https://issues.apache.org/jira/browse/IMPALA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486665#comment-16486665 ] Mostafa Mokhtar commented on IMPALA-6311: - [~jbapple] Yes, 75% FPP for a bloom filter is way high. Increasing the target FPP does make sense. > Evaluate smaller FPP for Bloom filters > -- > > Key: IMPALA-6311 > URL: https://issues.apache.org/jira/browse/IMPALA-6311 > Project: IMPALA > Issue Type: Task > Components: Perf Investigation >Reporter: Jim Apple >Priority: Major > > The Bloom filters are created by estimating the NDV and then using the FPP of > 75% to get the right size for the filter. This is may be too high to be very > useful - if our filters are currently filtering more than 75% out, then it is > only because we are overestimating NDV. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6311) Evaluate smaller FPP for Bloom filters
[ https://issues.apache.org/jira/browse/IMPALA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16485741#comment-16485741 ] Mostafa Mokhtar commented on IMPALA-6311: - Actually I was thinking of reducing the default maximum size. The main bottleneck here is that the memory used for handling the runtime filter RPCs and aggregating them is untracked. Increasing the max filter default size or the number of filters makes things worse on the coordinator. > Evaluate smaller FPP for Bloom filters > -- > > Key: IMPALA-6311 > URL: https://issues.apache.org/jira/browse/IMPALA-6311 > Project: IMPALA > Issue Type: Task > Components: Perf Investigation >Reporter: Jim Apple >Priority: Major > > The Bloom filters are created by estimating the NDV and then using the FPP of > 75% to get the right size for the filter. This is may be too high to be very > useful - if our filters are currently filtering more than 75% out, then it is > only because we are overestimating NDV. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7021) Impala query fails with BlockMissingException while HDFS throws SSLHandshakeException
Mostafa Mokhtar created IMPALA-7021: --- Summary: Impala query fails with BlockMissingException while HDFS throws SSLHandshakeException Key: IMPALA-7021 URL: https://issues.apache.org/jira/browse/IMPALA-7021 Project: IMPALA Issue Type: Bug Components: Backend, Security Affects Versions: Impala 2.12.0 Reporter: Mostafa Mokhtar Assignee: Sailesh Mukil Impala query failed with status {code} Status: Disk I/O error: Error reading from HDFS file: hdfs://vc1512.halxg.cloudera.com:8020/user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba75005b_76476061_data.0.parq Error(255): Unknown error 255 Root cause: BlockMissingException: Could not obtain block: BP-1933019065-10.17.221.22-1485905285703:blk_1084803316_11063749 file=/user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba75005b_76476061_data.0.parq {code} Impala log {code} I0513 19:50:07.248292 150900 coordinator.cc:1092] Release admission control resources for query_id=f14b2814eea7b58b:c2eeddb4 W0513 19:50:11.313998 109914 ShortCircuitCache.java:826] ShortCircuitCache(0x188f4ffc): could not load 1084803339_BP-1933019065-10.17.221.22-1485905285703 due to InvalidToken exception. Java exception follows: org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to /user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba750004_131907241_data.0.parq at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:653) at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:552) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:804) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:738) at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:485) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355) at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1219) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1159) at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1533) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1492) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) W0513 19:50:11.317173 109914 ShortCircuitCache.java:826] ShortCircuitCache(0x188f4ffc): could not load 1084803339_BP-1933019065-10.17.221.22-1485905285703 due to InvalidToken exception. Java exception follows: org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to /user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba750004_131907241_data.0.parq at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:653) at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:552) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:804) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:738) at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:485) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355) at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1219) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1159) at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1533) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1492) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) W0513 19:50:11.318079 109914 DFSInputStream.java:1275] Connection failure: Failed to connect to /10.17.187.36:20002 for file /user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba750004_131907241_data.0.parq for block BP-1933019065-10.17.221.22-1485905285703:blk_1084803339_11063772:org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to /user/hive/warehouse/tpcds_3_parquet_6_rack.db/customer/1b4ebbf0879f988a-9733ba750004_131907241_data.0.parq {code} Impala error log {code} hdfsPread: FSDataInputStream#read error: BlockMissingException: Could not obtain block: BP-1933019065-10.17.221.22-1485905285703:blk_1084803339_11063772
[jira] [Assigned] (IMPALA-7020) Order by expressions in Analytical functions are not materialized causing slowdown
[ https://issues.apache.org/jira/browse/IMPALA-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar reassigned IMPALA-7020: --- Assignee: Alexander Behm > Order by expressions in Analytical functions are not materialized causing > slowdown > -- > > Key: IMPALA-7020 > URL: https://issues.apache.org/jira/browse/IMPALA-7020 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.12.0 >Reporter: Mostafa Mokhtar >Assignee: Alexander Behm >Priority: Major > Labels: performance > Attachments: Slow case profile.txt, Workaround profile.txt > > > Order by expressions in Analytical functions are not materialized and cause > queries to run much slower. > The rewrite for the query below is 20x faster, profiles attached. > Repro > {code} > select * > FROM > ( > SELECT > o.*, > ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn > FROM > ( > SELECT > l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as > string) evt_ts > FROM > lineitem > WHERE > l_shipdate BETWEEN '1992-01-01 00:00:00' AND '1992-01-15 00:00:00' > ) o > ) r > WHERE > rn BETWEEN 1 AND 101 > ORDER BY rn; > {code} > Workaround > {code} > select * > FROM > ( > SELECT > o.*, > ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn > FROM > ( > SELECT > l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as > string) evt_ts > FROM > lineitem > WHERE > l_shipdate BETWEEN '1992-01-01 00:00:00' AND '1992-01-15 00:00:00' > union all > SELECT > l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as > string) evt_ts > FROM > lineitem limit 0 > > ) o > ) r > WHERE > rn BETWEEN 1 AND 101 > ORDER BY rn; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7015) Insert into Kudu table returns with Status OK even if there are Kudu errors
[ https://issues.apache.org/jira/browse/IMPALA-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472302#comment-16472302 ] Mostafa Mokhtar commented on IMPALA-7015: - [~tmarsh] FYI > Insert into Kudu table returns with Status OK even if there are Kudu errors > --- > > Key: IMPALA-7015 > URL: https://issues.apache.org/jira/browse/IMPALA-7015 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 2.12.0 >Reporter: Mostafa Mokhtar >Priority: Major > Attachments: Insert into kudu profile with errors.txt > > > DML statements against Kudu tables return status OK even if there are Kudu > errors. > This behavior is misleading. > {code} > Summary: > Session ID: 18430b000e5dd8dc:e3e5dadb4a15d4b4 > Session Type: BEESWAX > Start Time: 2018-05-11 10:10:07.314218000 > End Time: 2018-05-11 10:10:07.434017000 > Query Type: DML > Query State: FINISHED > Query Status: OK > Impala Version: impalad version 2.12.0-cdh5.15.0 RELEASE (build > 2f9498d5c2f980aa7ff9505c56654c8e59e026ca) > User: mmokhtar > Connected User: mmokhtar > Delegated User: > Network Address: :::10.17.234.27:60760 > Default Db: tpcds_1000_kudu > Sql Statement: insert into store_2 select * from store > Coordinator: vd1317.foo:22000 > Query Options (set by configuration): > Query Options (set by configuration and planner): MT_DOP=0 > Plan: > {code} > {code} > Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem > Est. Peak Mem Detail > - > 02:PARTIAL SORT5 909.030us 1.025ms 1.00K 1.00K 6.14 MB > 4.00 MB > 01:EXCHANGE56.262ms 7.232ms 1.00K 1.00K 75.50 KB > 0 KUDU(KuduPartition(tpcds_1000_kudu.store.s_store_sk)) > 00:SCAN KUDU 53.694ms 4.137ms 1.00K 1.00K 4.34 MB > 0 tpcds_1000_kudu.store > Errors: Key already present in Kudu table > 'impala::tpcds_1000_kudu.store_2'. (1 of 1002 similar) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6897) Catalog server should flag tables with large number of small files
[ https://issues.apache.org/jira/browse/IMPALA-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470688#comment-16470688 ] Mostafa Mokhtar commented on IMPALA-6897: - The Catalog has a section for Top tables in terms of metadata size. > Catalog server should flag tables with large number of small files > -- > > Key: IMPALA-6897 > URL: https://issues.apache.org/jira/browse/IMPALA-6897 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.13.0 >Reporter: bharath v >Priority: Major > Labels: ramp-up, supportability > > Since Catalog has all the file metadata information available, it should help > flag tables with large number of small files. This information can be > propagated to the coordinators and should be reflected in the query profiles > like how we do for "missing stats". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2945) Low memory estimation in local aggregation
[ https://issues.apache.org/jira/browse/IMPALA-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461594#comment-16461594 ] Mostafa Mokhtar commented on IMPALA-2945: - I disagree, the planner and admission control should not be blocked by one another. There is no point fixing memory based admission control only to realize that the planner estimate are off because fixes were stalled. Why would users migrate away from memory-based admission control? > Low memory estimation in local aggregation > -- > > Key: IMPALA-2945 > URL: https://issues.apache.org/jira/browse/IMPALA-2945 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.0, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, > Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala > 2.11.0, Impala 2.12.0 >Reporter: Mostafa Mokhtar >Priority: Major > Labels: planner, resource-management > > When computing the per-host memory estimate for local aggregations, the > planner does not take into account that data is randomly distributed across > nodes leading to significant underestimation in some cases. The suggested fix > is to use min(agg input cardinality, NDV * #hosts) as the per-node > cardinality used for the per-node memory estimate. > Impact: In the query below, the planner significantly underestimates the > per-node memory of agg node 03 to be 3.8GB but the actual is 24.77. > Query > {code} > select sum(l_extendedprice) / 7.0 as avg_yearly > from > lineitem, > part > where > p_partkey = l_partkey > and p_brand = 'Brand#23' > and p_container = 'MED BOX' > and l_quantity < ( > select > 0.2 * avg(l_quantity) > from > lineitem > where > l_partkey = p_partkey > ) > {code} > Plan > {code} > 12:AGGREGATE [FINALIZE] > | output: sum:merge(l_extendedprice) > | hosts=20 per-host-mem=unavailable > | tuple-ids=6 row-size=16B cardinality=1 > | > 11:EXCHANGE [UNPARTITIONED] > | hosts=20 per-host-mem=unavailable > | tuple-ids=6 row-size=16B cardinality=1 > | > 06:AGGREGATE > | output: sum(l_extendedprice) > | hosts=20 per-host-mem=10.00MB > | tuple-ids=6 row-size=16B cardinality=1 > | > 05:HASH JOIN [RIGHT SEMI JOIN, PARTITIONED] > | hash predicates: l_partkey = p_partkey > | other join predicates: l_quantity < 0.2 * avg(l_quantity) > | hosts=20 per-host-mem=125.18MB > | tuple-ids=0,1 row-size=80B cardinality=29992141 > | > |--10:EXCHANGE [HASH(p_partkey)] > | | hosts=20 per-host-mem=0B > | | tuple-ids=0,1 row-size=80B cardinality=29992141 > | | > | 04:HASH JOIN [INNER JOIN, BROADCAST] > | | hash predicates: l_partkey = p_partkey > | | hosts=20 per-host-mem=58.30MB > | | tuple-ids=0,1 row-size=80B cardinality=29992141 > | | > | |--09:EXCHANGE [BROADCAST] > | | | hosts=20 per-host-mem=0B > | | | tuple-ids=1 row-size=56B cardinality=100 > | | | > | | 01:SCAN HDFS [tpch_1000_decimal_parquet.part, RANDOM] > | | partitions=1/1 files=40 size=6.38GB > | | predicates: p_brand = 'Brand#23', p_container = 'MED BOX' > | | table stats: 2 rows total > | | column stats: all > | | hosts=20 per-host-mem=264.00MB > | | tuple-ids=1 row-size=56B cardinality=100 > | | > | 00:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM] > | partitions=1/1 files=880 size=216.61GB > | table stats: 589709 rows total > | column stats: all > | hosts=20 per-host-mem=264.00MB > | tuple-ids=0 row-size=24B cardinality=589709 > | > 08:AGGREGATE [FINALIZE] > | output: avg:merge(l_quantity) > | group by: l_partkey > | hosts=20 per-host-mem=167.89MB > | tuple-ids=4 row-size=16B cardinality=200052064 > | > 07:EXCHANGE [HASH(l_partkey)] > | hosts=20 per-host-mem=0B > | tuple-ids=3 row-size=16B cardinality=200052064 > | > 03:AGGREGATE > | output: avg(l_quantity) > | group by: l_partkey > | hosts=20 per-host-mem=3.28GB > | tuple-ids=3 row-size=16B cardinality=200052064 > | > 02:SCAN HDFS [tpch_1000_decimal_parquet.lineitem, RANDOM] >partitions=1/1 files=880 size=216.61GB >table stats: 589709 rows total >column stats: all >hosts=20 per-host-mem=176.00MB >tuple-ids=2 row-size=16B cardinality=589709 > {code} > Summary > |Operator ||#Hosts|| Avg Time|| Max Time||#Rows ||Est. > #Rows|| Peak Mem ||Est. Peak Mem ||Detail | > |12:AGGREGATE |1 |256.620ms| 256.620ms|1| 1| > 92.00 KB|-1.00 B| FINALIZE | > |11:EXCHANGE |1 |184.430us| 184.430us| 20| 1| > 0|-1.00 B| UNPARTITIONED | > |06:AGGREGATE |20 |364.045ms|1s508ms| 20| 1| >9.37 MB|