helifu has posted comments on this change. (
http://gerrit.cloudera.org:8080/13987 )
Change subject: KUDU-2854 short circuit predicates on dictionary-coded columns
......................................................................
Patch Set 4:
This morning on slack, Todd suggested me to use 'kudu perf loadgen' tool to
load data to an existing wide table with ~280+ columns since his benchmark was
a customer workload. So, I did some benchmarks today. And the conclusion is
that we don't regress that perf gain with current patch since we don't see
"DeltaPreparer". Here are the details:
##1.I ran the benchmark with this patch:
(a) select * from my_wide_table where c280 = <value that does not exist>
Samples: 2K of event 'cycles', Event count (approx.): 176315474
25.30% rpc worker-6290 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*,
kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu
12.46% rpc worker-6290 kudu-tserver [.]
kudu::cfile::CFileIterator::PrepareMatchingCodeWords(kudu::ColumnMaterializationContext*)
9.17% rpc worker-6290 kudu-tserver [.]
kudu::Slice::compare(kudu::Slice const&) const
3.74% rpc worker-6290 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
2.93% rpc worker-6290 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
2.44% rpc worker-6290 kudu-tserver [.]
boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader,
std::default_delete<kudu::cfile::CFileReader> >, std::less<int>,
1.83% rpc worker-6290 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::GetIteratorStats(std::vector<kudu::IteratorStats,
std::allocator<kudu::IteratorStats> >*) const
1.83% rpc worker-6290 kudu-tserver [.]
bshuf_shuffle_bit_eightelem_SSE_avx2
1.83% rpc worker-6290 kudu-tserver [.] operator delete[](void*,
std::nothrow_t const&)
1.47% rpc worker-6290 kudu-tserver [.] LZ4_memcpy_using_offset
1.30% rpc reactor-628 [kernel.kallsyms] [k] find_busiest_group
1.10% rpc worker-6290 kudu-tserver [.]
tcmalloc::CentralFreeList::ReleaseToSpans(void*)
1.10% rpc worker-6290 kudu-tserver [.]
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
unsigned int, int)
1.10% rpc worker-6290 kudu-tserver [.]
kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
1.10% rpc worker-6290 kudu-tserver [.]
kudu::cfile::BinaryDictBlockDecoder::GetFirstRowId() const
1.10% rpc worker-6290 kudu-tserver [.]
LZ4_decompress_generic.constprop.3
0.91% rpc worker-6290 libstdc++.so.6.0.21 [.] std::_Hash_bytes(void
const*, unsigned long, unsigned long)
0.75% rpc reactor-628 libssl.so.1.0.0 [.] 0x0000000000024667
0.73% rpc worker-6290 kudu-tserver [.]
kudu::ArenaBase<false>::Reset()
0.73% rpc worker-6290 kudu-tserver [.]
kudu::BitmapChangeBits(unsigned char*, unsigned long, unsigned long, bool)
0.73% rpc worker-6290 libstdc++.so.6.0.21 [.]
__cxxabiv1::__si_class_type_info::__do_dyncast(long,
__cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info
const*, void
0.73% rpc worker-6290 kudu-tserver [.]
kudu::cfile::CFileIterator::~CFileIterator()
0.73% rpc worker-6290 kudu-tserver [.]
kudu::cfile::CFileIterator::TrySkipDictCodedBlock(unsigned long*,
kudu::SelectionVectorView*, kudu::cfile::BlockDecoder*) const
0.73% rpc worker-6290 kudu-tserver [.]
kudu::tablet::DeltaApplier::MaterializeColumn(kudu::ColumnMaterializationContext*)
(b) select * from my_wide_table where c280 = <value that exists>
Samples: 2K of event 'cycles', Event count (approx.): 233293310
22.17% rpc worker-6290 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*,
kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu
10.81% rpc worker-6290 kudu-tserver [.]
kudu::cfile::CFileIterator::PrepareMatchingCodeWords(kudu::ColumnMaterializationContext*)
10.25% rpc worker-6290 kudu-tserver [.]
kudu::Slice::compare(kudu::Slice const&) const
5.54% rpc worker-6290 kudu-tserver [.]
bshuf_shuffle_bit_eightelem_SSE_avx2
4.99% rpc worker-6290 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
2.96% rpc worker-6290 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
2.70% rpc reactor-628 [kernel.kallsyms] [k] find_busiest_group
1.66% rpc worker-6290 kudu-tserver [.]
bshuf_trans_byte_bitrow_SSE_avx2
1.41% rpc worker-6290 kudu-tserver [.]
boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader,
std::default_delete<kudu::cfile::CFileReader> >, std::less<int>,
1.11% rpc worker-6290 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::PrepareColumn(kudu::ColumnMaterializationContext*)
1.11% rpc worker-6290 kudu-tserver [.] Bits::Count(void const*,
int)
1.06% rpc worker-6290 [kernel.kallsyms] [k] clear_page_c_e
0.88% rpc reactor-628 kudu-tserver [.] ev_run
0.83% rpc worker-6290 kudu-tserver [.]
tcmalloc::CentralFreeList::ReleaseToSpans(void*)
0.83% rpc worker-6290 kudu-tserver [.] LZ4_memcpy_using_offset
0.83% rpc worker-6290 [kernel.kallsyms] [k] page_fault
0.83% rpc worker-6290 kudu-tserver [.]
kudu::cfile::CFileIterator::TrySkipDictCodedBlock(unsigned long*,
kudu::SelectionVectorView*, kudu::cfile::BlockDecoder*) const
0.83% rpc worker-6290 kudu-tserver [.]
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
unsigned int, int)
0.83% rpc worker-6290 [vdso] [.] __vdso_clock_gettime
0.83% rpc worker-6290 kudu-tserver [.]
kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
0.77% rpc worker-6290 libstdc++.so.6.0.21 [.] std::_Hash_bytes(void
const*, unsigned long, unsigned long)
0.73% rpc worker-6290 kudu-tserver [.]
tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void*
(*)(unsigned long))
0.69% rpc reactor-628 kudu-tserver [.] operator delete[](void*,
std::nothrow_t const&)
0.67% raft [worker]-6 [kernel.kallsyms] [k] find_busiest_group
0.67% rpc reactor-628 [kernel.kallsyms] [k] cpumask_next_and
0.59% rpc reactor-628 [kernel.kallsyms] [k] find_next_bit
0.55% rpc worker-6290 kudu-tserver [.]
LZ4_decompress_generic.constprop.3
0.55% rpc worker-6290 kudu-tserver [.]
kudu::MaterializingIterator::MaterializeBlock(kudu::RowBlock*)
##2.I ran the benchmark without this patch:
(a) select * from my_wide_table where c280 = <value that does not exist>
Samples: 863 of event 'cycles', Event count (approx.): 177463370
21.86% rpc worker-6567 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*,
kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu:
14.21% rpc worker-6567 kudu-tserver [.]
kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
11.29% rpc worker-6567 kudu-tserver [.]
kudu::Slice::compare(kudu::Slice const&) const
5.10% rpc worker-6567 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
3.28% rpc worker-6567 kudu-tserver [.]
bshuf_shuffle_bit_eightelem_SSE_avx2
2.68% rpc worker-6567 kudu-tserver [.]
boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader,
std::default_delete<kudu::cfile::CFileReader> >, std::less<int>, b
2.57% rpc worker-6567 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
2.19% rpc worker-6567 kudu-tserver [.] operator delete[](void*,
std::nothrow_t const&)
1.77% rpc worker-6567 kudu-tserver [.]
kudu::cfile::CFileReader::NewIterator(std::unique_ptr<kudu::cfile::CFileIterator,
std::default_delete<kudu::cfile::CFileIterator> >*, kudu::
1.46% rpc worker-6567 kudu-tserver [.]
kudu::cfile::BinaryDictBlockDecoder::GetFirstRowId() const
1.46% rpc worker-6567 kudu-tserver [.]
kudu::SelectionVector::AnySelected() const
1.13% rpc worker-6577 [kernel.kallsyms] [k] fput
1.09% rpc worker-6567 kudu-tserver [.] LZ4_memcpy_using_offset
1.09% rpc worker-6567 kudu-tserver [.]
kudu::cfile::CFileIterator::~CFileIterator()
0.86% rpc reactor-656 [kernel.kallsyms] [k]
copy_user_enhanced_fast_string
0.81% rpc reactor-656 [kernel.kallsyms] [k] find_busiest_group
0.78% rpc worker-6567 kudu-tserver [.] std::_Hashtable<std::string,
std::pair<std::string const, kudu::ColumnPredicate>,
std::allocator<std::pair<std::string const, kudu::ColumnPr
0.73% rpc worker-6567 kudu-tserver [.]
kudu::BitmapChangeBits(unsigned char*, unsigned long, unsigned long, bool)
0.73% rpc worker-6567 kudu-tserver [.]
kudu::tserver::Scanner::has_fulfilled_limit() const
0.73% rpc worker-6567 kudu-tserver [.]
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
unsigned int, int)
0.73% rpc worker-6567 kudu-tserver [.]
kudu::tablet::DeltaApplier::FinishBatch()
0.73% rpc worker-6567 libc-2.19.so [.] __clock_gettime
0.73% rpc worker-6567 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::InitializeSelectionVector(kudu::SelectionVector*)
0.73% rpc worker-6567 kudu-tserver [.] kudu::MonoTime::Now()
0.73% rpc worker-6567 kudu-tserver [.]
kudu::UnionIterator::HasNext() const
0.73% rpc worker-6567 kudu-tserver [.]
kudu::cfile::CFileIterator::PrepareBatch(unsigned long*)
0.73% rpc worker-6567 kudu-tserver [.]
kudu::ColumnSchemaPB::SharedDtor()
0.73% rpc worker-6567 kudu-tserver [.]
kudu::MaterializingIterator::MaterializeBlock(kudu::RowBlock*)
(b) select * from my_wide_table where c280 = <value that exists>
Samples: 2K of event 'cycles', Event count (approx.): 235218822
23.63% rpc worker-6567 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*,
kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu
10.44% rpc worker-6567 kudu-tserver [.]
kudu::Slice::compare(kudu::Slice const&) const
9.34% rpc worker-6567 kudu-tserver [.]
kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
6.60% rpc worker-6567 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
5.22% rpc worker-6567 kudu-tserver [.]
bshuf_shuffle_bit_eightelem_SSE_avx2
2.62% rpc worker-6567 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
2.20% rpc worker-6567 kudu-tserver [.]
bshuf_trans_byte_bitrow_SSE_avx2
1.92% rpc worker-6567 kudu-tserver [.]
LZ4_decompress_generic.constprop.3
1.37% rpc worker-6567 kudu-tserver [.]
kudu::SelectionVector::AnySelected() const
1.26% rpc worker-6567 kudu-tserver [.]
std::_Hashtable<std::string, std::pair<std::string const,
kudu::ColumnPredicate>, std::allocator<std::pair<std::string const,
kudu::ColumnP
1.20% rpc worker-6567 kudu-tserver [.]
tcmalloc::CentralFreeList::Populate()
1.10% rpc worker-6567 kudu-tserver [.]
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
unsigned int, int)
1.10% rpc worker-6567 [kernel.kallsyms] [k] clear_page_c_e
0.84% rpc worker-6567 kudu-tserver [.]
boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader,
std::default_delete<kudu::cfile::CFileReader> >, std::less<int>,
0.82% rpc worker-6567 kudu-tserver [.] LZ4_memcpy_using_offset
0.82% rpc worker-6567 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::GetIteratorStats(std::vector<kudu::IteratorStats,
std::allocator<kudu::IteratorStats> >*) const
0.67% rpc reactor-656 kudu-tserver [.] std::unordered_map<unsigned
long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned
long>, std::allocator<std::pair
0.62% rpc reactor-656 libssl.so.1.0.0 [.] 0x0000000000023566
0.55% rpc reactor-656 libcrypto.so.1.0.0 [.] 0x00000000000a9e6f
0.55% rpc worker-6567 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::MaterializeColumn(kudu::ColumnMaterializationContext*)
##3.I ran the benchmark on released version 1.10.1 which didn't include the
patch for KUDU-2381:
(a) select * from my_wide_table where c280 = <value that does not exist>
Samples: 1K of event 'cycles', Event count (approx.): 272379216
18.76% rpc worker-1038 kudu-tserver [.]
kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::Start(unsigned
long, int)
16.15% rpc worker-1038 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*,
kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu
7.60% rpc worker-1038 kudu-tserver [.]
kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
7.37% rpc worker-1038 kudu-tserver [.]
kudu::Slice::compare(kudu::Slice const&) const
7.12% rpc worker-1038 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::FinishBatch()
4.51% rpc worker-1038 kudu-tserver [.]
kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::MayHaveDeltas()
const
3.13% rpc worker-1038 kudu-tserver [.] operator new[](unsigned
long)
2.14% rpc worker-1038 kudu-tserver [.]
kudu::SelectionVector::AnySelected() const
2.14% rpc worker-1038 kudu-tserver [.]
boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader,
std::default_delete<kudu::cfile::CFileReader> >, std::less<int>,
1.67% rpc worker-1038 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
1.49% rpc worker-1038 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
1.43% rpc worker-1038 kudu-tserver [.] LZ4_decompress_fast
1.43% rpc worker-1038 kudu-tserver [.] operator delete[](void*,
std::nothrow_t const&)
1.40% rpc worker-1038 kudu-tserver [.]
std::_Hashtable<std::string, std::pair<std::string const,
kudu::ColumnPredicate>, std::allocator<std::pair<std::string const,
kudu::ColumnP
1.19% rpc worker-1038 kudu-tserver [.]
kudu::cfile::CFileIterator::seeked() const
0.95% rpc worker-1038 kudu-tserver [.]
kudu::cfile::CFileIterator::~CFileIterator()
0.83% rpc worker-1038 libstdc++.so.6.0.21 [.] std::_Hash_bytes(void
const*, unsigned long, unsigned long)
0.71% rpc worker-1038 kudu-tserver [.] kudu::(anonymous
namespace)::ShardedCache<(kudu::Cache::EvictionPolicy)1>::Lookup(kudu::Slice
const&, kudu::Cache::CacheBehavior)
0.71% rpc worker-1038 kudu-tserver [.]
kudu::cfile::CFileIterator::~CFileIterator()
0.71% rpc worker-1038 kudu-tserver [.]
kudu::tablet::DMSIterator::~DMSIterator()
0.71% rpc worker-1038 [vdso] [.] __vdso_clock_gettime
0.71% rpc worker-1038 kudu-tserver [.]
kudu::tserver::TabletServiceImpl::HandleContinueScanRequest(kudu::tserver::ScanRequestPB
const*, kudu::rpc::RpcContext const*, kudu::tserve
0.71% rpc worker-1038 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::InitializeSelectionVector(kudu::SelectionVector*)
0.59% rpc worker-1038 libstdc++.so.6.0.21 [.] std::basic_string<char,
std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
0.56% rpc worker-1038 kudu-tserver [.]
kudu::ColumnSchemaFromPB(kudu::ColumnSchemaPB const&)
0.55% rpc worker-1038 kudu-tserver [.]
tcmalloc::CentralFreeList::Populate()
0.50% maintenance_sch [kernel.kallsyms] [k] queue_delayed_work_on
0.50% maintenance_sch kudu-tserver [.]
kudu::tablet::DiskRowSet::DeltaMemStoreEmpty() const
0.48% rpc worker-1038 kudu-tserver [.]
kudu::tablet::DeltaApplier::MaterializeColumn(kudu::ColumnMaterializationContext*)
(b) select * from my_wide_table where c280 = <value that exists>
Samples: 1K of event 'cycles', Event count (approx.): 304579571
16.99% rpc worker-1038 kudu-tserver [.]
kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::Start(unsigned
long, int)
14.65% rpc worker-1038 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*,
kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu
8.92% rpc worker-1038 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::FinishBatch()
7.86% rpc worker-1038 kudu-tserver [.]
kudu::Slice::compare(kudu::Slice const&) const
6.37% rpc worker-1038 kudu-tserver [.]
kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
5.31% rpc worker-1038 kudu-tserver [.]
kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
4.25% rpc worker-1038 kudu-tserver [.]
bshuf_shuffle_bit_eightelem_SSE_avx2
3.40% rpc worker-1038 kudu-tserver [.]
kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::MayHaveDeltas()
const
2.55% rpc worker-1038 kudu-tserver [.] LZ4_decompress_fast
2.12% rpc worker-1038 kudu-tserver [.]
kudu::SelectionVector::AnySelected() const
1.91% rpc worker-1038 kudu-tserver [.] operator new[](unsigned
long)
1.83% rpc worker-1038 kudu-tserver [.]
boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader,
std::default_delete<kudu::cfile::CFileReader> >, std::less<int>,
1.49% rpc worker-1038 kudu-tserver [.] operator delete[](void*,
std::nothrow_t const&)
1.36% rpc worker-1038 kudu-tserver [.]
kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
1.19% rpc worker-1038 kudu-tserver [.]
tcmalloc::CentralFreeList::Populate()
1.06% rpc worker-1038 kudu-tserver [.]
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
unsigned int, int)
0.85% rpc worker-1038 kudu-tserver [.]
kudu::cfile::CFileIterator::~CFileIterator()
0.82% rpc worker-1038 kudu-tserver [.]
kudu::cfile::CFileReader::NewIterator(std::unique_ptr<kudu::cfile::CFileIterator,
std::default_delete<kudu::cfile::CFileIterator> >*, kudu:
0.64% rpc worker-1038 kudu-tserver [.]
std::vector<std::deque<kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::ColumnUpdate,
std::allocator<kudu::tablet::DeltaPrepar
0.64% rpc worker-1038 kudu-tserver [.]
std::_Deque_base<kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::ColumnUpdate,
std::allocator<kudu::tablet::DeltaPreparer<kud
0.64% rpc worker-1038 kudu-tserver [.] kudu::(anonymous
namespace)::ShardedCache<(kudu::Cache::EvictionPolicy)1>::Lookup(kudu::Slice
const&, kudu::Cache::CacheBehavior)
0.64% rpc worker-1038 kudu-tserver [.]
kudu::MaterializingIterator::HasNext() const
--
To view, visit http://gerrit.cloudera.org:8080/13987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
Gerrit-Change-Number: 13987
Gerrit-PatchSet: 4
Gerrit-Owner: helifu <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <[email protected]>
Gerrit-Comment-Date: Thu, 05 Sep 2019 10:10:05 +0000
Gerrit-HasComments: No