Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14197 )
Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables ...................................................................... Patch Set 14: (3 comments) The code changes and testing seem generally fine, the min-max filtering needs fixing though. http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@27 PS14, Line 27: IMPALA-5675 tracks adding UTF-8 Character length support to VARCHAR : columns and marked the truncation code with a TODO that references : that Jira. > I don't expect any additional sorting or predicate issues outside of any th UTF-8 has the nice property that sorting by (unsigned) byte values is equivalent to sorting in codepoint order so this should just work as far as I can see it. I.e. I think there aren't any additional ripple effects from the truncation, just everything behaves as you would expect from the values being truncated. There are other fancier collation algorithms, but as far as I'm aware nothing in our stack uses them by default. http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@33 PS14, Line 33: * Manually reproduced a check failure due to multi-byte characters : and tested that length truncation resolve that issue. > I will look at implementing it. The main challenge is inserting data direct This would be a good sanity check, if only to prevent future breakage. But if it's not feasible right now, seems like we could live with the manual testing. http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@47 PS14, Line 47: support > I am under the impression they should work (given it's just strings), but n I don't think there are major changes needed, but definitely some changes, since it crashes now. If we want to push this out we can disable the filters for now, e.g. DECIMAL filters were previously disabled like this: https://gerrit.cloudera.org/#/c/12113/19/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java Query submitted at: 2020-03-30 15:11:00 (Coordinator: http://tarmstrong-box2:25000) Query progress can be monitored at: http://tarmstrong-box2:25000/query_plan?query_id=194eb5100ef2d6ed:b373b84200000000 +----------+ | count(*) | +----------+ | 0 | +----------+ Fetched 1 row(s) in 0.37s [localhost.EXAMPLE.COM:21000] default> select straight_join count(*) from ctas_varchar join functional.chars_medium t on vc = t.varchar_col; Query: select straight_join count(*) from ctas_varchar join functional.chars_medium t on vc = t.varchar_col Query submitted at: 2020-03-30 15:11:12 (Coordinator: http://tarmstrong-box2:25000) Query progress can be monitored at: http://tarmstrong-box2:25000/query_plan?query_id=cc476b56d10a741e:99f338be00000000 [ ] 0% ERROR: Failed due to unreachable impalad(s): tarmstrong-box2:22001 from impalad_node1.ERROR: F0330 15:11:17.178375 8395 min-max-filter.cc:65] cc476b56d10a741e:99f338be00000002] Check failed: llvm_class != MIN_MAX_FILTER_LLVM_CLASS_NAMES.end() Not a valid type: 16 *** Check failure stack trace: *** @ 0x4f92d8c google::LogMessage::Fail() @ 0x4f94631 google::LogMessage::SendToLog() @ 0x4f92766 google::LogMessage::Flush() @ 0x4f95d2d google::LogMessageFatal::~LogMessageFatal() @ 0x2618bd0 impala::MinMaxFilter::GetLlvmClassName() @ 0x28aa046 impala::FilterContext::CodegenInsert() @ 0x2810a2e impala::PhjBuilderConfig::CodegenInsertRuntimeFilters() @ 0x280e26b impala::PhjBuilderConfig::Codegen() @ 0x280dfd5 impala::PhjBuilder::Codegen() @ 0x281ba6a impala::PartitionedHashJoinNode::Codegen() @ 0x272288f impala::ExecNode::Codegen() @ 0x287b2c6 impala::AggregationNodeBase::Codegen() @ 0x22d26f7 impala::FragmentInstanceState::Open() @ 0x22cf787 impala::FragmentInstanceState::Exec() @ 0x22e391e impala::QueryState::ExecFInstance() @ 0x22e1c3d _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv @ 0x22e5506 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE @ 0x20c079b boost::function0<>::operator()() @ 0x26828ea impala::Thread::SuperviseThread() @ 0x268ab6e boost::_bi::list5<>::operator()<>() @ 0x268aa92 boost::_bi::bind_t<>::operator()() @ 0x268aa55 boost::detail::thread_data<>::run() @ 0x3eb9379 thread_proxy @ 0x7f2a077f06da start_thread @ 0x7f2a0423e88e clone Predicate pushdown basically works, but is not enabled in many cases unless you explicitly cast the literal to varchar, e.g. this works: select straight_join count(*) from ctas_varchar where vc > cast('1' as varchar(1)); -- To view, visit http://gerrit.cloudera.org:8080/14197 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8 Gerrit-Change-Number: 14197 Gerrit-PatchSet: 14 Gerrit-Owner: Attila Bukor <abu...@apache.org> Gerrit-Reviewer: Attila Bukor <abu...@apache.org> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Mon, 30 Mar 2020 22:19:07 +0000 Gerrit-HasComments: Yes