Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
......................................................................


Patch Set 14:

(3 comments)

The code changes and testing seem generally fine, the min-max filtering needs 
fixing though.

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@27
PS14, Line 27: IMPALA-5675 tracks adding UTF-8 Character length support to 
VARCHAR
             : columns and marked the truncation code with a TODO that 
references
             : that Jira.
> I don't expect any additional sorting or predicate issues outside of any th
UTF-8 has the nice property that sorting by (unsigned) byte values is 
equivalent to sorting in codepoint order so this should just work as far as I 
can see it.

I.e. I think there aren't any additional ripple effects from the truncation, 
just everything behaves as you would expect from the values being truncated.

There are other fancier collation algorithms, but as far as I'm aware nothing 
in our stack uses them by default.


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@33
PS14, Line 33: * Manually reproduced a check failure due to multi-byte 
characters
             :   and tested that length truncation resolve that issue.
> I will look at implementing it. The main challenge is inserting data direct
This would be a good sanity check, if only to prevent future breakage. But if 
it's not feasible right now, seems like we could live with the manual testing.


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@47
PS14, Line 47: support
> I am under the impression they should work (given it's just strings), but n
I don't think there are major changes needed, but definitely some changes, 
since it crashes now. If we want to push this out we can disable the filters 
for now, e.g. DECIMAL filters were previously disabled like this: 
https://gerrit.cloudera.org/#/c/12113/19/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java

  Query submitted at: 2020-03-30 15:11:00 (Coordinator: 
http://tarmstrong-box2:25000)
  Query progress can be monitored at: 
http://tarmstrong-box2:25000/query_plan?query_id=194eb5100ef2d6ed:b373b84200000000
  +----------+
  | count(*) |
  +----------+
  | 0        |
  +----------+
  Fetched 1 row(s) in 0.37s
  [localhost.EXAMPLE.COM:21000] default> select straight_join count(*) from 
ctas_varchar join functional.chars_medium t on vc = t.varchar_col;
  Query: select straight_join count(*) from ctas_varchar join 
functional.chars_medium t on vc = t.varchar_col
  Query submitted at: 2020-03-30 15:11:12 (Coordinator: 
http://tarmstrong-box2:25000)
  Query progress can be monitored at: 
http://tarmstrong-box2:25000/query_plan?query_id=cc476b56d10a741e:99f338be00000000
  [                                                                             
                       ] 0%
  ERROR: Failed due to unreachable impalad(s): tarmstrong-box2:22001


from impalad_node1.ERROR:

  F0330 15:11:17.178375  8395 min-max-filter.cc:65] 
cc476b56d10a741e:99f338be00000002] Check failed: llvm_class != 
MIN_MAX_FILTER_LLVM_CLASS_NAMES.end() Not a valid type: 16
  *** Check failure stack trace: ***
      @          0x4f92d8c  google::LogMessage::Fail()
      @          0x4f94631  google::LogMessage::SendToLog()
      @          0x4f92766  google::LogMessage::Flush()
      @          0x4f95d2d  google::LogMessageFatal::~LogMessageFatal()
      @          0x2618bd0  impala::MinMaxFilter::GetLlvmClassName()
      @          0x28aa046  impala::FilterContext::CodegenInsert()
      @          0x2810a2e  
impala::PhjBuilderConfig::CodegenInsertRuntimeFilters()
      @          0x280e26b  impala::PhjBuilderConfig::Codegen()
      @          0x280dfd5  impala::PhjBuilder::Codegen()
      @          0x281ba6a  impala::PartitionedHashJoinNode::Codegen()
      @          0x272288f  impala::ExecNode::Codegen()
      @          0x287b2c6  impala::AggregationNodeBase::Codegen()
      @          0x22d26f7  impala::FragmentInstanceState::Open()
      @          0x22cf787  impala::FragmentInstanceState::Exec()
      @          0x22e391e  impala::QueryState::ExecFInstance()
      @          0x22e1c3d  
_ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
      @          0x22e5506  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
      @          0x20c079b  boost::function0<>::operator()()
      @          0x26828ea  impala::Thread::SuperviseThread()
      @          0x268ab6e  boost::_bi::list5<>::operator()<>()
      @          0x268aa92  boost::_bi::bind_t<>::operator()()
      @          0x268aa55  boost::detail::thread_data<>::run()
      @          0x3eb9379  thread_proxy
      @     0x7f2a077f06da  start_thread
      @     0x7f2a0423e88e  clone


Predicate pushdown basically works, but is not enabled in many cases unless you 
explicitly cast the literal to varchar, e.g. this works:

  select straight_join count(*) from ctas_varchar where vc > cast('1' as 
varchar(1));



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor <abu...@apache.org>
Gerrit-Reviewer: Attila Bukor <abu...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Mar 2020 22:19:07 +0000
Gerrit-HasComments: Yes

Reply via email to