Github user javadba commented on the pull request:

    https://github.com/apache/spark/pull/1586#issuecomment-50289795
  
    The updated code got caught by one of the cases in the Hive compatibility 
suite.
    
    The Hive UDF length calculation appears to be  different than the new one 
implemented, presumably due to differences in handling of character encoding.  
For the fix: I will make the length() function use the same character encoding 
as does Hive to keep it compatible.  The strlen() method will be the "outlet" 
to permit flexible handling of multi byte character sets in the general RDD (no 
strlen method is defined in hive proper).
    
     I am going to roll back just the hive portion of the commit, and will 
report back end of evening.
    
     udf_length *** FAILED ***
    [info]   Results do not match for udf_length:
    [info]   SELECT length(dest1.name) FROM dest1
    [info]   == Logical Plan ==
    [info]   Project [Length(name#41188) AS c_0#41186]
    [info]    MetastoreRelation default, dest1, None
    [info]   
    [info]   == Optimized Logical Plan ==
    [info]   Project [Length(name#41188) AS c_0#41186]
    [info]    MetastoreRelation default, dest1, None
    [info]   
    [info]   == Physical Plan ==
    [info]   Project [Length(name#41188:0) AS c_0#41186]
    [info]    HiveTableScan [name#41188], (MetastoreRelation default, dest1, 
None), None
    [info]   c_0
    [info]   !== HIVE - 1 row(s) ==   == CATALYST - 1 row(s) ==
    [info]   !2                       6 (HiveComparisonTest.scala:366)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to