[GitHub] spark pull request: [SPARK-8231][SQL] Add array_contains

chenghao-intel Wed, 22 Jul 2015 00:21:07 -0700

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/7580#issuecomment-123588213
  
    > 2. I have defined checkInputDataTypes. Based on the requirement to check 
that argument one is of type Array[T] and that the value to check is of type T, 
it doesn't look like ExpectsInputTypes is sufficient. Is this correct?
    For the complex data type, I don't think we need the `ExpectsInputTypes`, 
overriding the `checkInputDataType` looks good to me.
    
    >3. It looks like hive uses ObjectInspectorUtils defined here: 
>https://github.com/apache/hive/blob/af4aeab9c0dffc5f8e42428bf8b835dccc8771ef/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L941
 to see if types are equal. Is it necessary to do a recursive type check, if so 
does spark have a utility class/object to do this?
    Actually, except the `BinaryType`, the other data type (in catalyst data 
type) should work correctly. The internal data type of `BinaryType` is actually 
the Array[Byte], (`byte[]` in Java), so you're right, we have to recursively 
checking the equality, however, this is a generic problem, we probably need to 
add a wrapper for `Array[Byte]`, but we can leave it for the future 
implementation.
    
    > 4. Does it make sense to define eval? The default scenario is that if 
there is a null input, to return null. Since that behavior doesn't match what 
hive does, it seems to make sense to define a custom eval which takes care of 
null checks on left and right.
    Overriding the `nullSafeEval` instead.
    
    > 5. Lastly, I am having trouble testing that if given a null argument, 
array_contains should return false. This is due to checkInputTypes throwing a 
runtime error if there is a type mismatch, in this case because null is not an 
Integer. What is the correct way to test this, if it makes sense to do so?
    We can test that via DF test suite.
    
    > 6. Do I need to be checking if types are comparable? If so, will this be 
a scala/java thing, or does spark have a notion of comparable?
    For equality checking, I think every catalyst data type will handle that by 
correct overriding the `equals` method, (Array[Byte] is a special case that we 
need to fix).
    For comparable, you can check the sub classes of `AtomicType`, which has an 
`ordering` property.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8231][SQL] Add array_contains

Reply via email to