Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/7580#issuecomment-123588213
> 2. I have defined checkInputDataTypes. Based on the requirement to check
that argument one is of type Array[T] and that the value to check is of type T,
it doesn't look like ExpectsInputTypes is sufficient. Is this correct?
For the complex data type, I don't think we need the `ExpectsInputTypes`,
overriding the `checkInputDataType` looks good to me.
>3. It looks like hive uses ObjectInspectorUtils defined here:
>https://github.com/apache/hive/blob/af4aeab9c0dffc5f8e42428bf8b835dccc8771ef/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L941
to see if types are equal. Is it necessary to do a recursive type check, if so
does spark have a utility class/object to do this?
Actually, except the `BinaryType`, the other data type (in catalyst data
type) should work correctly. The internal data type of `BinaryType` is actually
the Array[Byte], (`byte[]` in Java), so you're right, we have to recursively
checking the equality, however, this is a generic problem, we probably need to
add a wrapper for `Array[Byte]`, but we can leave it for the future
implementation.
> 4. Does it make sense to define eval? The default scenario is that if
there is a null input, to return null. Since that behavior doesn't match what
hive does, it seems to make sense to define a custom eval which takes care of
null checks on left and right.
Overriding the `nullSafeEval` instead.
> 5. Lastly, I am having trouble testing that if given a null argument,
array_contains should return false. This is due to checkInputTypes throwing a
runtime error if there is a type mismatch, in this case because null is not an
Integer. What is the correct way to test this, if it makes sense to do so?
We can test that via DF test suite.
> 6. Do I need to be checking if types are comparable? If so, will this be
a scala/java thing, or does spark have a notion of comparable?
For equality checking, I think every catalyst data type will handle that by
correct overriding the `equals` method, (Array[Byte] is a special case that we
need to fix).
For comparable, you can check the sub classes of `AtomicType`, which has an
`ordering` property.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]