Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/2762#discussion_r19515074
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala ---
@@ -123,43 +94,89 @@ private[hive] trait HiveInspectors {
case pi: PrimitiveObjectInspector => pi.getPrimitiveJavaObject(data)
case li: ListObjectInspector =>
Option(li.getList(data))
- .map(_.map(unwrapData(_, li.getListElementObjectInspector)).toSeq)
+ .map(_.map(unwrap(_, li.getListElementObjectInspector)).toSeq)
.orNull
case mi: MapObjectInspector =>
Option(mi.getMap(data)).map(
_.map {
case (k,v) =>
- (unwrapData(k, mi.getMapKeyObjectInspector),
- unwrapData(v, mi.getMapValueObjectInspector))
+ (unwrap(k, mi.getMapKeyObjectInspector),
+ unwrap(v, mi.getMapValueObjectInspector))
}.toMap).orNull
case si: StructObjectInspector =>
val allRefs = si.getAllStructFieldRefs
new GenericRow(
allRefs.map(r =>
- unwrapData(si.getStructFieldData(data,r),
r.getFieldObjectInspector)).toArray)
+ unwrap(si.getStructFieldData(data,r),
r.getFieldObjectInspector)).toArray)
}
- /** Converts native catalyst types to the types expected by Hive */
- def wrap(a: Any): AnyRef = a match {
- case s: String => s: java.lang.String
- case i: Int => i: java.lang.Integer
- case b: Boolean => b: java.lang.Boolean
- case f: Float => f: java.lang.Float
- case d: Double => d: java.lang.Double
- case l: Long => l: java.lang.Long
- case l: Short => l: java.lang.Short
- case l: Byte => l: java.lang.Byte
- case b: BigDecimal => HiveShim.createDecimal(b.underlying())
- case b: Array[Byte] => b
- case d: java.sql.Date => d
- case t: java.sql.Timestamp => t
- case s: Seq[_] => seqAsJavaList(s.map(wrap))
- case m: Map[_,_] =>
- // Some UDFs seem to assume we pass in a HashMap.
- val hashMap = new java.util.HashMap[AnyRef, AnyRef]()
- hashMap.putAll(m.map { case (k, v) => wrap(k) -> wrap(v) })
- hashMap
- case null => null
+ /**
+ * Converts native catalyst types to the types expected by Hive
+ * @param a the value to be wrapped
+ * @param oi This ObjectInspector associated with the value returned by
this function, and
+ * the ObjectInspector should also be consistent with those
returned from
+ * toInspector: DataType => ObjectInspector and
+ * toInspector: Expression => ObjectInspector
+ */
+ def wrap(a: Any, oi: ObjectInspector): AnyRef = if (a == null) {
+ null
+ } else {
+ oi match {
+ case x: ConstantObjectInspector => x.getWritableConstantValue
+ case x: PrimitiveObjectInspector => a match {
+ // TODO what if x.preferWritable() == true? reuse the writable?
--- End diff --
Yes, currently the `oi` is should not be "preferWritable" as `toInspector`
doesn't return that. Even if we return an new instance of `Writable` here, it's
the same as the `preferWritable` `ObjectInspector` does internally.
As you suggested we don't want to dynamically check the `oi` type, I will
keep that for future improvement, and to reuse the writable object.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]