[GitHub] spark pull request #19066: [SPARK-21255][SQL] simplify encoder for java enum

2017-08-28 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/19066#discussion_r135511606
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
 ---
@@ -81,19 +81,9 @@ object ExpressionEncoder {
   ClassTag[T](cls))
   }
 
-  def javaEnumSchema[T](beanClass: Class[T]): DataType = {
-StructType(Seq(StructField("enum",
-  StructType(Seq(StructField(beanClass.getSimpleName, StringType, 
nullable = false))),
-  nullable = false)))
-  }
-
   // TODO: improve error message for java bean encoder.
   def javaBean[T](beanClass: Class[T]): ExpressionEncoder[T] = {
-val schema = if (beanClass.isEnum) {
-  javaEnumSchema(beanClass)
-} else {
-  JavaTypeInference.inferDataType(beanClass)._1
-}
+val schema = JavaTypeInference.inferDataType(beanClass)._1
--- End diff --

If we don't support top level enums, maybe it would be appropriate to check 
that beanClass is not enum and throw exception otherwise


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-22 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r134622727
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -154,13 +154,13 @@ case class StaticInvoke(
 val evaluate = if (returnNullable) {
   if (ctx.defaultValue(dataType) == "null") {
 s"""
-  ${ev.value} = $callFunc;
+  ${ev.value} = (($javaType) ($callFunc));
--- End diff --

from janino documentation: "Type arguments: Are parsed, but otherwise 
ignored. The most significant restriction that follows is that you must cast 
return values from method invocations, e.g. "(String) myMap.get(key)" 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating encoder ...

2017-08-22 Thread mike0sv
Github user mike0sv commented on the issue:

https://github.com/apache/spark/pull/18488
  
Found this in janino documentation, it explains the need for explicit 
casting: "Type arguments: Are parsed, but otherwise ignored. The most 
significant restriction that follows is that you must cast return values from 
method invocations, e.g. "(String) myMap.get(key)" 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating encoder ...

2017-08-22 Thread mike0sv
Github user mike0sv commented on the issue:

https://github.com/apache/spark/pull/18488
  
@srowen you are right, we store string values of constant names (for test 
example, we would get A and B values, not google/elgoog)
I commented some of the changes for clarity, but I don't see any downsides.
There is only one change which affects not enum-specific code, the casting 
in code generation, but it's completely harmless


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-22 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r134480556
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -154,13 +154,13 @@ case class StaticInvoke(
 val evaluate = if (returnNullable) {
   if (ctx.defaultValue(dataType) == "null") {
 s"""
-  ${ev.value} = $callFunc;
+  ${ev.value} = (($javaType) ($callFunc));
--- End diff --

explicitly cast value to needed type, because without this generated code 
didn't compile with something like "cannot assign value of type Enum to 
%RealEnumClassName%


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-22 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r134479564
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
 ---
@@ -81,9 +81,19 @@ object ExpressionEncoder {
   ClassTag[T](cls))
   }
 
+  def javaEnumSchema[T](beanClass: Class[T]): DataType = {
+StructType(Seq(StructField("enum",
+  StructType(Seq(StructField(beanClass.getSimpleName, StringType, 
nullable = false))),
+  nullable = false)))
+  }
+
   // TODO: improve error message for java bean encoder.
   def javaBean[T](beanClass: Class[T]): ExpressionEncoder[T] = {
-val schema = JavaTypeInference.inferDataType(beanClass)._1
+val schema = if (beanClass.isEnum) {
+  javaEnumSchema(beanClass)
--- End diff --

If we use enum as top level object, we need another level of structType for 
it to be compatible with our ser/de structure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-22 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r134478774
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 ---
@@ -429,6 +464,11 @@ object JavaTypeInference {
 valueNullable = true
   )
 
+case other if other.isEnum =>
+  CreateNamedStruct(expressions.Literal("enum") ::
+  StaticInvoke(JavaTypeInference.getClass, StringType, 
"serializeEnumName",
+  expressions.Literal.create(other.getName, StringType) :: 
inputObject :: Nil) :: Nil)
+
--- End diff --

we pass enum class name via literal to serializer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-22 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r134478386
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 ---
@@ -345,6 +356,30 @@ object JavaTypeInference {
 }
   }
 
+  /** Returns a mapping from enum value to int for given enum type */
+  def enumSerializer[T <: Enum[T]](enum: Class[T]): T => UTF8String = {
+assert(enum.isEnum)
+inputObject: T =>
+  UTF8String.fromString(inputObject.name())
+  }
+
+  /** Returns value index for given enum type and value */
+  def serializeEnumName[T <: Enum[T]](enum: UTF8String, inputObject: T): 
UTF8String = {
+
enumSerializer(Utils.classForName(enum.toString).asInstanceOf[Class[T]])(inputObject)
+  }
+
+  /** Returns a mapping from int to enum value for given enum type */
+  def enumDeserializer[T <: Enum[T]](enum: Class[T]): InternalRow => T = {
+assert(enum.isEnum)
+value: InternalRow =>
+  Enum.valueOf(enum, value.getUTF8String(0).toString)
--- End diff --

Enum.valueOf uses cached string->value map


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-22 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r134477534
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 ---
@@ -345,6 +356,30 @@ object JavaTypeInference {
 }
   }
 
+  /** Returns a mapping from enum value to int for given enum type */
+  def enumSerializer[T <: Enum[T]](enum: Class[T]): T => UTF8String = {
+assert(enum.isEnum)
+inputObject: T =>
+  UTF8String.fromString(inputObject.name())
--- End diff --

we use enum constant name as field value


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-22 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r134477280
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 ---
@@ -345,6 +356,30 @@ object JavaTypeInference {
 }
   }
 
+  /** Returns a mapping from enum value to int for given enum type */
+  def enumSerializer[T <: Enum[T]](enum: Class[T]): T => UTF8String = {
+assert(enum.isEnum)
+inputObject: T =>
+  UTF8String.fromString(inputObject.name())
+  }
+
+  /** Returns value index for given enum type and value */
+  def serializeEnumName[T <: Enum[T]](enum: UTF8String, inputObject: T): 
UTF8String = {
+
enumSerializer(Utils.classForName(enum.toString).asInstanceOf[Class[T]])(inputObject)
+  }
--- End diff --

Utils.classForName delegates to Class.forName, which operates on native 
level, so additional optimizations like caching are not required


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-22 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r134475878
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 ---
@@ -303,6 +309,11 @@ object JavaTypeInference {
   keyData :: valueData :: Nil,
   returnNullable = false)
 
+  case other if other.isEnum =>
+StaticInvoke(JavaTypeInference.getClass, ObjectType(other), 
"deserializeEnumName",
+  expressions.Literal.create(other.getEnumConstants.apply(0), 
ObjectType(other))
+:: getPath :: Nil)
+
--- End diff --

we pass literal value of first enum constant to resolve type parameter of 
deserializeEnumName method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-22 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r134475250
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 ---
@@ -118,6 +119,10 @@ object JavaTypeInference {
 val (valueDataType, nullable) = inferDataType(valueType, 
seenTypeSet)
 (MapType(keyDataType, valueDataType, nullable), true)
 
+  case other if other.isEnum =>
+(StructType(Seq(StructField(typeToken.getRawType.getSimpleName,
+  StringType, nullable = false))), true)
+
--- End diff --

We use struct type with string field to store enum type and it's value


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating encoder ...

2017-08-22 Thread mike0sv
Github user mike0sv commented on the issue:

https://github.com/apache/spark/pull/18488
  
@srowen @HyukjinKwon what's your status on this? anything else I can do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating encoder ...

2017-08-15 Thread mike0sv
Github user mike0sv commented on the issue:

https://github.com/apache/spark/pull/18488
  
@srowen @HyukjinKwon , retest this please :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-15 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r133194603
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
 ---
@@ -79,7 +79,7 @@ public ExpressionInfo(
 assert name != null;
 assert arguments != null;
 assert examples != null;
-assert examples.isEmpty() || examples.startsWith("\n
Examples:");
+assert examples.isEmpty() || 
examples.startsWith(System.lineSeparator() + "Examples:");
--- End diff --

ok, I got rid of it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating encoder ...

2017-08-14 Thread mike0sv
Github user mike0sv commented on the issue:

https://github.com/apache/spark/pull/18488
  
@srowen @HyukjinKwon it seems like it's all ok now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-08-10 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r132489991
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
 ---
@@ -79,7 +79,7 @@ public ExpressionInfo(
 assert name != null;
 assert arguments != null;
 assert examples != null;
-assert examples.isEmpty() || examples.startsWith("\n
Examples:");
+assert examples.isEmpty() || 
examples.startsWith(System.lineSeparator() + "Examples:");
--- End diff --

No, but without it it's not possible to run tests if you have different 
line separators (on windows for example)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating encoder ...

2017-08-10 Thread mike0sv
Github user mike0sv commented on the issue:

https://github.com/apache/spark/pull/18488
  
@srowen @HyukjinKwon hey guys, I think i got this, take a look. some sparkr 
tests failed for some reason, but I think it's not my fault =|


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating encoder ...

2017-07-03 Thread mike0sv
Github user mike0sv commented on the issue:

https://github.com/apache/spark/pull/18488
  
Ran into something strange. Changed ints to strings and it worked fine. But 
then I added a test for encoding bean with enum inside and the test failed. It 
failed because in my implementation deserializer returns Object (AnyRef), and 
if enum is top-level object it works fine. But if enum is a field, it tries to 
set it via setter, which does not accept arbitrary object. So, I changed 
implementation of deserializer to the following.
```  
  def enumDeserializer[T <: Enum[T]](enum: Class[T]): InternalRow => T = {
assert(enum.isEnum)
value: InternalRow =>
  Enum.valueOf(enum, value.toString)
  }

  def deserializeEnumName[T <: Enum[T]](typeDummy: T, inputObject: 
InternalRow): T = {
enumDeserializer(typeDummy.getClass.asInstanceOf[Class[T]])(inputObject)
  }
```

Now it failed with "Assignment conversion not possible from type 
"java.lang.Enum" to type "test.org.apache.spark.sql.JavaDatasetSuite$EnumBean"" 
at 
```
private test.org.apache.spark.sql.JavaDatasetSuite$EnumBean argValue;
private InternalRow argValue1;

final test.org.apache.spark.sql.JavaDatasetSuite$EnumBean value2 = 
resultIsNull ? null : 
org.apache.spark.sql.types.DataTypes.deserializeEnumName(argValue, argValue1);
```
which is odd, because if I call it from regular code it compiles just fine.
```  
EnumBean argValue = EnumBean.values()[0];
InternalRow argValue1 = null;
final EnumBean bean = deserializeEnumName(argValue, argValue1);
```

I even tried moving code to java class just in case, but that did no good.
Is there any difference in compiling environments that could be the cause 
of that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-07-03 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r125360956
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 ---
@@ -344,6 +352,28 @@ object JavaTypeInference {
 }
   }
 
+  /** Returns a mapping from enum value to int for given enum type */
+  def enumSerializer[T](enum: Class[T]): T => Int = {
+assert(enum.isEnum)
+enum.getEnumConstants.toList.zipWithIndex.toMap
+  }
+
+  /** Returns value index for given enum type and value */
+  def serializeEnumName[T](enum: UTF8String, inputObject: T): Int = {
+
enumSerializer(Utils.classForName(enum.toString).asInstanceOf[Class[T]])(inputObject)
+  }
+
+  /** Returns a mapping from int to enum value for given enum type */
+  def enumDeserializer[T](enum: Class[T]): Int => T = {
--- End diff --

But it's still not very good, because we have to lookup enum by class name 
for each object we serialize/deserialize. Do you have any ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-07-03 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r125360539
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 ---
@@ -344,6 +352,28 @@ object JavaTypeInference {
 }
   }
 
+  /** Returns a mapping from enum value to int for given enum type */
+  def enumSerializer[T](enum: Class[T]): T => Int = {
+assert(enum.isEnum)
+enum.getEnumConstants.toList.zipWithIndex.toMap
+  }
+
+  /** Returns value index for given enum type and value */
+  def serializeEnumName[T](enum: UTF8String, inputObject: T): Int = {
+
enumSerializer(Utils.classForName(enum.toString).asInstanceOf[Class[T]])(inputObject)
+  }
+
+  /** Returns a mapping from int to enum value for given enum type */
+  def enumDeserializer[T](enum: Class[T]): Int => T = {
--- End diff --

Good point. But if we use string values, there is already a hashmap inside 
enum implementation, accessible via valueOf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating e...

2017-07-03 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r125358120
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 ---
@@ -127,19 +128,24 @@ object JavaTypeInference {
 
 // TODO: we should only collect properties that have getter and 
setter. However, some tests
 // pass in scala case class as java bean class which doesn't have 
getter and setter.
-val properties = getJavaBeanReadableProperties(other)
-val fields = properties.map { property =>
-  val returnType = 
typeToken.method(property.getReadMethod).getReturnType
-  val (dataType, nullable) = inferDataType(returnType, seenTypeSet 
+ other)
-  new StructField(property.getName, dataType, nullable)
+if (typeToken.getRawType.isEnum) {
--- End diff --

Probably, but int is cheaper. However, if we use string, there will be a 
possibility for meaningful queries. Also, there are strange cases with complex 
enums with different fields, even with complex types. I'd say string with 
constant name is enough, thou


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18488: [SPARK-21255][SQL][WIP] Fixed NPE when creating encoder ...

2017-07-03 Thread mike0sv
Github user mike0sv commented on the issue:

https://github.com/apache/spark/pull/18488
  
It won't work if I have enum field inside regular java bean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18488: [SPARK-21255][SQL] Fixed NPE when creating encoder for e...

2017-07-03 Thread mike0sv
Github user mike0sv commented on the issue:

https://github.com/apache/spark/pull/18488
  
I reworked the code to ser/de enums into ints (according to declaring 
order). However, I recreate mapping for each object, which is very bad 
obviously. I need to create mapping once (for each partition i guess) and then 
use it for all objects. Please tell me how it can be achieved.
Also, is there a better place for my ser/de methods? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18488: [SPARK-21255][SQL] Fixed NPE when creating encoder for e...

2017-06-30 Thread mike0sv
Github user mike0sv commented on the issue:

https://github.com/apache/spark/pull/18488
  
@kiszk check it out


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: [SPARK-21255][SQL] Fixed NPE when creating encode...

2017-06-30 Thread mike0sv
Github user mike0sv commented on a diff in the pull request:

https://github.com/apache/spark/pull/18488#discussion_r125055741
  
--- Diff: 
sql/catalyst/src/test/java/org/apache/spark/sql/catalyst/EnumEncoderSuite.java 
---
@@ -0,0 +1,32 @@
+package org.apache.spark.sql.catalyst;
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18488: Enum support

2017-06-30 Thread mike0sv
GitHub user mike0sv opened a pull request:

https://github.com/apache/spark/pull/18488

Enum support

## What changes were proposed in this pull request?

Fixed NPE when creating encoder for enum.

## How was this patch tested?
Unit test in EnumEncoderSuite which creates an encoder for enum

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mike0sv/spark enum-support

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18488.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18488


commit 2b6144d169d3c8c036d6d83b085f8487e603c04a
Author: mike <mike...@gmail.com>
Date:   2017-06-30T13:40:35Z

[SPARK-21255][SQL] Fixed NPE when creating encoder for enum

commit 7e95645382cecba56f083bdfc2fc401b58df07f9
Author: mike <mike...@gmail.com>
Date:   2017-06-30T13:46:19Z

[SPARK-21255][SQL] Fixed NPE when creating encoder for enum




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org