[GitHub] spark pull request: [SPARK-3771][SQL] AppendingParquetOutputFormat...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/2638 [SPARK-3771][SQL] AppendingParquetOutputFormat should use reflection to prevent breaking binary-compatibility. Original problem is [SPARK-3764](https://issues.apache.org/jira/browse/SPARK-3764). `AppendingParquetOutputFormat` uses a binary-incompatible method `context.getTaskAttemptID`. This causes binary-incompatible of Spark itself, i.e. if Spark itself is built against hadoop-1, the artifact is for only hadoop-1, and vice versa. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-3771 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2638.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2638 commit ec213c160393698fa01c62469263f050f3668453 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-10-02T14:22:46Z Use reflection to prevent breaking binary-compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3771][SQL] AppendingParquetOutputFormat...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/2638#issuecomment-57800537 @srowen, Thank you for your comment. Indeed, when deploy completed apps to Spark cluster, there is a particular instance of Spark. But Spark app developers will use artifacts in Maven Central while developing and unit-testing. The artifacts seem to be built for Hadoop 2, so if they want to test with Hadoop 1, it won't work. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/2673#issuecomment-58225042 Hi @pwendell, I had a similar issue related to artifacts in Maven Central and Hadoop versions. Could you take a look at [SPARK-3764](https://issues.apache.org/jira/browse/SPARK-3764) and #2638 please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15445571 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: Expression) } /** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + def dataType = IntegerType + override def foldable = child.foldable + def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null.asInstanceOf[DataType] --- End diff -- Hi, I think `asInstanceOf[DataType]` is not needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15445580 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: Expression) } /** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + def dataType = IntegerType + override def foldable = child.foldable + def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null.asInstanceOf[DataType] +} else if (!string.isInstanceOf[String]) { + string.toString.length +} else { + string.toString.getBytes.length +} + } + +} + +object StrlenConstants { + val DefaultEncoding = ISO-8859-1 +} + +/** + * A function that returns the number of characters in a string expression + */ +case class Strlen(child: Expression, encoding : Expression) extends UnaryExpression { + + type EvaluatedType = Any + + def dataType = IntegerType + + override def foldable = child.foldable + + def nullable = child.nullable + + override def toString = sStrlen($child, $encoding) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null.asInstanceOf[DataType] --- End diff -- Also `asInstanceOf[DataType]` is not needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15503212 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: Expression) } /** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null +} else if (!string.isInstanceOf[String]) { + string.toString.length +} else { + string.toString.getBytes.length +} + } + +} + +object StrlenConstants { + val DefaultEncoding = ISO-8859-1 +} + +/** + * A function that returns the number of characters in a string expression + */ +case class Strlen(child: Expression, encoding : Expression) extends UnaryExpression + with Logging { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable --- End diff -- Hi, now `nullable` becomes `true` because `eval` returns `null` if the `UnsupportedEncodingException` is thrown regardless of `child.nullable`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15503576 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: Expression) } /** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null +} else if (!string.isInstanceOf[String]) { + string.toString.length +} else { + string.toString.getBytes.length +} + } + +} + +object StrlenConstants { + val DefaultEncoding = ISO-8859-1 +} + +/** + * A function that returns the number of characters in a string expression + */ +case class Strlen(child: Expression, encoding : Expression) extends UnaryExpression + with Logging { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable --- End diff -- Ah, this is not related to my prior comment. I just mentioned that you should remove unnecessary `isInstanceOf` on my prior comment. After that, you changed the logic to return `null` if the exception is thrown, so the nullability of this operator was also changed to `true`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15503732 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: Expression) } /** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null +} else if (!string.isInstanceOf[String]) { + string.toString.length +} else { + string.toString.getBytes.length +} + } + +} + +object StrlenConstants { + val DefaultEncoding = ISO-8859-1 +} + +/** + * A function that returns the number of characters in a string expression + */ +case class Strlen(child: Expression, encoding : Expression) extends UnaryExpression + with Logging { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable --- End diff -- No, I'm just saying: ```scala override def nullable = true ``` instead of ```scala override def nullable = child.nullable ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-50437674 I'm sorry but now I become confused. `Length` and `Strlen` look like becoming almost the same implementation. What do you intend the difference between them is? BTW, `getBytes()` without charset argument should not be used because it depends on enviroment, e.g. platform's default charset. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-50442618 First, I would like to confirm, but which do you want to add to HQL, `Length` or `Strlen`? The title of this PR says to add `Length` to HQL, but the implementation adds `Strlen`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-50450124 @javadba Ah, I see. Thank you for your detail. I'll continue to review :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15516740 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -209,6 +212,82 @@ case class EndsWith(left: Expression, right: Expression) } /** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null +} else if (!string.isInstanceOf[String]) { + string.toString.length +} else { + new String(string.toString.getBytes, StrlenConstants.DefaultEncoding).length --- End diff -- Hive's `Length` seems to return the number of code points of the string like [this](https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLength.java#L41-L56). I think we can use `String#codePointCount` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15517240 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -209,6 +212,82 @@ case class EndsWith(left: Expression, right: Expression) } /** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null +} else if (!string.isInstanceOf[String]) { + string.toString.length --- End diff -- I don't know we should handle non-string types. In my opinion, we should handle only string type (and binary type the same as Hive?) and non-string type values should be casted or converted to string type before this. @marmbrus, what do you think about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15517292 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -209,6 +212,82 @@ case class EndsWith(left: Expression, right: Expression) } /** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null +} else if (!string.isInstanceOf[String]) { + string.toString.length +} else { + new String(string.toString.getBytes, StrlenConstants.DefaultEncoding).length +} + } + +} + +object StrlenConstants { + val DefaultEncoding = ISO-8859-1 +} + +/** + * A function that returns the number of characters in a string expression + */ +case class Strlen(child: Expression, encoding : Expression) extends UnaryExpression + with Logging { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = true + + override def toString = sStrlen($child, $encoding) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null +} else if (!string.isInstanceOf[String]) { + log.debug(sNon-string value [$string] provided to strlen) + null +} else { + var evalEncoding = encoding.eval(input) + val strEncoding = +if (evalEncoding != null) { + evalEncoding.toString +} else { + StrlenConstants.DefaultEncoding +} + val s: String = --- End diff -- Could be removed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-50465289 @javadba I couldn't understand what you want `Strlen` to return. Could you clarify the semantics of `Strlen` again, please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-50554217 Hi, the helper `len()` is not needed. What Hive's `Length` is doing is to calculate the number of code points, which can be done by `String#codePointCount`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15560814 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -209,6 +212,82 @@ case class EndsWith(left: Expression, right: Expression) } /** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null +} else if (!string.isInstanceOf[String]) { + string.toString.length +} else { + new String(string.toString.getBytes, StrlenConstants.DefaultEncoding).length +} + } + +} + +object StrlenConstants { + val DefaultEncoding = ISO-8859-1 +} + +/** + * A function that returns the number of characters in a string expression + */ +case class Strlen(child: Expression, encoding : Expression) extends UnaryExpression + with Logging { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = true + + override def toString = sStrlen($child, $encoding) + + override def eval(input: Row): EvaluatedType = { +val string = child.eval(input) +if (string == null) { + null +} else if (!string.isInstanceOf[String]) { + log.debug(sNon-string value [$string] provided to strlen) + null +} else { + var evalEncoding = encoding.eval(input) + val strEncoding = +if (evalEncoding != null) { + evalEncoding.toString +} else { + StrlenConstants.DefaultEncoding +} + val s: String = + try { +new String(string.asInstanceOf[String].getBytes, strEncoding).length + } catch { +case ue : UnsupportedEncodingException = { + log.debug(sstrlen: Caught UnsupportedEncodingException for encoding=[$strEncoding]) --- End diff -- I think the first one is also not data dependent because the `child` has to return value of type related to `child.dataType`, e.g. if the `child.dataType` is `StringType`, the `child` has to return only `String` value. BTW, if you roll back to throw `UnsupportedEncodingException`, please also roll back `def nullable` I mentioned before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/993#issuecomment-50578802 Hi @marmbrus, thanks for great work! But it seems to break build. I got the following result when I run `sbt assembly` or `sbt publish-local`: ``` [error] (catalyst/compile:doc) Scaladoc generation failed ``` and I found a lot of error messages in the build log saying `value q is not a member of StringContext`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-50637172 Hi @javadba, FYI. I believe there are 3 types of length around string in Java/Scala. 1) the number of 16-bit characters in the string To get this, use [`String#length`](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#length()) like: ```scala scala \uF93D\uF936\uF949\uF942.length // chinese characters res0: Int = 4 scala \uD840\uDC0B\uD842\uDFB7.length // 2 surrogate pairs res1: Int = 4 scala 1234567890ABC.length res2: Int = 13 ``` 2) the number of code points in the string This will be covered by `Length`. To get this, use [`String#codePointCount`](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#codePointCount(int,%20int)) like: ```scala scala \uF93D\uF936\uF949\uF942.codePointCount(0, 4) // chinese characters res0: Int = 4 scala \uD840\uDC0B\uD842\uDFB7.codePointCount(0, 4) // 2 surrogate pairs res1: Int = 2 scala 1234567890ABC.codePointCount(0, 13) res2: Int = 13 ``` 3) the length of byte array encoded from string in some charset To get this, use [`String#getBytes(charset)`](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes(java.lang.String))`.length` like: ```scala scala \uF93D\uF936\uF949\uF942.getBytes(utf8).length // chinese characters res0: Int = 12 scala \uD840\uDC0B\uD842\uDFB7.getBytes(utf8).length // 2 surrogate pairs res1: Int = 8 scala 1234567890ABC.getBytes(utf8).length res2: Int = 13 scala \uF93D\uF936\uF949\uF942.getBytes(utf16).length // chinese characters res3: Int = 10 scala \uD840\uDC0B\uD842\uDFB7.getBytes(utf16).length // 2 surrogate pairs res4: Int = 10 scala 1234567890ABC.getBytes(utf16).length res5: Int = 28 scala \uF93D\uF936\uF949\uF942.getBytes(utf32).length // chinese characters res6: Int = 16 scala \uD840\uDC0B\uD842\uDFB7.getBytes(utf32).length // 2 surrogate pairs res7: Int = 8 scala 1234567890ABC.getBytes(utf32).length res8: Int = 52 ``` At first I guessed you wanted 3) for `Strlen` because charset related length is only 3), but I watched a conversation indicating another type of length and lost it halfway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-50858955 Oops, I had forgotten that Hive's `Length` can handle binary type. It would be better to use `Length` instead of `CharLength` and make it handle binary type. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-50947564 Hi @javadba, you said some of which do differ from my results, but which is different one? I can see these tests are checking as the same as my results. Some of them are failing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-50947924 Ah, I see. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15725455 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -208,6 +211,96 @@ case class EndsWith(left: Expression, right: Expression) def compare(l: String, r: String) = l.endsWith(r) } + +/** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val inputVal = child.eval(input) +if (inputVal == null) { + null +} else if (!inputVal.isInstanceOf[String]) { + inputVal.toString.length +} else { + OctetLenUtils.len(inputVal.asInstanceOf[String]) --- End diff -- We can just use `String#codePointCount`, which can do the same thing as Hive's implementation. `OctetLenUtils.len` implementation is not the same as Hive's one. Hive uses `o.a.h.io.Text` as an argument of the method, which returns UTF-8 bytes by `getBytes()` but `java.lang.String#getBytes` returns platform-dependent bytes. The reason why Hive checks one by one here is not to decode string from UTF-8 bytes. Doing encode/decode is heavy, so we should do it as less as possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15725507 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -208,6 +211,96 @@ case class EndsWith(left: Expression, right: Expression) def compare(l: String, r: String) = l.endsWith(r) } + +/** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val inputVal = child.eval(input) +if (inputVal == null) { + null +} else if (!inputVal.isInstanceOf[String]) { + inputVal.toString.length +} else { + OctetLenUtils.len(inputVal.asInstanceOf[String]) +} + } + +} + +object OctetLengthConstants { + val DefaultEncoding = ISO-8859-1 --- End diff -- Should be `UTF-8`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1586#discussion_r15726561 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -208,6 +211,96 @@ case class EndsWith(left: Expression, right: Expression) def compare(l: String, r: String) = l.endsWith(r) } + +/** + * A function that returns the number of bytes in an expression + */ +case class Length(child: Expression) extends UnaryExpression { + + type EvaluatedType = Any + + override def dataType = IntegerType + + override def foldable = child.foldable + + override def nullable = child.nullable + + override def toString = sLength($child) + + override def eval(input: Row): EvaluatedType = { +val inputVal = child.eval(input) +if (inputVal == null) { + null +} else if (!inputVal.isInstanceOf[String]) { + inputVal.toString.length +} else { + OctetLenUtils.len(inputVal.asInstanceOf[String]) --- End diff -- The usage is as follows: ```java str.codePointCount(0, str.length) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1750#discussion_r15740705 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -33,6 +33,19 @@ case class UnaryMinus(child: Expression) extends UnaryExpression { } } +case class Sqrt(child: Expression) extends UnaryExpression { + type EvaluatedType = Double + + def dataType = child.dataType + override def foldable = child.foldable + def nullable = child.nullable + override def toString = sSQRT($child) + + override def eval(input: Row): Double = { +math.sqrt(n1(child, input, _.toDouble(_)).asInstanceOf[Double]) --- End diff -- Hi, I'm wondering this will return `null` for `null`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-51145793 Hi @javadba, I tested `org.apache.spark.sql.SQLQuerySuite` and `org.apache.spark.sql.hive.execution.HiveQuerySuite` locally, and they worked fine even if I reverted the last commit 22eddbc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-51179894 @javadba, @marmbrus I saw the case of SOF sometimes, it was not with @javadba's sequence, though. I can't identify the exact reason now, but I guess this is not related to @javadba's commits and is more general problem. I'll file a new issue on JIRA with some sequences to reproduce the problem. I'd like to know the result of Jenkins build without the last commit 22eddbc. If succeed, we should use OCTET/CHAR_LENGTH, or if failed, we could use OCTET/CHAR_LEN for now, I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2965][SQL] Fix HashOuterJoin output nul...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1887 [SPARK-2965][SQL] Fix HashOuterJoin output nullabilities. Output attributes of opposite side of `OuterJoin` should be nullable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2965 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1887.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1887 commit bcb2d37e178a58e84aea1735985b649245b878a0 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-08-11T08:44:38Z Fix HashOuterJoin output nullabilities. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1889 [SPARK-2969][SQL] Make ScalaReflection be able to handle MapType.containsNull and MapType.valueContainsNull. Make `ScalaReflection` be able to handle like: - `Seq[Int]` as `ArrayType(IntegerType, containsNull = false)` - `Seq[java.lang.Integer]` as `ArrayType(IntegerType, containsNull = true)` - `Map[Int, Long]` as `MapType(IntegerType, LongType, valueContainsNull = false)` - `Map[Int, java.lang.Long]` as `MapType(IntegerType, LongType, valueContainsNull = true)` You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2969 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1889.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1889 commit 1a9a96bf752dc33d19cb3654e05bc35c28b9ae08 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-08-11T10:46:24Z Modify ScalaReflection to handle ArrayType.containsNull and MapType.valueContainsNull. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1889#discussion_r16094328 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala --- @@ -372,7 +372,7 @@ object MapType { * The `valueContainsNull` is true. */ def apply(keyType: DataType, valueType: DataType): MapType = -MapType(keyType: DataType, valueType: DataType, true) +MapType(keyType: DataType, valueType: DataType, false) --- End diff -- Oops, I forgot to change the doc. I think the default value should be `false` because non nullable by default is more natural. For example, when we think `Map[Int, Long]` variable, it can't contain `null` value. If we want to add `null` value to the map, we have to declare the variable as `Map[Int, Any]` or `Map[Int, java.lang.Long]` or something like that. It is the same way when thinking about the data type. And this is the same as `ArrayType.containsNull`'s default value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1889#discussion_r16096763 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala --- @@ -372,7 +372,7 @@ object MapType { * The `valueContainsNull` is true. */ def apply(keyType: DataType, valueType: DataType): MapType = -MapType(keyType: DataType, valueType: DataType, true) +MapType(keyType: DataType, valueType: DataType, false) --- End diff -- Ah, I see. I agree with the dangerousness. I'll revert the change. And should I change the `ArrayType`'s default value? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1889#discussion_r16097181 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala --- @@ -372,7 +372,7 @@ object MapType { * The `valueContainsNull` is true. */ def apply(keyType: DataType, valueType: DataType): MapType = -MapType(keyType: DataType, valueType: DataType, true) +MapType(keyType: DataType, valueType: DataType, false) --- End diff -- Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3969 Optimizer should have a super class...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/2825 SPARK-3969 Optimizer should have a super class as an interface. Some developers want to replace `Optimizer` to fit their projects but can't do so because currently `Optimizer` is an `object`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-3969 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2825.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2825 commit 9547a23fd5ca0058d1044d19f6a96bdbb1e3b810 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-10-16T10:52:19Z Extract abstract class from Optimizer for developers to be able to replace Optimizer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/2825#discussion_r18995386 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -28,7 +28,9 @@ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.rules._ import org.apache.spark.sql.catalyst.types._ -object Optimizer extends RuleExecutor[LogicalPlan] { +abstract class Optimizer extends RuleExecutor[LogicalPlan] + +object Optimizer extends Optimizer { --- End diff -- Hi @srowen, thank you for your comment. I'll rename the `object`. And yes, this PR is a very little API change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/2825#issuecomment-59461447 Hi @chenghao-intel, thank you for your comment. Yes, that's right. I don't want to mix the logical plan physical plan optimization and I'll extend `SparkStrategies` if I need. I want to add some `Expression`s for my projects and optimize logical plan with them, but right now we can't replace `Optimizer` because it is an `object`, so I want to add an extension point for `Optimizer`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3986][SQL] Fix package names to fit the...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/2835 [SPARK-3986][SQL] Fix package names to fit their directory names. Package names of 2 test suites are different from their directory names. - `GeneratedEvaluationSuite` - `GeneratedMutableEvaluationSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-3986 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2835.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2835 commit fa2cc05be1e40b7558f5137c294a4fde683ec318 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-10-17T03:35:48Z Fix package names to fit their directory names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/2825#discussion_r19050063 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ExpressionOptimizationSuite.scala --- @@ -30,7 +30,7 @@ class ExpressionOptimizationSuite extends ExpressionEvaluationSuite { expected: Any, inputRow: Row = EmptyRow): Unit = { val plan = Project(Alias(expression, sOptimized($expression))() :: Nil, NoRelation) -val optimizedPlan = Optimizer(plan) +val optimizedPlan = SparkOptimizer(plan) super.checkEvaluation(optimizedPlan.expressions.head, expected, inputRow) } } --- End diff -- Of course not. I'll add a new line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/2825#discussion_r19050137 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -28,7 +28,9 @@ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.rules._ import org.apache.spark.sql.catalyst.types._ -object Optimizer extends RuleExecutor[LogicalPlan] { +abstract class Optimizer extends RuleExecutor[LogicalPlan] + +object SparkOptimizer extends Optimizer { --- End diff -- Thank you for your suggestion. I agree that Catalyst should not tightly coupled with Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2052] [SQL] Add optimization for CaseCo...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/990#discussion_r13576056 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -154,6 +155,10 @@ object NullPropagation extends Rule[LogicalPlan] { case left :: Literal(null, _) :: Nil = Literal(null, e.dataType) case _ = e } + case e: CaseConversionExpression = e.children match { --- End diff -- Fixed the two points. Should I remove the mentioned lines? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2052] [SQL] Add optimization for CaseCo...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/990#discussion_r13576722 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -154,6 +155,10 @@ object NullPropagation extends Rule[LogicalPlan] { case left :: Literal(null, _) :: Nil = Literal(null, e.dataType) case _ = e } + case e: CaseConversionExpression = e.children match { --- End diff -- Ok, I do. BTW, I noticed that there are some Expressions in the same situation like `UnaryMinus`, `Cast`, or `Not`, ( or `BinaryArithmetic`, `BinaryComparison`, `StringRegexExpression`, too ? ) I think they also can be handled by constant folding. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2052] [SQL] Add optimization for CaseCo...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/990#issuecomment-45582389 @marmbrus I checked and remove the unneeded rules from `NullPropagation`. Could you please check the changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2093] [SQL] NullPropagation should use ...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1034 [SPARK-2093] [SQL] NullPropagation should use exact type value. `NullPropagation` should use exact type value when transform `Count` or `Sum`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2093 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1034.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1034 commit 9314806cf03f7a4222a67f8b285c4225bdef3083 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-06-10T07:55:53Z Fix NullPropagation to use exact type value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2093] [SQL] NullPropagation should use ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1034#discussion_r13584575 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -100,8 +100,8 @@ object ColumnPruning extends Rule[LogicalPlan] { object NullPropagation extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan = q transformExpressionsUp { - case e @ Count(Literal(null, _)) = Literal(0, e.dataType) - case e @ Sum(Literal(c, _)) if c == 0 = Literal(0, e.dataType) + case e @ Count(Literal(null, _)) = Literal(0L, e.dataType) --- End diff -- I agree with that. Even if we never change/specialize the return type, `SimplifyCasts` can remove the extra `Cast`. And I'd like to change the `Literal` value for `Sum` to `0L` to use `SimplifyCasts` for the case of the sum of integral values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2052] [SQL] Add optimization for CaseCo...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/990#discussion_r13635448 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantFoldingSuite.scala --- @@ -173,4 +173,63 @@ class ConstantFoldingSuite extends OptimizerTest { comparePlans(optimized, correctAnswer) } + + test(Constant folding test: expressions have null literals) { +val originalQuery = + testRelation +.select( + IsNull(Literal(null)) as 'c1, + IsNotNull(Literal(null)) as 'c2, + + GetItem(Literal(null, ArrayType(IntegerType)), 1) as 'c3, --- End diff -- Ah, I see, that's right! I'll move them back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2196] [SQL] Fix nullability of CaseWhen...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1133 [SPARK-2196] [SQL] Fix nullability of CaseWhen. `CaseWhen` should use `branches.length` to check if `elseValue` is provided or not. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2196 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1133.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1133 commit 4f049cc94954021aab34e2dac54b985da56a23ea Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-06-19T09:49:33Z Fix nullability of CaseWhen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2196] [SQL] Fix nullability of CaseWhen...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1133#issuecomment-46633857 Added some tests and I noticed that the `CaseWhen` should also be nullable if the `elseValue` is nullable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2254] [SQL] ScalaRefection should mark ...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1193 [SPARK-2254] [SQL] ScalaRefection should mark primitive types as non-nullable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2254 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1193.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1193 commit cfd6088e4d348e9680de46437959d642336f2db0 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-06-24T07:09:36Z Modify ScalaRefection.schemaFor method to return nullability of Scala Type. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1226 [SPARK-2287] [SQL] Make ScalaReflection be able to handle Generic case classes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2287 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1226.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1226 commit 7de570641d5bd099ed0e4b68f513aeb2b7ea3f1a Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-06-24T08:08:48Z Make ScalaReflection be able to handle Generic case classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1226#issuecomment-47192918 This will cause merge conflict with #1193, but I can fix it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2254] [SQL] ScalaRefection should mark ...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1193#issuecomment-47193596 @rxin Thanks! I will fix soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1226#issuecomment-47194382 Fixed merge conflict with #1193. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1226#issuecomment-47206743 Could you please retest this? Previous tests seemed like Hive metastore was something wrong. (Can I let Jenkins do retest?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2295] [SQL] Make JavaBeans nullability ...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1235 [SPARK-2295] [SQL] Make JavaBeans nullability stricter. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2295 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1235.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1235 commit 201c508aecb300f9439e3b9537ea3173c7db9ae5 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-06-26T17:55:11Z Make JavaBeans nullability stricter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1226#issuecomment-47298915 @marmbrus I see, I will do it next time, thanks. But why failed... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1151#discussion_r14285252 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -136,13 +137,12 @@ class SqlParser extends StandardTokenParsers with PackratParsers { } } - protected lazy val query: Parser[LogicalPlan] = ( -select * ( -UNION ~ ALL ^^^ { (q1: LogicalPlan, q2: LogicalPlan) = Union(q1, q2) } | -UNION ~ opt(DISTINCT) ^^^ { (q1: LogicalPlan, q2: LogicalPlan) = Distinct(Union(q1, q2)) } - ) -| insert | cache - ) + protected lazy val query: Parser[LogicalPlan] = + select * ( + UNION ~ ALL ^^^ { (q1: LogicalPlan, q2: LogicalPlan) = Union(q1, q2)} | + UNION ~ opt(DISTINCT) ^^^ { (q1: LogicalPlan, q2: LogicalPlan) = Distinct(Union(q1, q2))} | --- End diff -- Hi @YanjieGao, Jenkins says a scalastyle error exists here, which is File line length exceeds 100 characters. You have to format code around this line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1226#issuecomment-47328485 Passed Hive tests. Why? Just merged master. And Python tests failed... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1226#issuecomment-47416531 Hi @marmbrus, I found that there is a case at my local throwing the same exception with `test-only` like: ``` testOnly org.apache.spark.sql.hive.execution.PruningSuite ``` even if my patch is not applied. And I added a line to `PruningSuite` ``` + TestHive.loadTestTable(src) TestHive.reset() ``` then it worked. Would it be a help for you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1226#issuecomment-47417440 Ah, we must call `SHOW TABLES` before `TestHive.reset()` the same as [here](https://github.com/ueshin/apache-spark/blob/issues/SPARK-2287/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala#L249-L250), right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1266 [SPARK-2327] [SQL] Fix nullabilities of Join/Generate/Aggregate. Fix nullabilities of `Join`/`Generate`/`Aggregate` because: - Output attributes of opposite side of `OuterJoin` should be nullable. - Output attributes of generater side of `Generate` should be nullable if `join` is `true` and `outer` is `true`. - `AttributeReference` of `computedAggregates` of `Aggregate` should be the same as `aggregateExpression`'s. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2327 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1266.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1266 commit f20f196e33e4bd6ab4adeb9bb4afea9b9256d902 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-06-30T08:43:37Z Fix Join output nullabilities. commit 09532ec6a67ba38b63deb6d0ea4607d7e8a07080 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-06-30T08:45:10Z Fix Generate output nullabilities. commit 0e31e377b4a8ba8c4bf698c2afea6a604a6c25bd Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-06-30T08:48:52Z Fix Aggregate resultAttribute nullabilities. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2328] [SQL] Add execution of `SHOW TABL...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1268 [SPARK-2328] [SQL] Add execution of `SHOW TABLES` before `TestHive.reset()`. `PruningSuite` is executed first of Hive tests unfortunately, `TestHive.reset()` breaks the test environment. To prevent this, we must run a query before calling reset the first time. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2328 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1268.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1268 commit 043ceac67cdd53cb784e83012fca7a91804dae10 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-06-30T10:36:18Z Add execution of `SHOW TABLES` before `TestHive.reset()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-47518136 I guess this needs #1268 to pass Hive tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2328] [SQL] Add execution of `SHOW TABL...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1268#issuecomment-47518299 If #1226 is merged before this, this is not needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-47635811 Oops, passed #1268 related errors, but others failed... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-47743422 Hi, I encountered this kind of Servlet API conflict. When are you planning to merge this? Or is there something to do? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-47804679 Close this? I encountered after the #1271 was merged. I believe at least we need to exclude from Hive related dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1266#discussion_r14543737 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -46,10 +46,22 @@ case class Generate( child: LogicalPlan) extends UnaryNode { - protected def generatorOutput: Seq[Attribute] = -alias + protected def generatorOutput: Seq[Attribute] = { +val output = alias .map(a = generator.output.map(_.withQualifiers(a :: Nil))) .getOrElse(generator.output) +if (join outer) { + output.map { +case attr if !attr.resolved = attr +case attr if !attr.nullable = + AttributeReference( +attr.name, attr.dataType, nullable = true)(attr.exprId, attr.qualifiers) +case attr = attr --- End diff -- Ah, I see. It is the same as `withQualifiers`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1266#discussion_r14543764 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala --- @@ -319,7 +319,27 @@ case class BroadcastNestedLoopJoin( override def otherCopyArgs = sqlContext :: Nil - def output = left.output ++ right.output + def output = { --- End diff -- Ok, I'll add it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1266#issuecomment-4714 @marmbrus, Thank you for your comments. Modified to use `withNullability`. And no problem, because exact types and nullability is important for my project. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1150#discussion_r14569171 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -274,7 +274,7 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { case Unions(unionChildren) = execution.Union(unionChildren.map(planLater))(sqlContext) :: Nil case logical.Except(left,right) = -execution.Except(planLater(left),planLater(right)) :: Nil --- End diff -- Hi, it says you deleted some whitespace after `Nil`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2366] [SQL] Add column pruning for the ...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1301#issuecomment-48077448 Thank you for your comments. Fixed and pushed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2415] [SQL] RowWriteSupport should hand...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1339 [SPARK-2415] [SQL] RowWriteSupport should handle empty ArrayType correctly. `RowWriteSupport` doesn't write empty `ArrayType` value, so the read value becomes `null`. It should write empty `ArrayType` value as it is. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2415 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1339.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1339 commit 2f05196c865583a966b81700d8a909e44d6abb22 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-09T07:48:57Z Fix RowWriteSupport to handle empty ArrayType correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-2179][SQL] Public API for DataType...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1346#issuecomment-48560066 Hi, I'm wondering if `MapType` will have something like `containsNull` for `ArrayType`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-2179][SQL] Public API for DataType...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1346#issuecomment-48566629 @yhuai, I understand. Thank you for your reply. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2428][SQL] Add except and intersect met...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1355 [SPARK-2428][SQL] Add except and intersect methods to SchemaRDD. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2428 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1355.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1355 commit b6fa2641730545068703fb5d50a652dcbf3e8e88 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-10T09:10:37Z Add except and intersect methods to SchemaRDD. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2431][SQL] Refine StringComparison and ...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1357 [SPARK-2431][SQL] Refine StringComparison and related codes. Refine `StringComparison` and related codes as follows: - `StringComparison` could be similar to `StringRegexExpression` or `CaseConversionExpression`. - Nullability of `StringRegexExpression` could depend on children's nullabilities. - Add a case that the like condition includes no wildcard to `LikeSimplification`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2431 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1357.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1357 commit 680bb72ee0dafaeae34dbc3dbff3984f856b5e18 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-10T15:49:07Z Refine StringComparison. commit b9da9d21da49f7d92c9b194173805b1a627e841b Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-10T15:49:55Z Fix nullability of StringRegexExpression. commit 77766f55c331dac2f81f657d7aedb06538b29c41 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-10T15:54:46Z Add a case that the like condition includes no wildcard to LikeSimplification. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2446][SQL] Add BinaryType support to Pa...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1373 [SPARK-2446][SQL] Add BinaryType support to Parquet I/O. To support `BinaryType`, the following changes are needed: - Make `StringType` use `OriginalType.UTF8` - Add `BinaryType` using `PrimitiveTypeName.BINARY` without `OriginalType` You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2446 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1373.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1373 commit 616e04a989edce9f5109a3b2ad11b78e1e0f1a77 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-09T15:25:08Z Make StringType use OriginalType.UTF8. commit ecacb925c4e1f2dbf863b604ad2600cc8c9663d8 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-11T09:13:46Z Add BinaryType support to Parquet I/O. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2446][SQL] Add BinaryType support to Pa...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1373#issuecomment-48781530 @marmbrus Yes, I think so. But this new behavior is the same as Avro, Thrift and the next Hive (0.14). To load the string data saved with previous versions, `Cast` to `StringType` will be needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2467] Revert SparkBuild to publish-loca...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1398 [SPARK-2467] Revert SparkBuild to publish-local to both .m2 and .ivy2. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2467 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1398.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1398 commit 7f01d587259221c5333880578e6658e6534c52c4 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-14T06:28:44Z Revert SparkBuild to publish-local to both .m2 and .ivy2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1359#discussion_r14915562 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -205,3 +207,72 @@ case class EndsWith(left: Expression, right: Expression) extends BinaryExpression with StringComparison { def compare(l: String, r: String) = l.endsWith(r) } + +/** + * A function that takes a substring of its first argument starting at a given position. + * Defined for String and Binary types. + */ +case class Substring(str: Expression, pos: Expression, len: Expression) extends Expression { + + type EvaluatedType = Any + + def nullable: Boolean = true --- End diff -- `nullable` could be `str.nullable || pos.nullable || len.nullable` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2504][SQL] Fix nullability of Substring...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1426 [SPARK-2504][SQL] Fix nullability of Substring expression. This is a follow-up of #1359 with nullability narrowing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2504 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1426.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1426 commit 80958acf56d18ad1556282c6a1d06d12ffab02e4 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-16T02:41:40Z Fix nullability of Substring expression. commit 515783268188110646a3fee5c0015260f46d54b9 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-16T02:41:50Z Remove unnecessary white spaces. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1428 [SPARK-2509][SQL] Add optimization for Substring. `Substring` including `null` literal cases could be added to `NullPropagation`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2509 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1428.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1428 commit d9eb85f902c5498f9b7b6c6035471f622dd7d629 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-16T03:31:45Z Add Substring cases to NullPropagation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1428#discussion_r14981973 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -171,6 +171,9 @@ object NullPropagation extends Rule[LogicalPlan] { case Literal(candidate, _) if candidate == v = true case _ = false })) = Literal(true, BooleanType) + case e @ Substring(Literal(null, _), _, _) = Literal(null, e.dataType) --- End diff -- @rxin, thank you for your comment. The rule `ConstantFolding` can be applied only if the expression is foldable, i.e. ALL of the child expressions are foldable, and the `NullPropagation` covers the case there exists at least one `null` literal regardless of foldabilities of the others. So I think the cases are needed. BTW, `Substring` can be foldable, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1428#issuecomment-49125413 @rxin Ah, wait for a moment. What do you think about foldability of the `Substring` I mentioned above? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1428#issuecomment-49125499 I'll open another issue about foldability of `Substring`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/1428#issuecomment-49125889 @rxin I agree that we need the better way for `NullPropagation ` instead of too much pattern matching. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2518][SQL] Fix foldability of Substring...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1432 [SPARK-2518][SQL] Fix foldability of Substring expression. This is a follow-up of #1428. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2518 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1432.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1432 commit 37d1ace86d909bc4204bbb9f55c56a76aae8c106 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-16T06:22:17Z Fix foldability of Substring expression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/1440#discussion_r14993202 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -249,3 +263,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression { if (evaluated == null) null else cast(evaluated) } } + +object Cast { + private[sql] val simpleDateFormat = new SimpleDateFormat(-MM-dd HH:mm:ss) --- End diff -- Hi, `SimpleDateFormat` is not thread-safe, so `def` should be used instead of `val`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2535][SQL] Add StringComparison case to...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1451 [SPARK-2535][SQL] Add StringComparison case to NullPropagation. `StringComparison` expressions including `null` literal cases could be added to `NullPropagation`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2535 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1451.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1451 commit 8f9b98439b6e3bf674ae22a6a6a716e27da509f2 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-15T09:08:13Z Add StringComparison case to NullPropagation. commit e99c237f661589792cde6451466d2c44e783cd82 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-16T10:59:58Z Add some tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2588][SQL] Add some more DSLs.
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/1491 [SPARK-2588][SQL] Add some more DSLs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2588 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1491.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1491 commit 2310bf12156d7937181da199ed1d2305d7644cc7 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-15T08:55:00Z Add some more DSLs. commit 1023ea0bcd206d4d7f6312dd9726479b8f304a63 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-07-16T10:53:47Z Modify tests to use DSLs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-38432140 Added a test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-38757007 `LocalActor` has 4 threads to handle Actor messages but the replacement of the ContextClassLoader happens only one of them. I believe that the **UNBALANCED** state is not desirable. Your patch can fix my original problem but the undesirable state remains. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-38766469 I checked related codes: - [CoarseGrainedExecutorBackend.scala#L56](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L56) - [MesosExecutorBackend.scala#L56-L59](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala#L56-L59) I guess these threads should also be clean so we could remove it entirely. How does that sound? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-38872508 No, but what should I do? rebase onto current master branch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-38872759 Ah I see, there are some changes in `Executor`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-38877616 Rebased onto master and pushed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-38890320 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1380: Add sort-merge based cogroup/joins...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/283 SPARK-1380: Add sort-merge based cogroup/joins. I've written cogroup/joins based on 'Sort-Merge' algorithm. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-1380 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/283.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #283 commit 1c8ba5a0d480f816a0c217618b40bb615474963d Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-03-19T10:28:26Z Add sort-merge cogroup/joins. commit 99751661fcc7632a0f82816bbaca07bf822d3663 Author: Takuya UESHIN ues...@happy-camper.st Date: 2014-03-25T10:15:09Z Add Java APIs for sort-merge cogroup/joins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1380: Add sort-merge based cogroup/joins...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/283#issuecomment-39417683 @rxin Thank you for your reply. There are some case to use merge join for optimization: 1. If data to be joined are already sorted by join keys, merge join would be done more efficiently than hash join. In my test case, both algorithms were almost same speed, but merge join was scalable. 2. Merge join for sorted data by the same keys would be pipelined (each output can be produced immediately for arrived input tuples) even if N-way join (N2). Hash join blocks due to building a hash-table before output are produced. I think it is useful for users to choose ways to optimize their processing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1380: Add sort-merge based cogroup/joins...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/283#issuecomment-39421176 @mridulm Thank you for your reply. There are 2 points I have to mention about memory: - Before shuffle If data are sorted, no more memory is needed because no sort operation is needed, and if not sorted, merge join needs some amount of memory to sort data in each partition. - After shuffle Merge join needs at most the same amount of memory as hash join while fetching data, but it does not need more memory because it can produce output immediately from input. Hash join needs some more memory to build a hash table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---