[GitHub] spark pull request: [SPARK-3771][SQL] AppendingParquetOutputFormat...

2014-10-02 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/2638

[SPARK-3771][SQL] AppendingParquetOutputFormat should use reflection to 
prevent breaking binary-compatibility.

Original problem is 
[SPARK-3764](https://issues.apache.org/jira/browse/SPARK-3764).

`AppendingParquetOutputFormat` uses a binary-incompatible method 
`context.getTaskAttemptID`.
This causes binary-incompatible of Spark itself, i.e. if Spark itself is 
built against hadoop-1, the artifact is for only hadoop-1, and vice versa.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-3771

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2638


commit ec213c160393698fa01c62469263f050f3668453
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-10-02T14:22:46Z

Use reflection to prevent breaking binary-compatibility.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3771][SQL] AppendingParquetOutputFormat...

2014-10-03 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/2638#issuecomment-57800537
  
@srowen, Thank you for your comment.
Indeed, when deploy completed apps to Spark cluster, there is a particular 
instance of Spark.
But Spark app developers will use artifacts in Maven Central while 
developing and unit-testing. The artifacts seem to be built for Hadoop 2, so if 
they want to test with Hadoop 1, it won't work.
What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...

2014-10-07 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/2673#issuecomment-58225042
  
Hi @pwendell, I had a similar issue related to artifacts in Maven Central 
and Hadoop versions.
Could you take a look at 
[SPARK-3764](https://issues.apache.org/jira/browse/SPARK-3764) and #2638 please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15445571
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
--- End diff --

Hi, I think `asInstanceOf[DataType]` is not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-27 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15445580
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +209,70 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = ISO-8859-1
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression {
+
+  type EvaluatedType = Any
+
+  def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  def nullable = child.nullable
+
+  override def toString = sStrlen($child, $encoding)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null.asInstanceOf[DataType]
--- End diff --

Also `asInstanceOf[DataType]` is not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15503212
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = ISO-8859-1
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
--- End diff --

Hi, now `nullable` becomes `true` because `eval` returns `null` if the 
`UnsupportedEncodingException` is thrown regardless of `child.nullable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15503576
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = ISO-8859-1
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
--- End diff --

Ah, this is not related to my prior comment.
I just mentioned that you should remove unnecessary `isInstanceOf` on my 
prior comment.

After that, you changed the logic to return `null` if the exception is 
thrown, so the nullability of this operator was also changed to `true`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-28 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15503732
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,81 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  string.toString.getBytes.length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = ISO-8859-1
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
--- End diff --

No, I'm just saying:

```scala
override def nullable = true
```

instead of

```scala
override def nullable = child.nullable
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-29 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50437674
  
I'm sorry but now I become confused.
`Length` and `Strlen` look like becoming almost the same implementation.
What do you intend the difference between them is?

BTW, `getBytes()` without charset argument should not be used because it 
depends on enviroment, e.g. platform's default charset.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length support to Spark SQL and...

2014-07-29 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50442618
  
First, I would like to confirm, but which do you want to add to HQL, 
`Length` or `Strlen`?
The title of this PR says to add `Length` to HQL, but the implementation 
adds `Strlen`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-07-29 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50450124
  
@javadba Ah, I see. Thank you for your detail.
I'll continue to review :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-07-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15516740
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,82 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  new String(string.toString.getBytes, 
StrlenConstants.DefaultEncoding).length
--- End diff --

Hive's `Length` seems to return the number of code points of the string 
like 
[this](https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLength.java#L41-L56).
I think we can use `String#codePointCount` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-07-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15517240
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,82 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
--- End diff --

I don't know we should handle non-string types.
In my opinion, we should handle only string type (and binary type the same 
as Hive?) and non-string type values should be casted or converted to string 
type before this.

@marmbrus, what do you think about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-07-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15517292
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,82 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  new String(string.toString.getBytes, 
StrlenConstants.DefaultEncoding).length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = ISO-8859-1
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = true
+
+  override def toString = sStrlen($child, $encoding)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  log.debug(sNon-string value [$string] provided to strlen)
+  null
+} else {
+  var evalEncoding = encoding.eval(input)
+  val strEncoding =
+if (evalEncoding != null) {
+  evalEncoding.toString
+} else {
+  StrlenConstants.DefaultEncoding
+}
+  val s: String = 
--- End diff --

Could be removed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-07-29 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50465289
  
@javadba I couldn't understand what you want `Strlen` to return.
Could you clarify the semantics of `Strlen` again, please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-07-29 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50554217
  
Hi, the helper `len()` is not needed.
What Hive's `Length` is doing is to calculate the number of code points, 
which can be done by `String#codePointCount`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-07-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15560814
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -209,6 +212,82 @@ case class EndsWith(left: Expression, right: 
Expression)
 }
 
 /**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  string.toString.length
+} else {
+  new String(string.toString.getBytes, 
StrlenConstants.DefaultEncoding).length
+}
+  }
+
+}
+
+object StrlenConstants {
+  val DefaultEncoding = ISO-8859-1
+}
+
+/**
+ * A function that returns the number of characters in a string expression
+ */
+case class Strlen(child: Expression, encoding : Expression) extends 
UnaryExpression
+  with Logging {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = true
+
+  override def toString = sStrlen($child, $encoding)
+
+  override def eval(input: Row): EvaluatedType = {
+val string = child.eval(input)
+if (string == null) {
+  null
+} else if (!string.isInstanceOf[String]) {
+  log.debug(sNon-string value [$string] provided to strlen)
+  null
+} else {
+  var evalEncoding = encoding.eval(input)
+  val strEncoding =
+if (evalEncoding != null) {
+  evalEncoding.toString
+} else {
+  StrlenConstants.DefaultEncoding
+}
+  val s: String = 
+  try {
+new String(string.asInstanceOf[String].getBytes, 
strEncoding).length
+  } catch {
+case ue : UnsupportedEncodingException = {
+  log.debug(sstrlen: Caught UnsupportedEncodingException for 
encoding=[$strEncoding])
--- End diff --

I think the first one is also not data dependent because the `child` has to 
return value of type related to `child.dataType`, e.g. if the `child.dataType` 
is `StringType`, the `child` has to return only `String` value.

BTW, if you roll back to throw `UnsupportedEncodingException`, please also 
roll back `def nullable` I mentioned before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2054][SQL] Code Generation for Expressi...

2014-07-30 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/993#issuecomment-50578802
  
Hi @marmbrus, thanks for great work!
But it seems to break build.

I got the following result when I run `sbt assembly` or `sbt publish-local`:

```
[error] (catalyst/compile:doc) Scaladoc generation failed
```

and I found a lot of error messages in the build log saying `value q is not 
a member of StringContext`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-07-30 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50637172
  
Hi @javadba, FYI.
I believe there are 3 types of length around string in Java/Scala.

1) the number of 16-bit characters in the string
To get this, use 
[`String#length`](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#length())
 like:

```scala
scala \uF93D\uF936\uF949\uF942.length // chinese characters
res0: Int = 4

scala \uD840\uDC0B\uD842\uDFB7.length // 2 surrogate pairs
res1: Int = 4

scala 1234567890ABC.length
res2: Int = 13
```

2) the number of code points in the string
This will be covered by `Length`.
To get this, use 
[`String#codePointCount`](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#codePointCount(int,%20int))
 like:

```scala
scala \uF93D\uF936\uF949\uF942.codePointCount(0, 4) // chinese characters
res0: Int = 4

scala \uD840\uDC0B\uD842\uDFB7.codePointCount(0, 4) // 2 surrogate pairs
res1: Int = 2

scala 1234567890ABC.codePointCount(0, 13)
res2: Int = 13
```

3) the length of byte array encoded from string in some charset
To get this, use 
[`String#getBytes(charset)`](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes(java.lang.String))`.length`
 like:

```scala
scala \uF93D\uF936\uF949\uF942.getBytes(utf8).length // chinese 
characters
res0: Int = 12

scala \uD840\uDC0B\uD842\uDFB7.getBytes(utf8).length // 2 surrogate 
pairs
res1: Int = 8

scala 1234567890ABC.getBytes(utf8).length
res2: Int = 13

scala \uF93D\uF936\uF949\uF942.getBytes(utf16).length // chinese 
characters
res3: Int = 10

scala \uD840\uDC0B\uD842\uDFB7.getBytes(utf16).length // 2 surrogate 
pairs
res4: Int = 10

scala 1234567890ABC.getBytes(utf16).length
res5: Int = 28

scala \uF93D\uF936\uF949\uF942.getBytes(utf32).length // chinese 
characters
res6: Int = 16

scala \uD840\uDC0B\uD842\uDFB7.getBytes(utf32).length // 2 surrogate 
pairs
res7: Int = 8

scala 1234567890ABC.getBytes(utf32).length
res8: Int = 52
```

At first I guessed you wanted 3) for `Strlen` because charset related 
length is only 3), but I watched a conversation indicating another type of 
length and lost it halfway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-08-01 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50858955
  
Oops, I had forgotten that Hive's `Length` can handle binary type.
It would be better to use `Length` instead of `CharLength` and make it 
handle binary type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-08-01 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50947564
  
Hi @javadba, you said some of which do differ from my results, but which is 
different one?
I can see these tests are checking as the same as my results.
Some of them are failing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-08-01 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-50947924
  
Ah, I see. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-08-01 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15725455
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -208,6 +211,96 @@ case class EndsWith(left: Expression, right: 
Expression)
   def compare(l: String, r: String) = l.endsWith(r)
 }
 
+
+/**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val inputVal = child.eval(input)
+if (inputVal == null) {
+  null
+} else if (!inputVal.isInstanceOf[String]) {
+  inputVal.toString.length
+} else {
+  OctetLenUtils.len(inputVal.asInstanceOf[String])
--- End diff --

We can just use `String#codePointCount`, which can do the same thing as 
Hive's implementation.

`OctetLenUtils.len` implementation is not the same as Hive's one.
Hive uses `o.a.h.io.Text` as an argument of the method, which returns UTF-8 
bytes by `getBytes()` but `java.lang.String#getBytes` returns 
platform-dependent bytes.

The reason why Hive checks one by one here is not to decode string from 
UTF-8 bytes.
Doing encode/decode is heavy, so we should do it as less as possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-08-01 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15725507
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -208,6 +211,96 @@ case class EndsWith(left: Expression, right: 
Expression)
   def compare(l: String, r: String) = l.endsWith(r)
 }
 
+
+/**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val inputVal = child.eval(input)
+if (inputVal == null) {
+  null
+} else if (!inputVal.isInstanceOf[String]) {
+  inputVal.toString.length
+} else {
+  OctetLenUtils.len(inputVal.asInstanceOf[String])
+}
+  }
+
+}
+
+object OctetLengthConstants {
+  val DefaultEncoding = ISO-8859-1
--- End diff --

Should be `UTF-8`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2686 Add Length and Strlen support to Sp...

2014-08-01 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1586#discussion_r15726561
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -208,6 +211,96 @@ case class EndsWith(left: Expression, right: 
Expression)
   def compare(l: String, r: String) = l.endsWith(r)
 }
 
+
+/**
+ * A function that returns the number of bytes in an expression
+ */
+case class Length(child: Expression) extends UnaryExpression {
+
+  type EvaluatedType = Any
+
+  override def dataType = IntegerType
+
+  override def foldable = child.foldable
+
+  override def nullable = child.nullable
+
+  override def toString = sLength($child)
+
+  override def eval(input: Row): EvaluatedType = {
+val inputVal = child.eval(input)
+if (inputVal == null) {
+  null
+} else if (!inputVal.isInstanceOf[String]) {
+  inputVal.toString.length
+} else {
+  OctetLenUtils.len(inputVal.asInstanceOf[String])
--- End diff --

The usage is as follows:

```java
str.codePointCount(0, str.length)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...

2014-08-03 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1750#discussion_r15740705
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -33,6 +33,19 @@ case class UnaryMinus(child: Expression) extends 
UnaryExpression {
   }
 }
 
+case class Sqrt(child: Expression) extends UnaryExpression {
+  type EvaluatedType = Double
+  
+  def dataType = child.dataType
+  override def foldable = child.foldable
+  def nullable = child.nullable
+  override def toString = sSQRT($child)
+
+  override def eval(input: Row): Double = {
+math.sqrt(n1(child, input, _.toDouble(_)).asInstanceOf[Double])
--- End diff --

Hi, I'm wondering this will return `null` for `null`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...

2014-08-04 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-51145793
  
Hi @javadba, I tested `org.apache.spark.sql.SQLQuerySuite` and 
`org.apache.spark.sql.hive.execution.HiveQuerySuite` locally, and they worked 
fine even if I reverted the last commit 22eddbc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...

2014-08-05 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-51179894
  
@javadba, @marmbrus 
I saw the case of SOF sometimes, it was not with @javadba's sequence, 
though.
I can't identify the exact reason now, but I guess this is not related to 
@javadba's commits and is more general problem.
I'll file a new issue on JIRA with some sequences to reproduce the problem.

I'd like to know the result of Jenkins build without the last commit 
22eddbc.
If succeed, we should use OCTET/CHAR_LENGTH, or if failed, we could use 
OCTET/CHAR_LEN for now, I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2965][SQL] Fix HashOuterJoin output nul...

2014-08-11 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1887

[SPARK-2965][SQL] Fix HashOuterJoin output nullabilities.

Output attributes of opposite side of `OuterJoin` should be nullable.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2965

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1887.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1887


commit bcb2d37e178a58e84aea1735985b649245b878a0
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-08-11T08:44:38Z

Fix HashOuterJoin output nullabilities.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

2014-08-11 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1889

[SPARK-2969][SQL] Make ScalaReflection be able to handle 
MapType.containsNull and MapType.valueContainsNull.

Make `ScalaReflection` be able to handle like:

- `Seq[Int]` as `ArrayType(IntegerType, containsNull = false)`
- `Seq[java.lang.Integer]` as `ArrayType(IntegerType, containsNull = true)`
- `Map[Int, Long]` as `MapType(IntegerType, LongType, valueContainsNull = 
false)`
- `Map[Int, java.lang.Long]` as `MapType(IntegerType, LongType, 
valueContainsNull = true)`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2969

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1889.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1889


commit 1a9a96bf752dc33d19cb3654e05bc35c28b9ae08
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-08-11T10:46:24Z

Modify ScalaReflection to handle ArrayType.containsNull and 
MapType.valueContainsNull.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

2014-08-11 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1889#discussion_r16094328
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala 
---
@@ -372,7 +372,7 @@ object MapType {
* The `valueContainsNull` is true.
*/
   def apply(keyType: DataType, valueType: DataType): MapType =
-MapType(keyType: DataType, valueType: DataType, true)
+MapType(keyType: DataType, valueType: DataType, false)
--- End diff --

Oops, I forgot to change the doc.

I think the default value should be `false` because non nullable by default 
is more natural.
For example, when we think `Map[Int, Long]` variable, it can't contain 
`null` value.
If we want to add `null` value to the map, we have to declare the variable 
as `Map[Int, Any]` or `Map[Int, java.lang.Long]` or something like that.
It is the same way when thinking about the data type.

And this is the same as `ArrayType.containsNull`'s default value.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

2014-08-12 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1889#discussion_r16096763
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala 
---
@@ -372,7 +372,7 @@ object MapType {
* The `valueContainsNull` is true.
*/
   def apply(keyType: DataType, valueType: DataType): MapType =
-MapType(keyType: DataType, valueType: DataType, true)
+MapType(keyType: DataType, valueType: DataType, false)
--- End diff --

Ah, I see. I agree with the dangerousness.
I'll revert the change.
And should I change the `ArrayType`'s default value?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

2014-08-12 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1889#discussion_r16097181
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala 
---
@@ -372,7 +372,7 @@ object MapType {
* The `valueContainsNull` is true.
*/
   def apply(keyType: DataType, valueType: DataType): MapType =
-MapType(keyType: DataType, valueType: DataType, true)
+MapType(keyType: DataType, valueType: DataType, false)
--- End diff --

Sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3969 Optimizer should have a super class...

2014-10-16 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/2825

SPARK-3969 Optimizer should have a super class as an interface.

Some developers want to replace `Optimizer` to fit their projects but can't 
do so because currently `Optimizer` is an `object`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-3969

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2825.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2825


commit 9547a23fd5ca0058d1044d19f6a96bdbb1e3b810
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-10-16T10:52:19Z

Extract abstract class from Optimizer for developers to be able to replace 
Optimizer.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...

2014-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2825#discussion_r18995386
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -28,7 +28,9 @@ import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules._
 import org.apache.spark.sql.catalyst.types._
 
-object Optimizer extends RuleExecutor[LogicalPlan] {
+abstract class Optimizer extends RuleExecutor[LogicalPlan]
+
+object Optimizer extends Optimizer {
--- End diff --

Hi @srowen, thank you for your comment.
I'll rename the `object`.
And yes, this PR is a very little API change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...

2014-10-16 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/2825#issuecomment-59461447
  
Hi @chenghao-intel, thank you for your comment.
Yes, that's right. I don't want to mix the logical plan  physical plan 
optimization and I'll extend `SparkStrategies` if I need.
I want to add some `Expression`s for my projects and optimize logical plan 
with them, but right now we can't replace `Optimizer` because it is an 
`object`, so I want to add an extension point for `Optimizer`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3986][SQL] Fix package names to fit the...

2014-10-16 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/2835

[SPARK-3986][SQL] Fix package names to fit their directory names.

Package names of 2 test suites are different from their directory names.
- `GeneratedEvaluationSuite`
- `GeneratedMutableEvaluationSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-3986

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2835.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2835


commit fa2cc05be1e40b7558f5137c294a4fde683ec318
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-10-17T03:35:48Z

Fix package names to fit their directory names.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...

2014-10-17 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2825#discussion_r19050063
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ExpressionOptimizationSuite.scala
 ---
@@ -30,7 +30,7 @@ class ExpressionOptimizationSuite extends 
ExpressionEvaluationSuite {
   expected: Any,
   inputRow: Row = EmptyRow): Unit = {
 val plan = Project(Alias(expression, sOptimized($expression))() :: 
Nil, NoRelation)
-val optimizedPlan = Optimizer(plan)
+val optimizedPlan = SparkOptimizer(plan)
 super.checkEvaluation(optimizedPlan.expressions.head, expected, 
inputRow)
   }
 }
--- End diff --

Of course not. I'll add a new line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...

2014-10-17 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2825#discussion_r19050137
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -28,7 +28,9 @@ import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules._
 import org.apache.spark.sql.catalyst.types._
 
-object Optimizer extends RuleExecutor[LogicalPlan] {
+abstract class Optimizer extends RuleExecutor[LogicalPlan]
+
+object SparkOptimizer extends Optimizer {
--- End diff --

Thank you for your suggestion.
I agree that Catalyst should not tightly coupled with Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2052] [SQL] Add optimization for CaseCo...

2014-06-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/990#discussion_r13576056
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -154,6 +155,10 @@ object NullPropagation extends Rule[LogicalPlan] {
 case left :: Literal(null, _) :: Nil = Literal(null, e.dataType)
 case _ = e
   }
+  case e: CaseConversionExpression = e.children match {
--- End diff --

Fixed the two points.
Should I remove the mentioned lines?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2052] [SQL] Add optimization for CaseCo...

2014-06-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/990#discussion_r13576722
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -154,6 +155,10 @@ object NullPropagation extends Rule[LogicalPlan] {
 case left :: Literal(null, _) :: Nil = Literal(null, e.dataType)
 case _ = e
   }
+  case e: CaseConversionExpression = e.children match {
--- End diff --

Ok, I do.
BTW, I noticed that there are some Expressions in the same situation like 
`UnaryMinus`, `Cast`, or `Not`, ( or `BinaryArithmetic`, `BinaryComparison`, 
`StringRegexExpression`, too ? )
I think they also can be handled by constant folding.
What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2052] [SQL] Add optimization for CaseCo...

2014-06-10 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/990#issuecomment-45582389
  
@marmbrus I checked and remove the unneeded rules from `NullPropagation`.
Could you please check the changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2093] [SQL] NullPropagation should use ...

2014-06-10 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1034

[SPARK-2093] [SQL] NullPropagation should use exact type value.

`NullPropagation` should use exact type value when transform `Count` or 
`Sum`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2093

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1034.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1034


commit 9314806cf03f7a4222a67f8b285c4225bdef3083
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-06-10T07:55:53Z

Fix NullPropagation to use exact type value.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2093] [SQL] NullPropagation should use ...

2014-06-10 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1034#discussion_r13584575
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -100,8 +100,8 @@ object ColumnPruning extends Rule[LogicalPlan] {
 object NullPropagation extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan = q transformExpressionsUp {
-  case e @ Count(Literal(null, _)) = Literal(0, e.dataType)
-  case e @ Sum(Literal(c, _)) if c == 0 = Literal(0, e.dataType)
+  case e @ Count(Literal(null, _)) = Literal(0L, e.dataType)
--- End diff --

I agree with that.
Even if we never change/specialize the return type, `SimplifyCasts` can 
remove the extra `Cast`.

And I'd like to change the `Literal` value for `Sum` to `0L` to use 
`SimplifyCasts` for the case of the sum of integral values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2052] [SQL] Add optimization for CaseCo...

2014-06-11 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/990#discussion_r13635448
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantFoldingSuite.scala
 ---
@@ -173,4 +173,63 @@ class ConstantFoldingSuite extends OptimizerTest {
 
 comparePlans(optimized, correctAnswer)
   }
+
+  test(Constant folding test: expressions have null literals) {
+val originalQuery =
+  testRelation
+.select(
+  IsNull(Literal(null)) as 'c1,
+  IsNotNull(Literal(null)) as 'c2,
+
+  GetItem(Literal(null, ArrayType(IntegerType)), 1) as 'c3,
--- End diff --

Ah, I see, that's right!
I'll move them back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2196] [SQL] Fix nullability of CaseWhen...

2014-06-19 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1133

[SPARK-2196] [SQL] Fix nullability of CaseWhen.

`CaseWhen` should use `branches.length` to check if `elseValue` is provided 
or not.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2196

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1133.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1133


commit 4f049cc94954021aab34e2dac54b985da56a23ea
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-06-19T09:49:33Z

Fix nullability of CaseWhen.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2196] [SQL] Fix nullability of CaseWhen...

2014-06-19 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1133#issuecomment-46633857
  
Added some tests and I noticed that the `CaseWhen` should also be nullable 
if the `elseValue` is nullable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2254] [SQL] ScalaRefection should mark ...

2014-06-24 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1193

[SPARK-2254] [SQL] ScalaRefection should mark primitive types as 
non-nullable.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2254

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1193


commit cfd6088e4d348e9680de46437959d642336f2db0
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-06-24T07:09:36Z

Modify ScalaRefection.schemaFor method to return nullability of Scala Type.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-26 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1226

[SPARK-2287] [SQL] Make ScalaReflection be able to handle Generic case 
classes.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2287

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1226.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1226


commit 7de570641d5bd099ed0e4b68f513aeb2b7ea3f1a
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-06-24T08:08:48Z

Make ScalaReflection be able to handle Generic case classes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-26 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47192918
  
This will cause merge conflict with #1193, but I can fix it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2254] [SQL] ScalaRefection should mark ...

2014-06-26 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1193#issuecomment-47193596
  
@rxin Thanks! I will fix soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-26 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47194382
  
Fixed merge conflict with #1193.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-26 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47206743
  
Could you please retest this?
Previous tests seemed like Hive metastore was something wrong.
(Can I let Jenkins do retest?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2295] [SQL] Make JavaBeans nullability ...

2014-06-26 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1235

[SPARK-2295] [SQL] Make JavaBeans nullability stricter.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2295

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1235


commit 201c508aecb300f9439e3b9537ea3173c7db9ae5
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-06-26T17:55:11Z

Make JavaBeans nullability stricter.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-26 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47298915
  
@marmbrus I see, I will do it next time, thanks.
But why failed...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-27 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1151#discussion_r14285252
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -136,13 +137,12 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
 }
   }
 
-  protected lazy val query: Parser[LogicalPlan] = (
-select * (
-UNION ~ ALL ^^^ { (q1: LogicalPlan, q2: LogicalPlan) = Union(q1, 
q2) } |
-UNION ~ opt(DISTINCT) ^^^ { (q1: LogicalPlan, q2: LogicalPlan) = 
Distinct(Union(q1, q2)) }
-  )
-| insert | cache
-  )
+  protected lazy val query: Parser[LogicalPlan] =
+   select * (
+   UNION ~ ALL ^^^ { (q1: LogicalPlan, q2: LogicalPlan) = 
Union(q1, q2)} |
+   UNION ~ opt(DISTINCT) ^^^ { (q1: LogicalPlan, q2: LogicalPlan) 
= Distinct(Union(q1, q2))} |
--- End diff --

Hi @YanjieGao, Jenkins says a scalastyle error exists here, which is File 
line length exceeds 100 characters.
You have to format code around this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47328485
  
Passed Hive tests. Why? Just merged master.
And Python tests failed...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47416531
  
Hi @marmbrus,

I found that there is a case at my local throwing the same exception with 
`test-only` like:

```
testOnly org.apache.spark.sql.hive.execution.PruningSuite
```

even if my patch is not applied.

And I added a line to `PruningSuite`

```
+  TestHive.loadTestTable(src)
  TestHive.reset()
```

then it worked.

Would it be a help for you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2287] [SQL] Make ScalaReflection be abl...

2014-06-27 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1226#issuecomment-47417440
  
Ah, we must call `SHOW TABLES` before `TestHive.reset()` the same as 
[here](https://github.com/ueshin/apache-spark/blob/issues/SPARK-2287/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala#L249-L250),
 right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...

2014-06-30 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1266

[SPARK-2327] [SQL] Fix nullabilities of Join/Generate/Aggregate.

Fix nullabilities of `Join`/`Generate`/`Aggregate` because:
- Output attributes of opposite side of `OuterJoin` should be nullable.
- Output attributes of generater side of `Generate` should be nullable if 
`join` is `true` and `outer` is `true`.
- `AttributeReference` of `computedAggregates` of `Aggregate` should be the 
same as `aggregateExpression`'s.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2327

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1266.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1266


commit f20f196e33e4bd6ab4adeb9bb4afea9b9256d902
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-06-30T08:43:37Z

Fix Join output nullabilities.

commit 09532ec6a67ba38b63deb6d0ea4607d7e8a07080
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-06-30T08:45:10Z

Fix Generate output nullabilities.

commit 0e31e377b4a8ba8c4bf698c2afea6a604a6c25bd
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-06-30T08:48:52Z

Fix Aggregate resultAttribute nullabilities.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2328] [SQL] Add execution of `SHOW TABL...

2014-06-30 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1268

[SPARK-2328] [SQL] Add execution of `SHOW TABLES` before `TestHive.reset()`.

`PruningSuite` is executed first of Hive tests unfortunately, 
`TestHive.reset()` breaks the test environment.
To prevent this, we must run a query before calling reset the first time.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2328

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1268


commit 043ceac67cdd53cb784e83012fca7a91804dae10
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-06-30T10:36:18Z

Add execution of `SHOW TABLES` before `TestHive.reset()`.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...

2014-06-30 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1266#issuecomment-47518136
  
I guess this needs #1268 to pass Hive tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2328] [SQL] Add execution of `SHOW TABL...

2014-06-30 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1268#issuecomment-47518299
  
If #1226 is merged before this, this is not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...

2014-07-01 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1266#issuecomment-47635811
  
Oops, passed #1268 related errors, but others failed...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...

2014-07-02 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/906#issuecomment-47743422
  
Hi, I encountered this kind of Servlet API conflict.
When are you planning to merge this? Or is there something to do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...

2014-07-02 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/906#issuecomment-47804679
  
Close this?
I encountered after the #1271 was merged.
I believe at least we need to exclude from Hive related dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...

2014-07-03 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1266#discussion_r14543737
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -46,10 +46,22 @@ case class Generate(
 child: LogicalPlan)
   extends UnaryNode {
 
-  protected def generatorOutput: Seq[Attribute] =
-alias
+  protected def generatorOutput: Seq[Attribute] = {
+val output = alias
   .map(a = generator.output.map(_.withQualifiers(a :: Nil)))
   .getOrElse(generator.output)
+if (join  outer) {
+  output.map {
+case attr if !attr.resolved = attr
+case attr if !attr.nullable =
+  AttributeReference(
+attr.name, attr.dataType, nullable = true)(attr.exprId, 
attr.qualifiers)
+case attr = attr
--- End diff --

Ah, I see.
It is the same as `withQualifiers`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...

2014-07-03 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1266#discussion_r14543764
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala ---
@@ -319,7 +319,27 @@ case class BroadcastNestedLoopJoin(
 
   override def otherCopyArgs = sqlContext :: Nil
 
-  def output = left.output ++ right.output
+  def output = {
--- End diff --

Ok, I'll add it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2327] [SQL] Fix nullabilities of Join/G...

2014-07-03 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1266#issuecomment-4714
  
@marmbrus, Thank you for your comments.
Modified to use `withNullability`.

And no problem, because exact types and nullability is important for my 
project.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...

2014-07-04 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1150#discussion_r14569171
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -274,7 +274,7 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
   case Unions(unionChildren) =
 execution.Union(unionChildren.map(planLater))(sqlContext) :: Nil
   case logical.Except(left,right) =   
 
-execution.Except(planLater(left),planLater(right)) :: Nil   
--- End diff --

Hi, it says you deleted some whitespace after `Nil`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2366] [SQL] Add column pruning for the ...

2014-07-04 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1301#issuecomment-48077448
  
Thank you for your comments.
Fixed and pushed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2415] [SQL] RowWriteSupport should hand...

2014-07-09 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1339

[SPARK-2415] [SQL] RowWriteSupport should handle empty ArrayType correctly.

`RowWriteSupport` doesn't write empty `ArrayType` value, so the read value 
becomes `null`.
It should write empty `ArrayType` value as it is.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2415

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1339


commit 2f05196c865583a966b81700d8a909e44d6abb22
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-09T07:48:57Z

Fix RowWriteSupport to handle empty ArrayType correctly.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP][SPARK-2179][SQL] Public API for DataType...

2014-07-09 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1346#issuecomment-48560066
  
Hi, I'm wondering if `MapType` will have something like `containsNull` for 
`ArrayType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP][SPARK-2179][SQL] Public API for DataType...

2014-07-09 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1346#issuecomment-48566629
  
@yhuai, I understand. Thank you for your reply.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2428][SQL] Add except and intersect met...

2014-07-10 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1355

[SPARK-2428][SQL] Add except and intersect methods to SchemaRDD.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2428

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1355


commit b6fa2641730545068703fb5d50a652dcbf3e8e88
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-10T09:10:37Z

Add except and intersect methods to SchemaRDD.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2431][SQL] Refine StringComparison and ...

2014-07-10 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1357

[SPARK-2431][SQL] Refine StringComparison and related codes.

Refine `StringComparison` and related codes as follows:
- `StringComparison` could be similar to `StringRegexExpression` or 
`CaseConversionExpression`.
- Nullability of `StringRegexExpression` could depend on children's 
nullabilities.
- Add a case that the like condition includes no wildcard to 
`LikeSimplification`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2431

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1357.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1357


commit 680bb72ee0dafaeae34dbc3dbff3984f856b5e18
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-10T15:49:07Z

Refine StringComparison.

commit b9da9d21da49f7d92c9b194173805b1a627e841b
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-10T15:49:55Z

Fix nullability of StringRegexExpression.

commit 77766f55c331dac2f81f657d7aedb06538b29c41
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-10T15:54:46Z

Add a case that the like condition includes no wildcard to 
LikeSimplification.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2446][SQL] Add BinaryType support to Pa...

2014-07-11 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1373

[SPARK-2446][SQL] Add BinaryType support to Parquet I/O.

To support `BinaryType`, the following changes are needed:
- Make `StringType` use `OriginalType.UTF8`
- Add `BinaryType` using `PrimitiveTypeName.BINARY` without `OriginalType`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2446

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1373.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1373


commit 616e04a989edce9f5109a3b2ad11b78e1e0f1a77
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-09T15:25:08Z

Make StringType use OriginalType.UTF8.

commit ecacb925c4e1f2dbf863b604ad2600cc8c9663d8
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-11T09:13:46Z

Add BinaryType support to Parquet I/O.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2446][SQL] Add BinaryType support to Pa...

2014-07-11 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1373#issuecomment-48781530
  
@marmbrus Yes, I think so.
But this new behavior is the same as Avro, Thrift and the next Hive (0.14).
To load the string data saved with previous versions, `Cast` to 
`StringType` will be needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2467] Revert SparkBuild to publish-loca...

2014-07-14 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1398

[SPARK-2467] Revert SparkBuild to publish-local to both .m2 and .ivy2.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2467

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1398


commit 7f01d587259221c5333880578e6658e6534c52c4
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-14T06:28:44Z

Revert SparkBuild to publish-local to both .m2 and .ivy2.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2407: Added internal implementation of S...

2014-07-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1359#discussion_r14915562
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 ---
@@ -205,3 +207,72 @@ case class EndsWith(left: Expression, right: 
Expression)
 extends BinaryExpression with StringComparison {
   def compare(l: String, r: String) = l.endsWith(r)
 }
+
+/**
+ * A function that takes a substring of its first argument starting at a 
given position.
+ * Defined for String and Binary types.
+ */
+case class Substring(str: Expression, pos: Expression, len: Expression) 
extends Expression {
+  
+  type EvaluatedType = Any
+  
+  def nullable: Boolean = true
--- End diff --

`nullable` could be `str.nullable || pos.nullable || len.nullable` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2504][SQL] Fix nullability of Substring...

2014-07-15 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1426

[SPARK-2504][SQL] Fix nullability of Substring expression.

This is a follow-up of #1359 with nullability narrowing.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2504

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1426.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1426


commit 80958acf56d18ad1556282c6a1d06d12ffab02e4
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-16T02:41:40Z

Fix nullability of Substring expression.

commit 515783268188110646a3fee5c0015260f46d54b9
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-16T02:41:50Z

Remove unnecessary white spaces.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...

2014-07-15 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1428

[SPARK-2509][SQL] Add optimization for Substring.

`Substring` including `null` literal cases could be added to 
`NullPropagation`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2509

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1428


commit d9eb85f902c5498f9b7b6c6035471f622dd7d629
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-16T03:31:45Z

Add Substring cases to NullPropagation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...

2014-07-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1428#discussion_r14981973
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -171,6 +171,9 @@ object NullPropagation extends Rule[LogicalPlan] {
   case Literal(candidate, _) if candidate == v = true
   case _ = false
 })) = Literal(true, BooleanType)
+  case e @ Substring(Literal(null, _), _, _) = Literal(null, 
e.dataType)
--- End diff --

@rxin, thank you for your comment.
The rule `ConstantFolding` can be applied only if the expression is 
foldable, i.e. ALL of the child expressions are foldable, and the 
`NullPropagation` covers the case there exists at least one `null` literal 
regardless of foldabilities of the others.
So I think the cases are needed.

BTW, `Substring` can be foldable, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...

2014-07-15 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1428#issuecomment-49125413
  
@rxin Ah, wait for a moment.
What do you think about foldability of the `Substring` I mentioned above?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...

2014-07-15 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1428#issuecomment-49125499
  
I'll open another issue about foldability of `Substring`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2509][SQL] Add optimization for Substri...

2014-07-15 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1428#issuecomment-49125889
  
@rxin I agree that we need the better way for `NullPropagation ` instead of 
too much pattern matching.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2518][SQL] Fix foldability of Substring...

2014-07-16 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1432

[SPARK-2518][SQL] Fix foldability of Substring expression.

This is a follow-up of #1428.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2518

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1432


commit 37d1ace86d909bc4204bbb9f55c56a76aae8c106
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-16T06:22:17Z

Fix foldability of Substring expression.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1440#discussion_r14993202
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -249,3 +263,7 @@ case class Cast(child: Expression, dataType: DataType) 
extends UnaryExpression {
 if (evaluated == null) null else cast(evaluated)
   }
 }
+
+object Cast {
+  private[sql] val simpleDateFormat = new SimpleDateFormat(-MM-dd 
HH:mm:ss)
--- End diff --

Hi, `SimpleDateFormat` is not thread-safe, so `def` should be used instead 
of `val`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2535][SQL] Add StringComparison case to...

2014-07-16 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1451

[SPARK-2535][SQL] Add StringComparison case to NullPropagation.

`StringComparison` expressions including `null` literal cases could be 
added to `NullPropagation`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2535

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1451.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1451


commit 8f9b98439b6e3bf674ae22a6a6a716e27da509f2
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-15T09:08:13Z

Add StringComparison case to NullPropagation.

commit e99c237f661589792cde6451466d2c44e783cd82
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-16T10:59:58Z

Add some tests.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2588][SQL] Add some more DSLs.

2014-07-18 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/1491

[SPARK-2588][SQL] Add some more DSLs.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-2588

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1491.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1491


commit 2310bf12156d7937181da199ed1d2305d7644cc7
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-15T08:55:00Z

Add some more DSLs.

commit 1023ea0bcd206d4d7f6312dd9726479b8f304a63
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-07-16T10:53:47Z

Modify tests to use DSLs.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-24 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38432140
  
Added a test case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-26 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38757007
  
`LocalActor` has 4 threads to handle Actor messages but the replacement of 
the ContextClassLoader happens only one of them.
I believe that the **UNBALANCED** state is not desirable.

Your patch can fix my original problem but the undesirable state remains.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-26 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38766469
  
I checked related codes:

- 
[CoarseGrainedExecutorBackend.scala#L56](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L56)
- 
[MesosExecutorBackend.scala#L56-L59](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala#L56-L59)

I guess these threads should also be clean so we could remove it entirely.
How does that sound?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-27 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38872508
  
No, but what should I do? rebase onto current master branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-27 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38872759
  
Ah I see, there are some changes in `Executor`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-27 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38877616
  
Rebased onto master and pushed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-27 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38890320
  
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1380: Add sort-merge based cogroup/joins...

2014-04-01 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/283

SPARK-1380: Add sort-merge based cogroup/joins.

I've written cogroup/joins based on 'Sort-Merge' algorithm.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-1380

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/283.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #283


commit 1c8ba5a0d480f816a0c217618b40bb615474963d
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-03-19T10:28:26Z

Add sort-merge cogroup/joins.

commit 99751661fcc7632a0f82816bbaca07bf822d3663
Author: Takuya UESHIN ues...@happy-camper.st
Date:   2014-03-25T10:15:09Z

Add Java APIs for sort-merge cogroup/joins.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1380: Add sort-merge based cogroup/joins...

2014-04-03 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/283#issuecomment-39417683
  
@rxin Thank you for your reply.

There are some case to use merge join for optimization:

1. If data to be joined are already sorted by join keys, merge join would 
be done more efficiently than hash join. In my test case, both algorithms were 
almost same speed, but merge join was scalable.
2. Merge join for sorted data by the same keys would be pipelined (each 
output can be produced immediately for arrived input tuples) even if N-way join 
(N2). Hash join blocks due to building a hash-table before output are produced.

I think it is useful for users to choose ways to optimize their processing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1380: Add sort-merge based cogroup/joins...

2014-04-03 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/283#issuecomment-39421176
  
@mridulm Thank you for your reply.

There are 2 points I have to mention about memory:

- Before shuffle  
If data are sorted, no more memory is needed because no sort operation is 
needed, and if not sorted, merge join needs some amount of memory to sort data 
in each partition.
- After shuffle  
Merge join needs at most the same amount of memory as hash join while 
fetching data, but it does not need more memory because it can produce output 
immediately from input. Hash join needs some more memory to build a hash table.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   4   5   6   7   8   9   10   >