date:20170408

[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17541#discussion_r110533216
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -43,17 +43,8 @@ case class LogicalRelation(
 com.google.common.base.Objects.hashCode(relation, output)
   }
 
-  override def sameResult(otherPlan: LogicalPlan): Boolean = {
-otherPlan.canonicalized match {
-  case LogicalRelation(otherRelation, _, _) => relation == 
otherRelation
-  case _ => false
-}
-  }
-
-  // When comparing two LogicalRelations from within 
LogicalPlan.sameResult, we only need
-  // LogicalRelation.cleanArgs to return Seq(relation), since 
expectedOutputAttribute's
-  // expId can be different but the relation is still the same.
-  override lazy val cleanArgs: Seq[Any] = Seq(relation)
+  // Only care about relation when canonicalizing.
+  override def preCanonicalized: LogicalPlan = copy(catalogTable = None)
--- End diff --

The builders of external data sources need to implement `equals` and 
`hashCode` if they want to utilize our cache management. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17569
  
**[Test build #75628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75628/testReport)**
 for PR 17569 at commit 
[`10cf4be`](https://github.com/apache/spark/commit/10cf4be41d1de37115edc140e1421caf5b23336a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...

2017-04-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/17568
  
@cloud-fan how about this check for 2.?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17569
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-08 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110530705
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -225,25 +225,26 @@ case class Invoke(
   getFuncResult(ev.value, s"${obj.value}.$functionName($argString)")
 } else {
   val funcResult = ctx.freshName("funcResult")
+  // If the function can return null, we do an extra check to make 
sure our null bit is still
+  // set correctly.
+  val postNullCheckAndAssign = if (!returnNullable) {
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17569
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110530587
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -225,25 +225,26 @@ case class Invoke(
   getFuncResult(ev.value, s"${obj.value}.$functionName($argString)")
 } else {
   val funcResult = ctx.freshName("funcResult")
+  // If the function can return null, we do an extra check to make 
sure our null bit is still
+  // set correctly.
+  val postNullCheckAndAssign = if (!returnNullable) {
--- End diff --

how about just `assignResult`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110530518
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
 ---
@@ -262,17 +264,18 @@ object RowEncoder {
 input :: Nil)
 
 case _: DecimalType =>
-  Invoke(input, "toJavaBigDecimal", 
ObjectType(classOf[java.math.BigDecimal]))
+  Invoke(input, "toJavaBigDecimal", 
ObjectType(classOf[java.math.BigDecimal]),
+returnNullable = false)
 
 case StringType =>
-  Invoke(input, "toString", ObjectType(classOf[String]))
+  Invoke(input, "toString", ObjectType(classOf[String]), 
returnNullable = false)
--- End diff --

ok let's keep the default value unchanged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17541#discussion_r110530462
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -359,9 +359,59 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
   override protected def innerChildren: Seq[QueryPlan[_]] = subqueries
 
   /**
-   * Canonicalized copy of this query plan.
+   * Returns a plan where a best effort attempt has been made to transform 
`this` in a way
+   * that preserves the result but removes cosmetic variations (case 
sensitivity, ordering for
+   * commutative operations, expression id, etc.)
+   *
+   * Plans where `this.canonicalized == other.canonicalized` will always 
evaluate to the same
+   * result.
+   *
+   * Some nodes should overwrite this to provide proper canonicalize logic.
+   */
+  lazy val canonicalized: PlanType = {
+val canonicalizedChildren = children.map(_.canonicalized)
+var id = -1
+preCanonicalized.mapExpressions {
+  case a: Alias =>
+id += 1
+// As the root of the expression, Alias will always take an 
arbitrary exprId, we need to
+// normalize that for equality testing, by assigning expr id from 
0 incrementally. The
+// alias name doesn't matter and should be erased.
+Alias(normalizeExprId(a.child), "")(ExprId(id), a.qualifier, 
isGenerated = a.isGenerated)
+
+  case ar: AttributeReference if allAttributes.indexOf(ar.exprId) == 
-1 =>
+// Top level `AttributeReference` may also be used for output like 
`Alias`, we should
+// normalize the epxrId too.
+id += 1
+ar.withExprId(ExprId(id))
+
+  case other => normalizeExprId(other)
+}.withNewChildren(canonicalizedChildren)
+  }
+
+  /**
+   * Do some simple transformation on this plan before canonicalizing. 
Implementations can override
+   * this method to provide customer canonicalize logic without rewriting 
the whole logic.
*/
-  protected lazy val canonicalized: PlanType = this
+  protected def preCanonicalized: PlanType = this
+
+  /**
+   * Normalize the exprIds in the given expression, by updating the exprId 
in `AttributeReference`
+   * with its referenced ordinal from input attributes. It's similar to 
`BindReferences` but we
+   * do not use `BindReferences` here as the plan may take the expression 
as a parameter with type
+   * `Attribute`, and replace it with `BoundReference` will cause error.
+   */
+  protected def normalizeExprId[T <: Expression](e: T, input: AttributeSeq 
= allAttributes): T = {
+e.transformUp {
+  case ar: AttributeReference =>
+val ordinal = input.indexOf(ar.exprId)
+if (ordinal == -1) {
+  ar
--- End diff --

no, actually this is unexpected, the attribute should either reference to 
input attributes, or represent new output at top level. Keep it unchanged so 
that the equality check will fail later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17572: [SPARK-20260][MLLib] String interpolation required for e...

2017-04-08 Thread vijaykramesh

Github user vijaykramesh commented on the issue:

https://github.com/apache/spark/pull/17572
  
@srowen fixed it in some more places. it seems like everywhere else that 
regexp matches we actually want the $ outputted.  do you want me to squash 
commits as well? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17576: Update Dataset to camel case (DataSet) to match D...

2017-04-08 Thread kevinmcinerney

Github user kevinmcinerney closed the pull request at:

https://github.com/apache/spark/pull/17576


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17546: [SPARK-20233] [SQL] Apply star-join filter heuristics to...

2017-04-08 Thread ioana-delaney

Github user ioana-delaney commented on the issue:

https://github.com/apache/spark/pull/17546
  
@cloud-fan Do you have any comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-08 Thread Syrux

Github user Syrux commented on the issue:

https://github.com/apache/spark/pull/17575
  
Yo Sean, I already pushed the requested changes in case it's the correct 
place to do so.
(I can just revert them, if not)

I added two new methods to allow tests. First a method which finds all 
frequent items in a database, second a method that actually clean the database 
using those frequent items. Although I didn't end up using the first method, 
the pre-processing method is now much clearer to understand. So I left the new 
method. Just tell me if I need to put that piece of code back.

I also added tests for multiple types of sequence database. More 
specifically, when there is max one item per itemset, when there can be 
multiple items per itemsets, and when cleaning the database empties it. They 
should cover all cases together.

Of course, the new implementation passes the tests perfectly, and the old 
one doesn't.
Every other thing remained as is.

Tell me if the way I did it was ok. I hope it's up to standards :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD

2017-04-08 Thread reggert

Github user reggert commented on the issue:

https://github.com/apache/spark/pull/15899
  
Strictly speaking, this doesn't just affect pair RDDs. It affects any RDDs 
on which a `for` expression involving a filter operation, which includes 
explicit `if` clauses as well as pattern matches.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17541#discussion_r110524716
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -267,7 +265,7 @@ case class FileSourceScanExec(
 val metadata =
   Map(
 "Format" -> relation.fileFormat.toString,
-"ReadSchema" -> outputSchema.catalogString,
+"requiredSchema" -> requiredSchema.catalogString,
--- End diff --

This is also for display in `SparkPlanInfo`? Keep the original name 
`ReadSchema`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...

2017-04-08 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/17540#discussion_r110523942
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -180,9 +180,13 @@ class Dataset[T] private[sql](
 // to happen right away to let these side effects take place eagerly.
 queryExecution.analyzed match {
   case c: Command =>
-LocalRelation(c.output, 
queryExecution.executedPlan.executeCollect())
--- End diff --

Yeah, the check I added to ensure we get the same results in the SQL tab 
has [several hundred 
failures](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75579/testReport/)
 that go through this. Looks like the path is almost always `spark.sql` when 
the SQL statement is a command like CTAS.

I like your version and will update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-08 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/17540
  
Thanks for the review! I'll get the thrift-server tests fixed up next week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17541#discussion_r110523770
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -359,9 +359,59 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
   override protected def innerChildren: Seq[QueryPlan[_]] = subqueries
 
   /**
-   * Canonicalized copy of this query plan.
+   * Returns a plan where a best effort attempt has been made to transform 
`this` in a way
+   * that preserves the result but removes cosmetic variations (case 
sensitivity, ordering for
+   * commutative operations, expression id, etc.)
+   *
+   * Plans where `this.canonicalized == other.canonicalized` will always 
evaluate to the same
+   * result.
+   *
+   * Some nodes should overwrite this to provide proper canonicalize logic.
+   */
+  lazy val canonicalized: PlanType = {
+val canonicalizedChildren = children.map(_.canonicalized)
+var id = -1
+preCanonicalized.mapExpressions {
+  case a: Alias =>
+id += 1
+// As the root of the expression, Alias will always take an 
arbitrary exprId, we need to
+// normalize that for equality testing, by assigning expr id from 
0 incrementally. The
+// alias name doesn't matter and should be erased.
+Alias(normalizeExprId(a.child), "")(ExprId(id), a.qualifier, 
isGenerated = a.isGenerated)
+
+  case ar: AttributeReference if allAttributes.indexOf(ar.exprId) == 
-1 =>
+// Top level `AttributeReference` may also be used for output like 
`Alias`, we should
+// normalize the epxrId too.
+id += 1
+ar.withExprId(ExprId(id))
+
+  case other => normalizeExprId(other)
+}.withNewChildren(canonicalizedChildren)
+  }
+
+  /**
+   * Do some simple transformation on this plan before canonicalizing. 
Implementations can override
+   * this method to provide customer canonicalize logic without rewriting 
the whole logic.
*/
-  protected lazy val canonicalized: PlanType = this
+  protected def preCanonicalized: PlanType = this
+
+  /**
+   * Normalize the exprIds in the given expression, by updating the exprId 
in `AttributeReference`
+   * with its referenced ordinal from input attributes. It's similar to 
`BindReferences` but we
+   * do not use `BindReferences` here as the plan may take the expression 
as a parameter with type
+   * `Attribute`, and replace it with `BoundReference` will cause error.
+   */
+  protected def normalizeExprId[T <: Expression](e: T, input: AttributeSeq 
= allAttributes): T = {
+e.transformUp {
+  case ar: AttributeReference =>
+val ordinal = input.indexOf(ar.exprId)
+if (ordinal == -1) {
+  ar
--- End diff --

No need to normalize exprIds in this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17576: Update Dataset to camel case (DataSet) to match DataFram...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17576
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17576: Update Dataset to camel case (DataSet) to match D...

2017-04-08 Thread kevinmcinerney

GitHub user kevinmcinerney opened a pull request:

https://github.com/apache/spark/pull/17576

Update Dataset to camel case (DataSet) to match DataFrames

Shouldn't Datasets and DataFrames both be camel case for the ocd ppl out 
there?

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinmcinerney/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17576.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17576


commit 2e00ad22b1b57bb87914ec16582c033f84cf4a17
Author: Kevin Mc Inerney 
Date:   2017-04-08T17:54:19Z

Update Dataset to camel case (DataSet) to match DataFrames

Shouldn't Datasets and DataFrames both be camel case for the ocd ppl out 
there?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16820: [SPARK-19471] AggregationIterator does not initia...

2017-04-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16820#discussion_r110523513
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -448,6 +448,22 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   rand(Random.nextLong()), randn(Random.nextLong())
 ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_))
   }
+
+  private def assertNoExceptions(c: Column): Unit = {
+for (wholeStage <- Seq(true, false)) {
+  withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, 
wholeStage.toString)) {
+spark.range(0, 5).toDF("a").agg(sum("a")).withColumn("v", 
c).collect()
--- End diff --

I found almost all the physical plans of join have the exactly same issue. 
I will try to submit the fix for the joins one by one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15567: [SPARK-14393][SQL] values generated by non-determ...

2017-04-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15567#discussion_r110523451
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -205,10 +206,11 @@ case class FilterExec(condition: Expression, child: 
SparkPlan)
 
   protected override def doExecute(): RDD[InternalRow] = {
 val numOutputRows = longMetric("numOutputRows")
-child.execute().mapPartitionsInternal { iter =>
+child.execute().mapPartitionsWithIndexInternal { (index, iter) =>
   val predicate = newPredicate(condition, child.output)
+  predicate.initialize(0)
--- End diff --

Just wondering why `FilterExec` is not using `index` to initialize the 
conditions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17541#discussion_r110523418
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -359,9 +359,59 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
   override protected def innerChildren: Seq[QueryPlan[_]] = subqueries
 
   /**
-   * Canonicalized copy of this query plan.
+   * Returns a plan where a best effort attempt has been made to transform 
`this` in a way
+   * that preserves the result but removes cosmetic variations (case 
sensitivity, ordering for
+   * commutative operations, expression id, etc.)
+   *
+   * Plans where `this.canonicalized == other.canonicalized` will always 
evaluate to the same
+   * result.
+   *
+   * Some nodes should overwrite this to provide proper canonicalize logic.
+   */
+  lazy val canonicalized: PlanType = {
+val canonicalizedChildren = children.map(_.canonicalized)
+var id = -1
+preCanonicalized.mapExpressions {
+  case a: Alias =>
+id += 1
+// As the root of the expression, Alias will always take an 
arbitrary exprId, we need to
+// normalize that for equality testing, by assigning expr id from 
0 incrementally. The
+// alias name doesn't matter and should be erased.
+Alias(normalizeExprId(a.child), "")(ExprId(id), a.qualifier, 
isGenerated = a.isGenerated)
+
+  case ar: AttributeReference if allAttributes.indexOf(ar.exprId) == 
-1 =>
+// Top level `AttributeReference` may also be used for output like 
`Alias`, we should
+// normalize the epxrId too.
+id += 1
+ar.withExprId(ExprId(id))
+
+  case other => normalizeExprId(other)
+}.withNewChildren(canonicalizedChildren)
+  }
+
+  /**
+   * Do some simple transformation on this plan before canonicalizing. 
Implementations can override
+   * this method to provide customer canonicalize logic without rewriting 
the whole logic.
--- End diff --

`customer` -> `customized`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17350: [SPARK-20017][SQL] change the nullability of function 'S...

2017-04-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17350
  
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17569
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17569
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75627/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17569
  
**[Test build #75627 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75627/testReport)**
 for PR 17569 at commit 
[`510fb53`](https://github.com/apache/spark/commit/510fb530ebf3d9235206cefe8e428bf3f8689cfc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...

2017-04-08 Thread ioana-delaney

Github user ioana-delaney commented on a diff in the pull request:

https://github.com/apache/spark/pull/17546#discussion_r110522547
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -54,14 +54,12 @@ case class CostBasedJoinReorder(conf: SQLConf) extends 
Rule[LogicalPlan] with Pr
 
   private def reorder(plan: LogicalPlan, output: Seq[Attribute]): 
LogicalPlan = {
 val (items, conditions) = extractInnerJoins(plan)
-// TODO: Compute the set of star-joins and use them in the join 
enumeration
-// algorithm to prune un-optimal plan choices.
 val result =
   // Do reordering if the number of items is appropriate and join 
conditions exist.
   // We also need to check if costs of all items can be evaluated.
   if (items.size > 2 && items.size <= conf.joinReorderDPThreshold && 
conditions.nonEmpty &&
   items.forall(_.stats(conf).rowCount.isDefined)) {
-JoinReorderDP.search(conf, items, conditions, output)
+JoinReorderDP(conf).search(conf, items, conditions, output)
--- End diff --

Reverted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-08 Thread Syrux

Github user Syrux commented on the issue:

https://github.com/apache/spark/pull/17575
  
Ok, should I create a new Jira and push there the additionnal tests ?
Or is here completly fine, since it's related to the current change

Tell me, and I will get the change done asap :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17350: [SPARK-20017][SQL] change the nullability of function 'S...

2017-04-08 Thread zhaorongsheng

Github user zhaorongsheng commented on the issue:

https://github.com/apache/spark/pull/17350
  
@gatorsmile Sorry for the late reply. 
I have checked all the functions' nullability setting and I didn't found 
any issue.
Thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-08 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17575
  
Even a simplistic test of this case would give a lot more confidence that 
it's correct. If it means opening up a `private[spark]` method or two to make 
testing possible that seems reasonable. I don' think it needs significant 
change. Something needs to exercise this code path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-08 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110520996
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -225,25 +225,26 @@ case class Invoke(
   getFuncResult(ev.value, s"${obj.value}.$functionName($argString)")
 } else {
   val funcResult = ctx.freshName("funcResult")
+  // If the function can return null, we do an extra check to make 
sure our null bit is still
+  // set correctly.
+  val postNullCheck = if (!returnNullable) {
--- End diff --

sure, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17574
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17574
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75626/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17574
  
**[Test build #75626 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75626/testReport)**
 for PR 17574 at commit 
[`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...

2017-04-08 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17546#discussion_r110520685
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -736,6 +736,12 @@ object SQLConf {
   .checkValue(weight => weight >= 0 && weight <= 1, "The weight value 
must be in [0, 1].")
   .createWithDefault(0.7)
 
+  val JOIN_REORDER_DP_STAR_FILTER =
+buildConf("spark.sql.cbo.joinReorder.dp.star.filter")
+  .doc("Applies star-join filter heuristics to cost based join 
enumeration.")
+  .booleanConf
+  .createWithDefault(false)
--- End diff --

Yea I also think we keep the default false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17569
  
**[Test build #75627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75627/testReport)**
 for PR 17569 at commit 
[`510fb53`](https://github.com/apache/spark/commit/510fb530ebf3d9235206cefe8e428bf3f8689cfc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-08 Thread Syrux

Github user Syrux commented on the issue:

https://github.com/apache/spark/pull/17575
  
Yes exactly, the current implementation adds too much unnecessary 
delimiters. We this one line change, delimiter are only placed where needed. 

Currently there are no tests to verify if the algorithm cleans the 
sequences correctly. I only found that inneficiency by printing stuff around 
while I implemented other things on my local github. 

If you want, I can add some tests, but that will necessitate a small 
refector to separate the cleaning part in it's own method. Calling the current 
method would directly call the main algorithm ... ^^'

Two of the existing tests did cover cases where sequence of zero where 
left. However not at pertinent places (Integer/String type, variable-size 
itemsets clean a five at the end of the third sequence, leaving 2 zero instead 
of one). 

I can however vouch that the previous code worked just fine. Both the 
results of the old implementation and this one are the same. They also 
correspond to the results I obtained for another standalone CP based 
implementation. It's just that this code makes the pre-processing more 
efficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17541#discussion_r110519122
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -423,8 +423,15 @@ case class CatalogRelation(
 Objects.hashCode(tableMeta.identifier, output)
   }
 
-  /** Only compare table identifier. */
--- End diff --

Actually we should compare more, e.g. if the table schema is altered, the 
new table relation should not be considered as same with the old table 
relation, even after canonicalization. Also, it's tricky to remove the output 
of a plan during canonicalization as the parenting plan may rely on the output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17541
  
cc @gatorsmile any more comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17569
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17569
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75625/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17569
  
**[Test build #75625 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75625/testReport)**
 for PR 17569 at commit 
[`3080ac2`](https://github.com/apache/spark/commit/3080ac2230e2512d6de3f6aadfed0e31b3b7eed3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17568
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75624/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17568
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17568
  
**[Test build #75624 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75624/testReport)**
 for PR 17568 at commit 
[`0679ebe`](https://github.com/apache/spark/commit/0679ebe17ed6c4619a7aef64fd41c2f21ffd3c7a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17242: [SPARK-19902][SQL] Add optimization rule to simplify exp...

2017-04-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17242
  
ping @cloud-fan Can you take a look of this? If you don't think this is 
appropriate direction, please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17574
  
**[Test build #75626 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75626/testReport)**
 for PR 17574 at commit 
[`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-08 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110517840
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -225,25 +225,26 @@ case class Invoke(
   getFuncResult(ev.value, s"${obj.value}.$functionName($argString)")
 } else {
   val funcResult = ctx.freshName("funcResult")
+  // If the function can return null, we do an extra check to make 
sure our null bit is still
+  // set correctly.
+  val postNullCheck = if (!returnNullable) {
--- End diff --

nit: rename `postNullCheck`. It is actually not only null check but also 
assigning the function result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17574
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17574
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17574
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75623/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17574
  
**[Test build #75623 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75623/testReport)**
 for PR 17574 at commit 
[`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-04-08 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/14731
  
@srowen anything else I need to do here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-04-08 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/12004
  
@srowen anything else I need to do here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-08 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/17342#discussion_r110517523
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int 
= 10240) extends java.io.Ou
 new String(nonCircularBuffer, StandardCharsets.UTF_8)
   }
 }
+
+
+/**
+ * Factory for URL stream handlers. It relies on 'protocol' to choose the 
appropriate
+ * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' 
branches in
+ * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
+ */
+private[spark] class SparkUrlStreamHandlerFactory extends 
URLStreamHandlerFactory {
+  private var hdfsHandler : URLStreamHandler = _
+
+  def createURLStreamHandler(protocol: String): URLStreamHandler = {
+if (protocol.compareToIgnoreCase("hdfs") == 0) {
--- End diff --

Sorry, missed this. There's nothing explicit in 2.8+ right now; don't hold 
your breath. If people do want to co-dev one, be happy to help. There's no 
point me implementing something which isn't useful/going to be used by 
downstream projects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...

2017-04-08 Thread yssharma

Github user yssharma commented on the issue:

https://github.com/apache/spark/pull/17467
  
Just for info, while trying to use the `sc` in the `KinesisBackedBlockRDD ` 
: 

`- Basic reading from Kinesis *** FAILED ***
  org.apache.spark.SparkException: Task not serializable
  at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
  at 
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2284)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2058)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2084)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  ...
  Cause: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
- object not serializable (class: org.apache.spark.SparkContext, value: 
org.apache.spark.SparkContext@60c1663c)
- field (class: 
org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD, name: 
org$apache$spark$streaming$kinesis$KinesisBackedBlockRDD$$sc, type: class 
org.apache.spark.SparkContext)
- object (class 
org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD, 
KinesisBackedBlockRDD[0] at BlockRDD at KinesisBackedBlockRDD.scala:90)
- field (class: org.apache.spark.NarrowDependency, name: _rdd, type: 
class org.apache.spark.rdd.RDD)
- object (class org.apache.spark.OneToOneDependency, 
org.apache.spark.OneToOneDependency@52a33c3f)
- writeObject data (class: 
scala.collection.immutable.List$SerializationProxy)
- object (class scala.collection.immutable.List$SerializationProxy, 
scala.collection.immutable.List$SerializationProxy@71ed560f)
- writeReplace data (class: 
scala.collection.immutable.List$SerializationProxy)
- object (class scala.collection.immutable.$colon$colon, 
List(org.apache.spark.OneToOneDependency@52a33c3f))
- field (class: org.apache.spark.rdd.RDD, name: 
org$apache$spark$rdd$RDD$$dependencies_, type: interface scala.collection.Seq)
- object (class org.apache.spark.rdd.MapPartitionsRDD, 
MapPartitionsRDD[1] at map at KinesisBackedBlockRDDSuite.scala:83)
- field (class: org.apache.spark.rdd.RDD$$anonfun$collect$1, name: 
$outer, type: class org.apache.spark.rdd.RDD)
- object (class org.apache.spark.rdd.RDD$$anonfun$collect$1, 
)
- field (class: 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13, name: $outer, type: 
class org.apache.spark.rdd.RDD$$anonfun$collect$1)
- object (class 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13, )
  at 
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
  at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
  at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
  at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
  at 
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2284)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2058)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2084)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17506: [SPARK-20189][DStream] Fix spark kinesis testcases to re...

2017-04-08 Thread yssharma

Github user yssharma commented on the issue:

https://github.com/apache/spark/pull/17506
  
Is there anything else that can be done on this patch. The patch fixes all 
the deprecated api testcases that try to use the aws secret/id credentials 
instead of the builder.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...

2017-04-08 Thread yssharma

Github user yssharma commented on the issue:

https://github.com/apache/spark/pull/17467
  
@brkyvz - thanks for taking time to review the patch. appreciate it.
Implemented all your suggestions. Now passing a new map for the kinesis 
configs and added mechanism to use the builder for the configs.

As for the spark context, I wanted to use the sparkcontext available in 
`KinesisBackedBlockRDD` directly as well (instead of creating a new config 
map), but the sc in `KinesisBackedBlockRDD`
is not available, and trying to use it there causes serialization errors. 
Passing a different config map looked like the only simple solution to access 
the kineses configs.

The patch now doesnot use the `sc` at all and expects a kinesisConf to be 
passed to the `KinesisInputDStream` builder directly.

Let me know your thoughts. Thanks again for the review comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17569
  
**[Test build #75625 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75625/testReport)**
 for PR 17569 at commit 
[`3080ac2`](https://github.com/apache/spark/commit/3080ac2230e2512d6de3f6aadfed0e31b3b7eed3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-08 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110516074
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
 ---
@@ -262,17 +264,18 @@ object RowEncoder {
 input :: Nil)
 
 case _: DecimalType =>
-  Invoke(input, "toJavaBigDecimal", 
ObjectType(classOf[java.math.BigDecimal]))
+  Invoke(input, "toJavaBigDecimal", 
ObjectType(classOf[java.math.BigDecimal]),
+returnNullable = false)
 
 case StringType =>
-  Invoke(input, "toString", ObjectType(classOf[String]))
+  Invoke(input, "toString", ObjectType(classOf[String]), 
returnNullable = false)
--- End diff --

Here is statistics for 59 call sites of `Invoke()`.

18: `dataType` is primitive type
21: `returnNullable` is true (no specification at call site)
19: `returnNullable` is false
1: set a variable to `returnNullable

What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17541
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75620/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17541
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17541
  
**[Test build #75620 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75620/testReport)**
 for PR 17541 at commit 
[`9305187`](https://github.com/apache/spark/commit/930518759489f64d96e439715872353e64d681a0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17568
  
**[Test build #75624 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75624/testReport)**
 for PR 17568 at commit 
[`0679ebe`](https://github.com/apache/spark/commit/0679ebe17ed6c4619a7aef64fd41c2f21ffd3c7a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17574
  
Do we need it as a normal dependency? Looks like sql/core doesn't use it 
and the building works without this dependency. Sorry if I am missing something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17575
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-proc...

2017-04-08 Thread Syrux

GitHub user Syrux opened a pull request:

https://github.com/apache/spark/pull/17575

[SPARK-20265][MLlib] Improve Prefix'span pre-processing efficiency

## What changes were proposed in this pull request?

Improve PrefixSpan pre-processing efficency by preventing sequences of zero 
in the cleaned database. 
The efficiency gain is reflected in the following graph : 
https://postimg.org/image/9x6ireuvn/

## How was this patch tested?

Using MLlib's PrefixSpan existing tests and tests of my own on the 8 
datasets shown in the graph. All
result obtained were stricly the same as the original implementation 
(without this change).
dev/run-tests was also runned, no error were found.

Author : Cyril de Vogelaere 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Syrux/spark SPARK-20265

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17575.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17575


commit 7af4945fbfb309f7a7784cba2b1fc4cb4945fba0
Author: Syrux 
Date:   2017-04-08T10:17:04Z

[SPARK-20265][MLlib] Improve Prefix'span pre-processing efficiency




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17574
  
**[Test build #75623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75623/testReport)**
 for PR 17574 at commit 
[`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17574
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17569
  
Seems there are places (i.e., `RowEncoder`) calling `isNullAt` which gives 
`returnNullable` as true (default value).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...

2017-04-08 Thread witgo

Github user witgo commented on the issue:

https://github.com/apache/spark/pull/17567
  
 OK, I see.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17569
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75621/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17569
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17569
  
**[Test build #75621 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75621/testReport)**
 for PR 17569 at commit 
[`a39803a`](https://github.com/apache/spark/commit/a39803ab0f77124add833bebb3cb0353306aa1f2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17469
  
**[Test build #75622 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75622/testReport)**
 for PR 17469 at commit 
[`bc03f3c`](https://github.com/apache/spark/commit/bc03f3c5799e749558696fef0723e592523fbcd9).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17469
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75622/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17469
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-08 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17469
  
Great! I'll still follow up with Shane & Josh re: @felixcheung triggering 
build as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17469
  
Yes, it seems from your comment @holdenk.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17469
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17469
  
**[Test build #75622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75622/testReport)**
 for PR 17469 at commit 
[`bc03f3c`](https://github.com/apache/spark/commit/bc03f3c5799e749558696fef0723e592523fbcd9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17469
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-08 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17469
  
I've e-mailed them since the Jenkins configuration is a bit too involved 
(and I'd need Shane to sign off on any Jenkins change anyways). Sorry this is 
slowing down your PR @map222 and thank you so much for your patience with us :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17574
  
**[Test build #3648 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3648/testReport)**
 for PR 17574 at commit 
[`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17569
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75619/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17569
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-08 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17469
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17569
  
**[Test build #75619 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75619/testReport)**
 for PR 17569 at commit 
[`fc6caac`](https://github.com/apache/spark/commit/fc6caacf5fca8cd89b1e324540761ae23f88d9d1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-08 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110513988
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala ---
@@ -96,6 +96,16 @@ class DatasetPrimitiveSuite extends QueryTest with 
SharedSQLContext {
 checkDataset(dsBoolean.map(e => !e), false, true)
   }
 
+  test("mapPrimitiveArray") {
--- End diff --

No, I have just added to confirm this check works well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110513970
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
 ---
@@ -262,17 +264,18 @@ object RowEncoder {
 input :: Nil)
 
 case _: DecimalType =>
-  Invoke(input, "toJavaBigDecimal", 
ObjectType(classOf[java.math.BigDecimal]))
+  Invoke(input, "toJavaBigDecimal", 
ObjectType(classOf[java.math.BigDecimal]),
+returnNullable = false)
 
 case StringType =>
-  Invoke(input, "toString", ObjectType(classOf[String]))
+  Invoke(input, "toString", ObjectType(classOf[String]), 
returnNullable = false)
--- End diff --

can we check how many places we set `returnNullable` to true? If it's only 
a few, we can change the defaut value of `returnNullable` to false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110513952
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala ---
@@ -96,6 +96,16 @@ class DatasetPrimitiveSuite extends QueryTest with 
SharedSQLContext {
 checkDataset(dsBoolean.map(e => !e), false, true)
   }
 
+  test("mapPrimitiveArray") {
--- End diff --

do these tests fail before this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17569
  
**[Test build #75621 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75621/testReport)**
 for PR 17569 at commit 
[`a39803a`](https://github.com/apache/spark/commit/a39803ab0f77124add833bebb3cb0353306aa1f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-08 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110513852
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -228,17 +228,13 @@ case class Invoke(
   s"""
 Object $funcResult = null;
 ${getFuncResult(funcResult, 
s"${obj.value}.$functionName($argString)")}
-if ($funcResult == null) {
-  ${ev.isNull} = true;
-} else {
-  ${ev.value} = (${ctx.boxedType(javaType)}) $funcResult;
-}
+${ev.value} = (${ctx.boxedType(javaType)}) $funcResult;
   """
 }
 
 // If the function can return null, we do an extra check to make sure 
our null bit is still set
 // correctly.
-val postNullCheck = if (ctx.defaultValue(dataType) == "null") {
+val postNullCheck = if (ctx.defaultValue(dataType) == "null" && 
returnNullable) {
--- End diff --

Yes, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17540
  
LGTM, @rdblue the failed tests are thrift server tests, which are hard to 
debug. You can run hive tests locally and see what failed.(usually failed 
thrift server tests means we have failed hive tests)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-08 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110513816
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -356,7 +361,8 @@ object ScalaReflection extends ScalaReflection {
   udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt(),
   Nil,
   dataType = 
ObjectType(udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt()))
-Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
+Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil,
--- End diff --

I see. It is UDT. I have checked `deserialized` only in Spark runtime.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17540#discussion_r110513641
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -180,9 +180,13 @@ class Dataset[T] private[sql](
 // to happen right away to let these side effects take place eagerly.
 queryExecution.analyzed match {
   case c: Command =>
-LocalRelation(c.output, 
queryExecution.executedPlan.executeCollect())
--- End diff --

Actually do we need to do this? most `Command`s are just local 
operations(talking with metastore).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17540#discussion_r110513606
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -180,9 +180,13 @@ class Dataset[T] private[sql](
 // to happen right away to let these side effects take place eagerly.
 queryExecution.analyzed match {
   case c: Command =>
-LocalRelation(c.output, 
queryExecution.executedPlan.executeCollect())
--- End diff --

how about `LocalRelation(c.output, withAction("collect")(_. 
executeCollect()))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17540
  
The `withNewExecutionId` was added at 
https://github.com/rxin/spark/commit/1b0317f64cfe99ff70580eeb99753cd0d31f849a#diff-89b9796aae086e790ddd9351f0db8115R134
 .

The execution id is used to track all jobs that belong to the same query, 
so I think it makes sense to call `withExecutionId` at action methods like 
`Dataset#collect` or `DataFrameWriter#insertInto`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-08 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17469
  
No your correct. The tooling around Jenkins hasn't had enough love as of 
late since there are plans to replace a lot of it, so newer committers aren't 
always added everywhere they need to be. I've got some access I can look and 
see if I can fix it, but if I don't see where we'll have to wait for Josh or 
Shane (who have been very helpful) to update the config.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 117 matches

Mail list logo