date:20160724

[GitHub] spark pull request #14335: [SPARK-16697][ML][MLLib] improve LDA submitMiniBa...

2016-07-24 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14335#discussion_r72011858
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -472,12 +473,13 @@ final class OnlineLDAOptimizer extends LDAOptimizer {
 gammaPart = gammad :: gammaPart
   }
   Iterator((stat, gammaPart))
-}
+}.persist(StorageLevel.MEMORY_AND_DISK)
--- End diff --

Oh, I mean, it's not always clear that it's OK to use the memory/disk, but, 
I think it's justified here. Yes it will serialize.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14335: [SPARK-16697][ML][MLLib] improve LDA submitMiniBa...

2016-07-24 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14335#discussion_r72011821
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -472,12 +473,13 @@ final class OnlineLDAOptimizer extends LDAOptimizer {
 gammaPart = gammad :: gammaPart
   }
   Iterator((stat, gammaPart))
-}
+}.persist(StorageLevel.MEMORY_AND_DISK)
 val statsSum: BDM[Double] = 
stats.map(_._1).treeAggregate(BDM.zeros[Double](k, vocabSize))(
   _ += _, _ += _)
-expElogbetaBc.unpersist()
 val gammat: BDM[Double] = breeze.linalg.DenseMatrix.vertcat(
   stats.map(_._2).flatMap(list => 
list).collect().map(_.toDenseMatrix): _*)
--- End diff --

Yes, I'm wondering why not do this in a distributed way; am I missing why 
it's only done serially on the driver? is it that the dense matrix class 
doesn't serialize or serialize well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14336: Don't run over all numFeatures if debug log disab...

2016-07-24 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14336#discussion_r72011689
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -118,9 +118,11 @@ private[spark] object RandomForest extends Logging {
 val splits = findSplits(retaggedInput, metadata, seed)
 timer.stop("findSplits")
 logDebug("numBins: feature: number of bins")
-logDebug(Range(0, metadata.numFeatures).map { featureIndex =>
-  s"\t$featureIndex\t${metadata.numBins(featureIndex)}"
-}.mkString("\n"))
+if (log.isDebugEnabled) {
+  logDebug(Range(0, metadata.numFeatures).map { featureIndex =>
--- End diff --

Also trivial, but I woulda written this as just `(0 until 
metadata.numFeatures.map(i => s"\t$i\t${metadata.numBins(i)}").mkString("\n")` 
I know it was like this already. Up to your taste.

Does this actually help much though? `logDebug` is already declared to take 
`(msg: => String)` meaning it's only evaluated lazily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72011582
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

First of all, I'll take a look again. I need some boot-up time. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72011413
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

For the second one, I will revise again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14336: Don't run over all numFeatures if debug log disab...

2016-07-24 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14336#discussion_r72011376
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -118,9 +118,11 @@ private[spark] object RandomForest extends Logging {
 val splits = findSplits(retaggedInput, metadata, seed)
 timer.stop("findSplits")
 logDebug("numBins: feature: number of bins")
--- End diff --

Trivial, but why not also pull this into the block?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72011394
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

I see. Since this is `SELECT hint`, which one do you mean?
- `Hint(Project)`
- `Project(Hint)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-24 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14333
  
I don't understand the last change. As far as I can see it can be destroyed 
inside the loop iteration. It's also possible to reuse the broadcast (declare 
outside the loop), and unpersist each iteration, and destroy afterwards. But I 
don't see the need to hold on to all these broadcasts? 

Yes you make a fair point about recovery. I think your changes are safe in 
this respect, good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14296: [SPARK-16639][SQL] The query with having conditio...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14296#discussion_r72011203
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1207,6 +1207,12 @@ class Analyzer(
 val alias = Alias(ae, ae.toString)()
 aggregateExpressions += alias
 alias.toAttribute
+  case ne: NamedExpression => ne
--- End diff --

but `case e: Expression if grouping.exists(_.semanticEquals(e))` will 
filter them out right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72011145
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

then should we wrap the whole query plan with `Hint` instead of just the 
relation? 
https://github.com/apache/spark/pull/14132#discussion-diff-71820260R343


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14331
  
**[Test build #62798 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62798/consoleFull)**
 for PR 14331 at commit 
[`069419d`](https://github.com/apache/spark/commit/069419d4048e05f829618d124acc784cc6c01397).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14334: [SPARK-16703][SQL] Fixes window specification SQL format

2016-07-24 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14334
  
BTW can you change the title to say "Remove extra whitespace in SQL 
generation for window functions"?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14297: [SPARK-16660][SQL] CreateViewCommand should not take Cat...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14297
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14297: [SPARK-16660][SQL] CreateViewCommand should not take Cat...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14297
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62786/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14297: [SPARK-16660][SQL] CreateViewCommand should not take Cat...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14297
  
**[Test build #62786 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62786/consoleFull)**
 for PR 14297 at commit 
[`a5cf2b3`](https://github.com/apache/spark/commit/a5cf2b3261bd974b215ceaf82d88181a2f87f446).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14334: [SPARK-16703][SQL] Fixes window specification SQL format

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14334
  
**[Test build #62797 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62797/consoleFull)**
 for PR 14334 at commit 
[`5f051dc`](https://github.com/apache/spark/commit/5f051dc64b795e665a27c36786eb1d5046c8715c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14331#discussion_r72010784
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -173,8 +190,18 @@ case class CatalogTable(
   override def toString: String = {
 val tableProperties = properties.map(p => p._1 + "=" + 
p._2).mkString("[", ", ", "]")
 val partitionColumns = partitionColumnNames.map("`" + _ + 
"`").mkString("[", ", ", "]")
-val sortColumns = sortColumnNames.map("`" + _ + "`").mkString("[", ", 
", "]")
-val bucketColumns = bucketColumnNames.map("`" + _ + "`").mkString("[", 
", ", "]")
+val bucketStrings = bucketSpec match {
+  case Some(BucketSpec(numBuckets, bucketColumnNames, 
sortColumnNames)) =>
+val bucketColumnsString = bucketColumnNames.map("`" + _ + 
"`").mkString("[", ", ", "]")
+val sortColumnsString = sortColumnNames.map("`" + _ + 
"`").mkString("[", ", ", "]")
--- End diff --

which `quoteIdentifier`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13077: [SPARK-10748] [Mesos] Log error instead of crashi...

2016-07-24 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13077#discussion_r72010691
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -502,41 +502,53 @@ private[spark] class MesosClusterScheduler(
   } else {
 val offer = offerOption.get
 val taskId = 
TaskID.newBuilder().setValue(submission.submissionId).build()
-val (remainingResources, cpuResourcesToUse) =
-  partitionResources(offer.resources, "cpus", driverCpu)
-val (finalResources, memResourcesToUse) =
-  partitionResources(remainingResources.asJava, "mem", driverMem)
-val commandInfo = buildDriverCommand(submission)
-val appName = submission.schedulerProperties("spark.app.name")
-val taskInfo = TaskInfo.newBuilder()
-  .setTaskId(taskId)
-  .setName(s"Driver for $appName")
-  .setSlaveId(offer.slaveId)
-  .setCommand(commandInfo)
-  .addAllResources(cpuResourcesToUse.asJava)
-  .addAllResources(memResourcesToUse.asJava)
-offer.resources = finalResources.asJava
-
submission.schedulerProperties.get("spark.mesos.executor.docker.image").foreach 
{ image =>
-  val container = taskInfo.getContainerBuilder()
-  val volumes = submission.schedulerProperties
-.get("spark.mesos.executor.docker.volumes")
-.map(MesosSchedulerBackendUtil.parseVolumesSpec)
-  val portmaps = submission.schedulerProperties
-.get("spark.mesos.executor.docker.portmaps")
-.map(MesosSchedulerBackendUtil.parsePortMappingsSpec)
-  MesosSchedulerBackendUtil.addDockerInfo(
-container, image, volumes = volumes, portmaps = portmaps)
-  taskInfo.setContainer(container.build())
+var commandInfo: CommandInfo = null
+try {
+  commandInfo = buildDriverCommand(submission)
+} catch {
+  case e: SparkException =>
+afterLaunchCallback(submission.submissionId)
+finishedDrivers += new MesosClusterSubmissionState(submission, 
taskId,
+  SlaveID.newBuilder().setValue("").build(), None, null, None)
+logError(s"Failed to launch the driver with id: 
${submission.submissionId}, " +
+  s"cpu: $driverCpu, mem: $driverMem, reason: ${e.getMessage}")
+}
+if (commandInfo != null) {
--- End diff --

Rather than make a big long if statement, is it simpler and clearer to just 
return from the method in the catch block? then you can even ...

```
val commandInfo = try { buildDriverCommand(submission) } ...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14339
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14339
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62787/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14339
  
**[Test build #62787 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62787/consoleFull)**
 for PR 14339 at commit 
[`f5c566c`](https://github.com/apache/spark/commit/f5c566cf2609dfbbc56b76276becd9976b34f2dd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14334: [SPARK-16703][SQL] Fixes window specification SQL format

2016-07-24 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14334
  
@rxin @dongjoon-hyun Thanks! JIRA ticket added and updated the golden files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72010521
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

Hi, @cloud-fan . Thank you again.

Yes. That's true. So, at my first commit of this PR, I actually implemented 
to attach just the concrete `BroadcastHint` to the relations in 
`AstBuilder.scala`. But, there were some advices to
- Move out the logic from the parser module (for preventing future updates 
on parser module)
- Generalize the hint syntax like the current code. 

Since a general `Hint` is attached on a logical plan not a table relation, 
we can substitute any other plans (filter/project/aggregator), too. The 
following is my previous comment about this situation. The child of hint is 
`JOIN`.

> Yep. I grasp your point. First, the given relation is just a logical 
plan, not a table. In the following case, the relation is Join.
> SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0, parquet_t1



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14257: [SPARK-16621][SQL][WIP] Use a stable ID generation metho...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14257
  
**[Test build #62796 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62796/consoleFull)**
 for PR 14257 at commit 
[`06fb99c`](https://github.com/apache/spark/commit/06fb99c9485f353c166197d75f1f39a74cde3262).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14296: [SPARK-16639][SQL] The query with having conditio...

2016-07-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14296#discussion_r72009959
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1207,6 +1207,12 @@ class Analyzer(
 val alias = Alias(ae, ae.toString)()
 aggregateExpressions += alias
 alias.toAttribute
+  case ne: NamedExpression => ne
--- End diff --

I think we don't need to replace named expr (attributes, alias)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14332: [SPARK-16694] [CORE] Use for/foreach rather than ...

2016-07-24 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14332#discussion_r72009839
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/DataFrameExample.scala ---
@@ -54,14 +54,13 @@ object DataFrameExample {
   }
 }
 
-parser.parse(args, defaultParams).map { params =>
-  run(params)
-}.getOrElse {
-  sys.exit(1)
+parser.parse(args, defaultParams) match {
+  case Some(params) => run(params)
+  case _ => sys.exit(1)
 }
--- End diff --

Yes, `Option` has `foreach` as well and would be fine if the use case here 
were also to "do _X_ if the `Option` exists" but here we also need "... or do 
_Y_ if it doesn't". I'm not sure it's that `Unit` is special or not, but that 
it signals there is no meaningful result, which highlights the real issue: the 
only purpose is the _side effect_. The difference between transforming a return 
value with `fold` or whatever and then exiting with it, and using these 
constructs to exit, is precisely the difference here. You are arguing that 
side-effects are neither here nor there when thinking about this from a 
theoretical _functional programming_ perspective? can't agree with that at all! 
Transform without side effects? makes total sense for FP idioms. "Transforms" 
with no result and only side effects? not so much.

The blog you link makes the point that `fold` is concise, but less 
readable, and I agree, and I think that's a negative. The Scala API doc shows 
`match` in an example where side-effects are desired (`println`) and not for 
transformation. How about `if-else`? I do not particularly think `match` is 
necessary.

It's a good discussion but for consistency and readability I think this 
change (or to `if-else`) is positive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

2016-07-24 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14331
  
Overall LGTM, a few minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

2016-07-24 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14331#discussion_r72009811
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -20,10 +20,11 @@ package org.apache.spark.sql.sources
 import java.io.File
 
 import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.physical.HashPartitioning
 import org.apache.spark.sql.execution.DataSourceScanExec
-import org.apache.spark.sql.execution.datasources.{BucketSpec, 
DataSourceStrategy}
+import org.apache.spark.sql.execution.datasources.{DataSourceStrategy}
--- End diff --

Nit: Remove the braces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

2016-07-24 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14331#discussion_r72009802
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -110,6 +110,25 @@ case class CatalogTablePartition(
 
 
 /**
+ * A container for bucketing information.
+ * Bucketing is a technology for decomposing data sets into more 
manageable parts, and the number
+ * of buckets is fixed so it does not fluctuate with data.
+ *
+ * @param numBuckets number of buckets.
+ * @param bucketColumnNames the names of the columns that used to generate 
the bucket id.
+ * @param sortColumnNames the names of the columns that used to sort data 
in each bucket.
+ */
+case class BucketSpec(
+numBuckets: Int,
+bucketColumnNames: Seq[String],
+sortColumnNames: Seq[String]) {
+  if (numBuckets <= 0) {
+throw new AnalysisException(s"Expected positive number of buckets, but 
got `$numBuckets`.")
+  }
+}
+
+
--- End diff --

Nit: Remove extra newline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

2016-07-24 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14331#discussion_r72009804
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -173,8 +190,18 @@ case class CatalogTable(
   override def toString: String = {
 val tableProperties = properties.map(p => p._1 + "=" + 
p._2).mkString("[", ", ", "]")
 val partitionColumns = partitionColumnNames.map("`" + _ + 
"`").mkString("[", ", ", "]")
-val sortColumns = sortColumnNames.map("`" + _ + "`").mkString("[", ", 
", "]")
-val bucketColumns = bucketColumnNames.map("`" + _ + "`").mkString("[", 
", ", "]")
+val bucketStrings = bucketSpec match {
+  case Some(BucketSpec(numBuckets, bucketColumnNames, 
sortColumnNames)) =>
+val bucketColumnsString = bucketColumnNames.map("`" + _ + 
"`").mkString("[", ", ", "]")
+val sortColumnsString = sortColumnNames.map("`" + _ + 
"`").mkString("[", ", ", "]")
--- End diff --

Use `quoteIdentifier` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14340: [SPARK-16534][Streaming][Kafka] Add Python API support f...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14340
  
**[Test build #62795 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62795/consoleFull)**
 for PR 14340 at commit 
[`38a89f6`](https://github.com/apache/spark/commit/38a89f68dac408453ad8cdb871d2b64f23d758e1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14342: [SPARK-16685] Remove audit-release scripts.

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14342
  
**[Test build #62794 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62794/consoleFull)**
 for PR 14342 at commit 
[`c11042e`](https://github.com/apache/spark/commit/c11042e287dae9622a3d3ead8ea40833ad3c6078).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14296: [SPARK-16639][SQL] The query with having conditio...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14296#discussion_r72009643
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1207,6 +1207,12 @@ class Analyzer(
 val alias = Alias(ae, ae.toString)()
 aggregateExpressions += alias
 alias.toAttribute
+  case ne: NamedExpression => ne
+  case e: Expression if grouping.exists(_.semanticEquals(e)) &&
+  !ResolveGroupingAnalytics.hasGroupingFunction(e) =>
--- End diff --

It will be greate if we can figure it out and add comment here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14296: [SPARK-16639][SQL] The query with having conditio...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14296#discussion_r72009599
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -46,6 +46,14 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   Row("one", 6) :: Row("three", 3) :: Nil)
   }
 
+  test("having condition contains grouping column") {
+Seq(("one", 1), ("two", 2), ("three", 3), ("one", 5)).toDF("k", "v")
--- End diff --

wrap it with `withTempView`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14296: [SPARK-16639][SQL] The query with having conditio...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14296#discussion_r72009571
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1207,6 +1207,12 @@ class Analyzer(
 val alias = Alias(ae, ae.toString)()
 aggregateExpressions += alias
 alias.toAttribute
+  case ne: NamedExpression => ne
--- End diff --

why this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14342: [SPARK-16685] Remove audit-release scripts.

2016-07-24 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/14342

[SPARK-16685] Remove audit-release scripts.

## What changes were proposed in this pull request?
This patch removes dev/audit-release. It was initially created to do basic 
release auditing, but we haven't run them for the last one year+. 

## How was this patch tested?
N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-16685

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14342


commit c11042e287dae9622a3d3ead8ea40833ad3c6078
Author: Reynold Xin 
Date:   2016-07-25T05:13:34Z

Remove audit-release scripts.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14333
  
**[Test build #62793 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62793/consoleFull)**
 for PR 14333 at commit 
[`01f4d3a`](https://github.com/apache/spark/commit/01f4d3ab2e4c591b7b27b1ed92fbe07e1fc876d1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14327: [SPARK-16686][SQL] Project shouldn't be pushed do...

2016-07-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14327#discussion_r72009385
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -150,11 +150,13 @@ class SimpleTestOptimizer extends Optimizer(
 
 /**
  * Pushes projects down beneath Sample to enable column pruning with 
sampling.
+ * This rule is only doable when the projects don't add new attributes.
  */
 object PushProjectThroughSample extends Rule[LogicalPlan] {
--- End diff --

yah. I will update this. At least one optimizer test uses this rule. The 
test should be changed too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14339
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14339
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62785/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14339
  
**[Test build #62785 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62785/consoleFull)**
 for PR 14339 at commit 
[`cd5d04a`](https://github.com/apache/spark/commit/cd5d04a2661dc2e56b517478200a074ac075dec1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14313: [SPARK-16674][SQL] Avoid per-record type dispatch in JDB...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14313
  
**[Test build #62792 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62792/consoleFull)**
 for PR 14313 at commit 
[`a385318`](https://github.com/apache/spark/commit/a3853182b375539bd329fb10464c1a746d4eaa47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13077
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62783/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14337: [SPARK-16699][SQL] Fix performance bug in hash ag...

2016-07-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14337


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13077
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14337: [SPARK-16699][SQL] Fix performance bug in hash aggregate...

2016-07-24 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14337
  
@ooq next time please open a pull request against master branch rather than 
2.0. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13077
  
**[Test build #62783 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62783/consoleFull)**
 for PR 13077 at commit 
[`ae85f81`](https://github.com/apache/spark/commit/ae85f8154e506c223a018d9c04c58967a82fa580).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14341: [Minor][Doc][SQL] Fix two documents regarding size in by...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14341
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14337: [SPARK-16699][SQL] Fix performance bug in hash aggregate...

2016-07-24 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14337
  
Merging in master/2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14339
  
**[Test build #62791 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62791/consoleFull)**
 for PR 14339 at commit 
[`a3bb8d8`](https://github.com/apache/spark/commit/a3bb8d821b19c85261e1601fb4a2b92dcfc1814f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14327: [SPARK-16686][SQL] Project shouldn't be pushed do...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14327#discussion_r72008673
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -150,11 +150,13 @@ class SimpleTestOptimizer extends Optimizer(
 
 /**
  * Pushes projects down beneath Sample to enable column pruning with 
sampling.
+ * This rule is only doable when the projects don't add new attributes.
  */
 object PushProjectThroughSample extends Rule[LogicalPlan] {
--- End diff --

the last case in `ColumnPruning`, it will generate a new `Project` under 
`Sample`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14341: [Minor][Doc][SQL] Fix two documents regarding size in by...

2016-07-24 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14341
  
cc @cloud-fan @liancheng Just minor document changes. Please review if this 
change is proper. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14339
  
LGTM, cc @liancheng to take another look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14327: [SPARK-16686][SQL] Project shouldn't be pushed do...

2016-07-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14327#discussion_r72008576
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -150,11 +150,13 @@ class SimpleTestOptimizer extends Optimizer(
 
 /**
  * Pushes projects down beneath Sample to enable column pruning with 
sampling.
+ * This rule is only doable when the projects don't add new attributes.
  */
 object PushProjectThroughSample extends Rule[LogicalPlan] {
--- End diff --

Not sure which part you mean? I don't see `ColumnPruning` handling `Sample`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14339: [SPARK-16698][SQL] Field names having dots should...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14339#discussion_r72008560
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2982,4 +2982,19 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 """.stripMargin), Nil)
 }
   }
+
+  test("SPARK-16674: field names containing dots for both fields and 
partitioned fields") {
+withTempPath { path =>
+  val data = (1 to 10).map(i => (i, s"data-$i", i % 2, if ((i % 2) == 
0) "a" else "b"))
+.toDF("col.1", "col.2", "part.col1", "part.col2")
+  data.write
+.format("parquet")
+.partitionBy("part.col1", "part.col2")
+.save(path.getCanonicalPath)
+  val copyData = 
spark.read.format("parquet").load(path.getCanonicalPath)
--- End diff --

`copyData` -> `readBack`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14341: [Minor][Doc][SQL] Fix two documents regarding size in by...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14341
  
**[Test build #62790 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62790/consoleFull)**
 for PR 14341 at commit 
[`2b555f4`](https://github.com/apache/spark/commit/2b555f4393482c16424dfb743f86f56105d8fe7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14313: [SPARK-16674][SQL] Avoid per-record type dispatch in JDB...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14313
  
LGTM except some style comments, thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14313#discussion_r72008447
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -322,46 +322,133 @@ private[sql] class JDBCRDD(
 }
   }
 
-  // Each JDBC-to-Catalyst conversion corresponds to a tag defined here so 
that
-  // we don't have to potentially poke around in the Metadata once for 
every
-  // row.
-  // Is there a better way to do this?  I'd rather be using a type that
-  // contains only the tags I define.
-  abstract class JDBCConversion
-  case object BooleanConversion extends JDBCConversion
-  case object DateConversion extends JDBCConversion
-  case class  DecimalConversion(precision: Int, scale: Int) extends 
JDBCConversion
-  case object DoubleConversion extends JDBCConversion
-  case object FloatConversion extends JDBCConversion
-  case object IntegerConversion extends JDBCConversion
-  case object LongConversion extends JDBCConversion
-  case object BinaryLongConversion extends JDBCConversion
-  case object StringConversion extends JDBCConversion
-  case object TimestampConversion extends JDBCConversion
-  case object BinaryConversion extends JDBCConversion
-  case class ArrayConversion(elementConversion: JDBCConversion) extends 
JDBCConversion
+  // A `JDBCConversion` is responsible for converting and setting a value 
from `ResultSet`
+  // into a field for `MutableRow`. The last argument `Int` means the 
index for the
+  // value to be set in the row and also used for the value to retrieve 
from `ResultSet`.
+  private type JDBCConversion = (ResultSet, MutableRow, Int) => Unit
 
   /**
-   * Maps a StructType to a type tag list.
+   * Maps a StructType to conversions for each type.
*/
   def getConversions(schema: StructType): Array[JDBCConversion] =
 schema.fields.map(sf => getConversions(sf.dataType, sf.metadata))
 
   private def getConversions(dt: DataType, metadata: Metadata): 
JDBCConversion = dt match {
-case BooleanType => BooleanConversion
-case DateType => DateConversion
-case DecimalType.Fixed(p, s) => DecimalConversion(p, s)
-case DoubleType => DoubleConversion
-case FloatType => FloatConversion
-case IntegerType => IntegerConversion
-case LongType => if (metadata.contains("binarylong")) 
BinaryLongConversion else LongConversion
-case StringType => StringConversion
-case TimestampType => TimestampConversion
-case BinaryType => BinaryConversion
-case ArrayType(et, _) => ArrayConversion(getConversions(et, metadata))
+case BooleanType =>
+  (rs: ResultSet, row: MutableRow, pos: Int) =>
+row.setBoolean(pos, rs.getBoolean(pos + 1))
+
+case DateType =>
+  (rs: ResultSet, row: MutableRow, pos: Int) =>
+// DateTimeUtils.fromJavaDate does not handle null value, so we 
need to check it.
+val dateVal = rs.getDate(pos + 1)
+if (dateVal != null) {
+  row.setInt(pos, DateTimeUtils.fromJavaDate(dateVal))
+} else {
+  row.update(pos, null)
+}
+
+// When connecting with Oracle DB through JDBC, the precision and 
scale of BigDecimal
+// object returned by ResultSet.getBigDecimal is not correctly matched 
to the table
+// schema reported by ResultSetMetaData.getPrecision and 
ResultSetMetaData.getScale.
+// If inserting values like 1 into a column with NUMBER(12, 2) 
type, you get through
+// a BigDecimal object with scale as 0. But the dataframe schema has 
correct type as
+// DecimalType(12, 2). Thus, after saving the dataframe into parquet 
file and then
+// retrieve it, you will get wrong result 199.99.
+// So it is needed to set precision and scale for Decimal based on 
JDBC metadata.
+case DecimalType.Fixed(p, s) =>
+  (rs: ResultSet, row: MutableRow, pos: Int) =>
+val decimal =
+  nullSafeConvert[java.math.BigDecimal](rs.getBigDecimal(pos + 1), 
d => Decimal(d, p, s))
+row.update(pos, decimal)
+
+case DoubleType =>
+  (rs: ResultSet, row: MutableRow, pos: Int) =>
+row.setDouble(pos, rs.getDouble(pos + 1))
+
+case FloatType =>
+  (rs: ResultSet, row: MutableRow, pos: Int) =>
+row.setFloat(pos, rs.getFloat(pos + 1))
+
+case IntegerType =>
+  (rs: ResultSet, row: MutableRow, pos: Int) =>
+row.setInt(pos, rs.getInt(pos + 1))
+
+case LongType if metadata.contains("binarylong") =>
+  (rs: ResultSet, row: MutableRow, pos: Int) =>
+val bytes = rs.getBytes(pos + 1)
+var ans = 0L
+var j = 0
+

[GitHub] spark pull request #14341: [Minor][Doc][SQL] Fix two documents regarding siz...

2016-07-24 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/14341

[Minor][Doc][SQL] Fix two documents regarding size in bytes

## What changes were proposed in this pull request?

Fix two places in SQLConf documents regarding size in bytes and statistics.

## How was this patch tested?
No. Just change document.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 fix-doc-size-in-bytes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14341.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14341


commit 2b555f4393482c16424dfb743f86f56105d8fe7d
Author: Liang-Chi Hsieh 
Date:   2016-07-25T04:41:27Z

Fix doc.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14313#discussion_r72008403
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -322,46 +322,133 @@ private[sql] class JDBCRDD(
 }
   }
 
-  // Each JDBC-to-Catalyst conversion corresponds to a tag defined here so 
that
-  // we don't have to potentially poke around in the Metadata once for 
every
-  // row.
-  // Is there a better way to do this?  I'd rather be using a type that
-  // contains only the tags I define.
-  abstract class JDBCConversion
-  case object BooleanConversion extends JDBCConversion
-  case object DateConversion extends JDBCConversion
-  case class  DecimalConversion(precision: Int, scale: Int) extends 
JDBCConversion
-  case object DoubleConversion extends JDBCConversion
-  case object FloatConversion extends JDBCConversion
-  case object IntegerConversion extends JDBCConversion
-  case object LongConversion extends JDBCConversion
-  case object BinaryLongConversion extends JDBCConversion
-  case object StringConversion extends JDBCConversion
-  case object TimestampConversion extends JDBCConversion
-  case object BinaryConversion extends JDBCConversion
-  case class ArrayConversion(elementConversion: JDBCConversion) extends 
JDBCConversion
+  // A `JDBCConversion` is responsible for converting and setting a value 
from `ResultSet`
--- End diff --

`JDBCConversion` seems not a good name now, do you have better any ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14327: [SPARK-16686][SQL] Project shouldn't be pushed do...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14327#discussion_r72008303
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -150,11 +150,13 @@ class SimpleTestOptimizer extends Optimizer(
 
 /**
  * Pushes projects down beneath Sample to enable column pruning with 
sampling.
+ * This rule is only doable when the projects don't add new attributes.
  */
 object PushProjectThroughSample extends Rule[LogicalPlan] {
--- End diff --

ah, looks like `ColumnPruning` already handles it, can we just remove this 
rule?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14327: [SPARK-16686][SQL] Project shouldn't be pushed do...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14327#discussion_r72008210
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -150,11 +150,13 @@ class SimpleTestOptimizer extends Optimizer(
 
 /**
  * Pushes projects down beneath Sample to enable column pruning with 
sampling.
+ * This rule is only doable when the projects don't add new attributes.
  */
 object PushProjectThroughSample extends Rule[LogicalPlan] {
--- End diff --

should we merge this rule into `ColumnPruning`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14340: [SPARK-16534][Streaming][Kafka] Add Python API support f...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14340
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14340: [SPARK-16534][Streaming][Kafka] Add Python API support f...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14340
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62784/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14340: [SPARK-16534][Streaming][Kafka] Add Python API support f...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14340
  
**[Test build #62784 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62784/consoleFull)**
 for PR 14340 at commit 
[`dcca337`](https://github.com/apache/spark/commit/dcca3371da0c73bbcadb4331a2869e0303bd8ae8).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Kafka010StreamTests(PySparkStreamingTestCase):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72008134
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

how does hive handle sql hints? This rule looks a little complex to me, 
according to the fact that hints can only be applied to table relation. It will 
be great if we can wrap `UnresolvedRelation` with `Hint` in the parser, and 
then it will be very easy to resolve hints at analyzer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14327
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62782/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14327
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14327
  
**[Test build #62782 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62782/consoleFull)**
 for PR 14327 at commit 
[`31a6f6f`](https://github.com/apache/spark/commit/31a6f6fef63b0f130738a6fc7a5f628e60947cb9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14313: [SPARK-16674][SQL] Avoid per-record type dispatch in JDB...

2016-07-24 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14313
  
@cloud-fan I just addressed your comments. I added another argument in 
`JDBCConversion` for `MutableRow` so that we can avoid type-boxing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14313: [SPARK-16674][SQL] Avoid per-record type dispatch in JDB...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14313
  
**[Test build #62789 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62789/consoleFull)**
 for PR 14313 at commit 
[`8ac66b1`](https://github.com/apache/spark/commit/8ac66b17f81a2fdc0866df26889b5e2fcc634c51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

2016-07-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14313#discussion_r72007301
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -407,84 +496,8 @@ private[sql] class JDBCRDD(
 var i = 0
 while (i < conversions.length) {
   val pos = i + 1
-  conversions(i) match {
-case BooleanConversion => mutableRow.setBoolean(i, 
rs.getBoolean(pos))
-case DateConversion =>
-  // DateTimeUtils.fromJavaDate does not handle null value, so 
we need to check it.
-  val dateVal = rs.getDate(pos)
-  if (dateVal != null) {
-mutableRow.setInt(i, DateTimeUtils.fromJavaDate(dateVal))
-  } else {
-mutableRow.update(i, null)
-  }
-// When connecting with Oracle DB through JDBC, the precision 
and scale of BigDecimal
-// object returned by ResultSet.getBigDecimal is not correctly 
matched to the table
-// schema reported by ResultSetMetaData.getPrecision and 
ResultSetMetaData.getScale.
-// If inserting values like 1 into a column with 
NUMBER(12, 2) type, you get through
-// a BigDecimal object with scale as 0. But the dataframe 
schema has correct type as
-// DecimalType(12, 2). Thus, after saving the dataframe into 
parquet file and then
-// retrieve it, you will get wrong result 199.99.
-// So it is needed to set precision and scale for Decimal 
based on JDBC metadata.
--- End diff --

Fxied.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14327
  
**[Test build #62788 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62788/consoleFull)**
 for PR 14327 at commit 
[`5c4e7ff`](https://github.com/apache/spark/commit/5c4e75bd438ad7abbb37dc84cff9e5036567).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...

2016-07-24 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14327
  
Besides, according to the JIRA, this may need to be backported to 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...

2016-07-24 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14327
  
cc @cloud-fan @liancheng @yhuai Please review this change. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14339
  
**[Test build #62787 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62787/consoleFull)**
 for PR 14339 at commit 
[`f5c566c`](https://github.com/apache/spark/commit/f5c566cf2609dfbbc56b76276becd9976b34f2dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14331
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62777/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14331
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14331
  
**[Test build #62777 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62777/consoleFull)**
 for PR 14331 at commit 
[`cc63f01`](https://github.com/apache/spark/commit/cc63f01b7362aa8c34d82ef0941bc8919e5f90b7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14331
  
**[Test build #62776 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62776/consoleFull)**
 for PR 14331 at commit 
[`59321d0`](https://github.com/apache/spark/commit/59321d003c1394c47dba5894d9363fc9ea65a803).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14297: [SPARK-16660][SQL] CreateViewCommand should not take Cat...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14297
  
**[Test build #62786 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62786/consoleFull)**
 for PR 14297 at commit 
[`a5cf2b3`](https://github.com/apache/spark/commit/a5cf2b3261bd974b215ceaf82d88181a2f87f446).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14331
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14331
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62776/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-07-24 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14204
  
In my test, the column does not exist.






On Sun, Jul 24, 2016 at 6:41 PM -0700, "Tao Lin"  
wrote:












@yhuai Does the column exist? Are all of the cells in the column blank?

That's weird. It was OK when I committed. I'll check it out later.



â
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.


  
  











---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14339
  
Sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14297: [SPARK-16660][SQL] CreateViewCommand should not take Cat...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14297
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62775/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14313#discussion_r72005471
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -407,84 +496,8 @@ private[sql] class JDBCRDD(
 var i = 0
 while (i < conversions.length) {
   val pos = i + 1
-  conversions(i) match {
-case BooleanConversion => mutableRow.setBoolean(i, 
rs.getBoolean(pos))
-case DateConversion =>
-  // DateTimeUtils.fromJavaDate does not handle null value, so 
we need to check it.
-  val dateVal = rs.getDate(pos)
-  if (dateVal != null) {
-mutableRow.setInt(i, DateTimeUtils.fromJavaDate(dateVal))
-  } else {
-mutableRow.update(i, null)
-  }
-// When connecting with Oracle DB through JDBC, the precision 
and scale of BigDecimal
-// object returned by ResultSet.getBigDecimal is not correctly 
matched to the table
-// schema reported by ResultSetMetaData.getPrecision and 
ResultSetMetaData.getScale.
-// If inserting values like 1 into a column with 
NUMBER(12, 2) type, you get through
-// a BigDecimal object with scale as 0. But the dataframe 
schema has correct type as
-// DecimalType(12, 2). Thus, after saving the dataframe into 
parquet file and then
-// retrieve it, you will get wrong result 199.99.
-// So it is needed to set precision and scale for Decimal 
based on JDBC metadata.
-case DecimalConversion(p, s) =>
-  val decimalVal = rs.getBigDecimal(pos)
-  if (decimalVal == null) {
-mutableRow.update(i, null)
-  } else {
-mutableRow.update(i, Decimal(decimalVal, p, s))
-  }
-case DoubleConversion => mutableRow.setDouble(i, 
rs.getDouble(pos))
-case FloatConversion => mutableRow.setFloat(i, 
rs.getFloat(pos))
-case IntegerConversion => mutableRow.setInt(i, rs.getInt(pos))
-case LongConversion => mutableRow.setLong(i, rs.getLong(pos))
-// TODO(davies): use getBytes for better performance, if the 
encoding is UTF-8
-case StringConversion => mutableRow.update(i, 
UTF8String.fromString(rs.getString(pos)))
-case TimestampConversion =>
-  val t = rs.getTimestamp(pos)
-  if (t != null) {
-mutableRow.setLong(i, DateTimeUtils.fromJavaTimestamp(t))
-  } else {
-mutableRow.update(i, null)
-  }
-case BinaryConversion => mutableRow.update(i, rs.getBytes(pos))
-case BinaryLongConversion =>
-  val bytes = rs.getBytes(pos)
-  var ans = 0L
-  var j = 0
-  while (j < bytes.size) {
-ans = 256 * ans + (255 & bytes(j))
-j = j + 1
-  }
-  mutableRow.setLong(i, ans)
-case ArrayConversion(elementConversion) =>
-  val array = rs.getArray(pos).getArray
-  if (array != null) {
-val data = elementConversion match {
-  case TimestampConversion =>
-array.asInstanceOf[Array[java.sql.Timestamp]].map { 
timestamp =>
-  nullSafeConvert(timestamp, 
DateTimeUtils.fromJavaTimestamp)
-}
-  case StringConversion =>
-array.asInstanceOf[Array[java.lang.String]]
-  .map(UTF8String.fromString)
-  case DateConversion =>
-array.asInstanceOf[Array[java.sql.Date]].map { date =>
-  nullSafeConvert(date, DateTimeUtils.fromJavaDate)
-}
-  case DecimalConversion(p, s) =>
-array.asInstanceOf[Array[java.math.BigDecimal]].map { 
decimal =>
-  nullSafeConvert[java.math.BigDecimal](decimal, d => 
Decimal(d, p, s))
-}
-  case BinaryLongConversion =>
-throw new IllegalArgumentException(s"Unsupported array 
element conversion $i")
-  case _: ArrayConversion =>
-throw new IllegalArgumentException("Nested arrays 
unsupported")
-  case _ => array.asInstanceOf[Array[Any]]
-}
-mutableRow.update(i, new GenericArrayData(data))
-  } else {
-mutableRow.update(i, null)
-

[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14313#discussion_r72005487
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -322,46 +322,135 @@ private[sql] class JDBCRDD(
 }
   }
 
-  // Each JDBC-to-Catalyst conversion corresponds to a tag defined here so 
that
-  // we don't have to potentially poke around in the Metadata once for 
every
-  // row.
-  // Is there a better way to do this?  I'd rather be using a type that
-  // contains only the tags I define.
-  abstract class JDBCConversion
-  case object BooleanConversion extends JDBCConversion
-  case object DateConversion extends JDBCConversion
-  case class  DecimalConversion(precision: Int, scale: Int) extends 
JDBCConversion
-  case object DoubleConversion extends JDBCConversion
-  case object FloatConversion extends JDBCConversion
-  case object IntegerConversion extends JDBCConversion
-  case object LongConversion extends JDBCConversion
-  case object BinaryLongConversion extends JDBCConversion
-  case object StringConversion extends JDBCConversion
-  case object TimestampConversion extends JDBCConversion
-  case object BinaryConversion extends JDBCConversion
-  case class ArrayConversion(elementConversion: JDBCConversion) extends 
JDBCConversion
+  // A `JDBCConversion` is responsible for converting a value from 
`ResultSet`
+  // to a value in a field for `InternalRow`.
+  private type JDBCConversion = (ResultSet, Int) => Any
+
+  // This `ArrayElementConversion` is responsible for converting elements 
in
+  // an array from `ResultSet`.
+  private type ArrayElementConversion = (Object) => Any
 
   /**
-   * Maps a StructType to a type tag list.
+   * Maps a StructType to conversions for each type.
*/
   def getConversions(schema: StructType): Array[JDBCConversion] =
 schema.fields.map(sf => getConversions(sf.dataType, sf.metadata))
 
   private def getConversions(dt: DataType, metadata: Metadata): 
JDBCConversion = dt match {
-case BooleanType => BooleanConversion
-case DateType => DateConversion
-case DecimalType.Fixed(p, s) => DecimalConversion(p, s)
-case DoubleType => DoubleConversion
-case FloatType => FloatConversion
-case IntegerType => IntegerConversion
-case LongType => if (metadata.contains("binarylong")) 
BinaryLongConversion else LongConversion
-case StringType => StringConversion
-case TimestampType => TimestampConversion
-case BinaryType => BinaryConversion
-case ArrayType(et, _) => ArrayConversion(getConversions(et, metadata))
+case BooleanType =>
+  (rs: ResultSet, pos: Int) => rs.getBoolean(pos)
+
+case DateType =>
+  (rs: ResultSet, pos: Int) =>
+// DateTimeUtils.fromJavaDate does not handle null value, so we 
need to check it.
+val dateVal = rs.getDate(pos)
+if (dateVal != null) {
+  DateTimeUtils.fromJavaDate(dateVal)
+} else {
+  null
+}
+
+case DecimalType.Fixed(p, s) =>
+  (rs: ResultSet, pos: Int) =>
+val decimalVal = rs.getBigDecimal(pos)
+if (decimalVal != null) {
+  Decimal(decimalVal, p, s)
+} else {
+  null
+}
+
+case DoubleType =>
+  (rs: ResultSet, pos: Int) => rs.getDouble(pos)
+
+case FloatType =>
+  (rs: ResultSet, pos: Int) => rs.getFloat(pos)
+
+case IntegerType =>
+  (rs: ResultSet, pos: Int) => rs.getInt(pos)
+
+case LongType if metadata.contains("binarylong") =>
+  (rs: ResultSet, pos: Int) =>
+val bytes = rs.getBytes(pos)
+var ans = 0L
+var j = 0
+while (j < bytes.size) {
+  ans = 256 * ans + (255 & bytes(j))
+  j = j + 1
+}
+ans
+
+case LongType =>
+  (rs: ResultSet, pos: Int) => rs.getLong(pos)
+
+case StringType =>
+  (rs: ResultSet, pos: Int) =>
+// TODO(davies): use getBytes for better performance, if the 
encoding is UTF-8
+UTF8String.fromString(rs.getString(pos))
+
+case TimestampType =>
+  (rs: ResultSet, pos: Int) =>
+val t = rs.getTimestamp(pos)
+if (t != null) {
+  DateTimeUtils.fromJavaTimestamp(t)
+} else {
+  null
+}
+
+case BinaryType =>
+  (rs: ResultSet, pos: Int) => rs.getBytes(pos)
+
+case ArrayType(et, _) =>
+  val elementConversion: ArrayElementConversion = 
getArrayElementConversion(et, metadata)
+  (rs: ResultSet, pos: Int) =>
+

[GitHub] spark issue #14297: [SPARK-16660][SQL] CreateViewCommand should not take Cat...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14297
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14297: [SPARK-16660][SQL] CreateViewCommand should not take Cat...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14297
  
**[Test build #62775 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62775/consoleFull)**
 for PR 14297 at commit 
[`5072959`](https://github.com/apache/spark/commit/50729594c1956d185bde4c2b41891ce1567cfd5a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14313#discussion_r72005398
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -322,46 +322,135 @@ private[sql] class JDBCRDD(
 }
   }
 
-  // Each JDBC-to-Catalyst conversion corresponds to a tag defined here so 
that
-  // we don't have to potentially poke around in the Metadata once for 
every
-  // row.
-  // Is there a better way to do this?  I'd rather be using a type that
-  // contains only the tags I define.
-  abstract class JDBCConversion
-  case object BooleanConversion extends JDBCConversion
-  case object DateConversion extends JDBCConversion
-  case class  DecimalConversion(precision: Int, scale: Int) extends 
JDBCConversion
-  case object DoubleConversion extends JDBCConversion
-  case object FloatConversion extends JDBCConversion
-  case object IntegerConversion extends JDBCConversion
-  case object LongConversion extends JDBCConversion
-  case object BinaryLongConversion extends JDBCConversion
-  case object StringConversion extends JDBCConversion
-  case object TimestampConversion extends JDBCConversion
-  case object BinaryConversion extends JDBCConversion
-  case class ArrayConversion(elementConversion: JDBCConversion) extends 
JDBCConversion
+  // A `JDBCConversion` is responsible for converting a value from 
`ResultSet`
+  // to a value in a field for `InternalRow`.
+  private type JDBCConversion = (ResultSet, Int) => Any
+
+  // This `ArrayElementConversion` is responsible for converting elements 
in
+  // an array from `ResultSet`.
+  private type ArrayElementConversion = (Object) => Any
--- End diff --

looks like this type alias is not very useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14313#discussion_r72005309
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -322,46 +322,135 @@ private[sql] class JDBCRDD(
 }
   }
 
-  // Each JDBC-to-Catalyst conversion corresponds to a tag defined here so 
that
-  // we don't have to potentially poke around in the Metadata once for 
every
-  // row.
-  // Is there a better way to do this?  I'd rather be using a type that
-  // contains only the tags I define.
-  abstract class JDBCConversion
-  case object BooleanConversion extends JDBCConversion
-  case object DateConversion extends JDBCConversion
-  case class  DecimalConversion(precision: Int, scale: Int) extends 
JDBCConversion
-  case object DoubleConversion extends JDBCConversion
-  case object FloatConversion extends JDBCConversion
-  case object IntegerConversion extends JDBCConversion
-  case object LongConversion extends JDBCConversion
-  case object BinaryLongConversion extends JDBCConversion
-  case object StringConversion extends JDBCConversion
-  case object TimestampConversion extends JDBCConversion
-  case object BinaryConversion extends JDBCConversion
-  case class ArrayConversion(elementConversion: JDBCConversion) extends 
JDBCConversion
+  // A `JDBCConversion` is responsible for converting a value from 
`ResultSet`
+  // to a value in a field for `InternalRow`.
+  private type JDBCConversion = (ResultSet, Int) => Any
+
+  // This `ArrayElementConversion` is responsible for converting elements 
in
+  // an array from `ResultSet`.
+  private type ArrayElementConversion = (Object) => Any
 
   /**
-   * Maps a StructType to a type tag list.
+   * Maps a StructType to conversions for each type.
*/
   def getConversions(schema: StructType): Array[JDBCConversion] =
 schema.fields.map(sf => getConversions(sf.dataType, sf.metadata))
 
   private def getConversions(dt: DataType, metadata: Metadata): 
JDBCConversion = dt match {
-case BooleanType => BooleanConversion
-case DateType => DateConversion
-case DecimalType.Fixed(p, s) => DecimalConversion(p, s)
-case DoubleType => DoubleConversion
-case FloatType => FloatConversion
-case IntegerType => IntegerConversion
-case LongType => if (metadata.contains("binarylong")) 
BinaryLongConversion else LongConversion
-case StringType => StringConversion
-case TimestampType => TimestampConversion
-case BinaryType => BinaryConversion
-case ArrayType(et, _) => ArrayConversion(getConversions(et, metadata))
+case BooleanType =>
+  (rs: ResultSet, pos: Int) => rs.getBoolean(pos)
+
+case DateType =>
+  (rs: ResultSet, pos: Int) =>
+// DateTimeUtils.fromJavaDate does not handle null value, so we 
need to check it.
+val dateVal = rs.getDate(pos)
+if (dateVal != null) {
+  DateTimeUtils.fromJavaDate(dateVal)
+} else {
+  null
+}
+
+case DecimalType.Fixed(p, s) =>
+  (rs: ResultSet, pos: Int) =>
+val decimalVal = rs.getBigDecimal(pos)
+if (decimalVal != null) {
--- End diff --

use `nullSafeConvert`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

2016-07-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14313#discussion_r72005232
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -407,84 +496,8 @@ private[sql] class JDBCRDD(
 var i = 0
 while (i < conversions.length) {
   val pos = i + 1
-  conversions(i) match {
-case BooleanConversion => mutableRow.setBoolean(i, 
rs.getBoolean(pos))
-case DateConversion =>
-  // DateTimeUtils.fromJavaDate does not handle null value, so 
we need to check it.
-  val dateVal = rs.getDate(pos)
-  if (dateVal != null) {
-mutableRow.setInt(i, DateTimeUtils.fromJavaDate(dateVal))
-  } else {
-mutableRow.update(i, null)
-  }
-// When connecting with Oracle DB through JDBC, the precision 
and scale of BigDecimal
-// object returned by ResultSet.getBigDecimal is not correctly 
matched to the table
-// schema reported by ResultSetMetaData.getPrecision and 
ResultSetMetaData.getScale.
-// If inserting values like 1 into a column with 
NUMBER(12, 2) type, you get through
-// a BigDecimal object with scale as 0. But the dataframe 
schema has correct type as
-// DecimalType(12, 2). Thus, after saving the dataframe into 
parquet file and then
-// retrieve it, you will get wrong result 199.99.
-// So it is needed to set precision and scale for Decimal 
based on JDBC metadata.
--- End diff --

Oh, my mistake. I will add this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] fix KMeans bcNewCenters unpersi...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14333
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] fix KMeans bcNewCenters unpersi...

2016-07-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14333
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62779/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] fix KMeans bcNewCenters unpersi...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14333
  
**[Test build #62779 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62779/consoleFull)**
 for PR 14333 at commit 
[`ecd15b2`](https://github.com/apache/spark/commit/ecd15b2b7bac21a9e37f998747a752903530).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14313#discussion_r72005172
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -407,84 +496,8 @@ private[sql] class JDBCRDD(
 var i = 0
 while (i < conversions.length) {
   val pos = i + 1
-  conversions(i) match {
-case BooleanConversion => mutableRow.setBoolean(i, 
rs.getBoolean(pos))
-case DateConversion =>
-  // DateTimeUtils.fromJavaDate does not handle null value, so 
we need to check it.
-  val dateVal = rs.getDate(pos)
-  if (dateVal != null) {
-mutableRow.setInt(i, DateTimeUtils.fromJavaDate(dateVal))
-  } else {
-mutableRow.update(i, null)
-  }
-// When connecting with Oracle DB through JDBC, the precision 
and scale of BigDecimal
-// object returned by ResultSet.getBigDecimal is not correctly 
matched to the table
-// schema reported by ResultSetMetaData.getPrecision and 
ResultSetMetaData.getScale.
-// If inserting values like 1 into a column with 
NUMBER(12, 2) type, you get through
-// a BigDecimal object with scale as 0. But the dataframe 
schema has correct type as
-// DecimalType(12, 2). Thus, after saving the dataframe into 
parquet file and then
-// retrieve it, you will get wrong result 199.99.
-// So it is needed to set precision and scale for Decimal 
based on JDBC metadata.
--- End diff --

why do we remove these comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14313#discussion_r72005085
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -322,46 +322,135 @@ private[sql] class JDBCRDD(
 }
   }
 
-  // Each JDBC-to-Catalyst conversion corresponds to a tag defined here so 
that
-  // we don't have to potentially poke around in the Metadata once for 
every
-  // row.
-  // Is there a better way to do this?  I'd rather be using a type that
-  // contains only the tags I define.
-  abstract class JDBCConversion
-  case object BooleanConversion extends JDBCConversion
-  case object DateConversion extends JDBCConversion
-  case class  DecimalConversion(precision: Int, scale: Int) extends 
JDBCConversion
-  case object DoubleConversion extends JDBCConversion
-  case object FloatConversion extends JDBCConversion
-  case object IntegerConversion extends JDBCConversion
-  case object LongConversion extends JDBCConversion
-  case object BinaryLongConversion extends JDBCConversion
-  case object StringConversion extends JDBCConversion
-  case object TimestampConversion extends JDBCConversion
-  case object BinaryConversion extends JDBCConversion
-  case class ArrayConversion(elementConversion: JDBCConversion) extends 
JDBCConversion
+  // A `JDBCConversion` is responsible for converting a value from 
`ResultSet`
+  // to a value in a field for `InternalRow`.
+  private type JDBCConversion = (ResultSet, Int) => Any
--- End diff --

also explain what's the 2 arguments in the comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14339
  
can we test it in `SQLQuerySuite` or something? I think 
`FileSourceStrategySuite` should be used to test the strategy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14339: [SPARK-16698][SQL] Field names having dots should be all...

2016-07-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14339
  
**[Test build #62785 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62785/consoleFull)**
 for PR 14339 at commit 
[`cd5d04a`](https://github.com/apache/spark/commit/cd5d04a2661dc2e56b517478200a074ac075dec1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 290 matches

Mail list logo