date:20171130

[GitHub] spark issue #19824: [SPARK][STREAMING] Invoke onBatchCompletion() only when ...

2017-11-30 Thread victor-wong

Github user victor-wong commented on the issue:

https://github.com/apache/spark/pull/19824
  
@viirya Sorry for the misleading title, I have changed it now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19856: [SPARK-22664] The logs about "Connected to Zookeeper" in...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19856
  
**[Test build #4002 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4002/testReport)**
 for PR 19856 at commit 
[`652068b`](https://github.com/apache/spark/commit/652068b58f4140206d461bc95cdac4531f750e65).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18906
  
@ptkool, for 
https://github.com/apache/spark/pull/18906#issuecomment-348350023. can you 
point out Scala API rather than just saying it's consistent with Scala side and 
making reviewers checking through the codes here and there?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-30 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/19831
  
Since Hive can't protect user to set a wrong stats properties, I think this 
solution can alleviate the problem. Besides, it's consistent with what we do 
for `totalSize and rawDataSize` (only use the stats when > 0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19841: [SPARK-22642][SQL] the createdTempDir will not be delete...

2017-11-30 Thread zuotingbing

Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/19841
  
Yes i am also confused why the diff is so big but it  reported by git 
originally.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19831: [SPARK-22626][SQL] Wrong Hive table statistics ma...

2017-11-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19831#discussion_r154250160
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -418,7 +418,7 @@ private[hive] class HiveClientImpl(
   // Note that this statistics could be overridden by Spark's 
statistics if that's available.
   val totalSize = 
properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
   val rawDataSize = 
properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
-  val rowCount = 
properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)).filter(_ >= 0)
+  val rowCount = 
properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)).filter(_ > 0)
--- End diff --

Thanks for the investigation. Seems hive can't protect its stats properties.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu

Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19783#discussion_r154250197
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -332,8 +332,45 @@ case class FilterEstimation(plan: Filter) extends 
Logging {
 colStatsMap.update(attr, newStats)
   }
 
-  Some(1.0 / BigDecimal(ndv))
-} else {
+  // We compute filter selectivity using Histogram information
+  attr.dataType match {
+case StringType | BinaryType =>
+  Some(1.0 / BigDecimal(ndv))
+
+case _ =>
+  // returns 1/ndv if there is no histogram
+  if (colStat.histogram.isEmpty) return Some(1.0 / BigDecimal(ndv))
+
+  // We traverse histogram bins to locate the literal value
+  val hgmBins = colStat.histogram.get.bins
+  val datum = EstimationUtils.toDecimal(literal.value, 
literal.dataType).toDouble
+  // find the interval where this datum locates
+  var lowerId, higherId = -1
+  for (i <- hgmBins.indices) {
+// if datum > upperBound, just move to next bin
+if (datum <= hgmBins(i).hi && lowerId < 0) lowerId = i
+if (higherId < 0) {
+  if ((datum < hgmBins(i).hi || i == hgmBins.length - 1) ||
+((datum == hgmBins(i).hi) && (datum < hgmBins(i + 1).hi))) 
{
+higherId = i
+  }
+}
+  }
+  assert(lowerId <= higherId)
+  val lowerBinNdv = hgmBins(lowerId).ndv
+  val higherBinNdv = hgmBins(higherId).ndv
+  // assume uniform distribution in each bin
+  val percent = if (lowerId == higherId) {
+(1.0 / hgmBins.length) / math.max(lowerBinNdv, 1)
+  } else {
+1.0 / hgmBins.length * (higherId - lowerId - 1) +
+  (1.0 / hgmBins.length) / math.max(lowerBinNdv, 1) +
+  (1.0 / hgmBins.length) / math.max(higherBinNdv, 1)
+  }
+  Some(percent)
--- End diff --

Good point. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu

Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19783#discussion_r154249995
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
 ---
@@ -114,4 +114,197 @@ object EstimationUtils {
 }
   }
 
+  /**
+   * Returns the number of the first bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param histogram a numeric equi-height histogram
+   * @return the number of the first bin into which a column values falls.
+   */
+
+  def findFirstBinForValue(value: Double, histogram: Histogram): Int = {
+var binId = 0
+histogram.bins.foreach { bin =>
+  if (value > bin.hi) binId += 1
+}
+binId
+  }
+
+  /**
+   * Returns the number of the last bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param histogram a numeric equi-height histogram
+   * @return the number of the last bin into which a column values falls.
+   */
+
+  def findLastBinForValue(value: Double, histogram: Histogram): Int = {
+var binId = 0
+for (i <- 0 until histogram.bins.length) {
+  if (value > histogram.bins(i).hi) {
+// increment binId to point to next bin
+binId += 1
+  }
+  if ((value == histogram.bins(i).hi) && (i < histogram.bins.length - 
1)) {
+if (value == histogram.bins(i + 1).lo) {
--- End diff --

No.  I meant the upper bound for the array of bins in a histogram.  The 
default length of the histogram bin array is 254.  When i is equal to 253 (the 
last bin), then i+1 is 254 leading to out-of-bound error.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-30 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/19831
  
> Besides, if the size stats totalSize or rawDataSize is wrong, the problem 
exists whether CBO is enabled or not.

> If CBO enabled, the outputRowCount == 0, the getOutputSize is 1, 
sizeInBytes is 1 and this side can broadcast:
If CBO disabled, the sizeInBytes = (p.child.stats.sizeInBytes * 
outputRowSize) / childRowSize and this side cann't broadcast:

@wangyum `totalSize or rawDataSize` can also be wrong, right?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19823: [SPARK-22601][SQL] Data load is getting displayed succes...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19823
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84356/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19823: [SPARK-22601][SQL] Data load is getting displayed succes...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19823
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19823: [SPARK-22601][SQL] Data load is getting displayed succes...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19823
  
**[Test build #84356 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84356/testReport)**
 for PR 19823 at commit 
[`f6eb4ad`](https://github.com/apache/spark/commit/f6eb4ada04e21760f805b3e50062f91a438cef56).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19783#discussion_r154248775
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
 ---
@@ -114,4 +114,197 @@ object EstimationUtils {
 }
   }
 
+  /**
+   * Returns the number of the first bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param histogram a numeric equi-height histogram
+   * @return the number of the first bin into which a column values falls.
+   */
+
+  def findFirstBinForValue(value: Double, histogram: Histogram): Int = {
+var binId = 0
+histogram.bins.foreach { bin =>
+  if (value > bin.hi) binId += 1
+}
+binId
+  }
+
+  /**
+   * Returns the number of the last bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param histogram a numeric equi-height histogram
+   * @return the number of the last bin into which a column values falls.
+   */
+
+  def findLastBinForValue(value: Double, histogram: Histogram): Int = {
+var binId = 0
+for (i <- 0 until histogram.bins.length) {
+  if (value > histogram.bins(i).hi) {
+// increment binId to point to next bin
+binId += 1
+  }
+  if ((value == histogram.bins(i).hi) && (i < histogram.bins.length - 
1)) {
+if (value == histogram.bins(i + 1).lo) {
--- End diff --

By "out of bound", do you mean it exceeds 100 length limit? You can just 
switch new line after `&&`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19839: SPARK-22373 Bump Janino dependency version to fix...

2017-11-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19839


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19856: [SPARK-22664] The logs about "Connected to Zookeeper" in...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19856
  
**[Test build #4002 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4002/testReport)**
 for PR 19856 at commit 
[`652068b`](https://github.com/apache/spark/commit/652068b58f4140206d461bc95cdac4531f750e65).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19839: SPARK-22373 Bump Janino dependency version to fix thread...

2017-11-30 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19839
  
Merged to master/2.2/2.1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19783#discussion_r154248457
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
 ---
@@ -114,4 +114,197 @@ object EstimationUtils {
 }
   }
 
+  /**
+   * Returns the number of the first bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param histogram a numeric equi-height histogram
+   * @return the number of the first bin into which a column values falls.
+   */
+
+  def findFirstBinForValue(value: Double, histogram: Histogram): Int = {
+var binId = 0
--- End diff --

If a user changes the data, statistics will be removed, or re-collected 
(only size currently), Spark already implements this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19826: [SPARK-22428][DOC] Add spark application garbage ...

2017-11-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19826


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19826: [SPARK-22428][DOC] Add spark application garbage collect...

2017-11-30 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19826
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19831: [SPARK-22626][SQL] Wrong Hive table statistics ma...

2017-11-30 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/19831#discussion_r154245570
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -418,7 +418,7 @@ private[hive] class HiveClientImpl(
   // Note that this statistics could be overridden by Spark's 
statistics if that's available.
   val totalSize = 
properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
   val rawDataSize = 
properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
-  val rowCount = 
properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)).filter(_ >= 0)
+  val rowCount = 
properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)).filter(_ > 0)
--- End diff --

Maybe this could be more clear:
```scala
val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_))
val stats =
  if (totalSize.isDefined && totalSize.get > 0L) {
Some(CatalogStatistics(sizeInBytes = totalSize.get, rowCount = 
rowCount.filter(_ > 0)))
  } else if (rawDataSize.isDefined && rawDataSize.get > 0) {
Some(CatalogStatistics(sizeInBytes = rawDataSize.get, rowCount = 
rowCount.filter(_ > 0)))
  } else {
None
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19823: [SPARK-22601][SQL] Data load is getting displayed succes...

2017-11-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19823
  
LGTM pending Jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84360/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18906
  
**[Test build #84360 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84360/testReport)**
 for PR 18906 at commit 
[`0727597`](https://github.com/apache/spark/commit/0727597bfdb35d03df7050c9777b29efb9add4b1).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19839: SPARK-22373 Bump Janino dependency version to fix thread...

2017-11-30 Thread Victsm

Github user Victsm commented on the issue:

https://github.com/apache/spark/pull/19839
  
@srowen @kiszk @mgaido91 

Is this patch ready to merge?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19856: [SPARK-22664] The logs about "Connected to Zookeeper" in...

2017-11-30 Thread liu-zhaokun

Github user liu-zhaokun commented on the issue:

https://github.com/apache/spark/pull/19856
  
@srowen
Thanks for your reply.Could you help me review it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19805: [SPARK-22649][PYTHON][SQL] Adding localCheckpoint to Dat...

2017-11-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19805
  
Will review it this weekend.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-30 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r154240414
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -542,7 +496,54 @@ private[spark] class SecurityManager(
* Gets the secret key.
* @return the secret key as a String if authentication is enabled, 
otherwise returns null
*/
-  def getSecretKey(): String = secretKey
+  def getSecretKey(): String = {
+if (isAuthenticationEnabled) {
+  Option(sparkConf.getenv(ENV_AUTH_SECRET))
--- End diff --

I changed things and now they work pretty much as before. It would be good 
to separate secret generation from distribution, but I'd rather do that 
separately.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19631: [SPARK-22372][core, yarn] Make cluster submission use Sp...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19631
  
**[Test build #84363 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84363/testReport)**
 for PR 19631 at commit 
[`c752453`](https://github.com/apache/spark/commit/c752453b2a379f301d52692bfc639bb631520069).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19848: [SPARK-22162] Executors and the driver should use...

2017-11-30 Thread rezasafi

Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19848#discussion_r154238528
  
--- Diff: 
core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala ---
@@ -70,7 +70,8 @@ object SparkHadoopMapRedUtil extends Logging {
   if (shouldCoordinateWithDriver) {
 val outputCommitCoordinator = SparkEnv.get.outputCommitCoordinator
 val taskAttemptNumber = TaskContext.get().attemptNumber()
-val canCommit = outputCommitCoordinator.canCommit(jobId, splitId, 
taskAttemptNumber)
+val stageId = TaskContext.get().stageId()
+val canCommit = outputCommitCoordinator.canCommit(stageId, 
splitId, taskAttemptNumber)
--- End diff --

It make sense to use stageId there, since before jobId was used instead of 
stageId. I will test that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19848: [SPARK-22162] Executors and the driver should use...

2017-11-30 Thread rezasafi

Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19848#discussion_r154237986
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -102,14 +103,15 @@ object SparkHadoopWriter extends Logging {
   context: TaskContext,
   config: HadoopWriteConfigUtil[K, V],
   jobTrackerId: String,
+  commitJobId: Int,
   sparkStageId: Int,
   sparkPartitionId: Int,
   sparkAttemptNumber: Int,
   committer: FileCommitProtocol,
   iterator: Iterator[(K, V)]): TaskCommitMessage = {
 // Set up a task.
 val taskContext = config.createTaskAttemptContext(
-  jobTrackerId, sparkStageId, sparkPartitionId, sparkAttemptNumber)
+  jobTrackerId, commitJobId, sparkPartitionId, sparkAttemptNumber)
--- End diff --

I removed it a few min ago. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19848
  
**[Test build #84362 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84362/testReport)**
 for PR 19848 at commit 
[`92f9180`](https://github.com/apache/spark/commit/92f9180b0fea71ff3ae4aa3049e04d6f1e3167be).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19848: [SPARK-22162] Executors and the driver should use...

2017-11-30 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19848#discussion_r154235818
  
--- Diff: 
core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala ---
@@ -70,7 +70,8 @@ object SparkHadoopMapRedUtil extends Logging {
   if (shouldCoordinateWithDriver) {
 val outputCommitCoordinator = SparkEnv.get.outputCommitCoordinator
 val taskAttemptNumber = TaskContext.get().attemptNumber()
-val canCommit = outputCommitCoordinator.canCommit(jobId, splitId, 
taskAttemptNumber)
+val stageId = TaskContext.get().stageId()
+val canCommit = outputCommitCoordinator.canCommit(stageId, 
splitId, taskAttemptNumber)
--- End diff --

Shouldn't `CommitDeniedException` (below) be updated to use the stage ID 
also? Otherwise the exception might have incomplete information.

With that change it's possible that `jobId` might become unused in this 
method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19848: [SPARK-22162] Executors and the driver should use...

2017-11-30 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19848#discussion_r154236366
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -102,14 +103,15 @@ object SparkHadoopWriter extends Logging {
   context: TaskContext,
   config: HadoopWriteConfigUtil[K, V],
   jobTrackerId: String,
+  commitJobId: Int,
   sparkStageId: Int,
   sparkPartitionId: Int,
   sparkAttemptNumber: Int,
   committer: FileCommitProtocol,
   iterator: Iterator[(K, V)]): TaskCommitMessage = {
 // Set up a task.
 val taskContext = config.createTaskAttemptContext(
-  jobTrackerId, sparkStageId, sparkPartitionId, sparkAttemptNumber)
+  jobTrackerId, commitJobId, sparkPartitionId, sparkAttemptNumber)
--- End diff --

`sparkStageId` is now unused in this method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19848
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84355/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19848
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19848
  
**[Test build #84355 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84355/testReport)**
 for PR 19848 at commit 
[`f4ef351`](https://github.com/apache/spark/commit/f4ef351a05394ada5449e51a64606f0e5e7647c3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19828: [SPARK-22614] Dataset API: repartitionByRange(......

2017-11-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19828


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19828: [SPARK-22614] Dataset API: repartitionByRange(...)

2017-11-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19828
  
LGTM

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19714: [SPARK-22489][SQL] Shouldn't change broadcast joi...

2017-11-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19714


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19714: [SPARK-22489][SQL] Shouldn't change broadcast join build...

2017-11-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19714
  
LGTM 

Thanks! Merged to master. 

Could you submit a follow-up PR to document the behavior changes in 
migration section of Spark SQL? 
https://spark.apache.org/docs/latest/sql-programming-guide.html#migration-guide


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19715
  
**[Test build #84357 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84357/testReport)**
 for PR 19715 at commit 
[`97ad483`](https://github.com/apache/spark/commit/97ad483d05740b43221d693c2f42f79609422a23).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19715
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84357/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19715
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18906
  
**[Test build #84361 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84361/testReport)**
 for PR 18906 at commit 
[`f8e4904`](https://github.com/apache/spark/commit/f8e490464d59a8802f68f1bf2c0d2a59e89dc008).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84361/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84354/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84354 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84354/testReport)**
 for PR 19717 at commit 
[`0f48f03`](https://github.com/apache/spark/commit/0f48f036d6d68fb06d24298a11cc2c139d00735e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18906
  
**[Test build #84361 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84361/testReport)**
 for PR 18906 at commit 
[`f8e4904`](https://github.com/apache/spark/commit/f8e490464d59a8802f68f1bf2c0d2a59e89dc008).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread ptkool

Github user ptkool commented on the issue:

https://github.com/apache/spark/pull/18906
  
@holdenk I believe the changes in this PR match what's provided in the 
scala API. Am I missing something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84352/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84352 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84352/testReport)**
 for PR 19717 at commit 
[`357bd85`](https://github.com/apache/spark/commit/357bd8529e44266beacaec50bf6507da41e4cd24).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19826: [SPARK-22428][DOC] Add spark application garbage collect...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19826
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84358/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19826: [SPARK-22428][DOC] Add spark application garbage collect...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19826
  
**[Test build #84358 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84358/testReport)**
 for PR 19826 at commit 
[`241dbd0`](https://github.com/apache/spark/commit/241dbd069929c9f9eb6f1f9e45485693c9d4cd60).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19826: [SPARK-22428][DOC] Add spark application garbage collect...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19826
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18906
  
**[Test build #84360 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84360/testReport)**
 for PR 18906 at commit 
[`0727597`](https://github.com/apache/spark/commit/0727597bfdb35d03df7050c9777b29efb9add4b1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu

Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19783#discussion_r154225069
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
 ---
@@ -114,4 +114,197 @@ object EstimationUtils {
 }
   }
 
+  /**
+   * Returns the number of the first bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param histogram a numeric equi-height histogram
+   * @return the number of the first bin into which a column values falls.
+   */
+
+  def findFirstBinForValue(value: Double, histogram: Histogram): Int = {
+var binId = 0
+histogram.bins.foreach { bin =>
+  if (value > bin.hi) binId += 1
+}
+binId
+  }
+
+  /**
+   * Returns the number of the last bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param histogram a numeric equi-height histogram
+   * @return the number of the last bin into which a column values falls.
+   */
+
+  def findLastBinForValue(value: Double, histogram: Histogram): Int = {
+var binId = 0
+for (i <- 0 until histogram.bins.length) {
+  if (value > histogram.bins(i).hi) {
+// increment binId to point to next bin
+binId += 1
+  }
+  if ((value == histogram.bins(i).hi) && (i < histogram.bins.length - 
1)) {
+if (value == histogram.bins(i + 1).lo) {
--- End diff --

I used two statements instead of one statement is because, when i points to 
the last bin, this condition "value == histogram.bins(i + 1).lo" may be out of 
bound.  By separating the conditions into two statements, we can be sure that 
the out-of-bound error will not happen.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu

Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19783#discussion_r154223769
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
 ---
@@ -114,4 +114,197 @@ object EstimationUtils {
 }
   }
 
+  /**
+   * Returns the number of the first bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param histogram a numeric equi-height histogram
+   * @return the number of the first bin into which a column values falls.
+   */
+
+  def findFirstBinForValue(value: Double, histogram: Histogram): Int = {
+var binId = 0
+histogram.bins.foreach { bin =>
+  if (value > bin.hi) binId += 1
+}
+binId
+  }
+
+  /**
+   * Returns the number of the last bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param histogram a numeric equi-height histogram
+   * @return the number of the last bin into which a column values falls.
+   */
+
+  def findLastBinForValue(value: Double, histogram: Histogram): Int = {
+var binId = 0
--- End diff --

same comment as in my last reply.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread ron8hu

Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19783#discussion_r154223705
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
 ---
@@ -114,4 +114,197 @@ object EstimationUtils {
 }
   }
 
+  /**
+   * Returns the number of the first bin into which a column values falls 
for a specified
+   * numeric equi-height histogram.
+   *
+   * @param value a literal value of a column
+   * @param histogram a numeric equi-height histogram
+   * @return the number of the first bin into which a column values falls.
+   */
+
+  def findFirstBinForValue(value: Double, histogram: Histogram): Int = {
+var binId = 0
--- End diff --

I hesitate to add an assert statement here.  This is because an assert such 
as this may cause Spark system to crash if a user does not fresh his data 
statistics quickly.  In real world, a user may load data, collect statistics, 
and then add more incremental data, but does not collect statistics 
immediately.  He may issue a SQL query against his newly added data such as 
"WHERE column=xxx", where xxx is a new value in his incremental load.  After 
all, statistics are auxiliary, a query should still run even the statistics are 
not up to date.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84359/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18906
  
**[Test build #84359 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84359/testReport)**
 for PR 18906 at commit 
[`90e1684`](https://github.com/apache/spark/commit/90e16845e2e24a835387c13537c841d55e1b85c7).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18906
  
**[Test build #84359 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84359/testReport)**
 for PR 18906 at commit 
[`90e1684`](https://github.com/apache/spark/commit/90e16845e2e24a835387c13537c841d55e1b85c7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19826: [SPARK-22428][DOC] Add spark application garbage collect...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19826
  
**[Test build #84358 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84358/testReport)**
 for PR 19826 at commit 
[`241dbd0`](https://github.com/apache/spark/commit/241dbd069929c9f9eb6f1f9e45485693c9d4cd60).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19823: [SPARK-22601][SQL] Data load is getting displayed succes...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19823
  
**[Test build #84356 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84356/testReport)**
 for PR 19823 at commit 
[`f6eb4ad`](https://github.com/apache/spark/commit/f6eb4ada04e21760f805b3e50062f91a438cef56).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19715: [SPARK-22397][ML]add multiple columns support to Quantil...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19715
  
**[Test build #84357 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84357/testReport)**
 for PR 19715 at commit 
[`97ad483`](https://github.com/apache/spark/commit/97ad483d05740b43221d693c2f42f79609422a23).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19826: [SPARK-22428][DOC] Add spark application garbage collect...

2017-11-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19826
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19823: [SPARK-22601][SQL] Data load is getting displayed succes...

2017-11-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19823
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19631: [SPARK-22372][core, yarn] Make cluster submission use Sp...

2017-11-30 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19631
  
> For instance if someone is starting a server that is getting hit by other 
users that server could be started with the same env and then inadvertently 
expose the secret to other users.

If I understand what you're saying correctly, that should be considered a 
security issue in that server application regardless of this change. The server 
should not be exposing its environment to unprivileged users.

That being said, it seems Spark's own thrift server does that. If the 
following works in a spark-shell, it probably would do the same through the STS:

```
scala> spark.sql("set 
spark.sql.columnNameOfCorruptRecord=${env:SCALA_HOME}").show()
++--+
| key| value|
++--+
|spark.sql.columnN...|/apps/scala-2.11.7|
++--+
```

So this change would expose that secret to users also in YARN mode (it's 
already exposed in Standalone and Mesos currently, because it's in the config).

Let me think a little about this. I prefer the environment to the 
credentials approach because the latter are written to disk, but at the same 
time, that's less problematic than exposing the secret to users in the STS.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-11-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18692


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18692
  
LGTM 

Thanks for your patience! It looks much good now. Really appreciate for 
your contributions! Welcome to make more contributions!

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-30 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r154218810
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -542,7 +496,54 @@ private[spark] class SecurityManager(
* Gets the secret key.
* @return the secret key as a String if authentication is enabled, 
otherwise returns null
*/
-  def getSecretKey(): String = secretKey
+  def getSecretKey(): String = {
+if (isAuthenticationEnabled) {
+  Option(sparkConf.getenv(ENV_AUTH_SECRET))
--- End diff --

I can add a different, internal config for this is re-using 
`SPARK_AUTH_SECRET_CONF` is confusing. But I'm not too concerned about exposing 
this to the user code running the application; they can just as easily get that 
info from the UGI currently. Spark already redacts this kind of information 
when writing it to things like the event log, which would be one place where it 
might leak out.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-30 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r154218236
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -58,18 +58,14 @@ import org.apache.spark.util.{CallerContext, Utils}
 
 private[spark] class Client(
 val args: ClientArguments,
-val hadoopConf: Configuration,
 val sparkConf: SparkConf)
   extends Logging {
 
   import Client._
   import YarnSparkHadoopUtil._
 
-  def this(clientArgs: ClientArguments, spConf: SparkConf) =
-this(clientArgs, SparkHadoopUtil.get.newConfiguration(spConf), spConf)
-
   private val yarnClient = YarnClient.createYarnClient
-  private val yarnConf = new YarnConfiguration(hadoopConf)
+  private val hadoopConf = new 
YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
--- End diff --

Both are effectively the same; I chose too keep the name used more often to 
minimize changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84351/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18692
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18692
  
**[Test build #84351 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84351/testReport)**
 for PR 18692 at commit 
[`9ab91a1`](https://github.com/apache/spark/commit/9ab91a19cefd63b7d28674992b68da8164d487ae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84353/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19811
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19811
  
**[Test build #84353 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84353/testReport)**
 for PR 19811 at commit 
[`7ca94e1`](https://github.com/apache/spark/commit/7ca94e1ac2297e7c5e5de3a32a09050a010780fa).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-11-30 Thread liyinan926

Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/19717
  
@mridulm @jiangxb1987 @jerryshao @ueshin all comments so far have been 
addressed. The PR has also been updated to include a missing step for 
`local://` dependency support. PTAL. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19828: [SPARK-22614] Dataset API: repartitionByRange(...)

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19828
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19828: [SPARK-22614] Dataset API: repartitionByRange(...)

2017-11-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19828
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84347/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19828: [SPARK-22614] Dataset API: repartitionByRange(...)

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19828
  
**[Test build #84347 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84347/testReport)**
 for PR 19828 at commit 
[`012baa0`](https://github.com/apache/spark/commit/012baa0e088dbabd6d10f943e8a794caadfbb207).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154201470
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * Defines APIs used in expression code generation.
+ */
+object ExpressionCodegen {
+
+  /**
+   * Given an expression, returns the all necessary parameters to evaluate 
it, so the generated
+   * code of this expression can be split in a function.
+   * The 1st string in returned tuple is the parameter strings used to 
call the function.
+   * The 2nd string in returned tuple is the parameter strings used to 
declare the function.
+   *
+   * Returns `None` if it can't produce valid parameters.
+   *
+   * Params to include:
+   * 1. Evaluated columns referred by this, children or deferred 
expressions.
+   * 2. Rows referred by this, children or deferred expressions.
+   * 3. Eliminated subexpressions referred bu children expressions.
+   */
+  def getExpressionInputParams(
+  ctx: CodegenContext,
+  expr: Expression): Option[(Seq[String], Seq[String])] = {
+val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr)
+val inputRows = ctx.INPUT_ROW +: getInputRowsForChildren(ctx, expr)
+val subExprs = getSubExprInChildren(ctx, expr)
+
+val paramsFromRows = inputRows.distinct.filter(_ != null).map { row =>
+  (row, s"InternalRow $row")
+}
+val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, 
inputVars)
+val paramsFromSubExprs = getParamsForSubExprs(ctx, subExprs)
+val paramsLength = getParamLength(ctx, inputAttrs, subExprs) + 
paramsFromRows.length
+
+// Maximum allowed parameter number for Java's method descriptor.
+if (paramsLength > 255) {
+  None
+} else {
+  val allParams = (paramsFromRows ++ paramsFromColumns ++ 
paramsFromSubExprs).unzip
+  val callParams = allParams._1.distinct
+  val declParams = allParams._2.distinct
+  Some((callParams, declParams))
+}
+  }
+
+  /**
+   * Returns the eliminated subexpressions in the children expressions.
+   */
+  def getSubExprInChildren(ctx: CodegenContext, expr: Expression): 
Seq[Expression] = {
+expr.children.flatMap { child =>
+  child.collect {
+case e if ctx.subExprEliminationExprs.contains(e) => e
+  }
+}.distinct
+  }
+
+  /**
+   * Given the list of eliminated subexpressions used in the children 
expressions, returns the
+   * strings of funtion parameters. The first is the variable names used 
to call the function,
+   * the second is the parameters used to declare the function in 
generated code.
+   */
+  def getParamsForSubExprs(
+  ctx: CodegenContext,
+  subExprs: Seq[Expression]): Seq[(String, String)] = {
+subExprs.flatMap { subExpr =>
+  val argType = ctx.javaType(subExpr.dataType)
+
+  val subExprState = ctx.subExprEliminationExprs(subExpr)
+  (subExprState.value, subExprState.isNull)
+
+  if (!subExpr.nullable || subExprState.isNull == "true" || 
subExprState.isNull == "false") {
+Seq((subExprState.value, s"$argType ${subExprState.value}"))
+  } else {
+Seq((subExprState.value, s"$argType ${subExprState.value}"),
+  (subExprState.isNull, s"boolean ${subExprState.isNull}"))
+  }
+}.distinct
+  }
+
+  /**
+   * Retrieves previous input rows referred by children and deferred 
expressions.
+   */
+  def getInputRowsForChildren(ctx: CodegenContext, expr: Expression): 
Seq[String] = {
+

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154200835
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * Defines APIs used in expression code generation.
+ */
+object ExpressionCodegen {
+
+  /**
+   * Given an expression, returns the all necessary parameters to evaluate 
it, so the generated
+   * code of this expression can be split in a function.
+   * The 1st string in returned tuple is the parameter strings used to 
call the function.
+   * The 2nd string in returned tuple is the parameter strings used to 
declare the function.
+   *
+   * Returns `None` if it can't produce valid parameters.
+   *
+   * Params to include:
+   * 1. Evaluated columns referred by this, children or deferred 
expressions.
+   * 2. Rows referred by this, children or deferred expressions.
+   * 3. Eliminated subexpressions referred bu children expressions.
+   */
+  def getExpressionInputParams(
+  ctx: CodegenContext,
+  expr: Expression): Option[(Seq[String], Seq[String])] = {
+val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr)
+val inputRows = ctx.INPUT_ROW +: getInputRowsForChildren(ctx, expr)
+val subExprs = getSubExprInChildren(ctx, expr)
+
+val paramsFromRows = inputRows.distinct.filter(_ != null).map { row =>
+  (row, s"InternalRow $row")
+}
+val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, 
inputVars)
+val paramsFromSubExprs = getParamsForSubExprs(ctx, subExprs)
+val paramsLength = getParamLength(ctx, inputAttrs, subExprs) + 
paramsFromRows.length
+
+// Maximum allowed parameter number for Java's method descriptor.
+if (paramsLength > 255) {
+  None
+} else {
+  val allParams = (paramsFromRows ++ paramsFromColumns ++ 
paramsFromSubExprs).unzip
+  val callParams = allParams._1.distinct
+  val declParams = allParams._2.distinct
+  Some((callParams, declParams))
+}
+  }
+
+  /**
+   * Returns the eliminated subexpressions in the children expressions.
+   */
+  def getSubExprInChildren(ctx: CodegenContext, expr: Expression): 
Seq[Expression] = {
+expr.children.flatMap { child =>
+  child.collect {
+case e if ctx.subExprEliminationExprs.contains(e) => e
+  }
+}.distinct
+  }
+
+  /**
+   * Given the list of eliminated subexpressions used in the children 
expressions, returns the
+   * strings of funtion parameters. The first is the variable names used 
to call the function,
+   * the second is the parameters used to declare the function in 
generated code.
+   */
+  def getParamsForSubExprs(
+  ctx: CodegenContext,
+  subExprs: Seq[Expression]): Seq[(String, String)] = {
+subExprs.flatMap { subExpr =>
+  val argType = ctx.javaType(subExpr.dataType)
+
+  val subExprState = ctx.subExprEliminationExprs(subExpr)
+  (subExprState.value, subExprState.isNull)
+
+  if (!subExpr.nullable || subExprState.isNull == "true" || 
subExprState.isNull == "false") {
+Seq((subExprState.value, s"$argType ${subExprState.value}"))
+  } else {
+Seq((subExprState.value, s"$argType ${subExprState.value}"),
+  (subExprState.isNull, s"boolean ${subExprState.isNull}"))
+  }
+}.distinct
+  }
+
+  /**
+   * Retrieves previous input rows referred by children and deferred 
expressions.
+   */
+  def getInputRowsForChildren(ctx: CodegenContext, expr: Expression): 
Seq[String] = {
+

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154200309
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * Defines APIs used in expression code generation.
+ */
+object ExpressionCodegen {
+
+  /**
+   * Given an expression, returns the all necessary parameters to evaluate 
it, so the generated
+   * code of this expression can be split in a function.
+   * The 1st string in returned tuple is the parameter strings used to 
call the function.
+   * The 2nd string in returned tuple is the parameter strings used to 
declare the function.
+   *
+   * Returns `None` if it can't produce valid parameters.
+   *
+   * Params to include:
+   * 1. Evaluated columns referred by this, children or deferred 
expressions.
+   * 2. Rows referred by this, children or deferred expressions.
+   * 3. Eliminated subexpressions referred bu children expressions.
+   */
+  def getExpressionInputParams(
+  ctx: CodegenContext,
+  expr: Expression): Option[(Seq[String], Seq[String])] = {
+val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr)
+val inputRows = ctx.INPUT_ROW +: getInputRowsForChildren(ctx, expr)
+val subExprs = getSubExprInChildren(ctx, expr)
+
+val paramsFromRows = inputRows.distinct.filter(_ != null).map { row =>
+  (row, s"InternalRow $row")
+}
+val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, 
inputVars)
+val paramsFromSubExprs = getParamsForSubExprs(ctx, subExprs)
+val paramsLength = getParamLength(ctx, inputAttrs, subExprs) + 
paramsFromRows.length
+
+// Maximum allowed parameter number for Java's method descriptor.
+if (paramsLength > 255) {
+  None
+} else {
+  val allParams = (paramsFromRows ++ paramsFromColumns ++ 
paramsFromSubExprs).unzip
+  val callParams = allParams._1.distinct
+  val declParams = allParams._2.distinct
+  Some((callParams, declParams))
+}
+  }
+
+  /**
+   * Returns the eliminated subexpressions in the children expressions.
+   */
+  def getSubExprInChildren(ctx: CodegenContext, expr: Expression): 
Seq[Expression] = {
+expr.children.flatMap { child =>
+  child.collect {
+case e if ctx.subExprEliminationExprs.contains(e) => e
+  }
+}.distinct
+  }
+
+  /**
+   * Given the list of eliminated subexpressions used in the children 
expressions, returns the
+   * strings of funtion parameters. The first is the variable names used 
to call the function,
+   * the second is the parameters used to declare the function in 
generated code.
+   */
+  def getParamsForSubExprs(
+  ctx: CodegenContext,
+  subExprs: Seq[Expression]): Seq[(String, String)] = {
+subExprs.flatMap { subExpr =>
+  val argType = ctx.javaType(subExpr.dataType)
+
+  val subExprState = ctx.subExprEliminationExprs(subExpr)
+  (subExprState.value, subExprState.isNull)
+
+  if (!subExpr.nullable || subExprState.isNull == "true" || 
subExprState.isNull == "false") {
+Seq((subExprState.value, s"$argType ${subExprState.value}"))
+  } else {
+Seq((subExprState.value, s"$argType ${subExprState.value}"),
+  (subExprState.isNull, s"boolean ${subExprState.isNull}"))
+  }
+}.distinct
+  }
+
+  /**
+   * Retrieves previous input rows referred by children and deferred 
expressions.
+   */
+  def getInputRowsForChildren(ctx: CodegenContext, expr: Expression): 
Seq[String] = {
+

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154199646
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * Defines APIs used in expression code generation.
+ */
+object ExpressionCodegen {
+
+  /**
+   * Given an expression, returns the all necessary parameters to evaluate 
it, so the generated
+   * code of this expression can be split in a function.
+   * The 1st string in returned tuple is the parameter strings used to 
call the function.
+   * The 2nd string in returned tuple is the parameter strings used to 
declare the function.
+   *
+   * Returns `None` if it can't produce valid parameters.
+   *
+   * Params to include:
+   * 1. Evaluated columns referred by this, children or deferred 
expressions.
+   * 2. Rows referred by this, children or deferred expressions.
+   * 3. Eliminated subexpressions referred bu children expressions.
+   */
+  def getExpressionInputParams(
+  ctx: CodegenContext,
+  expr: Expression): Option[(Seq[String], Seq[String])] = {
+val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr)
+val inputRows = ctx.INPUT_ROW +: getInputRowsForChildren(ctx, expr)
+val subExprs = getSubExprInChildren(ctx, expr)
+
+val paramsFromRows = inputRows.distinct.filter(_ != null).map { row =>
+  (row, s"InternalRow $row")
+}
+val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, 
inputVars)
+val paramsFromSubExprs = getParamsForSubExprs(ctx, subExprs)
+val paramsLength = getParamLength(ctx, inputAttrs, subExprs) + 
paramsFromRows.length
+
+// Maximum allowed parameter number for Java's method descriptor.
+if (paramsLength > 255) {
+  None
+} else {
+  val allParams = (paramsFromRows ++ paramsFromColumns ++ 
paramsFromSubExprs).unzip
+  val callParams = allParams._1.distinct
+  val declParams = allParams._2.distinct
+  Some((callParams, declParams))
+}
+  }
+
+  /**
+   * Returns the eliminated subexpressions in the children expressions.
+   */
+  def getSubExprInChildren(ctx: CodegenContext, expr: Expression): 
Seq[Expression] = {
+expr.children.flatMap { child =>
+  child.collect {
+case e if ctx.subExprEliminationExprs.contains(e) => e
+  }
+}.distinct
+  }
+
+  /**
+   * Given the list of eliminated subexpressions used in the children 
expressions, returns the
+   * strings of funtion parameters. The first is the variable names used 
to call the function,
+   * the second is the parameters used to declare the function in 
generated code.
+   */
+  def getParamsForSubExprs(
+  ctx: CodegenContext,
+  subExprs: Seq[Expression]): Seq[(String, String)] = {
+subExprs.flatMap { subExpr =>
+  val argType = ctx.javaType(subExpr.dataType)
+
+  val subExprState = ctx.subExprEliminationExprs(subExpr)
+  (subExprState.value, subExprState.isNull)
--- End diff --

shouldn't this line be removed? I think it is useless..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154199208
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * Defines APIs used in expression code generation.
+ */
+object ExpressionCodegen {
+
+  /**
+   * Given an expression, returns the all necessary parameters to evaluate 
it, so the generated
+   * code of this expression can be split in a function.
+   * The 1st string in returned tuple is the parameter strings used to 
call the function.
+   * The 2nd string in returned tuple is the parameter strings used to 
declare the function.
+   *
+   * Returns `None` if it can't produce valid parameters.
+   *
+   * Params to include:
+   * 1. Evaluated columns referred by this, children or deferred 
expressions.
+   * 2. Rows referred by this, children or deferred expressions.
+   * 3. Eliminated subexpressions referred bu children expressions.
+   */
+  def getExpressionInputParams(
+  ctx: CodegenContext,
+  expr: Expression): Option[(Seq[String], Seq[String])] = {
+val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr)
+val inputRows = ctx.INPUT_ROW +: getInputRowsForChildren(ctx, expr)
+val subExprs = getSubExprInChildren(ctx, expr)
+
+val paramsFromRows = inputRows.distinct.filter(_ != null).map { row =>
+  (row, s"InternalRow $row")
+}
+val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, 
inputVars)
+val paramsFromSubExprs = getParamsForSubExprs(ctx, subExprs)
+val paramsLength = getParamLength(ctx, inputAttrs, subExprs) + 
paramsFromRows.length
+
+// Maximum allowed parameter number for Java's method descriptor.
+if (paramsLength > 255) {
+  None
+} else {
+  val allParams = (paramsFromRows ++ paramsFromColumns ++ 
paramsFromSubExprs).unzip
+  val callParams = allParams._1.distinct
+  val declParams = allParams._2.distinct
+  Some((callParams, declParams))
+}
+  }
+
+  /**
+   * Returns the eliminated subexpressions in the children expressions.
+   */
+  def getSubExprInChildren(ctx: CodegenContext, expr: Expression): 
Seq[Expression] = {
+expr.children.flatMap { child =>
+  child.collect {
+case e if ctx.subExprEliminationExprs.contains(e) => e
+  }
+}.distinct
+  }
+
+  /**
+   * Given the list of eliminated subexpressions used in the children 
expressions, returns the
+   * strings of funtion parameters. The first is the variable names used 
to call the function,
+   * the second is the parameters used to declare the function in 
generated code.
+   */
+  def getParamsForSubExprs(
+  ctx: CodegenContext,
+  subExprs: Seq[Expression]): Seq[(String, String)] = {
+subExprs.flatMap { subExpr =>
+  val argType = ctx.javaType(subExpr.dataType)
+
+  val subExprState = ctx.subExprEliminationExprs(subExpr)
+  (subExprState.value, subExprState.isNull)
+
+  if (!subExpr.nullable || subExprState.isNull == "true" || 
subExprState.isNull == "false") {
+Seq((subExprState.value, s"$argType ${subExprState.value}"))
+  } else {
+Seq((subExprState.value, s"$argType ${subExprState.value}"),
+  (subExprState.isNull, s"boolean ${subExprState.isNull}"))
+  }
+}.distinct
+  }
+
+  /**
+   * Retrieves previous input rows referred by children and deferred 
expressions.
+   */
+  def getInputRowsForChildren(ctx: CodegenContext, expr: Expression): 
Seq[String] = {
+

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154199027
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * Defines APIs used in expression code generation.
+ */
+object ExpressionCodegen {
+
+  /**
+   * Given an expression, returns the all necessary parameters to evaluate 
it, so the generated
+   * code of this expression can be split in a function.
+   * The 1st string in returned tuple is the parameter strings used to 
call the function.
+   * The 2nd string in returned tuple is the parameter strings used to 
declare the function.
+   *
+   * Returns `None` if it can't produce valid parameters.
+   *
+   * Params to include:
+   * 1. Evaluated columns referred by this, children or deferred 
expressions.
+   * 2. Rows referred by this, children or deferred expressions.
+   * 3. Eliminated subexpressions referred bu children expressions.
+   */
+  def getExpressionInputParams(
+  ctx: CodegenContext,
+  expr: Expression): Option[(Seq[String], Seq[String])] = {
+val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr)
+val inputRows = ctx.INPUT_ROW +: getInputRowsForChildren(ctx, expr)
+val subExprs = getSubExprInChildren(ctx, expr)
+
+val paramsFromRows = inputRows.distinct.filter(_ != null).map { row =>
+  (row, s"InternalRow $row")
+}
+val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, 
inputVars)
+val paramsFromSubExprs = getParamsForSubExprs(ctx, subExprs)
+val paramsLength = getParamLength(ctx, inputAttrs, subExprs) + 
paramsFromRows.length
+
+// Maximum allowed parameter number for Java's method descriptor.
+if (paramsLength > 255) {
+  None
+} else {
+  val allParams = (paramsFromRows ++ paramsFromColumns ++ 
paramsFromSubExprs).unzip
+  val callParams = allParams._1.distinct
+  val declParams = allParams._2.distinct
+  Some((callParams, declParams))
+}
+  }
+
+  /**
+   * Returns the eliminated subexpressions in the children expressions.
+   */
+  def getSubExprInChildren(ctx: CodegenContext, expr: Expression): 
Seq[Expression] = {
+expr.children.flatMap { child =>
+  child.collect {
+case e if ctx.subExprEliminationExprs.contains(e) => e
+  }
+}.distinct
+  }
+
+  /**
+   * Given the list of eliminated subexpressions used in the children 
expressions, returns the
+   * strings of funtion parameters. The first is the variable names used 
to call the function,
+   * the second is the parameters used to declare the function in 
generated code.
+   */
+  def getParamsForSubExprs(
+  ctx: CodegenContext,
+  subExprs: Seq[Expression]): Seq[(String, String)] = {
+subExprs.flatMap { subExpr =>
+  val argType = ctx.javaType(subExpr.dataType)
+
+  val subExprState = ctx.subExprEliminationExprs(subExpr)
+  (subExprState.value, subExprState.isNull)
+
+  if (!subExpr.nullable || subExprState.isNull == "true" || 
subExprState.isNull == "false") {
+Seq((subExprState.value, s"$argType ${subExprState.value}"))
+  } else {
+Seq((subExprState.value, s"$argType ${subExprState.value}"),
+  (subExprState.isNull, s"boolean ${subExprState.isNull}"))
+  }
+}.distinct
+  }
+
+  /**
+   * Retrieves previous input rows referred by children and deferred 
expressions.
+   */
+  def getInputRowsForChildren(ctx: CodegenContext, expr: Expression): 
Seq[String] = {
+

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154197311
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * Defines APIs used in expression code generation.
+ */
+object ExpressionCodegen {
+
+  /**
+   * Given an expression, returns the all necessary parameters to evaluate 
it, so the generated
+   * code of this expression can be split in a function.
+   * The 1st string in returned tuple is the parameter strings used to 
call the function.
+   * The 2nd string in returned tuple is the parameter strings used to 
declare the function.
+   *
+   * Returns `None` if it can't produce valid parameters.
+   *
+   * Params to include:
+   * 1. Evaluated columns referred by this, children or deferred 
expressions.
+   * 2. Rows referred by this, children or deferred expressions.
+   * 3. Eliminated subexpressions referred bu children expressions.
+   */
+  def getExpressionInputParams(
+  ctx: CodegenContext,
+  expr: Expression): Option[(Seq[String], Seq[String])] = {
+val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr)
+val inputRows = ctx.INPUT_ROW +: getInputRowsForChildren(ctx, expr)
+val subExprs = getSubExprInChildren(ctx, expr)
+
+val paramsFromRows = inputRows.distinct.filter(_ != null).map { row =>
+  (row, s"InternalRow $row")
+}
+val paramsFromColumns = prepareFunctionParams(ctx, inputAttrs, 
inputVars)
+val paramsFromSubExprs = getParamsForSubExprs(ctx, subExprs)
+val paramsLength = getParamLength(ctx, inputAttrs, subExprs) + 
paramsFromRows.length
+
+// Maximum allowed parameter number for Java's method descriptor.
+if (paramsLength > 255) {
+  None
+} else {
+  val allParams = (paramsFromRows ++ paramsFromColumns ++ 
paramsFromSubExprs).unzip
+  val callParams = allParams._1.distinct
+  val declParams = allParams._2.distinct
+  Some((callParams, declParams))
+}
+  }
+
+  /**
+   * Returns the eliminated subexpressions in the children expressions.
+   */
+  def getSubExprInChildren(ctx: CodegenContext, expr: Expression): 
Seq[Expression] = {
+expr.children.flatMap { child =>
+  child.collect {
+case e if ctx.subExprEliminationExprs.contains(e) => e
+  }
+}.distinct
+  }
+
+  /**
+   * Given the list of eliminated subexpressions used in the children 
expressions, returns the
+   * strings of funtion parameters. The first is the variable names used 
to call the function,
+   * the second is the parameters used to declare the function in 
generated code.
+   */
+  def getParamsForSubExprs(
+  ctx: CodegenContext,
+  subExprs: Seq[Expression]): Seq[(String, String)] = {
+subExprs.flatMap { subExpr =>
+  val argType = ctx.javaType(subExpr.dataType)
+
+  val subExprState = ctx.subExprEliminationExprs(subExpr)
+  (subExprState.value, subExprState.isNull)
+
+  if (!subExpr.nullable || subExprState.isNull == "true" || 
subExprState.isNull == "false") {
+Seq((subExprState.value, s"$argType ${subExprState.value}"))
+  } else {
+Seq((subExprState.value, s"$argType ${subExprState.value}"),
+  (subExprState.isNull, s"boolean ${subExprState.isNull}"))
+  }
+}.distinct
+  }
+
+  /**
+   * Retrieves previous input rows referred by children and deferred 
expressions.
+   */
+  def getInputRowsForChildren(ctx: CodegenContext, expr: Expression): 
Seq[String] = {
+

[GitHub] spark issue #19591: [SPARK-11035][core] Add in-process Spark app launcher.

2017-11-30 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19591
  
Ping.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154195688
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * Defines APIs used in expression code generation.
+ */
+object ExpressionCodegen {
+
+  /**
+   * Given an expression, returns the all necessary parameters to evaluate 
it, so the generated
+   * code of this expression can be split in a function.
+   * The 1st string in returned tuple is the parameter strings used to 
call the function.
+   * The 2nd string in returned tuple is the parameter strings used to 
declare the function.
+   *
+   * Returns `None` if it can't produce valid parameters.
+   *
+   * Params to include:
+   * 1. Evaluated columns referred by this, children or deferred 
expressions.
+   * 2. Rows referred by this, children or deferred expressions.
+   * 3. Eliminated subexpressions referred bu children expressions.
+   */
+  def getExpressionInputParams(
+  ctx: CodegenContext,
+  expr: Expression): Option[(Seq[String], Seq[String])] = {
+val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr)
--- End diff --

nit: can we move this closer to when it is used, ie. before `val 
paramsFromColumns ...`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154195724
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * Defines APIs used in expression code generation.
+ */
+object ExpressionCodegen {
+
+  /**
+   * Given an expression, returns the all necessary parameters to evaluate 
it, so the generated
+   * code of this expression can be split in a function.
+   * The 1st string in returned tuple is the parameter strings used to 
call the function.
+   * The 2nd string in returned tuple is the parameter strings used to 
declare the function.
+   *
+   * Returns `None` if it can't produce valid parameters.
+   *
+   * Params to include:
+   * 1. Evaluated columns referred by this, children or deferred 
expressions.
+   * 2. Rows referred by this, children or deferred expressions.
+   * 3. Eliminated subexpressions referred bu children expressions.
+   */
+  def getExpressionInputParams(
+  ctx: CodegenContext,
+  expr: Expression): Option[(Seq[String], Seq[String])] = {
+val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr)
+val inputRows = ctx.INPUT_ROW +: getInputRowsForChildren(ctx, expr)
+val subExprs = getSubExprInChildren(ctx, expr)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19848: [SPARK-22162] Executors and the driver should use consis...

2017-11-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19848
  
**[Test build #84355 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84355/testReport)**
 for PR 19848 at commit 
[`f4ef351`](https://github.com/apache/spark/commit/f4ef351a05394ada5449e51a64606f0e5e7647c3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154194781
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala
 ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions._
+
+/**
+ * Defines APIs used in expression code generation.
+ */
+object ExpressionCodegen {
+
+  /**
+   * Given an expression, returns the all necessary parameters to evaluate 
it, so the generated
+   * code of this expression can be split in a function.
+   * The 1st string in returned tuple is the parameter strings used to 
call the function.
+   * The 2nd string in returned tuple is the parameter strings used to 
declare the function.
+   *
+   * Returns `None` if it can't produce valid parameters.
+   *
+   * Params to include:
+   * 1. Evaluated columns referred by this, children or deferred 
expressions.
+   * 2. Rows referred by this, children or deferred expressions.
+   * 3. Eliminated subexpressions referred bu children expressions.
+   */
+  def getExpressionInputParams(
+  ctx: CodegenContext,
+  expr: Expression): Option[(Seq[String], Seq[String])] = {
+val (inputAttrs, inputVars) = getInputVarsForChildren(ctx, expr)
+val inputRows = ctx.INPUT_ROW +: getInputRowsForChildren(ctx, expr)
--- End diff --

I see that as of now `ctx.INPUT_ROW` can't be `null` here, but can we add 
an assertion and maybe a comment for the reason of this? I think this would 
help people who might want to reuse this function in the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154193346
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -55,8 +55,20 @@ import org.apache.spark.util.{ParentClassLoader, Utils}
  * to null.
  * @param value A term for a (possibly primitive) value of the result of 
the evaluation. Not
  *  valid if `isNull` is set to `true`.
+ * @param inputRow A term that holds the input row name when generating 
this code.
+ * @param inputVars A list of [[ExprInputVar]] that holds input variables 
when generating this code.
  */
-case class ExprCode(var code: String, var isNull: String, var value: 
String)
+case class ExprCode(
+var code: String,
+var isNull: String,
+var value: String,
+var inputRow: String = null,
+val inputVars: mutable.ArrayBuffer[ExprInputVar] = 
mutable.ArrayBuffer.empty)
--- End diff --

if feasible, I'd prefer an immutable collection here (see my comment above)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-11-30 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19813#discussion_r154192687
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -115,9 +116,35 @@ abstract class Expression extends TreeNode[Expression] 
{
 }
   }
 
+  /**
+   * Records current input row and variables for this expression into 
created `ExprCode`.
+   */
+  private def populateInputs(ctx: CodegenContext, eval: ExprCode): Unit = {
+if (ctx.INPUT_ROW != null) {
+  eval.inputRow = ctx.INPUT_ROW
+}
+if (ctx.currentVars != null) {
+  val boundRefs = this.collect {
+case b @ BoundReference(ordinal, _, _) if ctx.currentVars(ordinal) 
!= null => (ordinal, b)
+  }.toMap
+
+  ctx.currentVars.zipWithIndex.filter(_._1 != null).foreach { case 
(currentVar, idx) =>
--- End diff --

why is this filter needed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 >

101 - 200 of 395 matches

Mail list logo