date:20180101

[GitHub] spark issue #20124: [WIP][SPARK-22126][ML] Fix model-specific optimization s...

2018-01-01 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/20124
  
(Happy new year!)  Just commented on the JIRA; let me know what you think.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20127: [SPARK-22932] [SQL] Refactor AnalysisContext

2018-01-01 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20127#discussion_r159150948
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -70,6 +71,8 @@ object AnalysisContext {
   }
 
   def get: AnalysisContext = value.get()
+  def reset(): Unit = value.remove()
--- End diff --

`private`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20127: [SPARK-22932] [SQL] Refactor AnalysisContext

2018-01-01 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20127#discussion_r159150900
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -95,6 +98,17 @@ class Analyzer(
 this(catalog, conf, conf.optimizerMaxIterations)
   }
 
+  override def execute(plan: LogicalPlan): LogicalPlan = {
+AnalysisContext.reset()
+try {
+  executeSameContext(plan)
+} finally {
+  AnalysisContext.reset()
+}
+  }
+
+  private def executeSameContext(plan: LogicalPlan): LogicalPlan = 
super.execute(plan)
--- End diff --

`executeWithSameContext`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20132: [SPARK-13030][ML] Follow-up cleanups for OneHotEncoderEs...

2018-01-01 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20132
  
The simplified logic for encoder looks good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20132: [SPARK-13030][ML] Follow-up cleanups for OneHotEncoderEs...

2018-01-01 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20132
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20133
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85576/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20133
  
**[Test build #85576 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85576/testReport)**
 for PR 20133 at commit 
[`8ae8f18`](https://github.com/apache/spark/commit/8ae8f1832a62caf10a62511f339402c0d94f89ea).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20133
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20129: [SPARK-22933][SPARKR] R Structured Streaming API for wit...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20129
  
**[Test build #85577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85577/testReport)**
 for PR 20129 at commit 
[`137d1cb`](https://github.com/apache/spark/commit/137d1cb186aa826842ff7897cfd165429fb0b44b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-01-01 Thread sharkdtu

Github user sharkdtu commented on the issue:

https://github.com/apache/spark/pull/20078
  
@felixcheung 
At the beginning, if numReceivers > totleExecutorCores,  there is not cpu 
cores for batch processing, and `ExecutorAllocationManager` can't listen 
metrics of any batches. As a result, it doesn't work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20070: SPARK-22896 Improvement in String interpolation

2018-01-01 Thread chetkhatri

Github user chetkhatri commented on a diff in the pull request:

https://github.com/apache/spark/pull/20070#discussion_r159152519
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/LatentDirichletAllocationExample.scala
 ---
@@ -46,7 +46,10 @@ object LatentDirichletAllocationExample {
 val topics = ldaModel.topicsMatrix
 for (topic <- Range(0, 3)) {
   print(s"Topic $topic :")
-  for (word <- Range(0, ldaModel.vocabSize)) { print(s" ${topics(word, 
topic)}") }
+  for (word <- Range(0, ldaModel.vocabSize))
+  {
--- End diff --

@srowen sure done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19992: [SPARK-22805][CORE] Use StorageLevel aliases in e...

2018-01-01 Thread superbobry

Github user superbobry commented on a diff in the pull request:

https://github.com/apache/spark/pull/19992#discussion_r159153806
  
--- Diff: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala 
---
@@ -2022,12 +1947,7 @@ private[spark] object JsonProtocolSuite extends 
Assertions {
   |  "Port": 300
   |},
   |"Block ID": "rdd_0_0",
-  |"Storage Level": {
--- End diff --

I've added a test ensuring all predefine storage levels can be read from 
the legacy format.

Sidenote: I've also noticed that the legacy format incorrectly handled the 
predefined `StorageLevel.OFF_HEAP` and an fact any other custom storage level 
with `useOffHeap = true`. It looks like a bug to me, wdyt?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20129: [SPARK-22933][SPARKR] R Structured Streaming API for wit...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20129
  
**[Test build #85577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85577/testReport)**
 for PR 20129 at commit 
[`137d1cb`](https://github.com/apache/spark/commit/137d1cb186aa826842ff7897cfd165429fb0b44b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20129: [SPARK-22933][SPARKR] R Structured Streaming API for wit...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20129
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85577/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20129: [SPARK-22933][SPARKR] R Structured Streaming API for wit...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20129
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20133
  
**[Test build #85578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85578/testReport)**
 for PR 20133 at commit 
[`0894f5e`](https://github.com/apache/spark/commit/0894f5e5a6cac2f73ad30fc80de3cd82b3020de6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19992
  
**[Test build #85579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85579/testReport)**
 for PR 19992 at commit 
[`9fbfe40`](https://github.com/apache/spark/commit/9fbfe40f5ca83f080f56f3e91c7a6f3f27471df5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19992
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19992
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85579/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19992
  
**[Test build #85579 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85579/testReport)**
 for PR 19992 at commit 
[`9fbfe40`](https://github.com/apache/spark/commit/9fbfe40f5ca83f080f56f3e91c7a6f3f27471df5).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19992
  
**[Test build #85580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85580/testReport)**
 for PR 19992 at commit 
[`cb1fe6a`](https://github.com/apache/spark/commit/cb1fe6a572d8085d36884bf950a840f972976458).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20133#discussion_r159156756
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
 ---
@@ -39,6 +41,17 @@ object ParserUtils {
 throw new ParseException(s"Operation not allowed: $message", ctx)
   }
 
+  def duplicateClausesNotAllowed(message: String, ctx: ParserRuleContext): 
Nothing = {
+throw new ParseException(s"Found duplicate clauses: $message", ctx)
--- End diff --

We cannot merge these two functions to check the duplication?
e.g.,
```
  def checkDuplicateClauses[T](nodes: util.List[T], clauseName: String, 
ctx: ParserRuleContext): Unit = {
if (nodes.size() > 1) {
  throw new ParseException(s"Found duplicate clauses: $clauseName", ctx)
}
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19520: [SPARK-22298][WEB-UI] url encode APP id before ge...

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19520


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19613: Fixed a typo

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19613


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19739: [SPARK-22513][BUILD] Provide build profile for ha...

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19739


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20027: Branch 2.2

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20027


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19933: [SPARK-22744][CORE] Add a configuration to show t...

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19933


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20104: Merge pull request #1 from apache/master

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20104


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18916: [SPARK-21705][CORE][DOC]Add spark.internal.config...

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18916


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19936: Branch 0.5

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19936


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19919: [SPARK-22727] spark.executor.instances's default ...

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19919


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20131: [MINOR] Fix a bunch of typos

2018-01-01 Thread srowen

Github user srowen closed the pull request at:

https://github.com/apache/spark/pull/20131


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20044: [SPARK-22857] Optimize code by inspecting code

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20044


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19035: [SPARK-21822][SQL]When insert Hive Table is finis...

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19035


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19917: [SPARK-22725][SQL] Add failing test for select wi...

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19917


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20130: [BUILD] Close stale PRs

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20130


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20132: [SPARK-13030][ML] Follow-up cleanups for OneHotEn...

2018-01-01 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20132#discussion_r159157608
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala 
---
@@ -205,60 +210,58 @@ class OneHotEncoderModel private[ml] (
 
   import OneHotEncoderModel._
 
-  // Returns the category size for a given index with `dropLast` and 
`handleInvalid`
+  // Returns the category size for each index with `dropLast` and 
`handleInvalid`
   // taken into account.
-  private def configedCategorySize(orgCategorySize: Int, idx: Int): Int = {
+  private def getConfigedCategorySizes: Array[Int] = {
 val dropLast = getDropLast
 val keepInvalid = getHandleInvalid == 
OneHotEncoderEstimator.KEEP_INVALID
 
 if (!dropLast && keepInvalid) {
   // When `handleInvalid` is "keep", an extra category is added as 
last category
   // for invalid data.
-  orgCategorySize + 1
+  categorySizes.map(_ + 1)
 } else if (dropLast && !keepInvalid) {
   // When `dropLast` is true, the last category is removed.
-  orgCategorySize - 1
+  categorySizes.map(_ - 1)
 } else {
   // When `dropLast` is true and `handleInvalid` is "keep", the extra 
category for invalid
   // data is removed. Thus, it is the same as the plain number of 
categories.
-  orgCategorySize
+  categorySizes
 }
   }
 
   private def encoder: UserDefinedFunction = {
-val oneValue = Array(1.0)
-val emptyValues = Array.empty[Double]
-val emptyIndices = Array.empty[Int]
-val dropLast = getDropLast
-val handleInvalid = getHandleInvalid
-val keepInvalid = handleInvalid == OneHotEncoderEstimator.KEEP_INVALID
+val keepInvalid = getHandleInvalid == 
OneHotEncoderEstimator.KEEP_INVALID
+val configedSizes = getConfigedCategorySizes
+val localCategorySizes = categorySizes
 
 // The udf performed on input data. The first parameter is the input 
value. The second
-// parameter is the index of input.
-udf { (label: Double, idx: Int) =>
-  val plainNumCategories = categorySizes(idx)
-  val size = configedCategorySize(plainNumCategories, idx)
-
-  if (label < 0) {
-throw new SparkException(s"Negative value: $label. Input can't be 
negative.")
-  } else if (label == size && dropLast && !keepInvalid) {
-// When `dropLast` is true and `handleInvalid` is not "keep",
-// the last category is removed.
-Vectors.sparse(size, emptyIndices, emptyValues)
-  } else if (label >= plainNumCategories && keepInvalid) {
-// When `handleInvalid` is "keep", encodes invalid data to last 
category (and removed
-// if `dropLast` is true)
-if (dropLast) {
-  Vectors.sparse(size, emptyIndices, emptyValues)
+// parameter is the index in inputCols of the column being encoded.
+udf { (label: Double, colIdx: Int) =>
+  val origCategorySize = localCategorySizes(colIdx)
+  // idx: index in vector of the single 1-valued element
+  val idx = if (label >= 0 && label < origCategorySize) {
+label
+  } else {
+if (keepInvalid) {
+  origCategorySize
 } else {
-  Vectors.sparse(size, Array(size - 1), oneValue)
+  if (label < 0) {
+throw new SparkException(s"Negative value: $label. Input can't 
be negative. " +
--- End diff --

I have a question. Since we don't allow negative value when fitting, should 
we allow it in transforming even handleInvalid is KEEP_INVALID?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20133
  
**[Test build #85578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85578/testReport)**
 for PR 20133 at commit 
[`0894f5e`](https://github.com/apache/spark/commit/0894f5e5a6cac2f73ad30fc80de3cd82b3020de6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20133
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20133
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85578/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19992
  
**[Test build #85580 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85580/testReport)**
 for PR 19992 at commit 
[`cb1fe6a`](https://github.com/apache/spark/commit/cb1fe6a572d8085d36884bf950a840f972976458).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14180: [SPARK-16367][PYSPARK] Support for deploying Anaconda an...

2018-01-01 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/14180
  
gentle ping @ueshin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19992
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85580/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19992
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2018-01-01 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19077
  
ping @cloud-fan shall we continue with this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18714: [SPARK-20236][SQL] runtime partition overwrite

2018-01-01 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/18714
  
Is this PR still targeted to 2.3? @cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20127: [SPARK-22932] [SQL] Refactor AnalysisContext

2018-01-01 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20127
  
lgtm


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20068: [SPARK-17916][SQL] Fix empty string being parsed ...

2018-01-01 Thread aa8y

Github user aa8y commented on a diff in the pull request:

https://github.com/apache/spark/pull/20068#discussion_r159160508
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -152,7 +152,11 @@ class CSVOptions(
 
writerSettings.setIgnoreLeadingWhitespaces(ignoreLeadingWhiteSpaceFlagInWrite)
 
writerSettings.setIgnoreTrailingWhitespaces(ignoreTrailingWhiteSpaceFlagInWrite)
 writerSettings.setNullValue(nullValue)
-writerSettings.setEmptyValue(nullValue)
+// The Univocity parser parses empty strings as `null` by default. 
This is the default behavior
+// for Spark too, since `nullValue` defaults to an empty string and 
has a higher precedence to
+// setEmptyValue(). But when `nullValue` is set to a different value, 
that would mean that the
+// empty string should be parsed not as `null` but as an empty string.
+writerSettings.setEmptyValue("")
--- End diff --

I talked about this with Hyukjin Kwon before. I think the previous behavior 
should _not_ be exposed as an option as the previous behavior was a bug. All it 
did was that it _always_ coerced empty values to `null`s. If the `nullValue` 
was not set, then the it was set to `""` by default which coerced `""` to 
`null`. The empty value being set to `""` had no affect in this case. If it was 
set to something else, say `\N`, then the empty value was also set to `\N` 
which resulted in parsing both `\N` and `""` to `null`, as `""` was no longer 
considered as an empty value and the `""` being coerced to null is the 
Univocity parser's default.

Setting empty value explicitly to the `""` literal would ensure that an 
empty string is always parsed as empty string, unless `nullValue` is not set or 
it is set to `""`, which is what people would do if they want `""` to be parsed 
as `null`, which would be the old behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20133#discussion_r159170624
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1971,8 +1971,8 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
   s"""
  |CREATE TABLE t(a int, b int, c int, d int)
  |USING parquet
- |PARTITIONED BY(a, b)
  |LOCATION "${dir.toURI}"
+ |PARTITIONED BY(a, b)
--- End diff --

This is an end-to-end test for `ORDER-INSENSITIVENESS`. I do not want to 
introduce a new one for it 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20133#discussion_r159170627
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -875,12 +875,13 @@ class HiveDDLSuite
 
   test("desc table for Hive table - bucketed + sorted table") {
 withTable("tbl") {
-  sql(s"""
-CREATE TABLE tbl (id int, name string)
-PARTITIONED BY (ds string)
-CLUSTERED BY(id)
-SORTED BY(id, name) INTO 1024 BUCKETS
-""")
+  sql(
+s"""
+  |CREATE TABLE tbl (id int, name string)
+  |CLUSTERED BY(id)
+  |SORTED BY(id, name) INTO 1024 BUCKETS
+  |PARTITIONED BY (ds string)
+""".stripMargin)
--- End diff --

The same here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20133
  
**[Test build #85584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85584/testReport)**
 for PR 20133 at commit 
[`68170bb`](https://github.com/apache/spark/commit/68170bb45c64bb5b694bfdffd7c7f02801f9b82e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20133
  
**[Test build #85585 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85585/testReport)**
 for PR 20133 at commit 
[`9818ab5`](https://github.com/apache/spark/commit/9818ab53d5b32aa89fe825a8a6ebce867ed51f01).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-01-01 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20078
  
hmm, that sounds like a different problem, why is numReceivers set to >  
spark.cores.max?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20133#discussion_r159170197
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -408,9 +417,17 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
 .map(visitIdentifierList(_).toArray)
 .getOrElse(Array.empty[String])
 val properties = 
Option(ctx.tableProps).map(visitPropertyKeyValues).getOrElse(Map.empty)
-val bucketSpec = Option(ctx.bucketSpec()).map(visitBucketSpec)
+val bucketSpec = if (ctx.bucketSpec().size > 1) {
+  duplicateClausesNotAllowed("CLUSTERED BY", ctx)
--- End diff --

Sure


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20133#discussion_r159168759
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1971,8 +1971,8 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
   s"""
  |CREATE TABLE t(a int, b int, c int, d int)
  |USING parquet
- |PARTITIONED BY(a, b)
  |LOCATION "${dir.toURI}"
+ |PARTITIONED BY(a, b)
--- End diff --

Is it a relevant change? Since the PR is about ORDER-INSENSITIVENESS, can 
we keep the original code instead of making an irrelevant change like this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19498
  
**[Test build #85581 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85581/testReport)**
 for PR 19498 at commit 
[`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19968: [SPARK-22769][CORE] When driver stopping, there i...

2018-01-01 Thread KaiXinXiaoLei

Github user KaiXinXiaoLei commented on a diff in the pull request:

https://github.com/apache/spark/pull/19968#discussion_r159167935
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala 
---
@@ -100,6 +102,7 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv, 
numUsableCores: Int) exte
 return
   }
   unregisterRpcEndpoint(rpcEndpointRef.name)
+  endpointsIsStopped.putIfAbsent(rpcEndpointRef.name, true)
--- End diff --

ok thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20133#discussion_r159168832
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala
 ---
@@ -1153,65 +1191,165 @@ class DDLParserSuite extends PlanTest with 
SharedSQLContext {
 }
   }
 
+  test("Test CTAS against data source tables") {
+val s1 =
+  """
+|CREATE TABLE IF NOT EXISTS mydb.page_view
+|USING parquet
+|COMMENT 'This is the staging page view table'
+|LOCATION '/user/external/page_view'
+|TBLPROPERTIES ('p1'='v1', 'p2'='v2')
+|AS SELECT * FROM src
+  """.stripMargin
+
+val s2 =
+  """
+|CREATE TABLE IF NOT EXISTS mydb.page_view
+|USING parquet
+|LOCATION '/user/external/page_view'
+|COMMENT 'This is the staging page view table'
+|TBLPROPERTIES ('p1'='v1', 'p2'='v2')
+|AS SELECT * FROM src
+  """.stripMargin
+
+val s3 =
+  """
+|CREATE TABLE IF NOT EXISTS mydb.page_view
+|USING parquet
+|COMMENT 'This is the staging page view table'
+|LOCATION '/user/external/page_view'
+|TBLPROPERTIES ('p1'='v1', 'p2'='v2')
+|AS SELECT * FROM src
+  """.stripMargin
+
+checkParsing(s1)
+checkParsing(s2)
+checkParsing(s3)
+
+def checkParsing(sql: String): Unit = {
+  val (desc, exists) = extractTableDesc(sql)
+  assert(exists)
+  assert(desc.identifier.database == Some("mydb"))
+  assert(desc.identifier.table == "page_view")
+  assert(desc.storage.locationUri == Some(new 
URI("/user/external/page_view")))
+  assert(desc.schema.isEmpty) // will be populated later when the 
table is actually created
+  assert(desc.comment == Some("This is the staging page view table"))
+  assert(desc.viewText.isEmpty)
+  assert(desc.viewDefaultDatabase.isEmpty)
+  assert(desc.viewQueryColumnNames.isEmpty)
+  assert(desc.partitionColumnNames.isEmpty)
+  assert(desc.provider == Some("parquet"))
+  assert(desc.properties == Map("p1" -> "v1", "p2" -> "v2"))
+}
+  }
+
   test("Test CTAS #1") {
 val s1 =
-  """CREATE EXTERNAL TABLE IF NOT EXISTS mydb.page_view
+  """
+|CREATE EXTERNAL TABLE IF NOT EXISTS mydb.page_view
 |COMMENT 'This is the staging page view table'
 |STORED AS RCFILE
 |LOCATION '/user/external/page_view'
 |TBLPROPERTIES ('p1'='v1', 'p2'='v2')
-|AS SELECT * FROM src""".stripMargin
+|AS SELECT * FROM src
+   """.stripMargin
--- End diff --

nit. extra space before `"""`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19498
  
**[Test build #85581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85581/testReport)**
 for PR 19498 at commit 
[`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20133#discussion_r159168804
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -875,12 +875,13 @@ class HiveDDLSuite
 
   test("desc table for Hive table - bucketed + sorted table") {
 withTable("tbl") {
-  sql(s"""
-CREATE TABLE tbl (id int, name string)
-PARTITIONED BY (ds string)
-CLUSTERED BY(id)
-SORTED BY(id, name) INTO 1024 BUCKETS
-""")
+  sql(
+s"""
+  |CREATE TABLE tbl (id int, name string)
+  |CLUSTERED BY(id)
+  |SORTED BY(id, name) INTO 1024 BUCKETS
+  |PARTITIONED BY (ds string)
+""".stripMargin)
--- End diff --

Can we keep the original `HiveDDLSuite.scala` file, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20127: [SPARK-22932] [SQL] Refactor AnalysisContext

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20127


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20133#discussion_r159164626
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -408,9 +417,17 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
 .map(visitIdentifierList(_).toArray)
 .getOrElse(Array.empty[String])
 val properties = 
Option(ctx.tableProps).map(visitPropertyKeyValues).getOrElse(Map.empty)
-val bucketSpec = Option(ctx.bucketSpec()).map(visitBucketSpec)
+val bucketSpec = if (ctx.bucketSpec().size > 1) {
+  duplicateClausesNotAllowed("CLUSTERED BY", ctx)
--- End diff --

Can you split the validation logic and the extraction logic? In this case 
I'd move the check to line 411 and do the extract on line 420.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...

2018-01-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20114
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18714: [SPARK-20236][SQL] runtime partition overwrite

2018-01-01 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18714
  
ah yes, please please :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20133#discussion_r159150554
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -384,22 +384,31 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
*   CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
*   USING table_provider
*   [OPTIONS table_property_list]
-   *   [PARTITIONED BY (col_name, col_name, ...)]
-   *   [CLUSTERED BY (col_name, col_name, ...)
-   *[SORTED BY (col_name [ASC|DESC], ...)]
-   *INTO num_buckets BUCKETS
-   *   ]
-   *   [LOCATION path]
-   *   [COMMENT table_comment]
-   *   [TBLPROPERTIES (property_name=property_value, ...)]
+   *   create_table_clauses
*   [[AS] select_statement];
+   *
+   *   create_table_clauses (order insensitive):
+   * [PARTITIONED BY (col_name, col_name, ...)]
--- End diff --

Isn't `[OPTIONS table_property_list]` one of `create_table_clauses`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20133#discussion_r159170195
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
 ---
@@ -39,6 +41,17 @@ object ParserUtils {
 throw new ParseException(s"Operation not allowed: $message", ctx)
   }
 
+  def duplicateClausesNotAllowed(message: String, ctx: ParserRuleContext): 
Nothing = {
+throw new ParseException(s"Found duplicate clauses: $message", ctx)
--- End diff --

Sounds good to me! 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85582/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20106: [SPARK-21616][SPARKR][DOCS] update R migration gu...

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20106


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support f...

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20114


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19498
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85581/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20127: [SPARK-22932] [SQL] Refactor AnalysisContext

2018-01-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20127
  
Thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2018-01-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19498
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19498
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20128: [SPARK-21893][SPARK-22142][TESTS][FOLLOWUP] Enabl...

2018-01-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20128


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20127: [SPARK-22932] [SQL] Refactor AnalysisContext

2018-01-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20127#discussion_r159168110
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -70,6 +71,8 @@ object AnalysisContext {
   }
 
   def get: AnalysisContext = value.get()
+  def reset(): Unit = value.remove()
--- End diff --

Will be resolved by the future PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20128: [SPARK-21893][SPARK-22142][TESTS][FOLLOWUP] Enables PySp...

2018-01-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20128
  
Merged to master.

Thank you @srowen, @felixcheung and @ueshin for reviewing this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20131: [MINOR] Fix a bunch of typos

2018-01-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20131
  
Merged to master - 
https://github.com/apache/spark/commit/c284c4e1f6f684ca8db1cc446fdcc43b46e3413c


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...

2018-01-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20133#discussion_r159170240
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -384,22 +384,31 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
*   CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
*   USING table_provider
*   [OPTIONS table_property_list]
-   *   [PARTITIONED BY (col_name, col_name, ...)]
-   *   [CLUSTERED BY (col_name, col_name, ...)
-   *[SORTED BY (col_name [ASC|DESC], ...)]
-   *INTO num_buckets BUCKETS
-   *   ]
-   *   [LOCATION path]
-   *   [COMMENT table_comment]
-   *   [TBLPROPERTIES (property_name=property_value, ...)]
+   *   create_table_clauses
*   [[AS] select_statement];
+   *
+   *   create_table_clauses (order insensitive):
+   * [PARTITIONED BY (col_name, col_name, ...)]
--- End diff --

forgot it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20087
  
**[Test build #85582 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85582/testReport)**
 for PR 20087 at commit 
[`e9f705d`](https://github.com/apache/spark/commit/e9f705d0ad783da5bd091632e98a6151d4d21cb6).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20106: [SPARK-21616][SPARKR][DOCS] update R migration guide and...

2018-01-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20106
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20087
  
**[Test build #85582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85582/testReport)**
 for PR 20087 at commit 
[`e9f705d`](https://github.com/apache/spark/commit/e9f705d0ad783da5bd091632e98a6151d4d21cb6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20087
  
**[Test build #85583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85583/testReport)**
 for PR 20087 at commit 
[`d3aa7a0`](https://github.com/apache/spark/commit/d3aa7a01320b6af2866d7cf7c4f178eb23eae3ad).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20082: [SPARK-22897][CORE]: Expose stageAttemptId in TaskContex...

2018-01-01 Thread advancedxy

Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/20082
  
@cloud-fan Please take another look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19968: [SPARK-22769][CORE] When driver stopping, there is error...

2018-01-01 Thread KaiXinXiaoLei

Github user KaiXinXiaoLei commented on the issue:

https://github.com/apache/spark/pull/19968
  
@srowen ok ,i will update, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2018-01-01 Thread CodingCat

Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/20072#discussion_r159171970
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -261,6 +261,17 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val HADOOPFSRELATION_SIZE_FACTOR = buildConf(
+"org.apache.spark.sql.execution.datasources.sizeFactor")
--- End diff --

this is only for HadoopFSRelation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20072
  
**[Test build #85586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85586/testReport)**
 for PR 20072 at commit 
[`e97f419`](https://github.com/apache/spark/commit/e97f419a5c3347242832287a9e5b0e5662f9e6bb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-01 Thread CodingCat

Github user CodingCat commented on the issue:

https://github.com/apache/spark/pull/20072
  
@wzhfy thanks for the review, please take a look 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20087
  
**[Test build #85583 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85583/testReport)**
 for PR 20087 at commit 
[`d3aa7a0`](https://github.com/apache/spark/commit/d3aa7a01320b6af2866d7cf7c4f178eb23eae3ad).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85583/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-01-01 Thread sharkdtu

Github user sharkdtu commented on the issue:

https://github.com/apache/spark/pull/20078
  
@felixcheung
if you submit spark on yarn with 
`spark.streaming.dynamicAllocation.enabled=true`, the `num-executors` can not 
be set. So, at the begining, there are only 2(default value) executors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20127: [SPARK-22932] [SQL] Refactor AnalysisContext

2018-01-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20127#discussion_r159174832
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -95,6 +98,17 @@ class Analyzer(
 this(catalog, conf, conf.optimizerMaxIterations)
   }
 
+  override def execute(plan: LogicalPlan): LogicalPlan = {
+AnalysisContext.reset()
+try {
+  executeSameContext(plan)
+} finally {
+  AnalysisContext.reset()
+}
+  }
+
+  private def executeSameContext(plan: LogicalPlan): LogicalPlan = 
super.execute(plan)
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2018-01-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20072#discussion_r159175087
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
 ---
@@ -82,7 +82,11 @@ case class HadoopFsRelation(
 }
   }
 
-  override def sizeInBytes: Long = location.sizeInBytes
+  override def sizeInBytes: Long = {
+val sizeFactor = sqlContext.conf.sizeToMemorySizeFactor
+(location.sizeInBytes * sizeFactor).toLong
--- End diff --

we should add a safe check for overflow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2018-01-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20072#discussion_r159175078
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -261,6 +261,17 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
+  val DISK_TO_MEMORY_SIZE_FACTOR = buildConf(
+"org.apache.spark.sql.execution.datasources.sizeFactor")
--- End diff --

`...sizeFactor` is too vague, how about `fileDataSizeFactor`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2018-01-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19077
  
The idea LGTM, but I think we can simplify the implementation to allow the 
memory allocator to return a larger memory than requested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20082: [SPARK-22897][CORE]: Expose stageAttemptId in Tas...

2018-01-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20082#discussion_r159175320
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -79,6 +79,7 @@ private[spark] abstract class Task[T](
 SparkEnv.get.blockManager.registerTask(taskAttemptId)
 context = new TaskContextImpl(
   stageId,
+  stageAttemptId, // stageAttemptId and stageAttemptNumber are 
semantically equal
--- End diff --

How much work we need to rename the internal `stageAttemptId` to 
`stageAttemptNumber`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20082: [SPARK-22897][CORE]: Expose stageAttemptId in TaskContex...

2018-01-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20082
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20120: [SPARK-22926] [SQL] Respect table-level conf compression...

2018-01-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20120
  
I think we should document the difference between table options and 
properties. AFAIK we added table properties to data source tables since Spark 
2.3, and previously table options is the only place for users to put some 
configs to change some behaviors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20119: [SPARK-21475][Core]Revert "[SPARK-21475][CORE] Use NIO's...

2018-01-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20119
  
let's also cc the author. @jerryshao do you know if there is a way to fix 
the regression?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 112 matches

Mail list logo