[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16818
  
**[Test build #72432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72432/testReport)**
 for PR 16818 at commit 
[`ea1f440`](https://github.com/apache/spark/commit/ea1f44026dc9f5b0f8e660b5adf8824b8deb94df).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ValuePreceding(value: Long) extends FrameBoundary `
  * `case class ValueFollowing(value: Long) extends FrameBoundary `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16820
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16819
  
**[Test build #72434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72434/testReport)**
 for PR 16819 at commit 
[`97e5eee`](https://github.com/apache/spark/commit/97e5eee6aaf2335a2af62e816b767d408c37a59e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16809
  
@cloud-fan The new behavior looks reasonable to me, unless users are 
expecting to keey the original cached data.

I went over the change history. I found @sameeragarwal did this in  
https://github.com/apache/spark/pull/13566 on purpose, even if he reported the 
issue in the initial PR (https://github.com/apache/spark/pull/13419). 

@sameeragarwal @hvanhovell @davies what is the reason we did not 
automatically call the `refreshByPath ` after insert? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16820
  
**[Test build #72445 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72445/testReport)**
 for PR 16820 at commit 
[`7ec5ebf`](https://github.com/apache/spark/commit/7ec5ebf867110f1c0105faee36ee6776ad36aab1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16680: [SPARK-16101][SQL] Refactoring CSV schema inferen...

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16680#discussion_r99583137
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
 ---
@@ -170,32 +111,21 @@ class CSVFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
 }
   }
 
-  // Consumes the header in the iterator.
-  CSVRelation.dropHeaderLine(file, lineIterator, csvOptions)
-
-  val filteredIter = lineIterator.filter { line =>
-line.trim.nonEmpty && !line.startsWith(commentPrefix)
+  val linesWithoutHeader = if (csvOptions.headerFlag && file.start == 
0) {
+// Note that if there are only comments in the first block, the 
header would probably
+// be not dropped.
+CSVUtils.dropHeaderLine(lines, csvOptions)
+  } else {
+lines
   }
 
+  val filteredLines = 
CSVUtils.filterCommentAndEmpty(linesWithoutHeader, csvOptions)
   val parser = new UnivocityParser(dataSchema, requiredSchema, 
csvOptions)
-  filteredIter.flatMap(parser.parse)
-}
-  }
-
-  /**
-   * Returns the first line of the first non-empty file in path
-   */
-  private def findFirstLine(options: CSVOptions, lines: Dataset[String]): 
String = {
-import lines.sqlContext.implicits._
-val nonEmptyLines = lines.filter(length(trim($"value")) > 0)
-if (options.isCommentSet) {
-  
nonEmptyLines.filter(!$"value".startsWith(options.comment.toString)).first()
-} else {
-  nonEmptyLines.first()
+  filteredLines.flatMap(parser.parse)
 }
   }
 
-  private def readText(
--- End diff --

Sure, either way is fine with me. I just resembled it from 
`JsonFileFormat.createBaseRDD`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16803: [SPARK-19458][SQL]load hive jars from local repo ...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16803#discussion_r99583052
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -480,7 +479,12 @@ object SparkSubmit extends CommandLineUtils {
 sysProp = "spark.driver.cores"),
   OptionAssigner(args.supervise.toString, STANDALONE | MESOS, CLUSTER,
 sysProp = "spark.driver.supervise"),
-  OptionAssigner(args.ivyRepoPath, STANDALONE, CLUSTER, sysProp = 
"spark.jars.ivy")
+  OptionAssigner(args.ivyRepoPath, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
+sysProp = "spark.jars.ivy"),
+  OptionAssigner(args.repositories, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
+sysProp = "spark.jars.repositories"),
--- End diff --

We need to document it in 
http://spark.apache.org/docs/latest/configuration.html, like what we did for 
`spark.jars.ivy`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16803: [SPARK-19458][SQL]load hive jars from local repo which h...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16803
  
Adding a new option `spark.jars.repositories` afffects more than loading 
hive jars, right? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-02-06 Thread robbinspg
Github user robbinspg commented on the issue:

https://github.com/apache/spark/pull/16751
  
Sorry, I've been away for the w/end. Yes we use maven for our test runs. 
Looks like you have it under control.
Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16803: [SPARK-19458][SQL]load hive jars from local repo which h...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16803
  
**[Test build #72428 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72428/testReport)**
 for PR 16803 at commit 
[`1bb31e5`](https://github.com/apache/spark/commit/1bb31e51a73565a07dc703edf51578762a47f5b2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16803: [SPARK-19458][SQL]load hive jars from local repo which h...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16803
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72428/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

2017-02-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16811#discussion_r99572391
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
---
@@ -232,19 +232,40 @@ class Word2VecModel private[ml] (
   @Since("1.5.0")
   def findSynonyms(word: String, num: Int): DataFrame = {
 val spark = SparkSession.builder().getOrCreate()
-spark.createDataFrame(wordVectors.findSynonyms(word, 
num)).toDF("word", "similarity")
+spark.createDataFrame(findSynonymsLocal(word, num)).toDF("word", 
"similarity")
   }
 
   /**
-   * Find "num" number of words whose vector representation most similar 
to the supplied vector.
+   * Find "num" number of words whose vector representation is most 
similar to the supplied vector.
* If the supplied vector is the vector representation of a word in the 
model's vocabulary,
* that word will be in the results.  Returns a dataframe with the words 
and the cosine
* similarities between the synonyms and the given word vector.
*/
   @Since("2.0.0")
   def findSynonyms(vec: Vector, num: Int): DataFrame = {
 val spark = SparkSession.builder().getOrCreate()
-spark.createDataFrame(wordVectors.findSynonyms(vec, num)).toDF("word", 
"similarity")
+spark.createDataFrame(findSynonymsLocal(vec, num)).toDF("word", 
"similarity")
+  }
+
+  /**
+   * Find "num" number of words whose vector representation is most 
similar to the supplied vector.
+   * If the supplied vector is the vector representation of a word in the 
model's vocabulary,
+   * that word will be in the results. Returns an array of the words and 
the cosine
+   * similarities between the synonyms and the given word vector.
+   */
+  @Since("2.2.0")
+  def findSynonymsLocal(vec: Vector, num: Int): Array[(String, Double)] = {
+wordVectors.findSynonyms(vec, num)
+  }
+
+  /**
+   * Find "num" number of words closest in similarity to the given word, 
not
+   * including the word itself. Returns a dataframe with the words and the
--- End diff --

(This doesn't return a DataFrame)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72442 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72442/testReport)**
 for PR 16787 at commit 
[`e971da0`](https://github.com/apache/spark/commit/e971da01e62c2e455504604120544f9a5e78588d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16792
  
**[Test build #72441 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72441/testReport)**
 for PR 16792 at commit 
[`2bd4417`](https://github.com/apache/spark/commit/2bd441728e4263b4efc8be1fda87d502fb0daf8b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread yangw1234
Github user yangw1234 commented on the issue:

https://github.com/apache/spark/pull/16820
  
@mengxr  @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16818
  
cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16792
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16792
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72441/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16792
  
**[Test build #72441 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72441/testReport)**
 for PR 16792 at commit 
[`2bd4417`](https://github.com/apache/spark/commit/2bd441728e4263b4efc8be1fda87d502fb0daf8b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16803: [SPARK-19458][SQL]load hive jars from local repo which h...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16803
  
Please rename it to [SPARK-19458][BUILD][SQL]load hive jars from local repo 
which has downloaded


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16626
  
**[Test build #72447 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72447/testReport)**
 for PR 16626 at commit 
[`88c2f48`](https://github.com/apache/spark/commit/88c2f48f730460c38aef50d02af08dd1df5c2097).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16680
  
**[Test build #72446 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72446/testReport)**
 for PR 16680 at commit 
[`6f7fa9b`](https://github.com/apache/spark/commit/6f7fa9b91fc5e86c3de06530486d29bbaf19079f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...

2017-02-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16387
  
@samkum Any update?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72433 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72433/testReport)**
 for PR 16787 at commit 
[`57521c6`](https://github.com/apache/spark/commit/57521c6edfef58c48c12904ce3b7fb4949a76f82).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72433/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16792
  
**[Test build #72439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72439/testReport)**
 for PR 16792 at commit 
[`78f04dc`](https://github.com/apache/spark/commit/78f04dc8a05c1b0373335f26464411e4a681729a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16815
  
**[Test build #3558 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3558/testReport)**
 for PR 16815 at commit 
[`754f705`](https://github.com/apache/spark/commit/754f705e26602444ee69781104728d786fc70707).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16815
  
**[Test build #72436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72436/testReport)**
 for PR 16815 at commit 
[`754f705`](https://github.com/apache/spark/commit/754f705e26602444ee69781104728d786fc70707).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16819
  
**[Test build #72434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72434/testReport)**
 for PR 16819 at commit 
[`97e5eee`](https://github.com/apache/spark/commit/97e5eee6aaf2335a2af62e816b767d408c37a59e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72438 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72438/testReport)**
 for PR 16787 at commit 
[`d822209`](https://github.com/apache/spark/commit/d82220973c945e08cd34855972461e96b56ea936).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16818
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16818
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72432/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16787
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and exten...

2017-02-06 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16792#discussion_r99572943
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1272,16 +1272,18 @@ def replace(self, to_replace, value, subset=None):
 """Returns a new :class:`DataFrame` replacing a value with another 
value.
 :func:`DataFrame.replace` and :func:`DataFrameNaFunctions.replace` 
are
 aliases of each other.
+Values `to_replace` and `value` should be homogeneous. Mixed 
string and numeric
--- End diff --

After a quick survey it looks like both behaviors (truncate and fp 
uniqueness) can be confusing so let's add both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16815
  
**[Test build #72435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72435/testReport)**
 for PR 16815 at commit 
[`754f705`](https://github.com/apache/spark/commit/754f705e26602444ee69781104728d786fc70707).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72442 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72442/testReport)**
 for PR 16787 at commit 
[`e971da0`](https://github.com/apache/spark/commit/e971da01e62c2e455504604120544f9a5e78588d).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72442/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16747
  
(FWIW, I am OK but just worried if it might be supposed to be internal 
type, maybe in the future)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16818
  
@uncleGen I think we should limit this to allowing long values for range 
frames only; row frames should not get larger than `1 << 31 + 1`. The reason 
for this is that we also need to be able to buffer that many rows and that this 
currently both not practical (I have yet too see someone hitting this limit), 
and that `WindowExec` assumes that the buffers are integer bound (see 
`RowBuffer.size` for instance). Also testing this will be a total PITA.

Just make sure we can construct a range frame that respects longs, and 
throw an error for row frames.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72448/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [WIP][SPARK-19448][SQL]optimize some duplication ...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99585661
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -841,5 +841,6 @@ private[client] class Shim_v1_2 extends Shim_v1_1 {
   case e: InvocationTargetException => throw e.getCause()
 }
   }
-
+  
 }
+
--- End diff --

This empty line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72448 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72448/testReport)**
 for PR 16787 at commit 
[`922eb9d`](https://github.com/apache/spark/commit/922eb9d182e547f0e5706f6ad8c924f4c9ef4496).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [WIP][SPARK-19448][SQL]optimize some duplication ...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99585644
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -841,5 +841,6 @@ private[client] class Shim_v1_2 extends Shim_v1_1 {
   case e: InvocationTargetException => throw e.getCause()
 }
   }
-
+  
--- End diff --

Please remove this empty spaces?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-02-06 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16751
  
Pardon me, but is there anywhere else keeping track of the build break with 
SBT? It's been failing for a while in master: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.2/

I can have a look at it if nobody else is


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove ...

2017-02-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16810#discussion_r99561230
  
--- Diff: docs/building-spark.md ---
@@ -63,57 +63,30 @@ with Maven profile settings and so on like the direct 
Maven build. Example:
 
 This will build Spark distribution along with Python pip and R packages. 
For more information on usage, run `./dev/make-distribution.sh --help`
 
-## Specifying the Hadoop Version
-
-Because HDFS is not protocol-compatible across versions, if you want to 
read from HDFS, you'll need to build Spark against the specific HDFS version in 
your environment. You can do this through the `hadoop.version` property. If 
unset, Spark will build against Hadoop 2.2.0 by default. Note that certain 
build profiles are required for particular Hadoop versions:
-
-
-  
-Hadoop versionProfile required
-  
-  
-2.2.xhadoop-2.2
-2.3.xhadoop-2.3
-2.4.xhadoop-2.4
-2.6.xhadoop-2.6
-2.7.x and later 2.xhadoop-2.7
-  
-
-
-Note that support for versions of Hadoop before 2.6 are deprecated as of 
Spark 2.1.0 and may be
-removed in Spark 2.2.0.
+## Specifying the Hadoop Version and Enabling YARN
 
+You can specify the exact version of Hadoop to compile against through the 
`hadoop.version` property. 
+If unset, Spark will build against Hadoop 2.6.0 by default.
--- End diff --

Yeah good call, let me fix up the version references to uniformly refer to 
the latest in each maintenance branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove support...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16810
  
**[Test build #72437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72437/testReport)**
 for PR 16810 at commit 
[`0220aad`](https://github.com/apache/spark/commit/0220aad0d9842a1582a82c958f2158a9cba11377).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16738: [SPARK-19398] Change one misleading log in TaskSetManage...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16738
  
**[Test build #3556 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3556/testReport)**
 for PR 16738 at commit 
[`a2cfd69`](https://github.com/apache/spark/commit/a2cfd6959ba670fdfe806880e216978d3c27b94d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16789: [SPARK-19444][ML][Documentation] Fix imports not being p...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16789
  
**[Test build #3557 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3557/testReport)**
 for PR 16789 at commit 
[`1d50a9d`](https://github.com/apache/spark/commit/1d50a9d7b9cde0584023243031c890c81cf591d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72440 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72440/testReport)**
 for PR 16787 at commit 
[`ebf875f`](https://github.com/apache/spark/commit/ebf875f6650bc182fbac3986745561ebe90f48d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16819
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16819
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72434/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72448 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72448/testReport)**
 for PR 16787 at commit 
[`922eb9d`](https://github.com/apache/spark/commit/922eb9d182e547f0e5706f6ad8c924f4c9ef4496).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16269
  
LGTM pending test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16809
  
Found the design doc: 
https://docs.google.com/document/d/1h5SzfC5UsvIrRpeLNDKSMKrKJvohkkccFlXo-GBAwQQ/edit?ts=574f717f#

> An alternative is to support a new command  REFRESH path that invalidates 
and refreshes all the cached data (and the associated metadata) for any 
dataframe that contains the given data source path. This acts as an explicit 
hammer without modifying the default behavior. Given that it’s fairly late to 
make significant changes in 2.0, this option might be less intrusive to the 
default behavior.

Should we revisit what is the expected default behavior in 2.2?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16820
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72445/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16820
  
**[Test build #72445 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72445/testReport)**
 for PR 16820 at commit 
[`7ec5ebf`](https://github.com/apache/spark/commit/7ec5ebf867110f1c0105faee36ee6776ad36aab1).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
@sujith71955 Thanks for the test! The test number looks promising!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16820
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72443 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72443/testReport)**
 for PR 16787 at commit 
[`9ccf9e3`](https://github.com/apache/spark/commit/9ccf9e364f0ac57f5b7c91a9b1bed6fb4c24098c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16820
  
@yangw1234 could you also check if we need to do this for whole stage code 
generation?

...and you really need to add tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16820
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16626
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16386
  
Sorry, I missed the ping. Will review it tonight. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [WIP][SPARK-19448][SQL]optimize some duplication ...

2017-02-06 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99588025
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -841,5 +841,6 @@ private[client] class Shim_v1_2 extends Shim_v1_1 {
   case e: InvocationTargetException => throw e.getCause()
 }
   }
-
+  
 }
+
--- End diff --

yes... I make a mistake, there is another HiveShim.scala .
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16818
  
@hvanhovell Thanks for your suggestions, it is just what I failed to notice 
or consider.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16816: Code style improvement

2017-02-06 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16816
  
@zhoucen  please close this PR and read 
http://spark.apache.org/contributing.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16817: [SPARK-17213][SQL][FOLLOWUP] Re-enable Parquet filter te...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16817
  
**[Test build #72430 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72430/testReport)**
 for PR 16817 at commit 
[`71a206f`](https://github.com/apache/spark/commit/71a206ff2f16a77359e7fe64086573d6c99795a7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-06 Thread sujith71955
Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/16677
  
@viirya i tested with the above mentioned approach with sample data, it has 
improved the performance almost into 3X
Please find the test report 
Total No of Executers = 3
Total Memory assigned = 66 G
Total Number of cores = 15 core
Number of Partition = 200
Data size = 10745616 (>10 milliion)
Limit value = 1000 (10 million)

query executed:  create destination_table as select * from source_table  
limit 1000;

Time Taken with current implementation with single partition : 383 seconds
**With use map output statistics(spark open source proposed solution) :120 
sec**  which is great!!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16803: [SPARK-19458][SQL]load hive jars from local repo which h...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16803
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72443/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72443 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72443/testReport)**
 for PR 16787 at commit 
[`9ccf9e3`](https://github.com/apache/spark/commit/9ccf9e364f0ac57f5b7c91a9b1bed6fb4c24098c).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16747
  
It seems there are several ones here and there. Maybe 
https://github.com/apache/spark/pull/15751#issuecomment-258518577 is related 
too because it is about supporting reading/writing out that type where it might 
refer that we can explicitly give the schema with that type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72449 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72449/testReport)**
 for PR 16787 at commit 
[`6566a59`](https://github.com/apache/spark/commit/6566a59e915e1a6e9e0a4bef8d4591a7ef6e18c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16816: Code style improvement

2017-02-06 Thread zhoucen
Github user zhoucen closed the pull request at:

https://github.com/apache/spark/pull/16816


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16815: [SPARK-19407][SS] defaultFS is used FileSystem.ge...

2017-02-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16815#discussion_r99556423
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala
 ---
@@ -47,7 +47,7 @@ object StreamMetadata extends Logging {
 
   /** Read the metadata from file if it exists */
   def read(metadataFile: Path, hadoopConf: Configuration): 
Option[StreamMetadata] = {
-val fs = FileSystem.get(hadoopConf)
+val fs = FileSystem.get(metadataFile.toUri, hadoopConf)
--- End diff --

I think this should be `metadataFile.getFileSystem(hadoopConf)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16819
  
I don't think this is a necessary change. Already, you can't ask for more 
resources than the cluster has; the cluster won't grant them. Capping it here 
means the app can't use more resources if the cluster suddenly gets more.

I see the problem you're trying to solve but the resource manager already 
ramps up requests slowly, so I don't think this is the issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

2017-02-06 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16747
  
CC @cloud-fan for https://github.com/apache/spark/pull/13008#r62947902 and 
@yhuai for https://github.com/apache/spark/pull/8597#r38769233 as they might be 
what you're referring to?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16792
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72438/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16792
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72439/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16792
  
**[Test build #72439 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72439/testReport)**
 for PR 16792 at commit 
[`78f04dc`](https://github.com/apache/spark/commit/78f04dc8a05c1b0373335f26464411e4a681729a).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72440/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72438 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72438/testReport)**
 for PR 16787 at commit 
[`d822209`](https://github.com/apache/spark/commit/d82220973c945e08cd34855972461e96b56ea936).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72440 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72440/testReport)**
 for PR 16787 at commit 
[`ebf875f`](https://github.com/apache/spark/commit/ebf875f6650bc182fbac3986745561ebe90f48d0).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16680
  
(I am fine with changing the name only for CSV ones for now as well. I 
would appreciate if you confirm please)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16815
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16817: [SPARK-17213][SQL][FOLLOWUP] Re-enable Parquet filter te...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16817
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16817: [SPARK-17213][SQL][FOLLOWUP] Re-enable Parquet filter te...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16817
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72430/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16820: [SPARK-19471] AggregationIterator does not initia...

2017-02-06 Thread yangw1234
GitHub user yangw1234 opened a pull request:

https://github.com/apache/spark/pull/16820

[SPARK-19471] AggregationIterator does not initialize the generated result 
projection before using it

## What changes were proposed in this pull request?

When AggregationIterator generates result projection, it does not call the 
initialize method of the Projection class. This will cause a runtime 
NullPointerException when the projection involves nondeterministic expressions.

This problem was introduced by #15567.

## How was this patch tested?

manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yangw1234/spark proj

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16820.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16820


commit 7ec5ebf867110f1c0105faee36ee6776ad36aab1
Author: wangyang 
Date:   2017-02-06T12:07:57Z

SPARK-19471




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16789: [SPARK-19444][ML][Documentation] Fix imports not being p...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16789
  
**[Test build #3557 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3557/testReport)**
 for PR 16789 at commit 
[`1d50a9d`](https://github.com/apache/spark/commit/1d50a9d7b9cde0584023243031c890c81cf591d7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16269
  
**[Test build #72444 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72444/testReport)**
 for PR 16269 at commit 
[`9742f78`](https://github.com/apache/spark/commit/9742f78412026b8bd83a3cd2f58c682362318d69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16269: [SPARK-19080][SQL] simplify data source analysis

2017-02-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16269#discussion_r99578597
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -374,25 +375,24 @@ case class BroadcastHint(child: LogicalPlan) extends 
UnaryNode {
  *  Map('a' -> Some('1'), 'b' -> Some('2')),
  *  and `INSERT INTO tbl PARTITION (a=1, b) AS ...`
  *  would have Map('a' -> Some('1'), 'b' -> None).
- * @param child the logical plan representing data to write to.
+ * @param query the logical plan representing data to write to.
  * @param overwrite overwrite existing table or partitions.
  * @param ifNotExists If true, only write if the table or partition does 
not exist.
  */
 case class InsertIntoTable(
 table: LogicalPlan,
 partition: Map[String, Option[String]],
-child: LogicalPlan,
+query: LogicalPlan,
 overwrite: Boolean,
 ifNotExists: Boolean)
   extends LogicalPlan {
-
-  override def children: Seq[LogicalPlan] = child :: Nil
-  override def output: Seq[Attribute] = Seq.empty
-
   assert(overwrite || !ifNotExists)
   assert(partition.values.forall(_.nonEmpty) || !ifNotExists)
 
-  override lazy val resolved: Boolean = childrenResolved && table.resolved
+  // We don't want `table` in children as sometimes we don't want to 
transform it.
+  override def children: Seq[LogicalPlan] = query :: Nil
+  override def output: Seq[Attribute] = Seq.empty
+  override lazy val resolved: Boolean = false
--- End diff --

now we will resolve `CreateTable`, `InsertIntoTable` to concrete commands, 
so the check can still work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16680
  
@cloud-fan, I just mainly resembled ones in JSON datasource and I am pretty 
sure you knew this when you added some comments. But let me just rebase this as 
is for now just in case maybe you are okay with them above and missed my 
reasons. 

I know it is not always true to follow existing implementation but maybe we 
could rename them together if the other one does not look appropriate. (I am 
fine with either way but just want to be sure that you know I had some reasons).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16680: [SPARK-16101][SQL] Refactoring CSV schema inferen...

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16680#discussion_r99583374
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -39,22 +37,76 @@ private[csv] object CSVInferSchema {
* 3. Replace any null types with string type
*/
   def infer(
-  tokenRdd: RDD[Array[String]],
-  header: Array[String],
+  csv: Dataset[String],
--- End diff --

This one too, I just resembled `json.InferSchema.infer` ...

```
  def infer(
  json: RDD[String],
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16680: [SPARK-16101][SQL] Refactoring CSV schema inferen...

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16680#discussion_r99583376
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import org.apache.spark.sql.Dataset
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+object CSVUtils {
+  /**
+   * Filter ignorable rows for CSV dataset (lines empty and starting with 
`comment`).
+   * This is currently being used in CSV schema inference.
+   */
+  def filterCommentAndEmpty(lines: Dataset[String], options: CSVOptions): 
Dataset[String] = {
+// Note that this was separately made by SPARK-18362. Logically, this 
should be the same
+// with the one below, `filterCommentAndEmpty` but execution path is 
different. One of them
+// might have to be removed in the near future if possible.
+import lines.sqlContext.implicits._
+val nonEmptyLines = lines.filter(length(trim($"value")) > 0)
+if (options.isCommentSet) {
+  nonEmptyLines.filter(!$"value".startsWith(options.comment.toString))
+} else {
+  nonEmptyLines
+}
+  }
+
+  /**
+   * Filter ignorable rows for CSV iterator (lines empty and starting with 
`comment`).
+   * This is currently being used in CSV reading path and CSV schema 
inference.
+   */
+  def filterCommentAndEmpty(iter: Iterator[String], options: CSVOptions): 
Iterator[String] = {
+iter.filter { line =>
+  line.trim.nonEmpty && !line.startsWith(options.comment.toString)
+}
+  }
+
+  /**
+   * Skip the given first line so that only data can remain in a dataset.
+   * This is similar with `dropHeaderLine` below and currently being used 
in CSV schema inference.
+   */
+  def filterHeaderLine(
+   iter: Iterator[String],
+   firstLine: String,
+   options: CSVOptions): Iterator[String] = {
+// Note that unlike actual CSV reading path, it simply filters the 
given first line. Therefore,
+// this skips the line same with the header if exists. One of them 
might have to be removed
+// in the near future if possible.
+if (options.headerFlag) {
+  iter.filterNot(_ == firstLine)
+} else {
+  iter
+}
+  }
+
+  /**
+   * Drop header line so that only data can remain.
+   * This is similar with `filterHeaderLine` above and currently being 
used in CSV reading path.
+   */
+  def dropHeaderLine(iter: Iterator[String], options: CSVOptions): 
Iterator[String] = {
+val nonEmptyLines = if (options.isCommentSet) {
+  val commentPrefix = options.comment.toString
+  iter.dropWhile { line =>
+line.trim.isEmpty || line.trim.startsWith(commentPrefix)
+  }
+} else {
+  iter.dropWhile(_.trim.isEmpty)
+}
+
+if (nonEmptyLines.hasNext) nonEmptyLines.drop(1)
+iter
+  }
+
+  /**
+   * Helper method that converts string representation of a character to 
actual character.
+   * It handles some Java escaped strings and throws exception if given 
string is longer than one
+   * character.
+   */
+  @throws[IllegalArgumentException]
+  def toChar(str: String): Char = {
+if (str.charAt(0) == '\\') {
+  str.charAt(1)
+  match {
+case 't' => '\t'
+case 'r' => '\r'
+case 'b' => '\b'
+case 'f' => '\f'
+case '\"' => '\"' // In case user changes quote char and uses \" 
as delimiter in options
+case '\'' => '\''
+case 'u' if str == """\u""" => '\u'
+case _ =>
+  throw new IllegalArgumentException(s"Unsupported special 
character for delimiter: $str")
+  }
+} else if (str.length == 1) {
+  

  1   2   3   4   5   6   >