[GitHub] spark issue #21511: [SPARK-24491][Kubernetes] Configuration support for requ...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21511
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-09 Thread KyleLi1985
Github user KyleLi1985 commented on the issue:

https://github.com/apache/spark/pull/22893
  
@SparkQA test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22992: [SPARK-24229] Update to Apache Thrift 0.10.0

2018-11-09 Thread Fokko
Github user Fokko closed the pull request at:

https://github.com/apache/spark/pull/22992


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22992: [SPARK-24229] Update to Apache Thrift 0.10.0

2018-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22992
  
@mingwandroid . If you are worrying about the real issues, could you lend 
us your hand, please? Reopening the issue with the valid reproducible case is 
always welcome.

Apache Spark community do seriously care about the correct CVE report, and 
provide backports.
- http://spark.apache.org/security.html

Alarming real risks is the only way to make people happy. We should not 
make people surprise with wrong reasons. Apache Spark issues and commits are 
precious resources. Not only you, all downstream are affected. So, we are 
trying to do our best to deliver only the correct patch.

If we cry `Wolf, Wolf` for incorrect situation repeatedly, Apache Spark 
security alert's credibility will go down gradually (and seriously eventually). 
Nobody believes Spark's security alart in the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22899
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22899
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4908/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22899
  
**[Test build #98674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98674/testReport)**
 for PR 22899 at commit 
[`3a32007`](https://github.com/apache/spark/commit/3a320075e2749e5ff21fc6fef616406fd8756cc9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22255: [SPARK-25102][Spark Core] Write Spark version inf...

2018-11-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22255


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22932


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22932
  
Thank you so much!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r232444034
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala
 ---
@@ -36,11 +37,17 @@ private[orc] class OrcOutputWriter(
   private[this] val serializer = new OrcSerializer(dataSchema)
 
   private val recordWriter = {
-new OrcOutputFormat[OrcStruct]() {
+val orcOutputFormat = new OrcOutputFormat[OrcStruct]() {
   override def getDefaultWorkFile(context: TaskAttemptContext, 
extension: String): Path = {
 new Path(path)
   }
-}.getRecordWriter(context)
+}
+val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc")
+val options = 
OrcMapRedOutputFormat.buildOptions(context.getConfiguration)
+val writer = OrcFile.createWriter(filename, options)
+val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer)
--- End diff --

Right. To avoid reflection, this was the only way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22998: [SPARK-26001][SQL]Reduce memory copy when writing decima...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22998
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22932
  
LGTM. Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22998: [SPARK-26001][SQL]Reduce memory copy when writing decima...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22998
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22998: [SPARK-26001][SQL]Reduce memory copy when writing decima...

2018-11-09 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/22998
  
cc @mgaido91, @dongjoon-hyun , @cloud-fan , @kiszk 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22998: [SPARK-26001][SQL]Reduce memory copy when writing decima...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22998
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22998: [SPARK-26001][SQL]Reduce memory copy when writing...

2018-11-09 Thread heary-cao
GitHub user heary-cao opened a pull request:

https://github.com/apache/spark/pull/22998

[SPARK-26001][SQL]Reduce memory copy when writing decimal

## What changes were proposed in this pull request?

this PR fix 2 here:

- when writing non-null decimals, we not zero-out all the 16 allocated 
bytes. if the number of bytes needed for a decimal is greater than 8. then we 
not need zero-out between 0-byte and 8-byte. The first 8-byte will be covered 
when writing decimal.

- when writing null decimals, we not zero-out all the 16 allocated bytes. 
BitSetMethods.set the label for null and the length of decimal to 0. when we 
get the decimal, will not access the 16 byte memory value, so this is safe.

## How was this patch tested?

the existed test cases.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heary-cao/spark writeDecimal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22998


commit bab69d426578a009ce7796e14757c6ae79d57f28
Author: caoxuewen 
Date:   2018-11-10T06:31:52Z

Reduce memory copy when writing decimal




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r232443802
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala
 ---
@@ -36,11 +37,17 @@ private[orc] class OrcOutputWriter(
   private[this] val serializer = new OrcSerializer(dataSchema)
 
   private val recordWriter = {
-new OrcOutputFormat[OrcStruct]() {
+val orcOutputFormat = new OrcOutputFormat[OrcStruct]() {
   override def getDefaultWorkFile(context: TaskAttemptContext, 
extension: String): Path = {
 new Path(path)
   }
-}.getRecordWriter(context)
+}
+val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc")
+val options = 
OrcMapRedOutputFormat.buildOptions(context.getConfiguration)
+val writer = OrcFile.createWriter(filename, options)
+val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer)
--- End diff --

This is basically copied from getRecordWriter


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...

2018-11-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22954
  
Hey guys thanks for reviewing! Will address them soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...

2018-11-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22976#discussion_r232443266
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -68,57 +68,50 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
 genComparisons(ctx, ordering)
   }
 
+  /**
+   * Creates the variables for ordering based on the given order.
+   */
+  private def createOrderKeys(
+ctx: CodegenContext,
+row: String,
+ordering: Seq[SortOrder]): Seq[ExprCode] = {
+ctx.INPUT_ROW = row
+// to use INPUT_ROW we must make sure currentVars is null
+ctx.currentVars = null
+ordering.map(_.child.genCode(ctx))
+  }
+
   /**
* Generates the code for ordering based on the given order.
*/
   def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): 
String = {
 val oldInputRow = ctx.INPUT_ROW
 val oldCurrentVars = ctx.currentVars
-val inputRow = "i"
-ctx.INPUT_ROW = inputRow
-// to use INPUT_ROW we must make sure currentVars is null
-ctx.currentVars = null
-
-val comparisons = ordering.map { order =>
-  val eval = order.child.genCode(ctx)
-  val asc = order.isAscending
-  val isNullA = ctx.freshName("isNullA")
-  val primitiveA = ctx.freshName("primitiveA")
-  val isNullB = ctx.freshName("isNullB")
-  val primitiveB = ctx.freshName("primitiveB")
+val rowAKeys = createOrderKeys(ctx, "a", ordering)
+val rowBKeys = createOrderKeys(ctx, "b", ordering)
+val comparisons = rowAKeys.zip(rowBKeys).zipWithIndex.map { case ((l, 
r), i) =>
+  val dt = ordering(i).child.dataType
+  val asc = ordering(i).isAscending
+  val nullOrdering = ordering(i).nullOrdering
   s"""
-  ${ctx.INPUT_ROW} = a;
-  boolean $isNullA;
-  ${CodeGenerator.javaType(order.child.dataType)} $primitiveA;
-  {
-${eval.code}
-$isNullA = ${eval.isNull};
-$primitiveA = ${eval.value};
-  }
-  ${ctx.INPUT_ROW} = b;
-  boolean $isNullB;
-  ${CodeGenerator.javaType(order.child.dataType)} $primitiveB;
-  {
-${eval.code}
-$isNullB = ${eval.isNull};
-$primitiveB = ${eval.value};
-  }
-  if ($isNullA && $isNullB) {
+  ${l.code}
--- End diff --

Would you update this to use | and .stripMargin?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...

2018-11-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22976#discussion_r232443230
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -133,7 +126,6 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
   returnType = "int",
   makeSplitFunction = { body =>
 s"""
--- End diff --

Would you update this to use `|` and `.stripMargin`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...

2018-11-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22976#discussion_r232443205
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -154,7 +146,6 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
 // make sure INPUT_ROW is declared even if splitExpressions
 // returns an inlined block
 s"""
--- End diff --

Can we use just `code`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22993: [SPARK-24421][BUILD][CORE] Accessing sun.misc.Cleaner in...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22993
  
**[Test build #4422 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4422/testReport)**
 for PR 22993 at commit 
[`f137de7`](https://github.com/apache/spark/commit/f137de748e092315cc11e66deaafbcb469dd5764).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22993: [SPARK-24421][BUILD][CORE] Accessing sun.misc.Cleaner in...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22993
  
**[Test build #4422 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4422/testReport)**
 for PR 22993 at commit 
[`f137de7`](https://github.com/apache/spark/commit/f137de748e092315cc11e66deaafbcb469dd5764).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-09 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22893
  
There's no merge conflict right now. You can just update the file and push 
the commit to your branch. If there were a merge conflict, you'd just rebase on 
apache/master, resolve the conflict, and force-push the branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98672/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #98672 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98672/testReport)**
 for PR 21732 at commit 
[`2d2057b`](https://github.com/apache/spark/commit/2d2057b4f2dbb541b4f2573944318f7a874fac3d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22759: [MINOR][SQL][DOC] Correct parquet nullability doc...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22759#discussion_r232441937
  
--- Diff: docs/sql-programming-guide.md ---
@@ -706,7 +706,7 @@ data across a fixed number of buckets and can be used 
when a number of unique va
 
 [Parquet](http://parquet.io) is a columnar format that is supported by 
many other data processing systems.
 Spark SQL provides support for both reading and writing Parquet files that 
automatically preserves the schema
-of the original data. When writing Parquet files, all columns are 
automatically converted to be nullable for
+of the original data. When reading Parquet files, all columns are 
automatically converted to be nullable for
--- End diff --

This file has been re-org . Could you merge the latest master?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22759: [MINOR][SQL][DOC] Correct parquet nullability documentat...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22759
  
LGTM

Could you do us a favor to add the test cases for ensuring that the 
generated parquet files have a correct nullability value?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98671/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22932
  
**[Test build #98671 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98671/testReport)**
 for PR 22932 at commit 
[`04457be`](https://github.com/apache/spark/commit/04457be5bc8e6023a9b9c2e71f9a123869465cbd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98673/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22994
  
**[Test build #98673 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98673/testReport)**
 for PR 22994 at commit 
[`56329bc`](https://github.com/apache/spark/commit/56329bc9d9d28252032fe6fef8da2ffbb1ed0f9e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22992: [SPARK-24229] Update to Apache Thrift 0.10.0

2018-11-09 Thread mingwandroid
Github user mingwandroid commented on the issue:

https://github.com/apache/spark/pull/22992
  
Can you not just update this version so that people who care about CVE scan 
results can still use Apache Spark without worrying?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-09 Thread KyleLi1985
Github user KyleLi1985 commented on the issue:

https://github.com/apache/spark/pull/22893
  
It seems the related file spark/python/pyspark/ml/clustering.py has been 
changed, during these days. My local latest commit stay on "bfe60fc  on 30 
Jul".  So I need re-fork spark and open another pull request, or is there other 
method?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22855: [SPARK-25839] [Core] Implement use of KryoPool in KryoSe...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22855
  
**[Test build #4421 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4421/testReport)**
 for PR 22855 at commit 
[`3bfc4eb`](https://github.com/apache/spark/commit/3bfc4ebbf214b6b0fadbaa10aa832303a59de97d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...

2018-11-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22275
  
LGTM the current change looks clearer. Thanks @BryanCutler 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22994
  
**[Test build #98673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98673/testReport)**
 for PR 22994 at commit 
[`56329bc`](https://github.com/apache/spark/commit/56329bc9d9d28252032fe6fef8da2ffbb1ed0f9e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22992: [SPARK-24229] Update to Apache Thrift 0.10.0

2018-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22992
  
Please provide a test case or reproducible step for the issue. Otherwise, 
please close this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4907/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/22994
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98669/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22994
  
**[Test build #98669 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98669/testReport)**
 for PR 22994 at commit 
[`56329bc`](https://github.com/apache/spark/commit/56329bc9d9d28252032fe6fef8da2ffbb1ed0f9e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22996: [SPARK-25997][ML]add Python example code for Power Itera...

2018-11-09 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/22996
  
@holdenk 
Yes, it is. I will include the examples in ml-clustering.md. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #98672 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98672/testReport)**
 for PR 21732 at commit 
[`2d2057b`](https://github.com/apache/spark/commit/2d2057b4f2dbb541b4f2573944318f7a874fac3d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4906/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98663/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22305
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22305
  
**[Test build #98663 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98663/testReport)**
 for PR 22305 at commit 
[`006b953`](https://github.com/apache/spark/commit/006b9533c6beb90fe93d8bc4ec875a78ec7b50af).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22997: SPARK-25999: make-distribution.sh failure with --r and -...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22997
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22997: SPARK-25999: make-distribution.sh failure with --r and -...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22997
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22997: SPARK-25999: make-distribution.sh failure with --r and -...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22997
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22997: SPARK-25999: make-distribution.sh failure with --...

2018-11-09 Thread shanyu
GitHub user shanyu opened a pull request:

https://github.com/apache/spark/pull/22997

SPARK-25999: make-distribution.sh failure with --r and -Phadoop-provided

Signed-off-by: Shanyu Zhao 

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shanyu/spark shanyu-25999

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22997


commit 090c3bc1c43c2286c825d74a82304ef00a75900c
Author: Shanyu Zhao 
Date:   2018-11-10T01:12:55Z

SPARK-25999: make-distribution.sh failure with --r and -Phadoop-provided

Signed-off-by: Shanyu Zhao 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4905/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22932
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22932
  
**[Test build #98671 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98671/testReport)**
 for PR 22932 at commit 
[`04457be`](https://github.com/apache/spark/commit/04457be5bc8e6023a9b9c2e71f9a123869465cbd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r232430599
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
@@ -274,6 +278,15 @@ private[orc] class OrcOutputWriter(
 
   override def close(): Unit = {
 if (recordWriterInstantiated) {
+  // Hive 1.2.1 ORC initializes its private `writer` field at the 
first write.
+  try {
+val writerField = recordWriter.getClass.getDeclaredField("writer")
+writerField.setAccessible(true)
+val writer = writerField.get(recordWriter).asInstanceOf[Writer]
+writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, 
UTF_8.encode(SPARK_VERSION_SHORT))
+  } catch {
+case NonFatal(e) => log.warn(e.toString, e)
+  }
--- End diff --

For this case, I'll refactor out all the new code (line 281 ~ 289).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r232428893
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
@@ -274,6 +278,15 @@ private[orc] class OrcOutputWriter(
 
   override def close(): Unit = {
 if (recordWriterInstantiated) {
+  // Hive 1.2.1 ORC initializes its private `writer` field at the 
first write.
+  try {
+val writerField = recordWriter.getClass.getDeclaredField("writer")
+writerField.setAccessible(true)
+val writer = writerField.get(recordWriter).asInstanceOf[Writer]
+writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, 
UTF_8.encode(SPARK_VERSION_SHORT))
+  } catch {
+case NonFatal(e) => log.warn(e.toString, e)
+  }
--- End diff --

BTW, as you expected, we cannot use a single function for this. The 
`Writer` are not the same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22938: [SPARK-25935][SQL] Prevent null rows from JSON parser

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22938
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22938: [SPARK-25935][SQL] Prevent null rows from JSON parser

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22938
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98660/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22938: [SPARK-25935][SQL] Prevent null rows from JSON parser

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22938
  
**[Test build #98660 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98660/testReport)**
 for PR 22938 at commit 
[`9132af3`](https://github.com/apache/spark/commit/9132af3a8ee7404e3a14c280567a418a85693c07).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r232428173
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala
 ---
@@ -36,11 +41,17 @@ private[orc] class OrcOutputWriter(
   private[this] val serializer = new OrcSerializer(dataSchema)
 
   private val recordWriter = {
-new OrcOutputFormat[OrcStruct]() {
+val orcOutputFormat = new OrcOutputFormat[OrcStruct]() {
   override def getDefaultWorkFile(context: TaskAttemptContext, 
extension: String): Path = {
 new Path(path)
   }
-}.getRecordWriter(context)
+}
+val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc")
+val options = 
OrcMapRedOutputFormat.buildOptions(context.getConfiguration)
+val writer = OrcFile.createWriter(filename, options)
+val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer)
+writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, 
UTF_8.encode(SPARK_VERSION_SHORT))
--- End diff --

Thank you for review, @gatorsmile . Sure. I'll refactor out the following 
line.
```
writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, 
UTF_8.encode(SPARK_VERSION_SHORT))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...

2018-11-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22987
  
Thanks. Technically, since it's categorized as a `BUG`, I'm +1 to have this 
in `branch-2.4` as a syntax bug fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...

2018-11-09 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22954
  
I don't know R well enough to review that code, but the results look 
awesome! Nice work @HyukjinKwon!!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

2018-11-09 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/22954#discussion_r232425279
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -189,19 +238,67 @@ createDataFrame <- function(data, schema = NULL, 
samplingRatio = 1.0,
   x
 }
   }
+  data[] <- lapply(data, cleanCols)
 
-  # drop factors and wrap lists
-  data <- setNames(lapply(data, cleanCols), NULL)
+  args <- list(FUN = list, SIMPLIFY = FALSE, USE.NAMES = FALSE)
+  if (arrowEnabled) {
+shouldUseArrow <- tryCatch({
+  stopifnot(length(data) > 0)
+  dataHead <- head(data, 1)
+  # Currenty Arrow optimization does not support POSIXct and raw 
for now.
+  # Also, it does not support explicit float type set by users. It 
leads to
+  # incorrect conversion. We will fall back to the path without 
Arrow optimization.
+  if (any(sapply(dataHead, function(x) is(x, "POSIXct" {
+stop("Arrow optimization with R DataFrame does not support 
POSIXct type yet.")
+  }
+  if (any(sapply(dataHead, is.raw))) {
+stop("Arrow optimization with R DataFrame does not support raw 
type yet.")
+  }
+  if (inherits(schema, "structType")) {
+if (any(sapply(schema$fields(), function(x) 
x$dataType.toString() == "FloatType"))) {
+  stop("Arrow optimization with R DataFrame does not support 
FloatType type yet.")
--- End diff --

Any idea what's going on with the `FloatType`? Is it a problem on the arrow 
side?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

2018-11-09 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/22954#discussion_r232425031
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
@@ -225,4 +226,25 @@ private[sql] object SQLUtils extends Logging {
 }
 sparkSession.sessionState.catalog.listTables(db).map(_.table).toArray
   }
+
+  /**
+   * R callable function to read a file in Arrow stream format and create 
a `RDD`
--- End diff --

nit: a `RDD` -> an `RDD`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r232424657
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
@@ -274,6 +278,15 @@ private[orc] class OrcOutputWriter(
 
   override def close(): Unit = {
 if (recordWriterInstantiated) {
+  // Hive 1.2.1 ORC initializes its private `writer` field at the 
first write.
+  try {
+val writerField = recordWriter.getClass.getDeclaredField("writer")
+writerField.setAccessible(true)
+val writer = writerField.get(recordWriter).asInstanceOf[Writer]
+writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, 
UTF_8.encode(SPARK_VERSION_SHORT))
+  } catch {
+case NonFatal(e) => log.warn(e.toString, e)
+  }
--- End diff --

The same comment here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r232424626
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala
 ---
@@ -36,11 +41,17 @@ private[orc] class OrcOutputWriter(
   private[this] val serializer = new OrcSerializer(dataSchema)
 
   private val recordWriter = {
-new OrcOutputFormat[OrcStruct]() {
+val orcOutputFormat = new OrcOutputFormat[OrcStruct]() {
   override def getDefaultWorkFile(context: TaskAttemptContext, 
extension: String): Path = {
 new Path(path)
   }
-}.getRecordWriter(context)
+}
+val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc")
+val options = 
OrcMapRedOutputFormat.buildOptions(context.getConfiguration)
+val writer = OrcFile.createWriter(filename, options)
+val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer)
+writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, 
UTF_8.encode(SPARK_VERSION_SHORT))
--- End diff --

Could we create a separate function for adding these metadata?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22913: [SPARK-25902][SQL] Add support for dates with millisecon...

2018-11-09 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22913
  
Sounds good, thanks @javierluraschi !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22996: [SPARK-25997][ML]add Python example code for Power Itera...

2018-11-09 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22996
  
Thanks for working on this! I noticed you have the example on / off tags, 
normally those correspond with it being included in documentation somewhere the 
those tags are used -- is that the plan for this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...

2018-11-09 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22275#discussion_r232420076
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4923,6 +4923,28 @@ def test_timestamp_dst(self):
 self.assertPandasEqual(pdf, df_from_python.toPandas())
 self.assertPandasEqual(pdf, df_from_pandas.toPandas())
 
+def test_toPandas_batch_order(self):
+
+# Collects Arrow RecordBatches out of order in driver JVM then 
re-orders in Python
+def run_test(num_records, num_parts, max_records):
+df = self.spark.range(num_records, 
numPartitions=num_parts).toDF("a")
+with 
self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": max_records}):
+pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
+self.assertPandasEqual(pdf, pdf_arrow)
+
+cases = [
+(1024, 512, 2),  # Try large num partitions for good chance of 
not collecting in order
+(512, 64, 2),# Try medium num partitions to test out of 
order collection
+(64, 8, 2),  # Try small number of partitions to test out 
of order collection
+(64, 64, 1), # Test single batch per partition
+(64, 1, 64), # Test single partition, single batch
+(64, 1, 8),  # Test single partition, multiple batches
+(30, 7, 2),  # Test different sized partitions
+]
--- End diff --

I like the new tests, I think 0.1 on one of partitions is enough.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...

2018-11-09 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22275#discussion_r232420015
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4923,6 +4923,34 @@ def test_timestamp_dst(self):
 self.assertPandasEqual(pdf, df_from_python.toPandas())
 self.assertPandasEqual(pdf, df_from_pandas.toPandas())
 
+def test_toPandas_batch_order(self):
+
+def delay_first_part(partition_index, iterator):
+if partition_index == 0:
+time.sleep(0.1)
--- End diff --

I like this :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...

2018-11-09 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18610
  
@JohnHBrock this PR is pretty old so the biggest challenge is going to be 
updating it to the current master branch. There's some discussion around the 
types needing to be changed as well. If this is a thing you want to work on I'd 
love to do what I can to help with the review process.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22996: [SPARK-25997][ML]add Python example code for Power Itera...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22996
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98670/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22996: [SPARK-25997][ML]add Python example code for Power Itera...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22996
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22996: [SPARK-25997][ML]add Python example code for Power Itera...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22996
  
**[Test build #98670 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98670/testReport)**
 for PR 22996 at commit 
[`905b542`](https://github.com/apache/spark/commit/905b542a8618269bdc079f3c335a80c13d2214fa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22996: add Python example code for Power Iteration Clustering i...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22996
  
**[Test build #98670 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98670/testReport)**
 for PR 22996 at commit 
[`905b542`](https://github.com/apache/spark/commit/905b542a8618269bdc079f3c335a80c13d2214fa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22995: [SPARK-25998] [CORE] Change TorrentBroadcast to hold wea...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22995
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22996: add Python example code for Power Iteration Clustering i...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22996
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22996: add Python example code for Power Iteration Clustering i...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22996
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4904/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22995: [SPARK-25998] [CORE] Change TorrentBroadcast to hold wea...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22995
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22995: [SPARK-25998] [CORE] Change TorrentBroadcast to hold wea...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22995
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22996: add Python example code for Power Iteration Clust...

2018-11-09 Thread huaxingao
GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/22996

add Python example code for Power Iteration Clustering in spark.ml

## What changes were proposed in this pull request?

Add python example for Power Iteration Clustering in spark.ml

## How was this patch tested?

Manually tested

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark-25997

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22996.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22996


commit 905b542a8618269bdc079f3c335a80c13d2214fa
Author: Huaxin Gao 
Date:   2018-11-09T22:32:17Z

add Python example code for Power Iteration Clustering in spark.ml




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22995: [SPARK-25998] [CORE] Change TorrentBroadcast to h...

2018-11-09 Thread bkrieger
GitHub user bkrieger opened a pull request:

https://github.com/apache/spark/pull/22995

[SPARK-25998] [CORE] Change TorrentBroadcast to hold weak reference of 
broadcast object

## What changes were proposed in this pull request?

This PR changes the broadcast object in TorrentBroadcast from a strong 
reference to a weak reference. This allows it to be garbage collected even if 
the Dataset is held in memory. This is ok, because the broadcast object can 
always be re-read.

## How was this patch tested?

Tested in Spark shell by taking a heap dump, full repro steps listed in 
https://issues.apache.org/jira/browse/SPARK-25998.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bkrieger/spark bk/torrent-broadcast-weak

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22995.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22995


commit a2683b62985fc9c7d15fb92f3bb170a4b5225058
Author: Brandon Krieger 
Date:   2018-11-08T23:04:06Z

use weak reference for torrent broadcast

commit 99fbeecf43a289648a56d178fa55e188ce75bdb7
Author: Brandon Krieger 
Date:   2018-11-09T21:04:51Z

fix compile

commit 5e0a179c168a70b0166abe4bb51a1d26a2f1d666
Author: Brandon Krieger 
Date:   2018-11-09T21:33:22Z

fix

commit 1908b5b8dfa6c0b55db3bd9a90e21ca713e5bf25
Author: Brandon Krieger 
Date:   2018-11-09T21:48:44Z

no npe

commit 24183e5b8b63e0b4e117856ab4de7eb1b0ea6c9a
Author: Brandon Krieger 
Date:   2018-11-09T21:52:21Z

no option

commit f212da322242386ce3b71e9961a964e60b587287
Author: Brandon Krieger 
Date:   2018-11-09T22:08:23Z

typo




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98668/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98667/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22994
  
**[Test build #98669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98669/testReport)**
 for PR 22994 at commit 
[`56329bc`](https://github.com/apache/spark/commit/56329bc9d9d28252032fe6fef8da2ffbb1ed0f9e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4903/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22994
  
**[Test build #98668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98668/testReport)**
 for PR 22994 at commit 
[`c05683b`](https://github.com/apache/spark/commit/c05683bab177b7b203fe0ca440a19810fc2df418).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22994
  
**[Test build #98667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98667/testReport)**
 for PR 22994 at commit 
[`6bddfec`](https://github.com/apache/spark/commit/6bddfec5cb76584c172552d8a3822e29e12c5654).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22994
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4902/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >