[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19470
  
**[Test build #82720 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82720/testReport)**
 for PR 19470 at commit 
[`8e7fe9b`](https://github.com/apache/spark/commit/8e7fe9b9e3fc6121686caad45dcfdb4ff08f0c4a).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19470
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19488
  
**[Test build #82722 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82722/testReport)**
 for PR 19488 at commit 
[`32bdf77`](https://github.com/apache/spark/commit/32bdf771fe70444ac23adf796702b5a26e085805).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19488
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82722/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19269
  
**[Test build #82719 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82719/testReport)**
 for PR 19269 at commit 
[`ac3de3c`](https://github.com/apache/spark/commit/ac3de3c2c5c50df8e5f2c618a78a4fa4c6756797).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WriteToDataSourceV2Command(writer: DataSourceV2Writer, 
query: LogicalPlan)`
  * `class RowToInternalRowDataWriterFactory(`
  * `class RowToInternalRowDataWriter(rowWriter: DataWriter[Row], encoder: 
ExpressionEncoder[Row])`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19488
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19470
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82720/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...

2017-10-13 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19489
  
@dahaian close this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19269
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19269
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82719/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

2017-10-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19470
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

2017-10-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19470
  
R failure seems to be irrelevant.
```
[error] running 
/home/jenkins/workspace/SparkPullRequestBuilder/R/run-tests.sh ; process was 
terminated by signal 9
Attempting to post to Github...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19470
  
**[Test build #82724 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82724/testReport)**
 for PR 19470 at commit 
[`8e7fe9b`](https://github.com/apache/spark/commit/8e7fe9b9e3fc6121686caad45dcfdb4ff08f0c4a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18732
  
A late question: shall we create another API for it instead of reusing 
`pandas_udf`? cc @ueshin


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19480#discussion_r144480289
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -201,6 +201,23 @@ class CodeGenerationSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("SPARK-6: group splitted expressions into one method per nested 
class") {
--- End diff --

thank you very much for your help @viirya ! In my use cases it seemed to be 
connected to the `dropDuplicates` method and I focused on it, but thanks to 
your suggestion now I realize that `dropDuplicates` by itself is not enough, it 
needs also some functions applied to columns to generate the issue! Thank you 
so much. Where should I add this test case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...

2017-10-13 Thread dahaian
Github user dahaian commented on the issue:

https://github.com/apache/spark/pull/19489
  
@srowen sorry,I just ask a question.If someone can help me.I will close 
this patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19480#discussion_r144480542
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -798,10 +831,46 @@ class CodegenContext {
|  ${makeSplitFunction(body)}
|}
  """.stripMargin
-addNewFunction(name, code)
+addNewFunctionInternal(name, code, inlineToOuterClass = false)
   }
 
-  foldFunctions(functions.map(name => 
s"$name(${arguments.map(_._2).mkString(", ")})"))
+  // Here we store all the methods which have been added to the outer 
class.
+  val outerClassFunctions = functions
+.filter(_.subclassName.isEmpty)
+.map(_.functionName)
+
+  // Here we handle all the methods which have been added to the 
nested subclasses and
+  // not to the outer class.
+  // Since they can be many, their direct invocation in the outer 
class adds many entries
+  // to the outer class' constant pool. This can cause the constant 
pool to past JVM limit.
+  // To avoid this problem, we group them and we call only the 
grouping methods in the
+  // outer class.
+  val innerClassFunctions = functions
+.filter(_.subclassName.isDefined)
+.foldLeft(ListMap.empty[(String, String), Seq[String]]) { case 
(acc, f) =>
+  val key = (f.subclassName.get, f.subclassInstance.get)
+  acc.updated(key, acc.getOrElse(key, Seq.empty[String]) ++ 
Seq(f.functionName))
+}
+.flatMap { case ((subclassName, subclassInstance), 
subclassFunctions) =>
+  if (subclassFunctions.size > 
CodeGenerator.MERGE_SPLIT_METHODS_THRESHOLD) {
+// Adding a new function to each subclass which contains
+// the invocation of all the ones which have been added to
+// that subclass
+val code = s"""
+|private $returnType $func($argString) {
+|  
${makeSplitFunction(foldFunctions(subclassFunctions.map(name =>
+  s"$name(${arguments.map(_._2).mkString(", ")})")))}
--- End diff --

we are inside the string interpolator here, so `|` is not missing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18732
  
I think @viirya raised this question too - 
https://github.com/apache/spark/pull/18732#issuecomment-333065737 and I think I 
also left few worries about thus here and there.  To me, +0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...

2017-10-13 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19489
  
This isn't where you ask questions, and the branch itself is messed up. 
Please close it.
Questions go to the mailing list.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19480#discussion_r144480771
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -201,6 +201,23 @@ class CodeGenerationSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("SPARK-6: group splitted expressions into one method per nested 
class") {
--- End diff --

I was adding it to `DataFrameAggregateSuite`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19480#discussion_r144481305
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1010,6 +1079,8 @@ object CodeGenerator extends Logging {
   // This is the value of HugeMethodLimit in the OpenJDK JVM settings
   val DEFAULT_JVM_HUGE_METHOD_LIMIT = 8000
 
+  val MERGE_SPLIT_METHODS_THRESHOLD = 3
--- End diff --

@gatorsmile it comes from [this 
discussion](https://github.com/apache/spark/pull/19480/files/bdc2fdb18688b86a1b30d6ea8a692a3e95858049#r144282402).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19490: [Trivial][DOC] update code style for InteractionExample

2017-10-13 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19490
  
I don't think this is a style fix? this is too trivial to bother with


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...

2017-10-13 Thread dahaian
Github user dahaian commented on the issue:

https://github.com/apache/spark/pull/19489
  
@srowen sorry, I do not understand you said "the mailing list"?Please tell 
me how I ask a question?And I will close this patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

2017-10-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19488
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...

2017-10-13 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19489
  
This should be closed no matter what. Please start at the web site. 
http://spark.apache.org/community.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19488
  
**[Test build #82725 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82725/testReport)**
 for PR 19488 at commit 
[`32bdf77`](https://github.com/apache/spark/commit/32bdf771fe70444ac23adf796702b5a26e085805).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19464: [SPARK-22233] [core] Allow user to filter out empty spli...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19464
  
**[Test build #82726 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82726/testReport)**
 for PR 19464 at commit 
[`534d8fb`](https://github.com/apache/spark/commit/534d8fbcd7dfbdc9af06a4d926f6a353f429fce8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19489: The declared package "org.apache.hive.service.cli...

2017-10-13 Thread dahaian
Github user dahaian closed the pull request at:

https://github.com/apache/spark/pull/19489


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...

2017-10-13 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19419#discussion_r144483070
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -89,6 +90,14 @@ private[spark] object JettyUtils extends Logging {
 val result = servletParams.responder(request)
 response.setHeader("Cache-Control", "no-cache, no-store, 
must-revalidate")
 response.setHeader("X-Frame-Options", xFrameOptionsValue)
+
conf.get(UI_X_XSS_PROTECTION).foreach(response.setHeader("X-XSS-Protection", _))
+if (conf.get(UI_X_CONTENT_TYPE_OPTIONS.key).equals("true")) {
--- End diff --

At the moment this would fail if unset


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...

2017-10-13 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19419#discussion_r144482977
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -420,6 +420,25 @@ package object config {
   .toSequence
   .createWithDefault(Nil)
 
+
+  private[spark] val UI_X_XSS_PROTECTION =
+ConfigBuilder("spark.ui.xXssProtection")
+  .doc("Value for HTTP X-XSS-Protection response header")
+  .stringConf
+  .createOptional
+
+  private[spark] val UI_X_CONTENT_TYPE_OPTIONS =
+ConfigBuilder("spark.ui.xContentTypeOptions.enabled")
+  .doc("Set to 'true' for setting X-Content-Type-Options HTTP response 
header to 'nosniff'")
+  .stringConf
--- End diff --

This should be a boolean option then?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...

2017-10-13 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19419#discussion_r144483296
  
--- Diff: docs/configuration.md ---
@@ -2013,7 +2013,62 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
 
+### HTTP Security Headers
 
+Apache Spark can be configured to include HTTP Headers which aids in 
preventing Cross 
+Site Scripting (XSS), Cross-Frame Scripting (XFS), MIME-Sniffing and also 
enforces HTTP 
+Strict Transport Security.
+
+
+Property NameDefaultMeaning
+
+spark.ui.xXssProtection
+None
+
+Value for HTTP X-XSS-Protection response header. You can 
choose appropriate value 
+from below:
+
+  0 (Disables XSS filtering)
--- End diff --

Markdown won't work here? I think it won't. Close the `` tags though. 
It's not clear below what the values the users sets though. The values are `0`, 
`1`, or `1; mode=block` and that should be clear.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...

2017-10-13 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19419#discussion_r144483342
  
--- Diff: docs/configuration.md ---
@@ -2013,7 +2013,62 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
 
+### HTTP Security Headers
 
+Apache Spark can be configured to include HTTP Headers which aids in 
preventing Cross 
+Site Scripting (XSS), Cross-Frame Scripting (XFS), MIME-Sniffing and also 
enforces HTTP 
+Strict Transport Security.
+
+
+Property NameDefaultMeaning
+
+spark.ui.xXssProtection
+None
+
+Value for HTTP X-XSS-Protection response header. You can 
choose appropriate value 
+from below:
+
+  0 (Disables XSS filtering)
+  1 (Enables XSS filtering. If a cross-site scripting 
attack is detected, 
+the browser will sanitize the page.)
+  1; mode=block (Enables XSS filtering. The browser 
will prevent rendering 
+of the page if an attack is detected.)
+ 
+
+
+
+spark.ui.allowFramingFrom
+SAMEORIGIN
+
+Value for X-Frame-Options HTTP response header
+You can provide the "website uri" which can only be 
displayed in a frame on 
+the specified origin. 
+
--- End diff --

Remove this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19489: The declared package "org.apache.hive.service.cli.thrift...

2017-10-13 Thread dahaian
Github user dahaian commented on the issue:

https://github.com/apache/spark/pull/19489
  
@srowen I closed 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

2017-10-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19488#discussion_r144483555
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -205,14 +205,15 @@ object PhysicalAggregation {
 case logical.Aggregate(groupingExpressions, resultExpressions, child) 
=>
   // A single aggregate expression might appear multiple times in 
resultExpressions.
   // In order to avoid evaluating an individual aggregate function 
multiple times, we'll
-  // build a set of the distinct aggregate expressions and build a 
function which can
+  // build a map of the distinct aggregate expressions and build a 
function which can
   // be used to re-write expressions so that they reference the single 
copy of the
   // aggregate function which actually gets computed.
-  val aggregateExpressions = resultExpressions.flatMap { expr =>
+  val aggregateExpressionMap = resultExpressions.flatMap { expr =>
 expr.collect {
-  case agg: AggregateExpression => agg
+  case agg: AggregateExpression => (agg.canonicalized, 
agg.deterministic) -> agg
--- End diff --

I think non-deterministic functions should not be deduplicated, e.g. 
`select max(a + rand()), max(a + rand()) from ...` should still eveluate 2 
aggregate funcitions.

my suggestion:
```
val aggregateExpressions = resultExpressions.flatMap { expr =>
  expr.collect {
case agg: AggregateExpression => agg
  }
}
val aggregateExpressionMap = 
aggregateExpressions.filter(_.deterministic).map { agg =>
  agg.canonicalized -> agg
}.toMap
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19476: [SPARK-22062][CORE] Spill large block to disk in BlockMa...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19476
  
**[Test build #82727 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82727/testReport)**
 for PR 19476 at commit 
[`2d00fd2`](https://github.com/apache/spark/commit/2d00fd25043c40ab271f4da1dc5b4dd7a407fc6f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...

2017-10-13 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19419#discussion_r144488504
  
--- Diff: docs/configuration.md ---
@@ -2013,7 +2013,62 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
 
+### HTTP Security Headers
--- End diff --

I think this could move to `security.md` as a security related advanced 
configurations. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19363: [SPARK-22224][SQL]Override toString of KeyValue/Relation...

2017-10-13 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/19363
  
cc @viirya @cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...

2017-10-13 Thread krishna-pandey
Github user krishna-pandey commented on a diff in the pull request:

https://github.com/apache/spark/pull/19419#discussion_r144495102
  
--- Diff: docs/configuration.md ---
@@ -2013,7 +2013,62 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
 
+### HTTP Security Headers
 
+Apache Spark can be configured to include HTTP Headers which aids in 
preventing Cross 
+Site Scripting (XSS), Cross-Frame Scripting (XFS), MIME-Sniffing and also 
enforces HTTP 
+Strict Transport Security.
+
+
+Property NameDefaultMeaning
+
+spark.ui.xXssProtection
+None
+
+Value for HTTP X-XSS-Protection response header. You can 
choose appropriate value 
+from below:
+
+  0 (Disables XSS filtering)
+  1 (Enables XSS filtering. If a cross-site scripting 
attack is detected, 
+the browser will sanitize the page.)
+  1; mode=block (Enables XSS filtering. The browser 
will prevent rendering 
+of the page if an attack is detected.)
+ 
+
+
+
+spark.ui.allowFramingFrom
+SAMEORIGIN
+
+Value for X-Frame-Options HTTP response header
+You can provide the "website uri" which can only be 
displayed in a frame on 
+the specified origin. 
+
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19419: [SPARK-22188] [CORE] Adding security headers for ...

2017-10-13 Thread krishna-pandey
Github user krishna-pandey commented on a diff in the pull request:

https://github.com/apache/spark/pull/19419#discussion_r144495051
  
--- Diff: docs/configuration.md ---
@@ -2013,7 +2013,62 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
 
+### HTTP Security Headers
 
+Apache Spark can be configured to include HTTP Headers which aids in 
preventing Cross 
+Site Scripting (XSS), Cross-Frame Scripting (XFS), MIME-Sniffing and also 
enforces HTTP 
+Strict Transport Security.
+
+
+Property NameDefaultMeaning
+
+spark.ui.xXssProtection
+None
+
+Value for HTTP X-XSS-Protection response header. You can 
choose appropriate value 
+from below:
+
+  0 (Disables XSS filtering)
--- End diff --

Removed the lists, however it is being used in many places in existing 
configuration.md


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...

2017-10-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19363#discussion_r144500085
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
@@ -564,4 +565,30 @@ class KeyValueGroupedDataset[K, V] private[sql](
   encoder: Encoder[R]): Dataset[R] = {
 cogroup(other)((key, left, right) => f.call(key, left.asJava, 
right.asJava).asScala)(encoder)
   }
+
+  override def toString: String = {
+try {
+  val builder = new StringBuilder
+  val kFields = kExprEnc.schema.map {
+case f => s"${f.name}: ${f.dataType.simpleString(2)}"
+  }
+  val vFields = vExprEnc.schema.map {
+case f => s"${f.name}: ${f.dataType.simpleString(2)}"
+  }
+  builder.append("[key: [")
+  builder.append(kFields.take(2).mkString(", "))
+  if (kFields.length > 2) {
+builder.append(" ... " + (kFields.length - 2) + " more field(s)")
+  }
+  builder.append("], value: [")
+  builder.append(vFields.take(2).mkString(", "))
+  if (vFields.length > 2) {
+builder.append(" ... " + (vFields.length - 2) + " more field(s)")
+  }
+  builder.append("]]").toString()
+} catch {
+  case NonFatal(e) =>
--- End diff --

Although some methods of `StringBuilder` can throw exceptions, I didn't see 
simple `append` can throw exception. Access of `schema` should not throw 
exception at all. So I don't get why it is.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...

2017-10-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19363#discussion_r144501685
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -465,6 +466,24 @@ class RelationalGroupedDataset protected[sql](
 
 Dataset.ofRows(df.sparkSession, plan)
   }
+
+  override def toString: String = {
+try {
+  val builder = new StringBuilder
+  builder.append("RelationalGroupedDataset: [key: [")
+  val kFields = groupingExprs.map(_.asInstanceOf[NamedExpression]).map 
{
+case f => s"${f.name}: ${f.dataType.simpleString(2)}"
+  }
+  builder.append(kFields.take(2).mkString(", "))
+  if (kFields.length > 2) {
+builder.append(" ... " + (kFields.length - 2) + " more field(s)")
+  }
+  builder.append(s"], value: ${df.toString}, $groupType]").toString()
--- End diff --

`value:` or `df:`?

`$groupType` -> `type: $groupType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...

2017-10-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19363#discussion_r144501966
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala ---
@@ -46,6 +36,20 @@ abstract class QueryTest extends PlanTest {
   Locale.setDefault(Locale.US)
 
   /**
+   * Makes sure the answer matches the expected result
+   */
+  def checkString(expected: String, actual: String): Unit = {
+if (expected != actual) {
+  fail(
+"KeyValueGroupedDataset.toString() gives wrong result:\n\n" + 
sideBySide(
--- End diff --

You didn't only use this against `KeyValueGroupedDataset`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19470
  
**[Test build #82724 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82724/testReport)**
 for PR 19470 at commit 
[`8e7fe9b`](https://github.com/apache/spark/commit/8e7fe9b9e3fc6121686caad45dcfdb4ff08f0c4a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19470
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...

2017-10-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19363#discussion_r144500697
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -465,6 +466,24 @@ class RelationalGroupedDataset protected[sql](
 
 Dataset.ofRows(df.sparkSession, plan)
   }
+
+  override def toString: String = {
+try {
+  val builder = new StringBuilder
+  builder.append("RelationalGroupedDataset: [key: [")
+  val kFields = groupingExprs.map(_.asInstanceOf[NamedExpression]).map 
{
--- End diff --

Are all expressions in `groupingExprs` `NamedExpression`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...

2017-10-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19363#discussion_r144502598
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/RelationalGroupedDatasetSuite.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.SharedSQLContext
+
+class RelationalGroupedDatasetSuite extends QueryTest with 
SharedSQLContext {
+  import testImplicits._
+
+  test("Check RelationalGroupedDataset toString: Single data") {
+val kvDataset = (1 to 3).toDF("id").as[SingleData].groupBy("id")
--- End diff --

Is `SingleData` necessary?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19470
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82724/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...

2017-10-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19363#discussion_r144502680
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/RelationalGroupedDatasetSuite.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.SharedSQLContext
+
+class RelationalGroupedDatasetSuite extends QueryTest with 
SharedSQLContext {
+  import testImplicits._
+
+  test("Check RelationalGroupedDataset toString: Single data") {
+val kvDataset = (1 to 3).toDF("id").as[SingleData].groupBy("id")
+val expected = "RelationalGroupedDataset: [key: [id: int], value: [id: 
int], GroupByType]"
+val actual = kvDataset.toString
+checkString(expected, actual)
+  }
+
+  test("Check RelationalGroupedDataset toString: over length schema ") {
+val kvDataset = (1 to 3).map( x => (x, x.toString, x.toLong))
+  .toDF("id", "val1", "val2").as[TripleData].groupBy("id")
--- End diff --

Is `TripleData` necessary?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19491: [SPARK-22273][SQL] Fix key/value schema field nam...

2017-10-13 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/19491

[SPARK-22273][SQL] Fix key/value schema field names in HashMapGenerators.

## What changes were proposed in this pull request?

When fixing schema field names using escape characters with 
`addReferenceMinorObj()` at 
[SPARK-18952](https://issues.apache.org/jira/browse/SPARK-18952) (#16361), 
double-quotes around the names were remained and the names become something 
like `"((java.lang.String) references[1])"`.

```java
/* 055 */ private int maxSteps = 2;
/* 056 */ private int numRows = 0;
/* 057 */ private org.apache.spark.sql.types.StructType keySchema = new 
org.apache.spark.sql.types.StructType().add("((java.lang.String) 
references[1])", org.apache.spark.sql.types.DataTypes.StringType);
/* 058 */ private org.apache.spark.sql.types.StructType valueSchema = 
new org.apache.spark.sql.types.StructType().add("((java.lang.String) 
references[2])", org.apache.spark.sql.types.DataTypes.LongType);
/* 059 */ private Object emptyVBase;
```

We should remove the double-quotes to refer the values in `references` 
properly:

```java
/* 055 */ private int maxSteps = 2;
/* 056 */ private int numRows = 0;
/* 057 */ private org.apache.spark.sql.types.StructType keySchema = new 
org.apache.spark.sql.types.StructType().add(((java.lang.String) references[1]), 
org.apache.spark.sql.types.DataTypes.StringType);
/* 058 */ private org.apache.spark.sql.types.StructType valueSchema = 
new org.apache.spark.sql.types.StructType().add(((java.lang.String) 
references[2]), org.apache.spark.sql.types.DataTypes.LongType);
/* 059 */ private Object emptyVBase;
```

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-22273

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19491.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19491


commit c23e3d060a473c1d2558c60689a6f552ccbbe6f2
Author: Takuya UESHIN 
Date:   2017-09-28T13:33:35Z

Remove double-quotes around field name.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...

2017-10-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19363#discussion_r144501447
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -465,6 +466,24 @@ class RelationalGroupedDataset protected[sql](
 
 Dataset.ofRows(df.sparkSession, plan)
   }
+
+  override def toString: String = {
+try {
+  val builder = new StringBuilder
+  builder.append("RelationalGroupedDataset: [key: [")
--- End diff --

`key:` or `groupingBy:`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19491: [SPARK-22273][SQL] Fix key/value schema field names in H...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19491
  
**[Test build #82728 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82728/testReport)**
 for PR 19491 at commit 
[`c23e3d0`](https://github.com/apache/spark/commit/c23e3d060a473c1d2558c60689a6f552ccbbe6f2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17290: [SPARK-16599][CORE] java.util.NoSuchElementException: No...

2017-10-13 Thread aphasingnirvana
Github user aphasingnirvana commented on the issue:

https://github.com/apache/spark/pull/17290
  
@srowen I believe the bug still persists. Shouldn't we reopen it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19487: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/19487
  
LGTM. I'm going stick out today a slight roll of my PathOutputCommitter 
class which is one layer above FileOutputCommitter : lets people write 
committers without output & work paths, yet avoid going near complexity that is 
FileOutputFormat. See 
[PathOutputCommitter](https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/com/hortonworks/spark/cloud/commit/PathOutputCommitProtocol.scala);
 if I get my changes into Hadoop 3 then this committer will work for all 
versions of Hadoop 3.x, even though the S3A stuff is targeting Hadoop 3.1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTas...

2017-10-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/18979#discussion_r144505454
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
 ---
@@ -57,7 +60,14 @@ class BasicWriteTaskStatsTracker(hadoopConf: 
Configuration)
   private def getFileSize(filePath: String): Long = {
 val path = new Path(filePath)
 val fs = path.getFileSystem(hadoopConf)
-fs.getFileStatus(path).getLen()
+try {
+  fs.getFileStatus(path).getLen()
+} catch {
+  case e: FileNotFoundException =>
+// may arise against eventually consistent object stores
+logInfo(s"File $path is not yet visible", e)
--- End diff --

say "Reported file size in job summary may be invalid"?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17290: [SPARK-16599][CORE] java.util.NoSuchElementException: No...

2017-10-13 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17290
  
I don't think this change was the ultimate fix, and it caused another 
problem, so no I don't think this PR should be reopened.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-10-13 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18805
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18805
  
**[Test build #82729 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82729/testReport)**
 for PR 18805 at commit 
[`029a753`](https://github.com/apache/spark/commit/029a753ad4be6881c4e1721eecdfaad0f8b158bd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/18979
  
Noted :)
@dongjoon-hyun : is the issue with ORC that if there's nothing to write, it 
doesn't generate a file (so avoiding that issue with sometimes you get 0-byte 
ORC files & things downstream fail)?

If so, the warning message which @gatorsmile has proposed is potentially 
going to mislead people into worrying about a problem which isn't there. and 
the numFiles metric is going to mislead.

I'm starting to worry about how noisy the log would be, both there and when 
working with s3 when it's playing delayed visibility (rarer).

1. What if this patch just logged at debug: less noise, but still something 
there if people are trying to debug a mismatch?
1. if there's no file found, numFiles doesn't get incremented. 
1. I count the number of files actually submitted
1. And in `getFinalStats()` log @ info if there is a mismatch

This would line things up in future for actually returning the list of 
expected vs actual files up as a metric where it could be reported.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19488
  
**[Test build #82725 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82725/testReport)**
 for PR 19488 at commit 
[`32bdf77`](https://github.com/apache/spark/commit/32bdf771fe70444ac23adf796702b5a26e085805).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19488
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19488
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82725/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18979
  
**[Test build #82730 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82730/testReport)**
 for PR 18979 at commit 
[`adab985`](https://github.com/apache/spark/commit/adab985d0455f3f549ebceb03528f0a6a45a31c0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19464: [SPARK-22233] [core] Allow user to filter out empty spli...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19464
  
**[Test build #82726 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82726/testReport)**
 for PR 19464 at commit 
[`534d8fb`](https://github.com/apache/spark/commit/534d8fbcd7dfbdc9af06a4d926f6a353f429fce8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19464: [SPARK-22233] [core] Allow user to filter out empty spli...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19464
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19464: [SPARK-22233] [core] Allow user to filter out empty spli...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19464
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82726/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19476: [SPARK-22062][CORE] Spill large block to disk in BlockMa...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19476
  
**[Test build #82727 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82727/testReport)**
 for PR 19476 at commit 
[`2d00fd2`](https://github.com/apache/spark/commit/2d00fd25043c40ab271f4da1dc5b4dd7a407fc6f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19476: [SPARK-22062][CORE] Spill large block to disk in BlockMa...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19476
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82727/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19476: [SPARK-22062][CORE] Spill large block to disk in BlockMa...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19476
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-13 Thread amankothari04
Github user amankothari04 commented on the issue:

https://github.com/apache/spark/pull/16578
  
@viirya did you get a chance to review this ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18979
  
**[Test build #82731 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82731/testReport)**
 for PR 18979 at commit 
[`649f8da`](https://github.com/apache/spark/commit/649f8da245443567b842b697a4d47e5241eb5946).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19452: [SPARK-22136][SS] Evaluate one-sided conditions e...

2017-10-13 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/19452#discussion_r144532481
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingSymmetricHashJoinHelperSuite.scala
 ---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.apache.spark.sql.Column
+import org.apache.spark.sql.catalyst.analysis.SimpleAnalyzer
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.execution.{LeafExecNode, LocalTableScanExec, 
SparkPlan}
+import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
+import 
org.apache.spark.sql.execution.streaming.StreamingSymmetricHashJoinHelper.JoinConditionSplitPredicates
+import org.apache.spark.sql.types.DataTypes
+
+class StreamingSymmetricHashJoinHelperSuite extends StreamTest {
+  import org.apache.spark.sql.functions._
+
+  val attributeA = AttributeReference("a", DataTypes.IntegerType)()
+  val attributeB = AttributeReference("b", DataTypes.IntegerType)()
+  val attributeC = AttributeReference("c", DataTypes.IntegerType)()
+  val attributeD = AttributeReference("d", DataTypes.IntegerType)()
+  val colA = new Column(attributeA)
+  val colB = new Column(attributeB)
+  val colC = new Column(attributeC)
+  val colD = new Column(attributeD)
+
+  val left = new LocalTableScanExec(Seq(attributeA, attributeB), Seq())
+  val right = new LocalTableScanExec(Seq(attributeC, attributeD), Seq())
+
+  test("empty") {
+val split = JoinConditionSplitPredicates(None, left, right)
+assert(split.leftSideOnly.isEmpty)
+assert(split.rightSideOnly.isEmpty)
+assert(split.bothSides.isEmpty)
+assert(split.full.isEmpty)
+  }
+
+  test("only literals") {
+// Literal-only conjuncts end up on the left side because that's the 
first bucket they fit in.
+// There's no semantic reason they couldn't be in any bucket.
+val predicate = (lit(1) < lit(5) && lit(6) < lit(7) && lit(0) === 
lit(-1)).expr
+val split = JoinConditionSplitPredicates(Some(predicate), left, right)
+
+assert(split.leftSideOnly.contains(predicate))
+assert(split.rightSideOnly.isEmpty)
--- End diff --

Nit: why should rightSideOnly be empty? The literals CAN be evaluated using 
right side only. This feels very asymmetrical situation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19452: [SPARK-22136][SS] Evaluate one-sided conditions e...

2017-10-13 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/19452#discussion_r144533029
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingSymmetricHashJoinHelperSuite.scala
 ---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.apache.spark.sql.Column
+import org.apache.spark.sql.catalyst.analysis.SimpleAnalyzer
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.execution.{LeafExecNode, LocalTableScanExec, 
SparkPlan}
+import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
+import 
org.apache.spark.sql.execution.streaming.StreamingSymmetricHashJoinHelper.JoinConditionSplitPredicates
+import org.apache.spark.sql.types.DataTypes
+
+class StreamingSymmetricHashJoinHelperSuite extends StreamTest {
+  import org.apache.spark.sql.functions._
+
+  val attributeA = AttributeReference("a", DataTypes.IntegerType)()
+  val attributeB = AttributeReference("b", DataTypes.IntegerType)()
+  val attributeC = AttributeReference("c", DataTypes.IntegerType)()
+  val attributeD = AttributeReference("d", DataTypes.IntegerType)()
--- End diff --

the tests would be much more intuitive and therefore easier to read if 
these columns and variables are named such that they are obviously part of left 
of right. E.g. leftAttribute1 instead of attributeA, leftAttribute2 instead of 
attributeB, etc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19491: [SPARK-22273][SQL] Fix key/value schema field names in H...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19491
  
**[Test build #82728 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82728/testReport)**
 for PR 19491 at commit 
[`c23e3d0`](https://github.com/apache/spark/commit/c23e3d060a473c1d2558c60689a6f552ccbbe6f2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19491: [SPARK-22273][SQL] Fix key/value schema field names in H...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19491
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82728/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19491: [SPARK-22273][SQL] Fix key/value schema field names in H...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19491
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18979
  
**[Test build #82732 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82732/testReport)**
 for PR 18979 at commit 
[`d3f96f6`](https://github.com/apache/spark/commit/d3f96f63f263f134a79e76e8bdd1961f333b7c7a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19480#discussion_r144541081
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -201,6 +201,23 @@ class CodeGenerationSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("SPARK-6: group splitted expressions into one method per nested 
class") {
--- End diff --

@viirya I have a good and a bad news... Thanks to your suggestion I have 
been able to understand and reproduce the issue. Moreover, I found also another 
issue which is fixed by this problem and I am adding a UT for that too: in some 
cases, we might have a 
```
Code of method apply(...) grows beyond 64 KB
```
And with this PR the problem is fixed.

The bad thing is that the UT you provided still fails, but with a different 
error: actually it is always a Constant Pool limit exceeded exception, but it 
is in a NestedClass. From my analysis, this is caused by another problem, ie. 
that we might reference too many fields of the superclass in the NestedClasses. 
This might be addressed maybe trying to tune the magic number which I brought 
to 1000k in this PR, but I am pretty sure that it will be also addressed by the 
ongoing PR for SPARK-18016, since he is trying to reduce the number of 
variables. Thus I consider this out of scope for this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/19448
  
Thanks for reviewing this/getting it in. Personally, I had it in the 
"improvement" category rather than bug fix. If it wasn't for that line in the 
docs, there'd be no ambiguity about improve/vs fix, and there is always a 
lower-risk way to fix doc/code mismatch: change the docs.

But I'm grateful for it being in; with the backport to branch-2 ryan should 
be able to use it semi-immediately


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19488: [SPARK-22266][SQL] The same aggregate function was evalu...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19488
  
**[Test build #82733 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82733/testReport)**
 for PR 19488 at commit 
[`1cca72b`](https://github.com/apache/spark/commit/1cca72ba34a9e29cf51b56f7e9c8840c75ffe5da).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19480
  
**[Test build #82734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82734/testReport)**
 for PR 19480 at commit 
[`c4601b4`](https://github.com/apache/spark/commit/c4601b4ae1f6ba618e1a92e75b06faaf8d14bee3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array

2017-10-13 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/19492

[SPARK-8][SQL] Add support for array to from_json

## What changes were proposed in this pull request?

The fix introduces support for parsing array of primitive types in the 
`from_json` function. Currently, it supports only array of struct, which is 
equivalent to array of JSON objects, but it doesn't allow to parse array of 
simple types, like array of strings (eg. `["this is a valid JSON", "we cannot 
parse before the PR"]`).

## How was this patch tested?

Added UTs.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19492.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19492


commit cba1f4accaf5dbe305e366eab33879577d7f9a93
Author: Marco Gaido 
Date:   2017-10-11T10:29:26Z

[SPARK-8] Add support for array to from_json




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/19448
  
PS, for people who are interested in dynamic committers, 
[MAPREDUCE-6823](https://issues.apache.org/jira/browse/MAPREDUCE-6823) is 
something to look at. It allows you to switch committers under pretty much 
everything *other than parquet*...this patch helps make Parquet manageable too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

2017-10-13 Thread maryannxue
Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/19488#discussion_r144543377
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -205,14 +205,15 @@ object PhysicalAggregation {
 case logical.Aggregate(groupingExpressions, resultExpressions, child) 
=>
   // A single aggregate expression might appear multiple times in 
resultExpressions.
   // In order to avoid evaluating an individual aggregate function 
multiple times, we'll
-  // build a set of the distinct aggregate expressions and build a 
function which can
+  // build a map of the distinct aggregate expressions and build a 
function which can
   // be used to re-write expressions so that they reference the single 
copy of the
   // aggregate function which actually gets computed.
-  val aggregateExpressions = resultExpressions.flatMap { expr =>
+  val aggregateExpressionMap = resultExpressions.flatMap { expr =>
 expr.collect {
-  case agg: AggregateExpression => agg
+  case agg: AggregateExpression => (agg.canonicalized, 
agg.deterministic) -> agg
--- End diff --

Thank you, @cloud-fan! You've made a very good point! We should address 
non-deterministic correctly here. I have confirmed, though, that all aggregate 
functions should be deterministic (and fortunately Spark already guarantees 
this in the analysis stage). So I think we are safe to use "agg.canonicalized" 
without considering the "deterministic" field here, and so as not to be 
confusing. I have also added a code comment addressing this point in my latest 
commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19492: [SPARK-22228][SQL] Add support for array...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19492
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/18979
  
The latest PR update pulls in @dongjoon-hyun's new test; to avoid merge 
conflict in the Insert suite I've rebased against master.

1.  Everything handles missing files on output
2. There's only one logInfo at the end of the execute call, so if many 
empty files are created, the logs aren't too noisy.
3. There is now some implicit counting of how many files were missing `= 
submittedFiles - numFiles`; this isn't aggregated and reported though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19487: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-13 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/19487#discussion_r144545827
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -60,15 +71,6 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: 
String)
*/
   private def absPathStagingDir: Path = new Path(path, "_temporary-" + 
jobId)
--- End diff --

what if path is null here when passed in? I'd actually expect an 
IllegalArgumentException being raised, at least from looking at 
Path.checkParthArg()


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19448
  
I guess we wouldn't change the docs in branch-2.2 alone as we have a safe 
fix here for this mismatch anyway. I think I just wanted to say this backport 
can be justified.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19487: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-13 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/19487
  
Looking a bit more at this. I see it handles """ as well as empty, and also 
other forms of invalid URI which Path can't handle today ("multiple colons 
except with file:// on windows, etc).  And as long as you don't call 
`absPathStagingDir` you don't get an exception there.

do you actually want the code to downgrade if an invalid URI is passed in, 
that is: only skip if the path is empty/null, but not for other things like a 
path of "::" ? As there you may want to reject it. In which case you'd 
change the `hasValidPath` query to just looking at the string, and reinstate 
the test for the path with an invalid non-empty URI

 BTW, "test:" is possibly a valid path for "the home directory in the 
schema 'test'.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18979
  
**[Test build #82730 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82730/testReport)**
 for PR 18979 at commit 
[`adab985`](https://github.com/apache/spark/commit/adab985d0455f3f549ebceb03528f0a6a45a31c0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18979
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82730/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18979
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19480#discussion_r144554795
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -201,6 +201,23 @@ class CodeGenerationSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("SPARK-6: group splitted expressions into one method per nested 
class") {
--- End diff --

@mgaido91 Do you meant "2: compact primitive declarations into arrays" in 
SPARK-18016?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19480#discussion_r144555259
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2103,4 +2103,35 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
   Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 
3, 1), Row(7, 3, 2)))
   }
+
+  test("SPARK-6: splitExpressions should not cause \"Code of method 
grows beyond 64 KB\"") {
--- End diff --

`splitExpressions should not generate codes beyond 64KB`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-10-13 Thread michaelmior
Github user michaelmior commented on the issue:

https://github.com/apache/spark/pull/19263
  
@jerryshao Agreed on both points. Changed!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-13 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/18732
  
@cloud-fan, it's a good question, I thought quite a bit about it and 
discussed with @viirya 
-https://github.com/apache/spark/pull/18732#pullrequestreview-66106082

Just to recap, I think from a API perspective, having just one decorator 
`pandas_udf` making it easier for user to use - they don't need to think about 
which decorator to use where.  It does make it a little bit complicated for 
implementation because some code have to interpret the context in which a 
pandas_udf is used, i.e., `pandas_udf` in `groupby()apply()` is a 
`pandas.DataFrame -> pandas.DataFrame`, and in `withColumn`, `select` it's 
`pandas.Series -> pandas.Series`.

Another thought is even if we were to introduce something like 
`pandas_df_udf`, for instance, we might still run into issues in the future, 
where, say, we want a aggregate pandas udf that defines mapping `pandas.Series 
-> scalar`, so I don't think we can define a decorator for every input/output 
shape because there can potentially be many.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19480#discussion_r144559514
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -201,6 +201,23 @@ class CodeGenerationSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 }
   }
 
+  test("SPARK-6: group splitted expressions into one method per nested 
class") {
--- End diff --

@mgaido91 Thanks for trying it. Yeah, those expressions like `skewness` are 
very complicated, so they're likely to cause the issue you encountered.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-10-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18805
  
**[Test build #82729 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82729/testReport)**
 for PR 18805 at commit 
[`029a753`](https://github.com/apache/spark/commit/029a753ad4be6881c4e1721eecdfaad0f8b158bd).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18805
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82729/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18805
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19481: [SPARK-21907][CORE][BACKPORT 2.2] oom during spill

2017-10-13 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/19481
  
@eyalfa can you close the PR? The github infra does not this automatically 
for backports.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >