[GitHub] spark pull request #19388: [SPARK-22162] Executors and the driver should use...

2017-09-29 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/19388#discussion_r141924015
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
@@ -524,6 +525,13 @@ class PairRDDFunctionsSuite extends SparkFunSuite with 
SharedSparkContext {
 pairs.saveAsNewAPIHadoopFile[ConfigTestFormat]("ignored")
   }
 
+  test("The JobId on driver and executor should be the same during the 
commit") {
+val pairs = sc.parallelize(Array((new Integer(1), new Integer(2))), 1).
+  map( p => (new Integer(p._1 + 1), new Integer(p._2 + 1))).filter( p 
=> p._1 > 0)
--- End diff --

oh, of course on the pairs, sorry I missed that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19382: [SPARK-22158][SQL] convertMetastoreOrc/Parquet should no...

2017-09-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19382
  
@gatorsmile .
Whiling testing, I noticed that Parquet ignores table properties in case of 
`convertMetastoreParquet=false`, too. For that case, we need to proceed in 
another JIRA issue. #19218 may cover that, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between ...

2017-09-29 Thread adrian-ionescu
Github user adrian-ionescu commented on the issue:

https://github.com/apache/spark/pull/19149
  
I think the idea is good, but maybe the fix can be refined by identifying 
which are the rules that conflict with `InferFilterFromConstraints` and only 
rerunning those ones? Seems like a waste right now to run rules like 
`CombineUnions` or `RewriteCorrelatedScalarSubquery` twice (before and after), 
when they're obviously not impacted by `InferFilterFromConstraints`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19388: [SPARK-22162] Executors and the driver should use consis...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19388
  
**[Test build #82325 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82325/testReport)**
 for PR 19388 at commit 
[`4c0985e`](https://github.com/apache/spark/commit/4c0985ef9584d893e73a09100e93881ac6687dca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19388: [SPARK-22162] Executors and the driver should use...

2017-09-29 Thread rezasafi
Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19388#discussion_r141921618
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
@@ -524,6 +525,13 @@ class PairRDDFunctionsSuite extends SparkFunSuite with 
SharedSparkContext {
 pairs.saveAsNewAPIHadoopFile[ConfigTestFormat]("ignored")
   }
 
+  test("The JobId on driver and executor should be the same during the 
commit") {
+val pairs = sc.parallelize(Array((new Integer(1), new Integer(2))), 1).
+  map( p => (new Integer(p._1 + 1), new Integer(p._2 + 1))).filter( p 
=> p._1 > 0)
--- End diff --

The extra parenthesis is because I am trying to build an array of pairs. I 
will add another pair to make it clear. I will apply your other comments as 
well. Thank you very much.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19382: [SPARK-22158][SQL] convertMetastoreOrc/Parquet sh...

2017-09-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19382#discussion_r141920280
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1437,30 +1439,75 @@ class HiveDDLSuite
 }
   }
 
-  test("create hive serde table with new syntax") {
+  test("create hive serde table with new syntax - orc") {
+Seq("true", "false").foreach { value =>
+  withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> value) {
+withTable("t", "t2", "t3") {
+  withTempPath { path =>
+sql(
+  s"""
+|CREATE TABLE t(id int) USING hive
+|OPTIONS(fileFormat 'orc', compression 'Zlib')
+|LOCATION '${path.toURI}'
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(DDLUtils.isHiveTable(table))
+assert(table.storage.serde == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcSerde"))
+assert(table.storage.properties.get("compression") == 
Some("Zlib"))
+assert(spark.table("t").collect().isEmpty)
+
+sql("INSERT INTO t SELECT 1")
+checkAnswer(spark.table("t"), Row(1))
+// Check if this is compressed as ZLIB.
+val maybeOrcFile = 
path.listFiles().find(_.getName.startsWith("part"))
+assert(maybeOrcFile.isDefined)
+val orcFilePath = maybeOrcFile.get.toPath.toString
+val expectedCompressionKind =
+  OrcFileOperator.getFileReader(orcFilePath).get.getCompression
+assert("ZLIB" === expectedCompressionKind.name())
+
+sql("CREATE TABLE t2 USING HIVE AS SELECT 1 AS c1, 'a' AS c2")
+val table2 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t2"))
+assert(DDLUtils.isHiveTable(table2))
+assert(
+  table2.storage.serde == 
Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
+checkAnswer(spark.table("t2"), Row(1, "a"))
+
+sql("CREATE TABLE t3(a int, p int) USING hive PARTITIONED BY 
(p)")
+sql("INSERT INTO t3 PARTITION(p=1) SELECT 0")
+checkAnswer(spark.table("t3"), Row(0, 1))
+  }
+}
+  }
+}
+  }
+
+  test("create hive serde table with new syntax - parquet") {
 withTable("t", "t2", "t3") {
   withTempPath { path =>
 sql(
   s"""
-|CREATE TABLE t(id int) USING hive
-|OPTIONS(fileFormat 'orc', compression 'Zlib')
-|LOCATION '${path.toURI}'
-  """.stripMargin)
+ |CREATE TABLE t(id int) USING hive
+ |OPTIONS(fileFormat 'parquet', compression 'gzip')
+ |LOCATION '${path.toURI}'
+   """.stripMargin)
 val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
 assert(DDLUtils.isHiveTable(table))
-assert(table.storage.serde == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcSerde"))
-assert(table.storage.properties.get("compression") == Some("Zlib"))
+assert(table.storage.serde ==
+  
Some("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"))
+assert(table.storage.properties.get("compression") == Some("gzip"))
 assert(spark.table("t").collect().isEmpty)
 
 sql("INSERT INTO t SELECT 1")
 checkAnswer(spark.table("t"), Row(1))
-// Check if this is compressed as ZLIB.
-val maybeOrcFile = 
path.listFiles().find(!_.getName.endsWith(".crc"))
-assert(maybeOrcFile.isDefined)
-val orcFilePath = maybeOrcFile.get.toPath.toString
-val expectedCompressionKind =
-  OrcFileOperator.getFileReader(orcFilePath).get.getCompression
-assert("ZLIB" === expectedCompressionKind.name())
+val maybeParquetFile = path.listFiles().find(f => 
f.getName.startsWith("part"))
+assert(maybeParquetFile.isDefined)
+
+val footer = ParquetFileReader.readFooter(
+  sparkContext.hadoopConfiguration,
+  new Path(maybeParquetFile.get.getPath),
+  NO_FILTER)
+assert("GZIP" === 
footer.getBlocks.get(0).getColumns().get(0).getCodec.toString)
--- End diff --

It's for getting compression codec.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19382: [SPARK-22158][SQL] convertMetastoreOrc/Parquet sh...

2017-09-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19382#discussion_r141919885
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1437,30 +1439,75 @@ class HiveDDLSuite
 }
   }
 
-  test("create hive serde table with new syntax") {
+  test("create hive serde table with new syntax - orc") {
+Seq("true", "false").foreach { value =>
+  withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> value) {
+withTable("t", "t2", "t3") {
+  withTempPath { path =>
+sql(
+  s"""
+|CREATE TABLE t(id int) USING hive
+|OPTIONS(fileFormat 'orc', compression 'Zlib')
+|LOCATION '${path.toURI}'
+   """.stripMargin)
+val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
+assert(DDLUtils.isHiveTable(table))
+assert(table.storage.serde == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcSerde"))
+assert(table.storage.properties.get("compression") == 
Some("Zlib"))
+assert(spark.table("t").collect().isEmpty)
+
+sql("INSERT INTO t SELECT 1")
+checkAnswer(spark.table("t"), Row(1))
+// Check if this is compressed as ZLIB.
+val maybeOrcFile = 
path.listFiles().find(_.getName.startsWith("part"))
+assert(maybeOrcFile.isDefined)
+val orcFilePath = maybeOrcFile.get.toPath.toString
+val expectedCompressionKind =
+  OrcFileOperator.getFileReader(orcFilePath).get.getCompression
+assert("ZLIB" === expectedCompressionKind.name())
+
+sql("CREATE TABLE t2 USING HIVE AS SELECT 1 AS c1, 'a' AS c2")
+val table2 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t2"))
+assert(DDLUtils.isHiveTable(table2))
+assert(
+  table2.storage.serde == 
Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
+checkAnswer(spark.table("t2"), Row(1, "a"))
+
+sql("CREATE TABLE t3(a int, p int) USING hive PARTITIONED BY 
(p)")
+sql("INSERT INTO t3 PARTITION(p=1) SELECT 0")
+checkAnswer(spark.table("t3"), Row(0, 1))
+  }
+}
+  }
+}
+  }
+
+  test("create hive serde table with new syntax - parquet") {
 withTable("t", "t2", "t3") {
   withTempPath { path =>
 sql(
   s"""
-|CREATE TABLE t(id int) USING hive
-|OPTIONS(fileFormat 'orc', compression 'Zlib')
-|LOCATION '${path.toURI}'
-  """.stripMargin)
+ |CREATE TABLE t(id int) USING hive
+ |OPTIONS(fileFormat 'parquet', compression 'gzip')
--- End diff --

parquet use `gzip`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19382: [SPARK-22158][SQL] convertMetastoreOrc/Parquet should no...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19382
  
**[Test build #82324 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82324/testReport)**
 for PR 19382 at commit 
[`d218c98`](https://github.com/apache/spark/commit/d218c98d7af75ffc13e9900a9370885f83f55dc6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19382: [SPARK-22158][SQL] convertMetastoreOrc/Parquet sh...

2017-09-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19382#discussion_r141919805
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -189,12 +189,12 @@ case class RelationConversions(
   private def convert(relation: HiveTableRelation): LogicalRelation = {
 val serde = 
relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT)
 if (serde.contains("parquet")) {
-  val options = Map(ParquetOptions.MERGE_SCHEMA ->
+  val options = relation.tableMeta.storage.properties + 
(ParquetOptions.MERGE_SCHEMA ->
--- End diff --

Hi, @gatorsmile . It's the same in the Paquet. Without this, the test case 
for parquet fails.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between ...

2017-09-29 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/19149
  
Can we add a test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19388: [SPARK-22162] Executors and the driver should use...

2017-09-29 Thread rezasafi
Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/19388#discussion_r141917674
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
@@ -524,6 +525,13 @@ class PairRDDFunctionsSuite extends SparkFunSuite with 
SharedSparkContext {
 pairs.saveAsNewAPIHadoopFile[ConfigTestFormat]("ignored")
   }
 
+  test("The JobId on driver and executor should be the same during the 
commit") {
+val pairs = sc.parallelize(Array((new Integer(1), new Integer(2))), 1).
+  map( p => (new Integer(p._1 + 1), new Integer(p._2 + 1))).filter( p 
=> p._1 > 0)
--- End diff --

The extra parenthesis is because the array has the type of pairs. I will 
add another pair to make it clear. I will apply your other changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19263
  
**[Test build #82323 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82323/testReport)**
 for PR 19263 at commit 
[`eaeeef2`](https://github.com/apache/spark/commit/eaeeef2025923502cf3804d0d377dc1d2d1c57b0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...

2017-09-29 Thread michaelmior
Github user michaelmior commented on a diff in the pull request:

https://github.com/apache/spark/pull/19263#discussion_r141916752
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -66,6 +67,7 @@ private[spark] class EventLoggingListener(
 
   private val shouldCompress = 
sparkConf.getBoolean("spark.eventLog.compress", false)
   private val shouldOverwrite = 
sparkConf.getBoolean("spark.eventLog.overwrite", false)
+  private val shouldLogBlockUpdates = 
sparkConf.getBoolean("spark.eventLog.blockUpdates", false)
--- End diff --

Thanks for clarifying. I opted to convert other event log settings as well 
for uniformity.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18760: [SPARK-21560][Core] Add hold mode for the LiveListenerBu...

2017-09-29 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/18760
  
> I use offer(non-blocking) to control the empty rate in the configuration.

Won't that just block all posts until you hit the empty rate? I don't see a 
lot of difference.

But yeah, it would be good to test how things work with queues before 
adding yet more features to this part of the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...

2017-09-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19263#discussion_r141915135
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -66,6 +67,7 @@ private[spark] class EventLoggingListener(
 
   private val shouldCompress = 
sparkConf.getBoolean("spark.eventLog.compress", false)
   private val shouldOverwrite = 
sparkConf.getBoolean("spark.eventLog.overwrite", false)
+  private val shouldLogBlockUpdates = 
sparkConf.getBoolean("spark.eventLog.blockUpdates", false)
--- End diff --

The other configs predate the config library; it's ok either way for those, 
but new configs should use the new library.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19392: [SPARK-22169][SQL] table name with numbers and character...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19392
  
**[Test build #82322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82322/testReport)**
 for PR 19392 at commit 
[`9a11231`](https://github.com/apache/spark/commit/9a11231742692e33fac3c466c2a03a15ca8a16c3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19392: [SPARK-22169][SQL] table name with numbers and character...

2017-09-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19392
  
cc @hvanhovell @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19392: [SPARK-22169][SQL] table name with numbers and ch...

2017-09-29 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/19392

[SPARK-22169][SQL] table name with numbers and characters should parsed 
successfuly

## What changes were proposed in this pull request?

By definition the table name in Spark can be something like `123x`, `25a`, 
etc. However, some special cases are unsupported, like `12L`, `34M`, etc. It's 
because the lexer parses them to numeric literal tokens instead of identifier 
tokens. A simple fix is to include these literal tokens in the `identifier` 
parser rule.

TODO:
decimal literals are still un-supported, e.g. `1L.23D`. This is because 
`.23D` is also a valid token, we need some lexer hack to parse this input to 3 
tokens: `1L`, `.`, `23D`, and I'm not sure if it worth.

## How was this patch tested?

regression test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark parser-bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19392.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19392


commit 9a11231742692e33fac3c466c2a03a15ca8a16c3
Author: Wenchen Fan 
Date:   2017-09-29T16:13:19Z

table name with numbers and characters should be supported




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19382: [SPARK-22158][SQL] convertMetastoreOrc should not...

2017-09-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19382#discussion_r141908151
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1438,39 +1438,44 @@ class HiveDDLSuite
   }
 
   test("create hive serde table with new syntax") {
-withTable("t", "t2", "t3") {
-  withTempPath { path =>
-sql(
-  s"""
-|CREATE TABLE t(id int) USING hive
-|OPTIONS(fileFormat 'orc', compression 'Zlib')
-|LOCATION '${path.toURI}'
-  """.stripMargin)
-val table = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t"))
-assert(DDLUtils.isHiveTable(table))
-assert(table.storage.serde == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcSerde"))
-assert(table.storage.properties.get("compression") == Some("Zlib"))
-assert(spark.table("t").collect().isEmpty)
-
-sql("INSERT INTO t SELECT 1")
-checkAnswer(spark.table("t"), Row(1))
-// Check if this is compressed as ZLIB.
-val maybeOrcFile = 
path.listFiles().find(!_.getName.endsWith(".crc"))
-assert(maybeOrcFile.isDefined)
-val orcFilePath = maybeOrcFile.get.toPath.toString
-val expectedCompressionKind =
-  OrcFileOperator.getFileReader(orcFilePath).get.getCompression
-assert("ZLIB" === expectedCompressionKind.name())
-
-sql("CREATE TABLE t2 USING HIVE AS SELECT 1 AS c1, 'a' AS c2")
-val table2 = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t2"))
-assert(DDLUtils.isHiveTable(table2))
-assert(table2.storage.serde == 
Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
-checkAnswer(spark.table("t2"), Row(1, "a"))
-
-sql("CREATE TABLE t3(a int, p int) USING hive PARTITIONED BY (p)")
-sql("INSERT INTO t3 PARTITION(p=1) SELECT 0")
-checkAnswer(spark.table("t3"), Row(0, 1))
+Seq("true", "false").foreach { value =>
+  withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> value) {
+withTable("t", "t2", "t3") {
+  withTempPath { path =>
+sql(
+  s"""
+|CREATE TABLE t(id int) USING hive
+|OPTIONS(fileFormat 'orc', compression 'Zlib')
--- End diff --

Thank you for review, @gatorsmile . I'll check that, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19391: [SPARK-22146] FileNotFoundException while reading...

2017-09-29 Thread mgaido91
Github user mgaido91 closed the pull request at:

https://github.com/apache/spark/pull/19391


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19391: [SPARK-22146] FileNotFoundException while reading ORC fi...

2017-09-29 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19391
  
yes, thank you @srowen! It has been backported now! Sorry for wasting your 
time with this wrong PR. And thank you again for your help.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/18732#discussion_r141906843
  
--- Diff: python/pyspark/sql/group.py ---
@@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None):
 jgd = self._jgd.pivot(pivot_col, values)
 return GroupedData(jgd, self.sql_ctx)
 
+def apply(self, udf_obj):
--- End diff --

I see. It's a totally valid concern. Yeah I think we can distinguish them 
by returnType.

I will update the doc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19368: [SPARK-22146] FileNotFoundException while reading ORC fi...

2017-09-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19368
  
@mgaido91 I just pushed it to 2.2 branch, since the risk of this fix is 
small. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19386: [SPARK-22161] [SQL] Add Impala-modified TPC-DS qu...

2017-09-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19386


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19386: [SPARK-22161] [SQL] Add Impala-modified TPC-DS queries

2017-09-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19386
  
Thanks! Merged to master/2.2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19391: [SPARK-22146] FileNotFoundException while reading ORC fi...

2017-09-29 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19391
  
Oh we just back-port the original PR then. Do you need in 2.2?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...

2017-09-29 Thread ArtRand
Github user ArtRand commented on the issue:

https://github.com/apache/spark/pull/19272
  
Hello @vanzin 

Thanks for taking a look at this. Good to know that there can be downstream 
errors depending on the situation. 

Would very much appreciate a proper review on this work when you have some 
time, very keen on getting this into the next release. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19391: [SPARK-22146] FileNotFoundException while reading ORC fi...

2017-09-29 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19391
  
@srowen  Yes, it cherry-picked cleanly, why?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19391: [SPARK-22146] FileNotFoundException while reading ORC fi...

2017-09-29 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19391
  
Did the original PR not cherry-pick cleanly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19391: [SPARK-22146] FileNotFoundException while reading ORC fi...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19391
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19391: [SPARK-22146] FileNotFoundException while reading ORC fi...

2017-09-29 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19391
  
This is a backport to branch 2.2 for SPARK-22146 which has already been 
merged to master.

cc @dongjoon-hyun @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19270: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19270
  
**[Test build #82321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82321/testReport)**
 for PR 19270 at commit 
[`c1f85ae`](https://github.com/apache/spark/commit/c1f85aeab03e2d172261431d6e5e46cbd303e069).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19391: [SPARK-22146] FileNotFoundException while reading...

2017-09-29 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/19391

[SPARK-22146] FileNotFoundException while reading ORC files containing 
special characters

## What changes were proposed in this pull request?

Reading ORC files containing special characters like '%' fails with a 
FileNotFoundException.
This PR aims to fix the problem.

## How was this patch tested?

Added UT.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-22146_branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19391.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19391


commit 570e0e801ad584924e26b96f21a4902f6de0cfdf
Author: Marco Gaido 
Date:   2017-09-29T06:14:53Z

[SPARK-22146] FileNotFoundException while reading ORC files containing 
special characters

## What changes were proposed in this pull request?

Reading ORC files containing special characters like '%' fails with a 
FileNotFoundException.
This PR aims to fix the problem.

## How was this patch tested?

Added UT.

Author: Marco Gaido 
Author: Marco Gaido 

Closes #19368 from mgaido91/SPARK-22146.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19290: [SPARK-22063][R] Upgrades lintr to latest commit sha1 ID

2017-09-29 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/19290
  
hey all, i'm here.  was out sick for the past few days and trying to get
caught up.  sorry about that!

so...  what version of lintr do we need to put on the workers?

On Fri, Sep 29, 2017 at 3:32 AM, UCB AMPLab 
wrote:

> Test PASSed.
> Refer to this link for build results (access rights to CI server needed):
> https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82312/
> Test PASSed.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18732#discussion_r141892887
  
--- Diff: python/pyspark/sql/group.py ---
@@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None):
 jgd = self._jgd.pivot(pivot_col, values)
 return GroupedData(jgd, self.sql_ctx)
 
+def apply(self, udf_obj):
--- End diff --

I'm basically concerned that there is no distinct difference between the 
current pandas udf and the new one for `apply`.  But seems we can distinguish 
them by looking at the return type? If so, we may no need of `pandas_df_udf`.

But we should update the doc of `pandas_udf` for this kind of (`apply`) 
pandas udf.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141892238
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorterSuite.java
 ---
@@ -139,4 +139,44 @@ public int compare(
 }
 assertEquals(dataToSort.length, iterLength);
   }
+
+  @Test
+  public void freeAfterOOM() {
+final TestMemoryManager testMemoryManager = new TestMemoryManager(new 
SparkConf().set("spark.memory.offHeap.enabled", "false"));
--- End diff --

nit: Is it better to wrap this line since this is too long.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19359: [SPARK-22129][SPARK-22138] Release script improve...

2017-09-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19359


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-09-29 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/18732
  
Thanks all for the initial review! I will address some comments and upload 
a new version today.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/18732#discussion_r141891357
  
--- Diff: python/pyspark/sql/group.py ---
@@ -194,6 +194,28 @@ def pivot(self, pivot_col, values=None):
 jgd = self._jgd.pivot(pivot_col, values)
 return GroupedData(jgd, self.sql_ctx)
 
+def apply(self, udf_obj):
--- End diff --

I am not sure if that is necessary. We can check if the function should 
have just one parameter in `apply()` without introducing a new `pandas_df_udf` 
too.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19263
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19263
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82316/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19263
  
**[Test build #82316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82316/testReport)**
 for PR 19263 at commit 
[`0bb4130`](https://github.com/apache/spark/commit/0bb4130337e5862d92eaf653fbffdd3ebff7f364).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/18732#discussion_r141886413
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala
 ---
@@ -44,14 +44,17 @@ case class ArrowEvalPythonExec(udfs: Seq[PythonUDF], 
output: Seq[Attribute], chi
 val schemaOut = 
StructType.fromAttributes(output.drop(child.output.length).zipWithIndex
   .map { case (attr, i) => attr.withName(s"_$i") })
 
+val batchedIter: Iterator[Iterator[InternalRow]] =
+  iter.grouped(conf.arrowMaxRecordsPerBatch).map(_.iterator)
+
--- End diff --

I actually find this code doesn't work now. I will fix it.

@ueshin is right, this is to reuse `ArrowEvalPython` for both the current 
pandas udf and `apply()`. I basically want to lift the batching logic out of 
`ArrowEvalPython` so the called and decide how they want rows to be batched 
into `RecordBatch`. 

In the current pandas udf case, it batches it by 
`conf.arrowMaxRecordsPerBatch` and in `apply` it batches by one group per batch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19386: [SPARK-22161] [SQL] Add Impala-modified TPC-DS queries

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on the issue:

https://github.com/apache/spark/pull/19386
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/18732#discussion_r141885324
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3376,6 +3377,74 @@ def test_vectorized_udf_empty_partition(self):
 res = df.select(f(col('id')))
 self.assertEquals(df.collect(), res.collect())
 
+@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not 
installed")
+class GroupbyApplyTests(ReusedPySparkTestCase):
+@classmethod
+def setUpClass(cls):
+ReusedPySparkTestCase.setUpClass()
+cls.spark = SparkSession(cls.sc)
+
+@classmethod
+def tearDownClass(cls):
+ReusedPySparkTestCase.tearDownClass()
+cls.spark.stop()
+
+def assertFramesEqual(self, expected, result):
+msg = ("DataFrames are not equal: " +
+   ("\n\nExpected:\n%s\n%s" % (expected, expected.dtypes)) +
+   ("\n\nResult:\n%s\n%s" % (result, result.dtypes)))
+self.assertTrue(expected.equals(result), msg=msg)
+
+@property
+def data(self):
+from pyspark.sql.functions import pandas_udf, array, explode, col, 
lit
+return self.spark.range(10).toDF('id') \
+.withColumn("vs", array([lit(i) for i in range(20, 30)])) \
+.withColumn("v", explode(col('vs'))).drop('vs')
+
+def test_groupby_apply_simple(self):
+from pyspark.sql.functions import pandas_udf
+df = self.data
+
+def foo(df):
+ret = df
+ret = ret.assign(v1=df.v * df.id * 1.0)
+ret = ret.assign(v2=df.v + df.id)
+return ret
+
+foo_udf = pandas_udf(
+foo,
+StructType(
+[StructField('id', LongType()),
+ StructField('v', IntegerType()),
+ StructField('v1', DoubleType()),
+ StructField('v2', LongType())]))
--- End diff --

Yes the column names are specified in the returnType and the returnType 
must be a `StructType`.

The rational is that `apply()` is a mapping from a pd.Dataframe -> 
pd.DataFrame, therefore the returnType must be a `StructType`.

This is the best way I can think of to specify the column names and 
returnType, it makes sense to me because there should be a one-to-one mapping 
between the return value of the function (a `pd.DataFrame`) and it's schema (a 
`StructType` containing column names and dataType)

Also because `pd.DataFrame` doesn't support nested types, there is no 
ambiguity whether a `StructType` indicates a `pd.DataFrame` or nested type 
either.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-09-29 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/18732#discussion_r141883671
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2206,6 +2207,10 @@ def pandas_udf(f=None, returnType=StringType()):
 | 8|  JOHN DOE|  22|
 +--+--++
 """
+import pandas as pd
+if isinstance(returnType, pd.Series):
+returnType = from_pandas_dtypes(returnType)
--- End diff --

I agree having a consistent way to express return type is good.

The reason I added this is to enable this usage:

```
sample_df = df.filter(df.id == 1).toPandas()

def foo(df):
  ret = # Some transformation on the input pd.DataFrame
  return ret

foo_udf = pandas_udf(foo, foo(sample_df).dtypes)

df.groupBy('id').apply(foo_udf)
```

The pattern is quite useful in interactive usage. Here the user no longer 
needs to specify the return schema of the `foo` manually. And if the user 
changes the return columns of `foo`, they don't need to change the return type 
of `pandas_udf`. 

I am leaning towards keeping this but I am willing to be convinced.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13493
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82319/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13493
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13493
  
**[Test build #82319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82319/testReport)**
 for PR 13493 at commit 
[`fbbcd26`](https://github.com/apache/spark/commit/fbbcd263c32a008873c7f080e5abadf1c01fa006).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19390
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19390
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82320/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19390
  
**[Test build #82320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82320/testReport)**
 for PR 19390 at commit 
[`2db3793`](https://github.com/apache/spark/commit/2db379316fb2155f8f1b658015df24c14bfd2aaf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19390
  
**[Test build #82318 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82318/testReport)**
 for PR 19390 at commit 
[`2fe6f70`](https://github.com/apache/spark/commit/2fe6f70b3410f08b1803ec9d4b6b7ee135316855).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19390
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82318/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19390
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19390
  
**[Test build #82320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82320/testReport)**
 for PR 19390 at commit 
[`2db3793`](https://github.com/apache/spark/commit/2db379316fb2155f8f1b658015df24c14bfd2aaf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13493
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13493
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82315/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13493
  
**[Test build #82315 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82315/testReport)**
 for PR 13493 at commit 
[`3575444`](https://github.com/apache/spark/commit/357544403c6f7d4fb165c68213e647bc442dce07).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19390
  
**[Test build #82318 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82318/testReport)**
 for PR 19390 at commit 
[`2fe6f70`](https://github.com/apache/spark/commit/2fe6f70b3410f08b1803ec9d4b6b7ee135316855).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13493
  
**[Test build #82319 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82319/testReport)**
 for PR 13493 at commit 
[`fbbcd26`](https://github.com/apache/spark/commit/fbbcd263c32a008873c7f080e5abadf1c01fa006).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19390
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19390
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82317/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19390
  
**[Test build #82317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82317/testReport)**
 for PR 19390 at commit 
[`28717d9`](https://github.com/apache/spark/commit/28717d9be1f6bcca541eeb8da0d4083db8465610).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrow...

2017-09-29 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13493#discussion_r141852693
  
--- Diff: python/pyspark/mllib/tests.py ---
@@ -1762,6 +1763,18 @@ def test_pca(self):
 self.assertEqualUpToSign(pcs.toArray()[:, k - 1], 
expected_pcs[:, k - 1])
 
 
+class FPGrowthTest(MLlibTestCase):
+
+def test_fpgrowth(self):
+data = [["a", "b", "c"], ["a", "b", "d", "e"], ["a", "c", "e"], 
["a", "c", "f"]]
+rdd = self.sc.parallelize(data, 2)
+model = FPGrowth.train(rdd, 0.6, 2)
--- End diff --

This line is useless.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19390
  
**[Test build #82317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82317/testReport)**
 for PR 19390 at commit 
[`28717d9`](https://github.com/apache/spark/commit/28717d9be1f6bcca541eeb8da0d4083db8465610).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations

2017-09-29 Thread skonto
GitHub user skonto opened a pull request:

https://github.com/apache/spark/pull/19390

[SPARK-18935][MESOS] Fix dynamic reservations

## What changes were proposed in this pull request?

- Solves the issue described in the ticket by preserving reservation and 
allocation info.
- Adds extra debug level logging to make debugging easier in the future eg.

> 17/09/29 14:53:07 DEBUG MesosCoarseGrainedSchedulerBackend: Accepting 
offer: f20de49b-dee3-45dd-a3c1-73418b7de891-O32 with attributes: Map() 
allocation info: role: "spark-prive"
>  reservation info: name: "ports"
> type: RANGES
> ranges {
>   range {
> begin: 31000
> end: 32000
>   }
> }
> role: "spark-prive"
> reservation {
>   principal: "test"
> }
> allocation_info {
>   role: "spark-prive"
> }

- Some style cleanup.

## How was this patch tested?

Manually by running the example in the ticket with and without a principal.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/skonto/spark fix_dynamic_reservation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19390.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19390


commit 28717d9be1f6bcca541eeb8da0d4083db8465610
Author: Stavros Kontopoulos 
Date:   2017-09-29T11:49:23Z

fix dynamic reservations




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: [SPARK-22050][CORE] Allow BlockUpdated events to be opti...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19263
  
**[Test build #82316 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82316/testReport)**
 for PR 19263 at commit 
[`0bb4130`](https://github.com/apache/spark/commit/0bb4130337e5862d92eaf653fbffdd3ebff7f364).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19344: [SPARK-22122][SQL] Use analyzed logical plans to count i...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19344
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19263: [SPARK-22050][CORE] Allow BlockUpdated events to ...

2017-09-29 Thread michaelmior
Github user michaelmior commented on a diff in the pull request:

https://github.com/apache/spark/pull/19263#discussion_r141849398
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -66,6 +67,7 @@ private[spark] class EventLoggingListener(
 
   private val shouldCompress = 
sparkConf.getBoolean("spark.eventLog.compress", false)
   private val shouldOverwrite = 
sparkConf.getBoolean("spark.eventLog.overwrite", false)
+  private val shouldLogBlockUpdates = 
sparkConf.getBoolean("spark.eventLog.blockUpdates", false)
--- End diff --

@vanzin The reason I didn't is because all of the other config settings 
nearby don't. Would it make sense to change the other event log settings as 
part of this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19344: [SPARK-22122][SQL] Use analyzed logical plans to count i...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19344
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82314/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19344: [SPARK-22122][SQL] Use analyzed logical plans to count i...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19344
  
**[Test build #82314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82314/testReport)**
 for PR 19344 at commit 
[`5691cf6`](https://github.com/apache/spark/commit/5691cf6f7bda6732b7d610701c2485397a6a94b5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13493
  
**[Test build #82315 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82315/testReport)**
 for PR 13493 at commit 
[`3575444`](https://github.com/apache/spark/commit/357544403c6f7d4fb165c68213e647bc442dce07).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2017-09-29 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/13493
  
ping @zjffdu Looks reasonable fix, But pls resolve the conflicts! Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19287
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19287
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82313/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19287
  
**[Test build #82313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82313/testReport)**
 for PR 19287 at commit 
[`94fd257`](https://github.com/apache/spark/commit/94fd257c3a8da6ef4473eab72e826af57b10ed47).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141843276
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorterSuite.java
 ---
@@ -139,4 +139,44 @@ public int compare(
 }
 assertEquals(dataToSort.length, iterLength);
   }
+
+  @Test
+  public void freeAfterOOM() {
+final TestMemoryManager testMemoryManager = new TestMemoryManager(new 
SparkConf().set("spark.memory.offHeap.enabled", "false"));
+final TaskMemoryManager memoryManager = new TaskMemoryManager(
+testMemoryManager, 0);
+final TestMemoryConsumer consumer = new 
TestMemoryConsumer(memoryManager);
+final MemoryBlock dataPage = memoryManager.allocatePage(2048, 
consumer);
+final Object baseObject = dataPage.getBaseObject();
+// Write the records into the data page:
+long position = dataPage.getBaseOffset();
+
+final HashPartitioner hashPartitioner = new HashPartitioner(4);
+// Use integer comparison for comparing prefixes (which are partition 
ids, in this case)
+final PrefixComparator prefixComparator = PrefixComparators.LONG;
+final RecordComparator recordComparator = new RecordComparator() {
+  @Override
+  public int compare(
+  Object leftBaseObject,
+  long leftBaseOffset,
+  Object rightBaseObject,
+  long rightBaseOffset) {
+return 0;
+  }
+};
+UnsafeInMemorySorter sorter = new UnsafeInMemorySorter(consumer, 
memoryManager,
+recordComparator, prefixComparator, 100, shouldUseRadixSort());
+
+testMemoryManager.markExecutionAsOutOfMemoryOnce();
+try {
+  sorter.reset();
+} catch( OutOfMemoryError oom ) {
+  //as expected
+}
+// this currently fails on NPE at 
org.apache.spark.memory.MemoryConsumer.freeArray(MemoryConsumer.java:108)
+sorter.free();
+//simulate a 'back to back' free.
--- End diff --

nit: ws: `// simulate ...`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141843447
  
--- Diff: 
core/src/test/scala/org/apache/spark/memory/TestMemoryManager.scala ---
@@ -58,11 +58,15 @@ class TestMemoryManager(conf: SparkConf)
 
   override def maxOffHeapStorageMemory: Long = 0L
 
-  private var oomOnce = false
+  private var conseqOOM = 0
   private var available = Long.MaxValue
 
   def markExecutionAsOutOfMemoryOnce(): Unit = {
-oomOnce = true
+markConseqOOM(1)
+  }
+
+  def markConseqOOM( n : Int) : Unit = {
--- End diff --

nit: markConsequentOOM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141843414
  
--- Diff: 
core/src/test/scala/org/apache/spark/memory/TestMemoryManager.scala ---
@@ -58,11 +58,15 @@ class TestMemoryManager(conf: SparkConf)
 
   override def maxOffHeapStorageMemory: Long = 0L
 
-  private var oomOnce = false
+  private var conseqOOM = 0
--- End diff --

nit: conseq -> consequent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141843193
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorterSuite.java
 ---
@@ -139,4 +139,44 @@ public int compare(
 }
 assertEquals(dataToSort.length, iterLength);
   }
+
+  @Test
+  public void freeAfterOOM() {
+final TestMemoryManager testMemoryManager = new TestMemoryManager(new 
SparkConf().set("spark.memory.offHeap.enabled", "false"));
+final TaskMemoryManager memoryManager = new TaskMemoryManager(
+testMemoryManager, 0);
+final TestMemoryConsumer consumer = new 
TestMemoryConsumer(memoryManager);
+final MemoryBlock dataPage = memoryManager.allocatePage(2048, 
consumer);
+final Object baseObject = dataPage.getBaseObject();
+// Write the records into the data page:
+long position = dataPage.getBaseOffset();
+
+final HashPartitioner hashPartitioner = new HashPartitioner(4);
+// Use integer comparison for comparing prefixes (which are partition 
ids, in this case)
+final PrefixComparator prefixComparator = PrefixComparators.LONG;
+final RecordComparator recordComparator = new RecordComparator() {
+  @Override
+  public int compare(
+  Object leftBaseObject,
+  long leftBaseOffset,
+  Object rightBaseObject,
+  long rightBaseOffset) {
+return 0;
+  }
+};
+UnsafeInMemorySorter sorter = new UnsafeInMemorySorter(consumer, 
memoryManager,
+recordComparator, prefixComparator, 100, shouldUseRadixSort());
+
+testMemoryManager.markExecutionAsOutOfMemoryOnce();
+try {
+  sorter.reset();
+} catch( OutOfMemoryError oom ) {
+  //as expected
+}
+// this currently fails on NPE at 
org.apache.spark.memory.MemoryConsumer.freeArray(MemoryConsumer.java:108)
--- End diff --

nit: tense: "this currently fails" -> "[SPARK-21907] this failed ..."
At the point when anyone reads it, it will hopefully not fail :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141842424
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 ---
@@ -85,7 +85,7 @@
   private final LinkedList spillWriters = new 
LinkedList<>();
 
   // These variables are reset after spilling:
-  @Nullable private volatile UnsafeInMemorySorter inMemSorter;
+  private @Nullable volatile UnsafeInMemorySorter inMemSorter;
--- End diff --

nit: unnecessary change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141842918
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java
 ---
@@ -503,6 +511,39 @@ public void testGetIterator() throws Exception {
 verifyIntIterator(sorter.getIterator(279), 279, 300);
   }
 
+  @Test
+  public void testOOMDuringSpill() throws Exception {
+final UnsafeExternalSorter sorter = newSorter();
+for (int i = 0; sorter.hasSpaceForAnotherRecord(); ++i) {
+  insertNumber(sorter, i);
+}
+// todo: this might actually not be zero if pageSize is somehow 
configured differently,
+// so we actually have to compute the expected spill size according to 
the configured page size
+assertEquals(0,  sorter.getSpillSize());
--- End diff --

If this might actually not be zero, maybe don't test this assertion?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141842522
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java
 ---
@@ -162,14 +162,25 @@ private int getUsableCapacity() {
*/
   public void free() {
 if (consumer != null) {
-  consumer.freeArray(array);
+  if (null != array) {
--- End diff --

nit: RHS literal (array != null)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141842730
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java
 ---
@@ -162,14 +162,25 @@ private int getUsableCapacity() {
*/
   public void free() {
 if (consumer != null) {
-  consumer.freeArray(array);
+  if (null != array) {
+consumer.freeArray(array);
+  }
   array = null;
 }
   }
 
   public void reset() {
 if (consumer != null) {
   consumer.freeArray(array);
+  // this is needed to prevent a 'nested' spill,
--- End diff --

nit: it doesn't prevent a nested spill, it only renders it harmless
remove this line - the rest of the comment is true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141843914
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java
 ---
@@ -19,10 +19,18 @@
 
 import java.io.File;
 import java.io.IOException;
+import java.io.PrintWriter;
 import java.util.Arrays;
 import java.util.LinkedList;
 import java.util.UUID;
 
+import jodd.io.StringOutputStream;
+import org.hamcrest.Description;
+import org.hamcrest.Matcher;
+import org.hamcrest.Matchers;
+import org.hamcrest.TypeSafeMatcher;
+import org.junit.Rule;
+import org.junit.rules.ExpectedException;
--- End diff --

nit: I think you don't use most of these imports anymore.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141843227
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorterSuite.java
 ---
@@ -139,4 +139,44 @@ public int compare(
 }
 assertEquals(dataToSort.length, iterLength);
   }
+
+  @Test
+  public void freeAfterOOM() {
+final TestMemoryManager testMemoryManager = new TestMemoryManager(new 
SparkConf().set("spark.memory.offHeap.enabled", "false"));
+final TaskMemoryManager memoryManager = new TaskMemoryManager(
+testMemoryManager, 0);
+final TestMemoryConsumer consumer = new 
TestMemoryConsumer(memoryManager);
+final MemoryBlock dataPage = memoryManager.allocatePage(2048, 
consumer);
+final Object baseObject = dataPage.getBaseObject();
+// Write the records into the data page:
+long position = dataPage.getBaseOffset();
+
+final HashPartitioner hashPartitioner = new HashPartitioner(4);
+// Use integer comparison for comparing prefixes (which are partition 
ids, in this case)
+final PrefixComparator prefixComparator = PrefixComparators.LONG;
+final RecordComparator recordComparator = new RecordComparator() {
+  @Override
+  public int compare(
+  Object leftBaseObject,
+  long leftBaseOffset,
+  Object rightBaseObject,
+  long rightBaseOffset) {
+return 0;
+  }
+};
+UnsafeInMemorySorter sorter = new UnsafeInMemorySorter(consumer, 
memoryManager,
+recordComparator, prefixComparator, 100, shouldUseRadixSort());
+
+testMemoryManager.markExecutionAsOutOfMemoryOnce();
+try {
+  sorter.reset();
+} catch( OutOfMemoryError oom ) {
+  //as expected
--- End diff --

nit: ws: `// as expected`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141843046
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java
 ---
@@ -503,6 +511,39 @@ public void testGetIterator() throws Exception {
 verifyIntIterator(sorter.getIterator(279), 279, 300);
   }
 
+  @Test
+  public void testOOMDuringSpill() throws Exception {
+final UnsafeExternalSorter sorter = newSorter();
+for (int i = 0; sorter.hasSpaceForAnotherRecord(); ++i) {
+  insertNumber(sorter, i);
+}
+// todo: this might actually not be zero if pageSize is somehow 
configured differently,
+// so we actually have to compute the expected spill size according to 
the configured page size
+assertEquals(0,  sorter.getSpillSize());
+// we expect the next insert to attempt growing the pointerssArray
+// first allocation is expected to fail, then a spill is triggered 
which attempts another allocation
+// which also fails and we expect to see this OOM here.
+// the original code messed with a released array within the spill code
+// and ended up with a failed assertion.
+// we also expect the location of the OOM to be 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset
+memoryManager.markConseqOOM(2);
+OutOfMemoryError expectedOOM = null;
+try {
+  insertNumber(sorter, 1024);
+}
+// we expect an OutOfMemoryError here, anything else (i.e the original 
NPE is a failure)
+catch( OutOfMemoryError oom ){
+  expectedOOM = oom;
+}
+
+assertNotNull("expected OutOfMmoryError but it seems operation 
surprisingly succeeded"
+,expectedOOM);
+String oomStackTrace = Utils.exceptionString(expectedOOM);
+assertThat("expected OutOfMemoryError in 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset"
+, oomStackTrace
+, 
Matchers.containsString("org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset"));
--- End diff --

nit: move commas to end of line (3x)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141842948
  
--- Diff: 
core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java
 ---
@@ -503,6 +511,39 @@ public void testGetIterator() throws Exception {
 verifyIntIterator(sorter.getIterator(279), 279, 300);
   }
 
+  @Test
+  public void testOOMDuringSpill() throws Exception {
+final UnsafeExternalSorter sorter = newSorter();
+for (int i = 0; sorter.hasSpaceForAnotherRecord(); ++i) {
+  insertNumber(sorter, i);
+}
+// todo: this might actually not be zero if pageSize is somehow 
configured differently,
+// so we actually have to compute the expected spill size according to 
the configured page size
+assertEquals(0,  sorter.getSpillSize());
+// we expect the next insert to attempt growing the pointerssArray
+// first allocation is expected to fail, then a spill is triggered 
which attempts another allocation
+// which also fails and we expect to see this OOM here.
+// the original code messed with a released array within the spill code
+// and ended up with a failed assertion.
+// we also expect the location of the OOM to be 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset
+memoryManager.markConseqOOM(2);
+OutOfMemoryError expectedOOM = null;
+try {
+  insertNumber(sorter, 1024);
+}
+// we expect an OutOfMemoryError here, anything else (i.e the original 
NPE is a failure)
+catch( OutOfMemoryError oom ){
--- End diff --

nit: ws: catch (OutOfMemoryError oom) {


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-29 Thread juliuszsompolski
Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r141843494
  
--- Diff: 
core/src/test/scala/org/apache/spark/memory/TestMemoryManager.scala ---
@@ -58,11 +58,15 @@ class TestMemoryManager(conf: SparkConf)
 
   override def maxOffHeapStorageMemory: Long = 0L
 
-  private var oomOnce = false
+  private var conseqOOM = 0
   private var available = Long.MaxValue
 
   def markExecutionAsOutOfMemoryOnce(): Unit = {
-oomOnce = true
+markConseqOOM(1)
+  }
+
+  def markConseqOOM( n : Int) : Unit = {
--- End diff --

nit: ws: `(n: Int): Unit`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19290: [SPARK-22063][R] Upgrades lintr to latest commit sha1 ID

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19290
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82312/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19290: [SPARK-22063][R] Upgrades lintr to latest commit sha1 ID

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19290
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19290: [SPARK-22063][R] Upgrades lintr to latest commit sha1 ID

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19290
  
**[Test build #82312 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82312/testReport)**
 for PR 19290 at commit 
[`550750e`](https://github.com/apache/spark/commit/550750eb264d5143007e4e9327dc250643af0f84).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19344: [SPARK-22122][SQL] Use analyzed logical plans to ...

2017-09-29 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19344#discussion_r141836921
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -66,24 +64,11 @@ object TPCDSQueryBenchmark extends Logging {
 classLoader = Thread.currentThread().getContextClassLoader)
 
   // This is an indirect hack to estimate the size of each query's 
input by traversing the
-  // logical plan and adding up the sizes of all tables that appear in 
the plan. Note that this
-  // currently doesn't take WITH subqueries into account which might 
lead to fairly inaccurate
-  // per-row processing time for those cases.
+  // logical plan and adding up the sizes of all tables that appear in 
the plan.
   val queryRelations = scala.collection.mutable.HashSet[String]()
-  spark.sql(queryString).queryExecution.logical.map {
-case UnresolvedRelation(t: TableIdentifier) =>
-  queryRelations.add(t.table)
-case lp: LogicalPlan =>
-  lp.expressions.foreach { _ foreach {
-case subquery: SubqueryExpression =>
-  subquery.plan.foreach {
-case UnresolvedRelation(t: TableIdentifier) =>
-  queryRelations.add(t.table)
-case _ =>
-  }
-case _ =>
-  }
-}
+  spark.sql(queryString).queryExecution.analyzed.map {
+case SubqueryAlias(name, _: LogicalRelation) =>
+  queryRelations.add(name)
--- End diff --

IIUC ditto; `HiveTableRelation` never happens here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19344: [SPARK-22122][SQL] Use analyzed logical plans to ...

2017-09-29 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19344#discussion_r141836639
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -66,24 +64,11 @@ object TPCDSQueryBenchmark extends Logging {
 classLoader = Thread.currentThread().getContextClassLoader)
 
   // This is an indirect hack to estimate the size of each query's 
input by traversing the
-  // logical plan and adding up the sizes of all tables that appear in 
the plan. Note that this
-  // currently doesn't take WITH subqueries into account which might 
lead to fairly inaccurate
-  // per-row processing time for those cases.
+  // logical plan and adding up the sizes of all tables that appear in 
the plan.
   val queryRelations = scala.collection.mutable.HashSet[String]()
-  spark.sql(queryString).queryExecution.logical.map {
-case UnresolvedRelation(t: TableIdentifier) =>
-  queryRelations.add(t.table)
-case lp: LogicalPlan =>
-  lp.expressions.foreach { _ foreach {
-case subquery: SubqueryExpression =>
-  subquery.plan.foreach {
-case UnresolvedRelation(t: TableIdentifier) =>
-  queryRelations.add(t.table)
-case _ =>
-  }
-case _ =>
-  }
-}
+  spark.sql(queryString).queryExecution.analyzed.map {
+case SubqueryAlias(name, _: LogicalRelation) =>
--- End diff --

I checked again and I found we can't use `catalogTable` here because these 
TPCDS tables are locally temporary ones (IIUC these tables are always 
transformed into `ScalaAlias`(`LocalRelation`)).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19344: [SPARK-22122][SQL] Use analyzed logical plans to count i...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19344
  
**[Test build #82314 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82314/testReport)**
 for PR 19344 at commit 
[`5691cf6`](https://github.com/apache/spark/commit/5691cf6f7bda6732b7d610701c2485397a6a94b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19389
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19389
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82310/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...

2017-09-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19389
  
**[Test build #82310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82310/testReport)**
 for PR 19389 at commit 
[`1e10336`](https://github.com/apache/spark/commit/1e10336128bf1e78a889ee4438e4519bb12bd84a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19344: [SPARK-22122][SQL] Use analyzed logical plans to count i...

2017-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19344
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82311/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   >