[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

2017-02-22 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16975#discussion_r102569401
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -470,12 +470,25 @@ class SparkContext(config: SparkConf) extends Logging 
{
   files.foreach(addFile)
 }
 
-_executorMemory = _conf.getOption("spark.executor.memory")
-  .orElse(Option(System.getenv("SPARK_EXECUTOR_MEMORY")))
-  .orElse(Option(System.getenv("SPARK_MEM"))
-  .map(warnSparkMem))
-  .map(Utils.memoryStringToMb)
-  .getOrElse(1024)
+_executorMemory = {
+  val defaultMemory = 1024
+  val configuredMemory = _conf.getOption("spark.executor.memory")
+.orElse(Option(System.getenv("SPARK_EXECUTOR_MEMORY")))
+.orElse(Option(System.getenv("SPARK_MEM")).map(warnSparkMem))
+.map(Utils.memoryStringToMb)
+  // In local-cluster mode, always use the slave memory specified in 
the master string
+  // In other modes, use the configured memory if it exists
+  master match {
+case SparkMasterRegex.LOCAL_CLUSTER_REGEX(_, _, em) =>
+  if (configuredMemory.isDefined) {
--- End diff --

sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

2017-02-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16975#discussion_r101887609
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils {
   // Other options
   OptionAssigner(args.executorCores, STANDALONE | YARN, 
ALL_DEPLOY_MODES,
 sysProp = "spark.executor.cores"),
-  OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, 
ALL_DEPLOY_MODES,
+  OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, 
ALL_DEPLOY_MODES,
--- End diff --

also we're talking about a net addition of 7 LOC in `SparkContext.scala`, 
about half of which are comments and warning logs. It's really not that much 
code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

2017-02-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16975#discussion_r101887589
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils {
   // Other options
   OptionAssigner(args.executorCores, STANDALONE | YARN, 
ALL_DEPLOY_MODES,
 sysProp = "spark.executor.cores"),
-  OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, 
ALL_DEPLOY_MODES,
+  OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, 
ALL_DEPLOY_MODES,
--- End diff --

The inconsistency is already inherent with the parameters in 
`local-cluster[]`, so I'm not introducing it here with this change. I 
personally think it's a really bad interface to force the user set executor 
memory in two different places and require that these two values match.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

2017-02-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16975#discussion_r101857385
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils {
   // Other options
   OptionAssigner(args.executorCores, STANDALONE | YARN, 
ALL_DEPLOY_MODES,
 sysProp = "spark.executor.cores"),
-  OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, 
ALL_DEPLOY_MODES,
+  OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, 
ALL_DEPLOY_MODES,
--- End diff --

```
You may, for whatever reason, want to run executors with less than that, 
which your change doesn't seem to allow.
```
Yeah, I thought about this long and hard but I just couldn't come up with a 
case where you would possibly want the worker size to be different from 
executor size in local-cluster mode. If you want two launch 5 workers (2GB), 
each with 2 executors (1GB), then you might as well just launch 10 executors 
(1GB) or run real standalone mode locally. I think it's better to fix the 
out-of-the-box case than to try to cover all potentially non-existent corner 
cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

2017-02-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16975#discussion_r101857230
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils {
   // Other options
   OptionAssigner(args.executorCores, STANDALONE | YARN, 
ALL_DEPLOY_MODES,
 sysProp = "spark.executor.cores"),
-  OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, 
ALL_DEPLOY_MODES,
+  OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, 
ALL_DEPLOY_MODES,
--- End diff --

If this was the only change then specifying `local-cluster[2,1,2048]` 
doesn't actually do anything because we're not setting `spark.executor.memory` 
anywhere. You could do `--master local-cluster[2,1,2048] --conf 
spark.executor.memory=2g` but that's cumbersome and now there are two ways to 
set the executor memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

2017-02-17 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/16975

[SPARK-19522] Fix executor memory in local-cluster mode

## What changes were proposed in this pull request?

```
bin/spark-shell --master local-cluster[2,1,2048]
```
is supposed to launch 2 executors, each with 2GB of memory. However, when I 
ran this in master, I only get executors with 1GB memory. This patch fixes this 
problem.

## How was this patch tested?

`SparkSubmitSuite`, manual tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark local-cluster-executor-mem

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16975.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16975


commit db3773c8aeb44e69610c17ff5bc169d7085333ee
Author: Andrew Or <andrewo...@gmail.com>
Date:   2017-02-08T23:27:07Z

Fix propagation of executor memory in local-cluster mode

commit 1a5bdfeca931a624b82933772c57c605e02dc58d
Author: Andrew Or <andrewo...@gmail.com>
Date:   2017-02-17T16:47:44Z

Log warning if memory is explicitly set

commit b1a13dc3b1a59593945f0fb6d13683ed81acf9b7
Author: Andrew Or <andrewo...@gmail.com>
Date:   2017-02-17T16:48:16Z

Merge branch 'master' of github.com:apache/spark into 
local-cluster-executor-mem




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13899: [SPARK-16196][SQL] Codegen in-memory scan with Co...

2017-02-16 Thread andrewor14
Github user andrewor14 closed the pull request at:

https://github.com/apache/spark/pull/13899


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13899: [SPARK-16196][SQL] Codegen in-memory scan with ColumnarB...

2017-02-16 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/13899
  
Closing for now; too many conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/16819
  
I agree. Resource managers generally expect applications to request more 
than what's available already so we don't have to do it again ourselves in 
Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16823: [SPARK] Config methods simplification at SparkSession#Bu...

2017-02-06 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/16823
  
This is a bad idea! First it breaks backward compatibility, and second, we 
intentionally didn't want to make it so general that the user can pass in any 
objects. Can you please close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2017-01-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15396#discussion_r98218538
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag](
* This is introduced as an alias for `isCheckpointed` to clarify the 
semantics of the
* return value. Exposed for testing.
*/
-  private[spark] def isCheckpointedAndMaterialized: Boolean = 
isCheckpointed
+  private[spark] def isCheckpointedAndMaterialized: Boolean =
+checkpointData.exists(_.isCheckpointed)
--- End diff --

looks good, but you forgot to make this final


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16081: [SPARK][EXAMPLE] Added missing semicolon in quick-start-...

2016-11-30 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/16081
  
and 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16073: [SPARK-18640] Add synchronization to TaskScheduler.runni...

2016-11-30 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/16073
  
and 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16073: [SPARK-18640] Add synchronization to TaskScheduler.runni...

2016-11-30 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/16073
  
LGTM merging into master 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16081: [SPARK][EXAMPLE] Added missing semicolon in quick-start-...

2016-11-30 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/16081
  
Ok, merging into master 2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15462: [SPARK-17680] [SQL] [TEST] Added test cases for InMemory...

2016-11-28 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15462
  
@kiszk is there a JIRA associated specifically with adding tests for 
`InMemoryRelation`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15462: [SPARK-17680] [SQL] [TEST] Added test cases for InMemory...

2016-11-28 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15462
  
LGTM, merging into master 2.1 thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15993: [SPARK-18050][SQL] do not create default database if it ...

2016-11-23 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15993
  
Sounds good. Merging into master 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15462: [SPARK-17680] [SQL] [TEST] Added test cases for I...

2016-11-22 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15462#discussion_r89205894
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala
 ---
@@ -20,18 +20,83 @@ package org.apache.spark.sql.execution.columnar
 import java.nio.charset.StandardCharsets
 import java.sql.{Date, Timestamp}
 
-import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.{DataFrame, QueryTest, Row}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSQLContext
 import org.apache.spark.sql.test.SQLTestData._
 import org.apache.spark.sql.types._
-import org.apache.spark.storage.StorageLevel.MEMORY_ONLY
+import org.apache.spark.storage.StorageLevel._
 
 class InMemoryColumnarQuerySuite extends QueryTest with SharedSQLContext {
   import testImplicits._
 
   setupTestData()
 
+  def cachePrimitiveTest(data: DataFrame, dataType: String) {
--- End diff --

minor: `private def`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15462: [SPARK-17680] [SQL] [TEST] Added test cases for InMemory...

2016-11-22 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15462
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15462: [SPARK-17680] [SQL] [TEST] Added test cases for I...

2016-11-22 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15462#discussion_r89205780
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala
 ---
@@ -58,6 +123,12 @@ class InMemoryColumnarQuerySuite extends QueryTest with 
SharedSQLContext {
 }.map(Row.fromTuple))
   }
 
+  test("access only some column of the all of columns") {
+val df = spark.range(1, 100).map(i => (i, (i + 1).toFloat)).toDF("i", 
"f").cache
+df.count
--- End diff --

please be explicit with the actions here `count()`, `cache()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15462: [SPARK-17680] [SQL] [TEST] Added test cases for I...

2016-11-22 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15462#discussion_r89205541
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala
 ---
@@ -20,18 +20,83 @@ package org.apache.spark.sql.execution.columnar
 import java.nio.charset.StandardCharsets
 import java.sql.{Date, Timestamp}
 
-import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.{DataFrame, QueryTest, Row}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSQLContext
 import org.apache.spark.sql.test.SQLTestData._
 import org.apache.spark.sql.types._
-import org.apache.spark.storage.StorageLevel.MEMORY_ONLY
+import org.apache.spark.storage.StorageLevel._
 
 class InMemoryColumnarQuerySuite extends QueryTest with SharedSQLContext {
   import testImplicits._
 
   setupTestData()
 
+  def cachePrimitiveTest(data: DataFrame, dataType: String) {
+data.createOrReplaceTempView(s"testData$dataType")
+val storageLevel = MEMORY_ONLY
+val plan = spark.sessionState.executePlan(data.logicalPlan).sparkPlan
+val inMemoryRelation = InMemoryRelation(useCompression = true, 5, 
storageLevel, plan, None)
+
+assert(inMemoryRelation.cachedColumnBuffers.getStorageLevel == 
storageLevel)
+inMemoryRelation.cachedColumnBuffers.collect().head match {
+  case _: CachedBatch => assert(true)
--- End diff --

no need to do assert true here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15462: [SPARK-17680] [SQL] [TEST] Added test cases for I...

2016-11-22 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15462#discussion_r89205861
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala
 ---
@@ -246,4 +317,59 @@ class InMemoryColumnarQuerySuite extends QueryTest 
with SharedSQLContext {
 assert(cached.batchStats.value === expectedAnswer.size * 
INT.defaultSize)
   }
 
+  test("access columns in CachedBatch without whole stage codegen") {
+// whole stage codegen is not applied to a row with more than 
WHOLESTAGE_MAX_NUM_FIELDS fields
+withSQLConf(SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key -> "2") {
+  val data = Seq(null, true, 1.toByte, 3.toShort, 7, 15.toLong,
+31.25.toFloat, 63.75, new Date(127), new Timestamp(25500L), 
null)
+  val dataTypes = Seq(NullType, BooleanType, ByteType, ShortType, 
IntegerType, LongType,
+FloatType, DoubleType, DateType, TimestampType, IntegerType)
+  val schemas = dataTypes.zipWithIndex.map { case (dataType, index) =>
+StructField(s"col$index", dataType, true)
+  }
+  val rdd = sparkContext.makeRDD(Seq(Row.fromSeq(data)))
+  val df = spark.createDataFrame(rdd, StructType(schemas))
+  val row = df.persist.take(1).apply(0)
+  checkAnswer(df, row)
+}
+
+withSQLConf(SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key -> "2") {
--- End diff --

similarly can you split these into multiple smaller unit tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15462: [SPARK-17680] [SQL] [TEST] Added test cases for I...

2016-11-22 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15462#discussion_r89205730
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala
 ---
@@ -20,18 +20,83 @@ package org.apache.spark.sql.execution.columnar
 import java.nio.charset.StandardCharsets
 import java.sql.{Date, Timestamp}
 
-import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.{DataFrame, QueryTest, Row}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSQLContext
 import org.apache.spark.sql.test.SQLTestData._
 import org.apache.spark.sql.types._
-import org.apache.spark.storage.StorageLevel.MEMORY_ONLY
+import org.apache.spark.storage.StorageLevel._
 
 class InMemoryColumnarQuerySuite extends QueryTest with SharedSQLContext {
   import testImplicits._
 
   setupTestData()
 
+  def cachePrimitiveTest(data: DataFrame, dataType: String) {
+data.createOrReplaceTempView(s"testData$dataType")
+val storageLevel = MEMORY_ONLY
+val plan = spark.sessionState.executePlan(data.logicalPlan).sparkPlan
+val inMemoryRelation = InMemoryRelation(useCompression = true, 5, 
storageLevel, plan, None)
+
+assert(inMemoryRelation.cachedColumnBuffers.getStorageLevel == 
storageLevel)
+inMemoryRelation.cachedColumnBuffers.collect().head match {
+  case _: CachedBatch => assert(true)
+  case other => fail(s"Unexpected cached batch type: 
${other.getClass.getName}")
+}
+checkAnswer(inMemoryRelation, data.collect().toSeq)
+  }
+
+  test("all data type w && w/o nullability") {
+// all primitives
+Seq(true, false).map { nullability =>
--- End diff --

can you split these into 2 separate tests? It's more debuggable that way. I 
would use a private helper method to abstract the logic


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15978: [SPARK-18507][SQL] HiveExternalCatalog.listPartitions sh...

2016-11-22 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15978
  
(Oops never mind, not my fault! :p)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15978: [SPARK-18507][SQL] HiveExternalCatalog.listPartitions sh...

2016-11-22 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15978
  
@cloud-fan can you make a patch for 2.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15978: [SPARK-18507][SQL] HiveExternalCatalog.listPartitions sh...

2016-11-22 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15978
  
Oops, that was my fault. Thanks merging into master 2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...

2016-11-22 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15811
  
I did


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15896: [SPARK-18465] Uncache table shouldn't throw an exception...

2016-11-21 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15896
  
I personally think `UNCACHE TABLE IF EXISTS` is best. It preserves the old 
behavior but lets the user make sure a table is not cached if they really want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15953: [SPARK-18517][SQL] DROP TABLE IF EXISTS should not warn ...

2016-11-21 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15953
  
Makes sense. This LGTM merging into master and 2.0. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...

2016-11-21 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15811
  
LGTM merging into master thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...

2016-11-15 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15811
  
Looks good, just one question.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoin...

2016-11-15 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15811#discussion_r88115031
  
--- Diff: python/pyspark/rdd.py ---
@@ -181,6 +181,7 @@ def __init__(self, jrdd, ctx, 
jrdd_deserializer=AutoBatchedSerializer(PickleSeri
 self._jrdd = jrdd
 self.is_cached = False
 self.is_checkpointed = False
+self.is_locally_checkpointed = False
--- End diff --

is this read anywhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...

2016-11-15 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15811
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15833: [SPARK-18353][CORE] spark.rpc.askTimeout defalut ...

2016-11-15 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15833#discussion_r88113719
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/Client.scala ---
@@ -221,7 +221,9 @@ object Client {
 val conf = new SparkConf()
 val driverArgs = new ClientArguments(args)
 
-conf.set("spark.rpc.askTimeout", "10")
+if (!conf.contains("spark.rpc.askTimeout")) {
--- End diff --

also we should probably add this to `config/package.scala` so we don't 
duplicate it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...

2016-11-04 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15766
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15756: [SPARK-18256] Improve the performance of event log repla...

2016-11-04 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15756
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15739: [SPARK-16808][Core] prepend base URI for links on main h...

2016-11-03 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15739
  
Also there's another patch trying to solve the same issue: #15742


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15756: [SPARK-18256] Improve the performance of event log repla...

2016-11-03 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15756
  
LGTM. That's a massive amount of time spent in `Class.getSimpleName`!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15756: [SPARK-18256] Improve the performance of event lo...

2016-11-03 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15756#discussion_r86422590
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -540,7 +544,8 @@ private[spark] object JsonProtocol {
 
   def taskStartFromJson(json: JValue): SparkListenerTaskStart = {
 val stageId = (json \ "Stage ID").extract[Int]
-val stageAttemptId = (json \ "Stage Attempt 
ID").extractOpt[Int].getOrElse(0)
+val stageAttemptId =
+  Utils.jsonOption(json \ "Stage Attempt 
ID").map(_.extract[Int]).getOrElse(0)
--- End diff --

if so maybe we should add a scalastyle to ban it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15756: [SPARK-18256] Improve the performance of event lo...

2016-11-03 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15756#discussion_r86422652
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -540,7 +544,8 @@ private[spark] object JsonProtocol {
 
   def taskStartFromJson(json: JValue): SparkListenerTaskStart = {
 val stageId = (json \ "Stage ID").extract[Int]
-val stageAttemptId = (json \ "Stage Attempt 
ID").extractOpt[Int].getOrElse(0)
+val stageAttemptId =
+  Utils.jsonOption(json \ "Stage Attempt 
ID").map(_.extract[Int]).getOrElse(0)
--- End diff --

oh wait you already did 😄 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15756: [SPARK-18256] Improve the performance of event lo...

2016-11-03 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15756#discussion_r86422521
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -540,7 +544,8 @@ private[spark] object JsonProtocol {
 
   def taskStartFromJson(json: JValue): SparkListenerTaskStart = {
 val stageId = (json \ "Stage ID").extract[Int]
-val stageAttemptId = (json \ "Stage Attempt 
ID").extractOpt[Int].getOrElse(0)
+val stageAttemptId =
+  Utils.jsonOption(json \ "Stage Attempt 
ID").map(_.extract[Int]).getOrElse(0)
--- End diff --

what's the difference? Is `extractOpt` really slow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15739: [SPARK-16808][Core] prepend base URI for links on main h...

2016-11-03 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15739
  
ok to test @vanzin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15698: [SPARK-18182] Expose ReplayListenerBus.read() overload w...

2016-11-01 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15698
  
LGTM retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15410: [SPARK-17843][Web UI] Indicate event logs pending for pr...

2016-10-14 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15410
  
We shouldn't display file names but we should display application names and 
IDs, something the user understands. We don't have to do that as part of this 
issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...

2016-10-14 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15458
  
I see. Then maybe we should add a comment above the config to note that 
several commands don't work (e.g. ALTER TABLE) if this is turned on, even if 
it's only internal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...

2016-10-13 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15458
  
Yes that's why it's `internal`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...

2016-10-13 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15456
  
Merging into master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...

2016-10-13 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15458
  
JK, actually it doesn't merge in 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...

2016-10-13 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15458
  
Cool beans. Merging into master 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15400: [SPARK-11272] [Web UI] Add support for downloading event...

2016-10-13 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15400
  
This one LGTM I'm merging it into master. Thanks for working on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15400: [SPARK-11272] [Web UI] Add support for downloading event...

2016-10-13 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15400
  
Usually we retest the PR if it's been a few days since it last ran tests. 
We have had build breaks before where we merged a PR that passed tests a long 
time ago.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15410: [SPARK-17843][Web UI] Indicate event logs pending for pr...

2016-10-12 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15410
  
ok to test

I think the idea is good, but it would be a better UX if we display the 
pending applications as rows in the existing table (or a new one) and indicate 
there that it's still being processed. It might be more code change but I think 
it's worth it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15410: [SPARK-17843][Web UI] Indicate event logs pending...

2016-10-12 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15410#discussion_r83044768
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala ---
@@ -38,6 +39,13 @@ private[history] class HistoryPage(parent: 
HistoryServer) extends WebUIPage("")
   {providerConfig.map { case (k, v) => 
{k}: {v} }}
 
 {
+if (eventLogsUnderProcessCount > 0) {
+  There are {eventLogsUnderProcessCount} event log(s) 
currently being
+processed which may result in additional applications 
getting listed on this page.
--- End diff --

Instead of just telling the user how many logs are pending, it will be a 
lot more useful to tell him which ones are pending so he can wait before 
clicking on a specific application


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of Vertex...

2016-10-12 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15396
  
Looks good. I left a suggestion that I think will make the code cleaner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2016-10-12 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15396#discussion_r83043442
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag](
* This is introduced as an alias for `isCheckpointed` to clarify the 
semantics of the
* return value. Exposed for testing.
*/
-  private[spark] def isCheckpointedAndMaterialized: Boolean = 
isCheckpointed
+  private[spark] def isCheckpointedAndMaterialized: Boolean =
+checkpointData.exists(_.isCheckpointed)
--- End diff --

One way to fix the duplicate code is to make this one final and have 
`isCheckpointed` call this one, then implementations of RDD can feel free to 
override `isCheckpointed` without affecting our internal implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2016-10-12 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15396#discussion_r83042522
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag](
* This is introduced as an alias for `isCheckpointed` to clarify the 
semantics of the
--- End diff --

oh wait I just realized this is exactly the same as `isCheckpointed`. It's 
kind of confusing why we have two methods that do the same thing. Is it because 
VertexRDD or something overrides one but not the other?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15400: [SPARK-11272] [Web UI] Add support for downloading event...

2016-10-12 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15400
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15353: [SPARK-17724][WebUI][Streaming] Unevaluated new lines in...

2016-10-10 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15353
  
@keypointt by "working" I mean it should be replaced by a line break, not a 
space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15405: [SPARK-15917][CORE] Added support for number of executor...

2016-10-10 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15405
  
Thanks for working on this. It's great to see how small the patch turned 
out to be!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15405: [SPARK-15917][CORE] Added support for number of e...

2016-10-10 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15405#discussion_r82671287
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -637,6 +637,16 @@ private[deploy] class Master(
   }
   freeWorkers = freeWorkers.filter(canLaunchExecutor)
 }
+
+val numExecutorsScheduled = assignedExecutors.sum
+val numExecutorsLaunched = app.executors.size
+// Check to see if we managed to launch the requested number of 
executors
+if(numUsable != 0 && numExecutorsLaunched != app.executorLimit &&
+  numExecutorsScheduled != app.executorLimit) {
--- End diff --

Another thing is, how noisy is this? Do we log this if dynamic allocation 
is turned on (we shouldn't)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15405: [SPARK-15917][CORE] Added support for number of executor...

2016-10-10 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15405
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15347: [SPARK-16827] Stop reporting spill metrics as shuffle me...

2016-10-07 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15347
  
OK, this change by itself LGTM. @dafrista would you mind creating a 
separate JIRA (or point me to an existing one) about the TODO then? Merging 
this into master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15353: [SPARK-17724][WebUI][Streaming] Unevaluated new lines in...

2016-10-05 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15353
  
But this isn't the original intention, which is to actually add a line 
break where `\n` is today. IIRC this works correctly on Chrome but not on 
Safari (or the other way round?). If you can make it work across all browsers 
then I will merge it. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15353: [SPARK-17724][WebUI][Streaming] Unevaluated new lines in...

2016-10-05 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15353
  
Also this is a more general problem, not just for streaming


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15347: [SPARK-16827] Stop reporting spill metrics as shu...

2016-10-05 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15347#discussion_r82042420
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 ---
@@ -145,7 +145,9 @@ private UnsafeExternalSorter(
 // Use getSizeAsKb (not bytes) to maintain backwards compatibility for 
units
 // this.fileBufferSizeBytes = (int) 
conf.getSizeAsKb("spark.shuffle.file.buffer", "32k") * 1024;
 this.fileBufferSizeBytes = 32 * 1024;
-this.writeMetrics = taskContext.taskMetrics().shuffleWriteMetrics();
+// The spill metrics are stored in a new ShuffleWriteMetrics, and then 
discarded (this fixes SPARK-16827).
+// TODO: Instead, separate spill metrics should be stored and reported 
(tracked in SPARK-3577).
+this.writeMetrics = new ShuffleWriteMetrics();
--- End diff --

so this isn't actually used anywhere right now? Is that what the TODO is 
about?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15350: [SPARK-17778][Tests]Mock SparkContext to reduce memory u...

2016-10-05 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15350
  
I think that's OK. This is supposed to be a unit test for the BlockManager, 
not how BlockManager interacts with the rest of the system. LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15290: [SPARK-17715] [Scheduler] Make task launch logs DEBUG

2016-09-29 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15290
  
Merging into master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15290: [SPARK-17715] [Scheduler] Make task launch logs DEBUG

2016-09-29 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15290
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15247: [SPARK-17672] Spark 2.0 history server web Ui tak...

2016-09-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15247#discussion_r81218774
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala
 ---
@@ -109,4 +109,11 @@ private[history] abstract class 
ApplicationHistoryProvider {
   @throws(classOf[SparkException])
   def writeEventLogs(appId: String, attemptId: Option[String], zipStream: 
ZipOutputStream): Unit
 
+  /**
+   * Returns an ApplicationHistoryInfo for the appId.
+   *
+   * @return ApplicationHistoryInfo of one appId if exists.
--- End diff --

This is kind of verbose. I would just say `@return the 
[[ApplicationHistoryInfo]] for the appId if it exists`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15247: [SPARK-17672] Spark 2.0 history server web Ui tak...

2016-09-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15247#discussion_r81219143
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala
 ---
@@ -109,4 +109,11 @@ private[history] abstract class 
ApplicationHistoryProvider {
   @throws(classOf[SparkException])
   def writeEventLogs(appId: String, attemptId: Option[String], zipStream: 
ZipOutputStream): Unit
 
+  /**
+   * Returns an ApplicationHistoryInfo for the appId.
+   *
+   * @return ApplicationHistoryInfo of one appId if exists.
--- End diff --

I'll fix this myself when I merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15247: [SPARK-17672] Spark 2.0 history server web Ui takes too ...

2016-09-29 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15247
  
LGTM merging into master 2.0, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15247: [SPARK-17672] Spark 2.0 history server web Ui tak...

2016-09-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15247#discussion_r81219049
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/ApiRootResource.scala ---
@@ -222,6 +222,7 @@ private[spark] object ApiRootResource {
 private[spark] trait UIRoot {
   def getSparkUI(appKey: String): Option[SparkUI]
   def getApplicationInfoList: Iterator[ApplicationInfo]
+  def getApplicationInfo(appId: String): Option[ApplicationInfo]
--- End diff --

also this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15247: [SPARK-17672] Spark 2.0 history server web Ui tak...

2016-09-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15247#discussion_r81218992
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
@@ -182,6 +182,10 @@ class HistoryServer(
 
getApplicationList().iterator.map(ApplicationsListResource.appHistoryInfoToPublicAppInfo)
   }
 
+  def getApplicationInfo(appId: String): Option[ApplicationInfo] = {
--- End diff --

I believe this is a public API, but it's a reasonable one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce global SQL conf

2016-09-29 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15295
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15221: [SPARK-17648][CORE] TaskScheduler really needs offers to...

2016-09-29 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15221
  
This looks reasonable. Merging into master. I will leave it out from 
branch-2.0 just in case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15295: [SPARK-17720][SQL] introduce global SQL conf

2016-09-29 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15295#discussion_r81150854
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -791,7 +791,7 @@ object SparkSession {
   // Get the session from current thread's active session.
   var session = activeThreadSession.get()
   if ((session ne null) && !session.sparkContext.isStopped) {
-options.foreach { case (k, v) => session.conf.set(k, v) }
+options.foreach { case (k, v) => 
session.sessionState.conf.setConfString(k, v) }
--- End diff --

this is change related?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce global SQL conf

2016-09-29 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15295
  
LGTM. Pretty straightforward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15244: Update spark-standalone.md to fix link

2016-09-26 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15244
  
Thanks merged into master 2.0 and 1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15243: Fixing comment since Actor is not used anymore.

2016-09-26 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15243#discussion_r80478992
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala ---
@@ -21,7 +21,7 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.rpc._
 
 /**
- * Actor which connects to a worker process and terminates the JVM if the 
connection is severed.
+ * WorkerWatcher RpcEndpoint which connects to a worker process and 
terminates the JVM if the connection is severed.
--- End diff --

yeah, just say `Endpoint`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15137: [SPARK-17512][Core] Avoid formatting to python path for ...

2016-09-22 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15137
  
Got it. I think back then when I wrote that comment we still haven't 
supported cluster mode with python yet. It's just a little confusing if we're 
reading that comment in isolation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15137: [SPARK-17512][Core] Avoid formatting to python path for ...

2016-09-21 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15137
  
I've merged it. One more thing, would you mind correcting the comment in 
`PythonRunner#formatPath`? Right now it says "we currently only support local 
python files", which is apparently not true!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15137: [SPARK-17512][Core] Avoid formatting to python path for ...

2016-09-21 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15137
  
Good catch. I'm merging this into master, 2.0 and 1.6. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15181: [SPARK-17623][CORE] Clarify type of TaskEndReason with a...

2016-09-21 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15181
  
Thanks for the clean up. I'm merging this into master. Because this patch 
touches multiple files in the critical scheduler code I'm hesitant on back 
porting this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15181: [SPARK-17623][CORE] Clarify type of TaskEndReason...

2016-09-21 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15181#discussion_r79939560
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala ---
@@ -118,14 +118,14 @@ private[spark] class TaskResultGetter(sparkEnv: 
SparkEnv, scheduler: TaskSchedul
 
   def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, 
taskState: TaskState,
 serializedData: ByteBuffer) {
-var reason : TaskEndReason = UnknownReason
+var reason : TaskFailedReason = UnknownReason
 try {
   getTaskResultExecutor.execute(new Runnable {
 override def run(): Unit = Utils.logUncaughtExceptions {
   val loader = Utils.getContextOrSparkClassLoader
   try {
 if (serializedData != null && serializedData.limit() > 0) {
-  reason = serializer.get().deserialize[TaskEndReason](
+  reason = serializer.get().deserialize[TaskFailedReason](
--- End diff --

great. It's always better to be more explicit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15181: [SPARK-17623][CORE] Clarify type of TaskEndReason...

2016-09-21 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15181#discussion_r79894168
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala ---
@@ -118,14 +118,14 @@ private[spark] class TaskResultGetter(sparkEnv: 
SparkEnv, scheduler: TaskSchedul
 
   def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, 
taskState: TaskState,
 serializedData: ByteBuffer) {
-var reason : TaskEndReason = UnknownReason
+var reason : TaskFailedReason = UnknownReason
 try {
   getTaskResultExecutor.execute(new Runnable {
 override def run(): Unit = Utils.logUncaughtExceptions {
   val loader = Utils.getContextOrSparkClassLoader
   try {
 if (serializedData != null && serializedData.limit() > 0) {
-  reason = serializer.get().deserialize[TaskEndReason](
+  reason = serializer.get().deserialize[TaskFailedReason](
--- End diff --

Is this totally safe? Can you point me to the code where we serialize this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15159: [SPARK-17605][SPARK_SUBMIT] Add option spark.useP...

2016-09-21 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15159#discussion_r79867613
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -70,6 +70,8 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
   var proxyUser: String = null
   var principal: String = null
   var keytab: String = null
+  var usePython: Boolean = false
--- End diff --

so now there's an `isPython` and ` usePython`? Can you just use the old one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15001: [SPARK-17438][WebUI] Show Application.executorLimit in t...

2016-09-19 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15001
  
LGTM, merging into master 2.0 thanks @zsxwing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15001: [SPARK-17438][WebUI] Show Application.executorLim...

2016-09-19 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15001#discussion_r79449912
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/ui/ApplicationPage.scala ---
@@ -70,6 +70,16 @@ private[ui] class ApplicationPage(parent: MasterWebUI) 
extends WebUIPage("app")
 }
 
 
+  
+Executor Limit: 
+{
+  if (app.executorLimit == Int.MaxValue) "Unlimited" else 
app.executorLimit
--- End diff --

I think for now it's OK. Conceptually the number of executors is 
constrained by the lower of the two: the executor limit and the cores limit 
(when translated into executors).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...

2016-09-17 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15133
  
Yeah `SparkSession` will be the new thing moving forward. `SparkContext` is 
kind of just a legacy thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...

2016-09-17 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15133
  
We should probably just make it a random UUID in all cases to be 
consistent. I don't know if people check whether `spark.app.name` is set, so 
that might be a backward compatibility concern (though one that we kind of 
already broke with `SparkSession`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14338: [SPARK-16701] Make parameters configurable in BlockManag...

2016-09-16 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/14338
  
I am opposed to this change because these are internal states that were 
never intended for the user to tweak. If there is a specific problem you're 
trying to fix then we should address that more generally instead. Please close 
this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14644: [MESOS] Enable GPU support with Mesos

2016-09-16 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/14644
  
Tim, please file a JIRA!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14765: [SPARK-15815] Keeping tell yarn the target executors in ...

2016-09-16 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/14765
  
From the JIRA description it seems that this issue arises not only in the 
context of DA. If that's the case then we should definitely not just 
arbitrarily remove code from `ExecutorAllocationManager`. Let's discuss on a 
more general solution on the issue, but for now we should close this PR since 
it's neither sufficient nor correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14862: [SPARK-17295][SQL] Create TestHiveSessionState use refle...

2016-09-16 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/14862
  
What exactly does this change buy us? It doesn't allow us to remove any of 
the inheritance code. `TestHiveSessionState` is not fundamentally different 
from `HiveSessionState` so I think it's actually less clean to introduce a new 
catalog implementation for this purpose.

@jiangxb1987 would you mind closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15099: [SPARK-17541][SQL] fix some DDL bugs about table managem...

2016-09-16 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/15099
  
LGTM!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13822: [SPARK-16115][SQL] Change output schema to be partition ...

2016-09-16 Thread andrewor14
Github user andrewor14 commented on the issue:

https://github.com/apache/spark/pull/13822
  
By the way I'm not super active in this community anymore. If you want a 
quicker response you could try your luck by pinging @yhuai or @cloud-fan 
instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13822: [SPARK-16115][SQL] Change output schema to be par...

2016-09-16 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13822#discussion_r79247914
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -660,6 +662,10 @@ case class ShowPartitionsCommand(
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
 
+if (!catalog.tableExists(table)) {
+  throw new AnalysisException(s" Table does not exist")
--- End diff --

Can you include the table name in this message? Also please remove the 
random space in the beginning. Something like:
```
s"Table $table does not exist"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13822: [SPARK-16115][SQL] Change output schema to be par...

2016-09-16 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13822#discussion_r79247651
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -595,6 +595,19 @@ class HiveDDLSuite
 }
   }
 
+  test("test show partitions") {
--- End diff --

can you call this something more specific, like `show partitions on 
non-existent table`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13822: [SPARK-16115][SQL] Change output schema to be par...

2016-09-16 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13822#discussion_r79247588
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -595,6 +595,19 @@ class HiveDDLSuite
 }
   }
 
+  test("test show partitions") {
+val message = intercept[AnalysisException] {
+  sql("SHOW PARTITIONS default.nonexistentTable")
+}.getMessage
+assert(message.contains("Table does not exist"))
+
+withTable("t1") {
+  sql("CREATE TABLE t1 (key STRING, value STRING) PARTITIONED BY (ds 
STRING)")
+  sql("ALTER TABLE t1 ADD PARTITION (ds = '1')")
+  assert(sql(" SHOW PARTITIONS t1").schema.getFieldIndex("partition") 
== Some(0))
--- End diff --

I'm not sure if this is worth testing, since this is just Hive behavior


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >