date:20161011

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66801 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66801/consoleFull)**
 for PR 9766 at commit 
[`d481821`](https://github.com/apache/spark/commit/d4818217dc6e29a72a4e470dbe08cda197933162).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15360
  
**[Test build #66799 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66799/consoleFull)**
 for PR 15360 at commit 
[`1e64163`](https://github.com/apache/spark/commit/1e641633cbd38a4a990a1cebafeff7be276a0fec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66800 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66800/consoleFull)**
 for PR 15148 at commit 
[`1b63173`](https://github.com/apache/spark/commit/1b6317396629b9f290a279dd735923c0fc8efd89).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-11 Thread markhamstra

Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/15335#discussion_r82944965
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1255,27 +1255,46 @@ class DAGScheduler(
   s"longer running")
   }
 
-  if (disallowStageRetryForTest) {
-abortStage(failedStage, "Fetch failure will not retry stage 
due to testing config",
-  None)
-  } else if 
(failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId)) {
-abortStage(failedStage, s"$failedStage (${failedStage.name}) " 
+
-  s"has failed the maximum allowable number of " +
-  s"times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. " +
-  s"Most recent failure reason: ${failureMessage}", None)
-  } else {
-if (failedStages.isEmpty) {
-  // Don't schedule an event to resubmit failed stages if 
failed isn't empty, because
-  // in that case the event will already have been scheduled.
-  // TODO: Cancel running tasks in the stage
-  logInfo(s"Resubmitting $mapStage (${mapStage.name}) and " +
-s"$failedStage (${failedStage.name}) due to fetch failure")
-  messageScheduler.schedule(new Runnable {
-override def run(): Unit = 
eventProcessLoop.post(ResubmitFailedStages)
-  }, DAGScheduler.RESUBMIT_TIMEOUT, TimeUnit.MILLISECONDS)
+  val shouldAbortStage =
+failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId) ||
+disallowStageRetryForTest
+
+  if (shouldAbortStage) {
+val abortMessage = if (disallowStageRetryForTest) {
+  "Fetch failure will not retry stage due to testing config"
+} else {
+  s"""$failedStage (${failedStage.name})
+ |has failed the maximum allowable number of
+ |times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}.
+ |Most recent failure reason: 
$failureMessage""".stripMargin.replaceAll("\n", " ")
 }
+abortStage(failedStage, abortMessage, None)
+  } else { // update failedStages and make sure a 
ResubmitFailedStages event is enqueued
+// TODO: Cancel running tasks in the failed stage -- cf. 
SPARK-17064
+val noResubmitEnqueued = !failedStages.contains(failedStage)
 failedStages += failedStage
 failedStages += mapStage
+if (noResubmitEnqueued) {
+  // We expect one executor failure to trigger many 
FetchFailures in rapid succession,
+  // but all of those task failures can typically be handled 
by a single resubmission of
+  // the failed stage.  We avoid flooding the scheduler's 
event queue with resubmit
+  // messages by checking whether a resubmit is already in the 
event queue for the
+  // failed stage.  If there is already a resubmit enqueued 
for a different failed
+  // stage, that event would also be sufficient to handle the 
current failed stage, but
+  // producing a resubmit for each failed stage makes 
debugging and logging a little
+  // simpler while not producing an overwhelming number of 
scheduler events.
+  logInfo(
+s"Resubmitting $mapStage (${mapStage.name}) and " +
+s"$failedStage (${failedStage.name}) due to fetch failure"
+  )
+  messageScheduler.schedule(
--- End diff --

Ah, sorry for ascribing the prior comment to your preferences.  That 
comment actually did make sense a long time ago when the resubmitting of stages 
really was done periodically by an Akka scheduled event that fired every 
something seconds.  I'm pretty sure the RESUBMIT_TIMEOUT stuff is also legacy 
code that doesn't make sense and isn't necessary any more.

So, do you want to do the follow-up PR to get rid of it, or shall I? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow us...

2016-10-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15434
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should a...

2016-10-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15434#discussion_r82944802
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -459,11 +459,20 @@ class SessionCatalog(
* If a database is specified in `oldName`, this will rename the table 
in that database.
* If no database is specified, this will first attempt to rename a 
temporary table with
* the same name, then, if that does not exist, rename the table in the 
current database.
+   *
+   * This assumes the database specified in `newName` matches the one in 
`oldName`.
*/
-  def renameTable(oldName: TableIdentifier, newName: String): Unit = 
synchronized {
+  def renameTable(oldName: TableIdentifier, newName: TableIdentifier): 
Unit = synchronized {
 val db = formatDatabaseName(oldName.database.getOrElse(currentDb))
+newName.database.map(formatDatabaseName).foreach { newDb =>
--- End diff --

uh, I see. If this is by design, I do not have more questions. LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-10-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13065#discussion_r82944265
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala ---
@@ -99,5 +102,182 @@ case class GenerateExec(
   }
 }
   }
-}
 
+  override def inputRDDs(): Seq[RDD[InternalRow]] = {
+child.asInstanceOf[CodegenSupport].inputRDDs()
+  }
+
+  protected override def doProduce(ctx: CodegenContext): String = {
+// We need to add some code here for terminating generators.
+child.asInstanceOf[CodegenSupport].produce(ctx, this)
+  }
+
+  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: 
ExprCode): String = {
+ctx.currentVars = input
+ctx.copyResult = true
+
+// Add input rows to the values when we are joining
+val values = if (join) {
+  input
+} else {
+  Seq.empty
+}
+
+// Generate the driving expression.
+val data = boundGenerator.genCode(ctx)
+
+boundGenerator match {
+  case e: CollectionGenerator => codeGenCollection(ctx, e, values, 
data, row)
+  case g => codeGenTraversableOnce(ctx, g, values, data, row)
+}
+  }
+
+  /**
+   * Generate code for [[CollectionGenerator]] expressions.
+   */
+  private def codeGenCollection(
+  ctx: CodegenContext,
+  e: CollectionGenerator,
+  input: Seq[ExprCode],
+  data: ExprCode,
+  row: ExprCode): String = {
+
+// Generate looping variables.
+val index = ctx.freshName("index")
+
+// Add a check if the generate outer flag is true.
+val checks = optionalCode(outer, data.isNull)
+
+// Add position
+val position = if (e.position) {
+  Seq(ExprCode("", "false", index))
+} else {
+  Seq.empty
+}
+
+// Generate code for either ArrayData or MapData
+val (initMapData, updateRowData, values) = e.collectionSchema match {
+  case ArrayType(st: StructType, nullable) if e.inline =>
+val row = codeGenAccessor(ctx, data.value, "col", index, st, 
nullable, checks)
+val fieldChecks = checks ++ optionalCode(nullable, row.isNull)
+val columns = st.fields.toSeq.zipWithIndex.map { case (f, i) =>
+  codeGenAccessor(ctx, row.value, f.name, i.toString, f.dataType, 
f.nullable, fieldChecks)
+}
+("", row.code, columns)
+
+  case ArrayType(dataType, nullable) =>
+("", "", Seq(codeGenAccessor(ctx, data.value, "col", index, 
dataType, nullable, checks)))
+
+  case MapType(keyType, valueType, valueContainsNull) =>
+// Materialize the key and the value arrays before we enter the 
loop.
+val keyArray = ctx.freshName("keyArray")
+val valueArray = ctx.freshName("valueArray")
+val initArrayData =
+  s"""
+ |ArrayData $keyArray = ${data.isNull} ? null : 
${data.value}.keyArray();
+ |ArrayData $valueArray = ${data.isNull} ? null : 
${data.value}.valueArray();
+   """.stripMargin
+val values = Seq(
+  codeGenAccessor(ctx, keyArray, "key", index, keyType, nullable = 
false, checks),
+  codeGenAccessor(ctx, valueArray, "value", index, valueType, 
valueContainsNull, checks))
+(initArrayData, "", values)
+}
+
+// In case of outer we need to make sure the loop is executed at-least 
once when the array/map
+// contains no input. We do this by setting the looping index to -1 if 
there is no input,
+// evaluation of the array is prevented by a check in the accessor 
code.
+val numElements = ctx.freshName("numElements")
+val init = if (outer) s"$numElements == 0 ? -1 : 0" else "0"
+val numOutput = metricTerm(ctx, "numOutputRows")
+s"""
+   |${data.code}
+   |$initMapData
+   |int $numElements = ${data.isNull} ? 0 : 
${data.value}.numElements();
+   |for (int $index = $init; $index < $numElements; $index++) {
+   |  $numOutput.add(1);
+   |  $updateRowData
+   |  ${consume(ctx, input ++ position ++ values)}
+   |}
+ """.stripMargin
+  }
+
+  /**
+   * Generate code for a regular [[TraversableOnce]] returning 
[[Generator]].
+   */
+  private def codeGenTraversableOnce(
+  ctx: CodegenContext,
+  e: Expression,
+  input: Seq[ExprCode],
+  data: ExprCode,
+  row: ExprCode): String = {
+
+// Generate looping variables.
+val iterator = ctx.freshName("iterator")
+val hasNext = ctx.freshName("hasNext")
+val current = ctx.freshName("row")
+
+// Add a check

[GitHub] spark issue #15230: [SPARK-17657] [SQL] Disallow Users to Change Table Type

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15230
  
**[Test build #66798 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66798/consoleFull)**
 for PR 15230 at commit 
[`e19536c`](https://github.com/apache/spark/commit/e19536c3c645b70f6cf1df747a7798188acf2935).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15433: [SPARK-17822] Use weak reference in JVMObjectTracker.obj...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15433
  
**[Test build #66797 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66797/consoleFull)**
 for PR 15433 at commit 
[`7d50d84`](https://github.com/apache/spark/commit/7d50d84f90fcda9e5dec79c9be834870c83443c4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15230: [SPARK-17657] [SQL] Disallow Users to Change Table Type

2016-10-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15230
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-11 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/15377#discussion_r82943152
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2474,25 +2478,42 @@ private[spark] class CallerContext(
val context = "SPARK_" + from + appIdStr + appAttemptIdStr +
  jobIdStr + stageIdStr + stageAttemptIdStr + taskIdStr + 
taskAttemptNumberStr
 
+  lazy val conf = new Configuration
+
   /**
* Set up the caller context [[context]] by invoking Hadoop 
CallerContext API of
* [[org.apache.hadoop.ipc.CallerContext]], which was added in hadoop 
2.8.
*/
   def setCurrentContext(): Boolean = {
-var succeed = false
-try {
-  // scalastyle:off classforname
-  val callerContext = 
Class.forName("org.apache.hadoop.ipc.CallerContext")
-  val Builder = 
Class.forName("org.apache.hadoop.ipc.CallerContext$Builder")
-  // scalastyle:on classforname
-  val builderInst = 
Builder.getConstructor(classOf[String]).newInstance(context)
-  val hdfsContext = Builder.getMethod("build").invoke(builderInst)
-  callerContext.getMethod("setCurrent", callerContext).invoke(null, 
hdfsContext)
-  succeed = true
-} catch {
-  case NonFatal(e) => logInfo("Fail to set Spark caller context", e)
+if (!CallerContext.callerContextSupported) {
+  false
+} else {
+  if (!conf.getBoolean("hadoop.caller.context.enabled", false)) {
+logInfo("Hadoop caller context is not enabled")
+CallerContext.callerContextSupported = false
+false
+  } else {
+try {
+// scalastyle:off classforname
--- End diff --

Nit: indent is not correct, use 2 ws.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66796/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66796 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66796/consoleFull)**
 for PR 9766 at commit 
[`e9832f6`](https://github.com/apache/spark/commit/e9832f6c3dbbf9649333af5ab9a0a0fd0954c237).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #8318: [SPARK-1267][PYSPARK] Adds pip installer for pyspark

2016-10-11 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/8318
  
I'd be happy to take it from where @jhlch is at - I've got some bandwidth 
available to work on additional PySpark stuff and it seems like the interest on 
the committer side is here now so I'd love to help make this happen :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15448: [SPARK-17108][SQL]: Fix BIGINT and INT comparison failur...

2016-10-11 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15448
  
Also add unit tests please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15448: [SPARK-17108][SQL]: Fix BIGINT and INT comparison failur...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15448
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15448: [SPARK-17108][SQL]: Fix BIGINT and INT comparison failur...

2016-10-11 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15448
  
Could you try to fix this by adding implicit casting to the `GetMapValue` 
(make it extend `ImplicitCastInputTypes` instead of `ExpectsInputTypes`)? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-11 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/15377#discussion_r82942829
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2474,25 +2478,42 @@ private[spark] class CallerContext(
val context = "SPARK_" + from + appIdStr + appAttemptIdStr +
  jobIdStr + stageIdStr + stageAttemptIdStr + taskIdStr + 
taskAttemptNumberStr
 
+  lazy val conf = new Configuration
--- End diff --

Please use `SparkHadoopUtils#conf`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #8318: [SPARK-1267][PYSPARK] Adds pip installer for pyspark

2016-10-11 Thread jhlch

Github user jhlch commented on the issue:

https://github.com/apache/spark/pull/8318
  
I've got [a branch that has a solid first pass at making pyspark pip 
installable.](https://github.com/apache/spark/compare/master...jhlch:pipinstall)
 A few questions are: 

* How does this integrate with the typical build? Once the jar is built it 
needs to be put in a location pointed to by setup.py and MANIFEST.in.
* What version requirements are there for numpy and pandas? I'm not 
confident that the one I list are correct or as specific as they could be.
* Setup automated testing:

* run-tests and run-tests.py should use environments where pyspark has 
been pip installed and remove the 'find jars' etc thing it currently does.
* testpypi exists and could be useful in CI to make sure packaging and 
distribution never break. CI python envs could be initialized using `pip 
install --extra-index-url https://testpypi.python.org/pypi pyspark`

I've got too much on my plate to see this to the finish line in the next 
few months, but I do want to see this happen. Is someone else willing to take 
it from here? If not, I'll come back to it in Dec/Jan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66796 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66796/consoleFull)**
 for PR 9766 at commit 
[`e9832f6`](https://github.com/apache/spark/commit/e9832f6c3dbbf9649333af5ab9a0a0fd0954c237).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15448: [SPARK-17108][SQL]: Fix BIGINT and INT comparison...

2016-10-11 Thread weiqingy

GitHub user weiqingy opened a pull request:

https://github.com/apache/spark/pull/15448

[SPARK-17108][SQL]: Fix BIGINT and INT comparison failure in spark sql

## What changes were proposed in this pull request?
Add a function to check if two integers are compatible when invoking 
`acceptsType()` in `DataType`.

## How was this patch tested?
Manually.  
E.g.
```
spark.sql("create table t3(a map>)")
spark.sql("select * from t3 where a[1] is not null")
```
Before:
```
cannot resolve 't.`a`[1]' due to data type mismatch: argument 2 requires 
bigint type, however, '1' is of int type.; line 1 pos 22
org.apache.spark.sql.AnalysisException: cannot resolve 't.`a`[1]' due to 
data type mismatch: argument 2 requires bigint type, however, '1' is of int 
type.; line 1 pos 22
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:82)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:307)
```
After:
 Passed the sql query. No error above.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/weiqingy/spark SPARK_17108

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15448.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15448


commit ec3d55296abc9f355a0f0db0f40e04abb4b58d94
Author: Weiqing Yang 
Date:   2016-10-12T06:14:48Z

[SPARK-17108][SQL]: Fix BIGINT and INT comparison failure in spark sql




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15377: [SPARK-17802] Improved caller context logging.

2016-10-11 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/15377#discussion_r82942438
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2432,6 +2432,10 @@ private[spark] object Utils extends Logging {
   }
 }
 
+private[util] object CallerContext {
+  var callerContextSupported: Boolean = true
--- End diff --

What is the usage of this flag? I don't see any other place use it, all 
just setters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66795 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66795/consoleFull)**
 for PR 15307 at commit 
[`4c08d56`](https://github.com/apache/spark/commit/4c08d569f7817e222550ef7578c6e01f90bc4ee0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread falaki

Github user falaki commented on the issue:

https://github.com/apache/spark/pull/15421
  
@felixcheung and @wangmiao1981 thanks! This is good point. I will try 
testing it on different version of R. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15421: [SPARK-17811] SparkR cannot parallelize data.fram...

2016-10-11 Thread falaki

Github user falaki commented on a diff in the pull request:

https://github.com/apache/spark/pull/15421#discussion_r82940884
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
@@ -125,15 +125,34 @@ private[spark] object SerDe {
   }
 
   def readDate(in: DataInputStream): Date = {
-Date.valueOf(readString(in))
+try {
+  val inStr = readString(in)
+  if (inStr == "NA") {
+null
+  } else {
+Date.valueOf(inStr)
+  }
+} catch {
+  // On windows we get NegativeArraySizeException for NAs in R
--- End diff --

No. I will revert this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15447: [SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint...

2016-10-11 Thread apivovarov

Github user apivovarov commented on the issue:

https://github.com/apache/spark/pull/15447
  
Related PRs
https://github.com/apache/spark/pull/15396
https://github.com/apache/spark/pull/12576


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15447: [SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15447
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15375
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15375
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66792/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15447: [SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD che...

2016-10-11 Thread apivovarov

GitHub user apivovarov opened a pull request:

https://github.com/apache/spark/pull/15447

[SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint results Clasâ¦

EdgeRDD/VertexRDD wraps partitionsRDD
e.g. `EdgeRDDImpl.checkpoint()` calls `partitionsRDD.checkpoint()`
EdgeRDD/VertexRDD `isCheckpointed()` method should be implemented the same 
way - it should call `partitionsRDD.isCheckpointed`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apivovarov/spark 14804

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15447.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15447


commit b123b68589d59d65db6210f1792a48d7f94e09bb
Author: Alexander Pivovarov 
Date:   2016-10-12T05:48:37Z

[SPARK-14804][Graphx] Graph vertexRDD/EdgeRDD checkpoint results 
ClassCastException




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15445
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66789/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15445
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15375
  
**[Test build #66792 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66792/consoleFull)**
 for PR 15375 at commit 
[`836e874`](https://github.com/apache/spark/commit/836e8745c346c59f78958e10aec1c6f9537242b9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15445
  
**[Test build #66789 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66789/consoleFull)**
 for PR 15445 at commit 
[`be6d153`](https://github.com/apache/spark/commit/be6d1537e9bbd2cc2484e4d8da9d901b16725c97).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66794 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66794/consoleFull)**
 for PR 9766 at commit 
[`45a9b7a`](https://github.com/apache/spark/commit/45a9b7af6afbb2ab1287cc41fafbaa1de823eafa).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66794/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66794 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66794/consoleFull)**
 for PR 9766 at commit 
[`45a9b7a`](https://github.com/apache/spark/commit/45a9b7af6afbb2ab1287cc41fafbaa1de823eafa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15230: [SPARK-17657] [SQL] Disallow Users to Change Tabl...

2016-10-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15230#discussion_r82940270
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -225,6 +225,11 @@ case class AlterTableSetPropertiesCommand(
 val catalog = sparkSession.sessionState.catalog
 val table = catalog.getTableMetadata(tableName)
 DDLUtils.verifyAlterTableType(catalog, table, isView)
+// Not allowed to switch the table type.
+if (properties.contains("EXTERNAL")) {
--- End diff --

This is officially documented in the Hive document, as shown in the 
[link](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL):
`TBLPROPERTIES ("EXTERNAL"="TRUE") in release 0.6.0+ (HIVE-1329) â Change 
a managed table to an external table and vice versa for "FALSE".`

This is the only property users are not allowed to change. The other 
Hive-specific properties are still allowed to change, because Hive also allows 
it. 

For the our Spark-reserved properties, users are not allowed to change. See 
the function call `verifyTableProperties` in 
`[alterTable](https://github.com/apache/spark/blob/b9a147181d5e38d9abed0c7215f4c5cb695f579c/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L393)`.
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15173: [SPARK-15698][SQL][Streaming][Follw-up]Fix FileStream so...

2016-10-11 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15173
  
@zsxwing Why was not this merge to 2.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15439: [SPARK-17880][DOC] The url linking to `Accumulato...

2016-10-11 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15439


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15427
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15427
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66790/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15427
  
**[Test build #66790 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66790/consoleFull)**
 for PR 15427 at commit 
[`81339dc`](https://github.com/apache/spark/commit/81339dc429104633ee28cf078f643b5050564557).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15439: [SPARK-17880][DOC] The url linking to `AccumulatorV2` in...

2016-10-11 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15439
  
Thanks - merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15440: Fix hadoop.version in building-spark.md

2016-10-11 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15440


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15440: Fix hadoop.version in building-spark.md

2016-10-11 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15440
  
Thanks - merging in master/branch-2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should a...

2016-10-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15434#discussion_r82938529
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -459,11 +459,20 @@ class SessionCatalog(
* If a database is specified in `oldName`, this will rename the table 
in that database.
* If no database is specified, this will first attempt to rename a 
temporary table with
* the same name, then, if that does not exist, rename the table in the 
current database.
+   *
+   * This assumes the database specified in `newName` matches the one in 
`oldName`.
*/
-  def renameTable(oldName: TableIdentifier, newName: String): Unit = 
synchronized {
+  def renameTable(oldName: TableIdentifier, newName: TableIdentifier): 
Unit = synchronized {
 val db = formatDatabaseName(oldName.database.getOrElse(currentDb))
+newName.database.map(formatDatabaseName).foreach { newDb =>
--- End diff --

see PR description, we should use the database of source table, so that 
users can just write `db.tbl1 RENAME TO tbl2`. This is different from Hive, as 
we don't support move table from one database to another.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/15423#discussion_r82938410
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 // Returns true if the plan is supposed to be sorted.
 def isSorted(plan: LogicalPlan): Boolean = plan match {
   case _: Join | _: Aggregate | _: Generate | _: Sample | _: Distinct 
=> false
+  case _: ShowColumnsCommand => true
--- End diff --

@cloud-fan @viirya Thanks :-) I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow us...

2016-10-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15434
  
Just FYI. Hive allows the following changes:
```SQL
ALTER TABLE db1.tbl RENAME TO db2.tbl2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15406: [Spark-17745][ml][PySpark] update NB python api - add we...

2016-10-11 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15406
  
We should add weights to the doctests to demonstrate them and make sure 
they're working.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15423#discussion_r82937473
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 // Returns true if the plan is supposed to be sorted.
 def isSorted(plan: LogicalPlan): Boolean = plan match {
   case _: Join | _: Aggregate | _: Generate | _: Sample | _: Distinct 
=> false
+  case _: ShowColumnsCommand => true
--- End diff --

+1 as mentioned in previous comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15307
  
@marmbrus Could you take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11610: [SPARK-13777] [ML] Remove constant features from trainin...

2016-10-11 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/11610
  
This problem should be handled by 
https://github.com/apache/spark/pull/15394 if it is merged. It seems this is no 
longer active, and we are pursuing alternative solutions. Shall we close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15423#discussion_r82937255
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 // Returns true if the plan is supposed to be sorted.
 def isSorted(plan: LogicalPlan): Boolean = plan match {
   case _: Join | _: Aggregate | _: Generate | _: Sample | _: Distinct 
=> false
+  case _: ShowColumnsCommand => true
--- End diff --

marking `ShowColumnsCommand` as sorted is more weird, I'd like to leave the 
result sorted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9008: [SPARK-9478] [ml] Add class weights to Random Forest

2016-10-11 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/9008
  
@rotationsymmetry Could you please close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15375
  
@falaki @felixcheung  The DirectKafkaStreamSuite is a known flaky test. 
Nothing in this patch should affect Kafka. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...

2016-10-11 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15414#discussion_r82931901
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.{DataFrame, Dataset}
+import org.apache.spark.sql.types._
+
+class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
+
+  import testImplicits._
+
+  class MockPredictor(override val uid: String)
--- End diff --

move into companion object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...

2016-10-11 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15414#discussion_r82932068
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -121,10 +122,18 @@ abstract class Predictor[
* and put it in an RDD with strong types.
*/
   protected def extractLabeledPoints(dataset: Dataset[_]): 
RDD[LabeledPoint] = {
-dataset.select(col($(labelCol)).cast(DoubleType), 
col($(featuresCol))).rdd.map {
+dataset.select(col($(labelCol)), col($(featuresCol))).rdd.map {
   case Row(label: Double, features: Vector) => LabeledPoint(label, 
features)
 }
   }
+
+  /**
+   * Return the given DataFrame, with [[labelCol]] casted to DoubleType.
+   */
+protected def castDataSet(dataset: Dataset[_]): DataFrame = {
--- End diff --

let's just put this logic directly in `fit`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...

2016-10-11 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15414#discussion_r82935295
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.{DataFrame, Dataset}
+import org.apache.spark.sql.types._
+
+class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
+
+  import testImplicits._
+
+  class MockPredictor(override val uid: String)
+extends Predictor[Vector, MockPredictor, MockPredictionModel] {
+
+override def train(dataset: Dataset[_]): MockPredictionModel = {
+  require(dataset.schema("label").dataType == DoubleType)
+  new MockPredictionModel(uid)
+}
+
+override def copy(extra: ParamMap): MockPredictor = defaultCopy(extra)
+  }
+
+  class MockPredictionModel(override val uid: String)
+extends PredictionModel[Vector, MockPredictionModel] {
+
+override def predict(features: Vector): Double = 1.0
--- End diff --

`override def predict(features: Vector): Double = throw new 
NotImplementedError()` We can do this for everything except `train`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...

2016-10-11 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15414#discussion_r82932894
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.{DataFrame, Dataset}
+import org.apache.spark.sql.types._
+
+class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
+
+  import testImplicits._
+
+  class MockPredictor(override val uid: String)
+extends Predictor[Vector, MockPredictor, MockPredictionModel] {
+
+override def train(dataset: Dataset[_]): MockPredictionModel = {
+  require(dataset.schema("label").dataType == DoubleType)
+  new MockPredictionModel(uid)
+}
+
+override def copy(extra: ParamMap): MockPredictor = defaultCopy(extra)
+  }
+
+  class MockPredictionModel(override val uid: String)
+extends PredictionModel[Vector, MockPredictionModel] {
+
+override def predict(features: Vector): Double = 1.0
+
+override def copy(extra: ParamMap): MockPredictionModel = 
defaultCopy(extra)
+  }
+
+  test("should support all NumericType labels and not support other 
types") {
+val predictor = new MockPredictor("mock")
+MLTestingUtils.checkNumericTypes[MockPredictionModel, MockPredictor](
--- End diff --

Why don't we just cycle through the types here and call `fit`. I think it's 
a bit confusing the way it is now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...

2016-10-11 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15414#discussion_r82932799
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -117,7 +117,7 @@ object MLTestingUtils extends SparkFunSuite {
   Seq(ShortType, LongType, IntegerType, FloatType, ByteType, 
DoubleType, DecimalType(10, 0))
 types.map { t =>
 val castDF = df.select(col(labelColName).cast(t), 
col(featuresColName))
-t -> TreeTests.setMetadata(castDF, 2, labelColName, 
featuresColName)
+t -> TreeTests.setMetadata(castDF, 0, labelColName, 
featuresColName)
--- End diff --

What is this for? If the intent is to force `getNumClasses` to infer the 
number of classes, then you're no longer testing the not inferred case. 
Further, the point of this PR is to eliminate the need to do that since it is 
not a robust solution, IMO. 

Also, I'd like to remove the dependence on `TreeTests` here (and 
`genRegressionDF`) and just explicitly set the attributes in the functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66786/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15172
  
**[Test build #66786 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66786/consoleFull)**
 for PR 15172 at commit 
[`46b52e6`](https://github.com/apache/spark/commit/46b52e63918376dcf5dde0359fdfe1efa2456dfd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66784/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66785/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66784 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66784/consoleFull)**
 for PR 15307 at commit 
[`35bf508`](https://github.com/apache/spark/commit/35bf5089f0d79ba0ba007ca9983a75616f1a553d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15172
  
**[Test build #66785 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66785/consoleFull)**
 for PR 15172 at commit 
[`0bf663f`](https://github.com/apache/spark/commit/0bf663f0d8a71b2944d4030dc0ef95e36ee35471).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15446: [SPARK-17882][SparkR] Fix swallowed exception in RBacken...

2016-10-11 Thread falaki

Github user falaki commented on the issue:

https://github.com/apache/spark/pull/15446
  
@shivaram yes I just noticed it during my debugging and fixed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-11 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/15335#discussion_r82933318
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1255,27 +1255,46 @@ class DAGScheduler(
   s"longer running")
   }
 
-  if (disallowStageRetryForTest) {
-abortStage(failedStage, "Fetch failure will not retry stage 
due to testing config",
-  None)
-  } else if 
(failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId)) {
-abortStage(failedStage, s"$failedStage (${failedStage.name}) " 
+
-  s"has failed the maximum allowable number of " +
-  s"times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. " +
-  s"Most recent failure reason: ${failureMessage}", None)
-  } else {
-if (failedStages.isEmpty) {
-  // Don't schedule an event to resubmit failed stages if 
failed isn't empty, because
-  // in that case the event will already have been scheduled.
-  // TODO: Cancel running tasks in the stage
-  logInfo(s"Resubmitting $mapStage (${mapStage.name}) and " +
-s"$failedStage (${failedStage.name}) due to fetch failure")
-  messageScheduler.schedule(new Runnable {
-override def run(): Unit = 
eventProcessLoop.post(ResubmitFailedStages)
-  }, DAGScheduler.RESUBMIT_TIMEOUT, TimeUnit.MILLISECONDS)
+  val shouldAbortStage =
+failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId) ||
+disallowStageRetryForTest
+
+  if (shouldAbortStage) {
+val abortMessage = if (disallowStageRetryForTest) {
+  "Fetch failure will not retry stage due to testing config"
+} else {
+  s"""$failedStage (${failedStage.name})
+ |has failed the maximum allowable number of
+ |times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}.
+ |Most recent failure reason: 
$failureMessage""".stripMargin.replaceAll("\n", " ")
 }
+abortStage(failedStage, abortMessage, None)
+  } else { // update failedStages and make sure a 
ResubmitFailedStages event is enqueued
+// TODO: Cancel running tasks in the failed stage -- cf. 
SPARK-17064
+val noResubmitEnqueued = !failedStages.contains(failedStage)
--- End diff --

I think I was worried about the opposite problem -- perhaps we add 
`mapStage` to `failedStages`, but fail to fire a `Resubmit` event.  Maybe too 
many negatives to think through this clearly -- my intention was *more* logging 
& resubmission, not less.  I suppose I was thinking of it as:

```scala
val addedToFailedStages = failedStages.add(failedStage) | 
failedStages.add(mapStage)
if (addedToFailedStage) {
  logStuff()
  resubmit()
}
```

the point being, to avoid another case of the bug which started this all -- 
you add to `failedStages`, but fail to ever `Resubmit`.

I was thinking of something more like this (though as you'll see, this case 
is fine).  Say you have two jobs submitted concurrently, which share the first 
few stages.  A -> B -> C and A -> B -> D.   There is an executor failure while 
they are both running their independent parts, C & D, concurrently.  The 
failure is detected in C first, so it marks B & C as failed.  Later on, the 
failure is detected in D, it marks B & D as failed.  If the first resubmit was 
already processed, its fine, B is already running, and we mark D as waiting on 
D.  Similarly, its fine if the resubmit wasn't processed yet when the failure 
is detected in D-- then when the resubmit is processed, we resubmit all 3 
stages.

I think it also works out even if stage A needs to get resubmitted as well 
-- its handled in the same call that does the resubmit for B, when it checks 
for missing parents.  (In fact, thinking through these cases makes me think we 
don't even need to resubmit the `mapStage` at all -- the `failedStage` will 
submit itself on its resubmit, since it will notice its parents aren't ready.  
Which is why there isn't a case where this check would really mater.)

Anyway, the point is not that I could show you of a case were we *do* need 
to make sure there is a resubmit.  The point is that I'm *not* sure that we do 
*not* need it, which is why I thought it was better to err on the side of 
over-logging / resubmitting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15422#discussion_r82932947
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -588,6 +588,12 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val IGNORE_CORRUPT_FILES = 
SQLConfigBuilder("spark.sql.files.ignoreCorruptFiles")
+.doc("Whether to ignore corrupt files. If true, the Spark jobs will 
continue to run when " +
+  "encountering corrupt files and contents that have been read will 
still be returned.")
+.booleanConf
+.createWithDefault(false)
+
--- End diff --

Curious why we are duplicating the parameter in sql namespace. Wont 
spark.files.ignoreCorruptFiles not do ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15422#discussion_r82933077
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -170,4 +170,9 @@ package object config {
 .doc("Port to use for the block managed on the driver.")
 .fallbackConf(BLOCK_MANAGER_PORT)
 
+  private[spark] val IGNORE_CORRUPT_FILES = 
ConfigBuilder("spark.files.ignoreCorruptFiles")
+.doc("Whether to ignore corrupt files. If true, the Spark jobs will 
continue to run when " +
+  "encountering corrupt files and contents that have been read will 
still be returned.")
+.booleanConf
+.createWithDefault(false)
--- End diff --

So either way we will have a behavioral change - if NewHadoopRDD vs 
HadoopRDD.
IMO that is fine, given that we are standardizing on the behavior and this 
is something which was a corner case anyway.

Setting default to false makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15422#discussion_r82932992
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala ---
@@ -179,7 +183,16 @@ class NewHadoopRDD[K, V](
 
   override def hasNext: Boolean = {
 if (!finished && !havePair) {
-  finished = !reader.nextKeyValue
+  try {
+finished = !reader.nextKeyValue
+  } catch {
+case e: IOException =>
+  if (ignoreCorruptFiles) {
+finished = true
+  } else {
+throw e
+  }
+  }
--- End diff --

Thanks for changing this too !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/15422#discussion_r82932645
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -253,8 +256,12 @@ class HadoopRDD[K, V](
 try {
   finished = !reader.next(key, value)
 } catch {
-  case eof: EOFException =>
-finished = true
+  case e: IOException =>
+if (ignoreCorruptFiles) {
+  finished = true
+} else {
+  throw e
+}
--- End diff --

nit: case e: IOException if ignoreCorruptFiles =>
would have been more concise.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15444
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66787/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15444
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15444: [SPARK-17870][MLLIB][ML]Change statistic to pValue for S...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15444
  
**[Test build #66787 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66787/consoleFull)**
 for PR 15444 at commit 
[`b98ccdf`](https://github.com/apache/spark/commit/b98ccdfd696cb89cb4793a140c87c498ce5c3086).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66793 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66793/consoleFull)**
 for PR 9766 at commit 
[`dc6d5f9`](https://github.com/apache/spark/commit/dc6d5f927d93566ee1c3b935db864f2e517bc7e0).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66793/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66793 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66793/consoleFull)**
 for PR 9766 at commit 
[`dc6d5f9`](https://github.com/apache/spark/commit/dc6d5f927d93566ee1c3b935db864f2e517bc7e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15443
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66782/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15443
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15443
  
**[Test build #66782 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66782/consoleFull)**
 for PR 15443 at commit 
[`a843920`](https://github.com/apache/spark/commit/a843920983914de7efd21608b8f0e39c70b210d7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class StringHistogram(`
  * `  case class StringHistogramInfo(`
  * `  class StringHistogramInfoSerializer `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15375
  
**[Test build #66792 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66792/consoleFull)**
 for PR 15375 at commit 
[`836e874`](https://github.com/apache/spark/commit/836e8745c346c59f78958e10aec1c6f9537242b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r82931395
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -25,26 +25,25 @@ object StringUtils {
 
   // replace the _ with .{1} exactly match 1 time of any character
   // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
-  c match {
-case '_' => "."
-case '%' => ".*"
-case _ => Pattern.quote(Character.toString(c))
-  }
-  }.mkString
-} else {
-  v
+  def escapeLikeRegex(str: String): String = {
+val builder = new StringBuilder()
+var escaping = false
+for (next <- str) {
+  if (escaping) {
+builder ++= Pattern.quote(Character.toString(next))
--- End diff --

`\Q\\E\Qa\E` is correct. But doesn't it become `\Qa\E` in this change?

For `\\a`, the prefixing `\\` will go the next branch and enable 
`escaping`. Then the next char `a` will be quoted here. So it becomes `\Qa\E`. 
BTW, before this change, it will be `\Q\a\E`. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15446: [SPARK-17882][SparkR] Fix swallowed exception in RBacken...

2016-10-11 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15446
  
cc @falaki  Is this also a part of #15375 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15446: [SPARK-17882][SparkR] Fix swallowed exception in RBacken...

2016-10-11 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15446
  
Thanks @jrshust for the PR.

Jenkins, ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-11 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/15335#discussion_r82931294
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1255,27 +1255,46 @@ class DAGScheduler(
   s"longer running")
   }
 
-  if (disallowStageRetryForTest) {
-abortStage(failedStage, "Fetch failure will not retry stage 
due to testing config",
-  None)
-  } else if 
(failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId)) {
-abortStage(failedStage, s"$failedStage (${failedStage.name}) " 
+
-  s"has failed the maximum allowable number of " +
-  s"times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}. " +
-  s"Most recent failure reason: ${failureMessage}", None)
-  } else {
-if (failedStages.isEmpty) {
-  // Don't schedule an event to resubmit failed stages if 
failed isn't empty, because
-  // in that case the event will already have been scheduled.
-  // TODO: Cancel running tasks in the stage
-  logInfo(s"Resubmitting $mapStage (${mapStage.name}) and " +
-s"$failedStage (${failedStage.name}) due to fetch failure")
-  messageScheduler.schedule(new Runnable {
-override def run(): Unit = 
eventProcessLoop.post(ResubmitFailedStages)
-  }, DAGScheduler.RESUBMIT_TIMEOUT, TimeUnit.MILLISECONDS)
+  val shouldAbortStage =
+failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId) ||
+disallowStageRetryForTest
+
+  if (shouldAbortStage) {
+val abortMessage = if (disallowStageRetryForTest) {
+  "Fetch failure will not retry stage due to testing config"
+} else {
+  s"""$failedStage (${failedStage.name})
+ |has failed the maximum allowable number of
+ |times: ${Stage.MAX_CONSECUTIVE_FETCH_FAILURES}.
+ |Most recent failure reason: 
$failureMessage""".stripMargin.replaceAll("\n", " ")
 }
+abortStage(failedStage, abortMessage, None)
+  } else { // update failedStages and make sure a 
ResubmitFailedStages event is enqueued
+// TODO: Cancel running tasks in the failed stage -- cf. 
SPARK-17064
+val noResubmitEnqueued = !failedStages.contains(failedStage)
 failedStages += failedStage
 failedStages += mapStage
+if (noResubmitEnqueued) {
+  // We expect one executor failure to trigger many 
FetchFailures in rapid succession,
+  // but all of those task failures can typically be handled 
by a single resubmission of
+  // the failed stage.  We avoid flooding the scheduler's 
event queue with resubmit
+  // messages by checking whether a resubmit is already in the 
event queue for the
+  // failed stage.  If there is already a resubmit enqueued 
for a different failed
+  // stage, that event would also be sufficient to handle the 
current failed stage, but
+  // producing a resubmit for each failed stage makes 
debugging and logging a little
+  // simpler while not producing an overwhelming number of 
scheduler events.
+  logInfo(
+s"Resubmitting $mapStage (${mapStage.name}) and " +
+s"$failedStage (${failedStage.name}) due to fetch failure"
+  )
+  messageScheduler.schedule(
--- End diff --

yeah probably a separate PR, sorry this was just an opportunity for me to 
rant :)

And sorry if I worded it poorly, but I was not suggesting the one w/ 
"Periodically" as a better comment -- in fact I think its a *bad* comment, just 
wanted to mention it was another description which used to be there long ago.

This was my suggestion:

```
If we get one fetch-failure, we often get more fetch failures across 
multiple executors. We will get better parallelism when we resubmit the 
mapStage if we can resubmit when we know about as many of those failures as 
possible. So this is a heuristic to add a small delay to see if we gather a few 
more failures before we resubmit.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr..

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66791/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15375
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66791 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66791/consoleFull)**
 for PR 9766 at commit 
[`9de8c0e`](https://github.com/apache/spark/commit/9de8c0e7c0a2108b519c8adce7af5162578b04c9).
 * This patch **fails RAT tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15427
  
**[Test build #66790 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66790/consoleFull)**
 for PR 15427 at commit 
[`81339dc`](https://github.com/apache/spark/commit/81339dc429104633ee28cf078f643b5050564557).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66791 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66791/consoleFull)**
 for PR 9766 at commit 
[`9de8c0e`](https://github.com/apache/spark/commit/9de8c0e7c0a2108b519c8adce7af5162578b04c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15295
  
Merging to master! Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 649 matches

Mail list logo