date:20181012

[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22654
  
**[Test build #97306 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97306/testReport)**
 for PR 22654 at commit 
[`20856b4`](https://github.com/apache/spark/commit/20856b4132cbc6aa34484144112f3463e47c4906).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

2018-10-12 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22654#discussion_r224799147
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -1826,4 +1826,13 @@ class CSVSuite extends QueryTest with 
SharedSQLContext with SQLTestUtils with Te
 val df = spark.read.option("enforceSchema", false).csv(input)
 checkAnswer(df, Row("1", "2"))
   }
+
+  test("using the backward slash as the delimiter") {
+val input = Seq("""abc\1""").toDS()
--- End diff --

I prohibited single backslash and throw an exception with a tip of using 
double backslash.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22709
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22709
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3925/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integra...

2018-10-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22703
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support par...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support par...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97304/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22379
  
**[Test build #97313 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97313/testReport)**
 for PR 22379 at commit 
[`c3a31d4`](https://github.com/apache/spark/commit/c3a31d4ea6f3da305f6ab08eca2484043564bd2f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22699: [SPARK-25711][Core] Improve start-history-server.sh: sho...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22699
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support par...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20999
  
**[Test build #97304 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97304/testReport)**
 for PR 20999 at commit 
[`441acf3`](https://github.com/apache/spark/commit/441acf342a5fb11dd351f66a92a73e6dcfcfde76).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integra...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22703
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integra...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22703
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3922/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integra...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22703
  
**[Test build #97307 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97307/testReport)**
 for PR 22703 at commit 
[`6e34ce7`](https://github.com/apache/spark/commit/6e34ce7ab7961531d97655e0733ed92f701fbbfd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22699: [SPARK-25711][Core] Improve start-history-server.sh: sho...

2018-10-12 Thread gengliangwang

Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22699
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support par...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20999
  
**[Test build #97309 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97309/testReport)**
 for PR 20999 at commit 
[`9b84057`](https://github.com/apache/spark/commit/9b8405748b3756024e346a2f00d4561f7617b16e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support par...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support par...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3924/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22709: [SPARK-25718][SQL]Detect recursive reference in A...

2018-10-12 Thread gengliangwang

GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/22709

[SPARK-25718][SQL]Detect recursive reference in Avro schema and throw 
exception

## What changes were proposed in this pull request?

Avro schema allows recursive reference, e.g. the schema for linked-list in 
https://avro.apache.org/docs/1.8.2/spec.html#schema_record
```
{
  "type": "record",
  "name": "LongList",
  "aliases": ["LinkedLongs"],  // old name for this
  "fields" : [
{"name": "value", "type": "long"}, // each element has a 
long
{"name": "next", "type": ["null", "LongList"]} // optional next element
  ]
}
```

In current Spark SQL, it is impossible to convert the schema as 
`StructType` . Run `SchemaConverters.toSqlType(avroSchema)` and we will get 
stack overflow exception.

We should detect the recursive reference and throw exception for it.
## How was this patch tested?

New unit test case.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark avroRecursiveRef

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22709.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22709


commit c97f54347cf08edfa1f31ab7026700170a67c848
Author: Gengliang Wang 
Date:   2018-10-12T14:59:58Z

detect recusive reference loop in avro schema




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22709
  
**[Test build #97310 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97310/testReport)**
 for PR 22709 at commit 
[`c97f543`](https://github.com/apache/spark/commit/c97f54347cf08edfa1f31ab7026700170a67c848).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & formatting for JavaDoc

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22593
  
**[Test build #97311 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97311/testReport)**
 for PR 22593 at commit 
[`d7487c5`](https://github.com/apache/spark/commit/d7487c56cc26cb23d0486479195b174cacabb5af).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22709
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97310/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22709
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22678: [SPARK-25685][BUILD] Allow running tests in Jenkins in e...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22678
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22678: [SPARK-25685][BUILD] Allow running tests in Jenkins in e...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22678
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97303/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-12 Thread tgravescs

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224850397
  
--- Diff: core/src/main/scala/org/apache/spark/status/LiveEntity.scala ---
@@ -341,7 +341,9 @@ private class LiveExecutorStageSummary(
   metrics.shuffleWriteMetrics.recordsWritten,
   metrics.memoryBytesSpilled,
   metrics.diskBytesSpilled,
-  isBlacklisted)
+  isBlacklisted,
--- End diff --

@vanzin ideas on how to better handle this?

I don't see a real clean way to populate these fields from the 
AppstatusListener before being written.For context, in this PR these are 
currently being populated in the AppStatusStore.executorSummary call before 
going back to user.

We could potentially split into separate api or on UI side query both that 
and the executor info and join but seems like a lot more data.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97305/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #97305 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97305/testReport)**
 for PR 22482 at commit 
[`fd6377b`](https://github.com/apache/spark/commit/fd6377b69eaa8e1891448219744810562ebf4586).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-12 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224856923
  
--- Diff: core/src/main/scala/org/apache/spark/status/LiveEntity.scala ---
@@ -341,7 +341,9 @@ private class LiveExecutorStageSummary(
   metrics.shuffleWriteMetrics.recordsWritten,
   metrics.memoryBytesSpilled,
   metrics.diskBytesSpilled,
-  isBlacklisted)
+  isBlacklisted,
--- End diff --

I don't think the executor stage summary should contain this info.

Is it really that difficult to do the join? The current stage page already 
does that:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala#L710

Is the concern that doing something like that from JS would be expensive or 
too difficult?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22666
  
**[Test build #97314 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97314/testReport)**
 for PR 22666 at commit 
[`c038aaa`](https://github.com/apache/spark/commit/c038aaa2291b79c723af956bcf5e220ae8b776a3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22698: [SPARK-25710][SQL] range should report metrics correctly

2018-10-12 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22698
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22699: [SPARK-25711][Core] Improve start-history-server.sh: sho...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22699
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3923/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22699: [SPARK-25711][Core] Improve start-history-server.sh: sho...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22699
  
**[Test build #97308 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97308/testReport)**
 for PR 22699 at commit 
[`5e05c60`](https://github.com/apache/spark/commit/5e05c604fdc9913a1424a569deb16ec3301bd4e4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22678: [SPARK-25685][BUILD] Allow running tests in Jenkins in e...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22678
  
**[Test build #97303 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97303/testReport)**
 for PR 22678 at commit 
[`d8e7ad0`](https://github.com/apache/spark/commit/d8e7ad09f70a0d907720417c8dbdd675df59a6b9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-12 Thread abellina

Github user abellina commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r224833138
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala ---
@@ -146,21 +146,31 @@ private[scheduler] class BlacklistTracker (
 nextExpiryTime = math.min(execMinExpiry, nodeMinExpiry)
   }
 
+  private def killExecutor(exec: String, msg: String): Unit = {
+allocationClient match {
+  case Some(a) =>
+logInfo(msg)
+a.killExecutors(Seq(exec), adjustTargetNumExecutors = false, 
countFailures = false,
+  force = true)
+  case None =>
+logInfo(s"Not attempting to kill blacklisted executor id $exec " +
+  s"since allocation client is not defined.")
+}
+  }
+
   private def killBlacklistedExecutor(exec: String): Unit = {
 if (conf.get(config.BLACKLIST_KILL_ENABLED)) {
-  allocationClient match {
-case Some(a) =>
-  logInfo(s"Killing blacklisted executor id $exec " +
-s"since ${config.BLACKLIST_KILL_ENABLED.key} is set.")
-  a.killExecutors(Seq(exec), adjustTargetNumExecutors = false, 
countFailures = false,
-force = true)
-case None =>
-  logWarning(s"Not attempting to kill blacklisted executor id 
$exec " +
-s"since allocation client is not defined.")
-  }
+  killExecutor(exec,
+s"Killing blacklisted executor id $exec since 
${config.BLACKLIST_KILL_ENABLED.key} is set.")
 }
   }
 
+  private[scheduler] def killBlacklistedIdleExecutor(exec: String): Unit = 
{
+killExecutor(exec,
--- End diff --

Should this code be guarded by `if 
(conf.get(config.BLACKLIST_KILL_ENABLED))`? As the other 
`killBlacklistedExecutor` function?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-12 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21588
  
I know this is probably just reviving an old thread elsewhere, but, we 
don't know how to update our 1.2.1 Hive fork anyway, it seems? if so, and the 
fork is undesirable, seems like time to drop it.

If it's hard to get on mainstream Hive 1.x, then, how is 2.x? Certainly its 
reasonable to drop 1.x support in Spark 3.0. Does that solve anything?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-12 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r224843696
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3854,6 +3854,38 @@ object functions {
   @scala.annotation.varargs
   def map_concat(cols: Column*): Column = withExpr { 
MapConcat(cols.map(_.expr)) }
 
+  /**
+   * Parses a column containing a CSV string into a `StructType` with the 
specified schema.
+   * Returns `null`, in the case of an unparseable string.
+   *
+   * @param e a string column containing CSV data.
+   * @param schema the schema to use when parsing the CSV string
+   * @param options options to control how the CSV is parsed. accepts the 
same options and the
+   *CSV data source.
+   *
+   * @group collection_funcs
+   * @since 3.0.0
+   */
+  def from_csv(e: Column, schema: StructType, options: Map[String, 
String]): Column = withExpr {
+CsvToStructs(schema, options, e.expr)
+  }
+
+  /**
+   * (Java-specific) Parses a column containing a CSV string into a 
`StructType`
+   * with the specified schema. Returns `null`, in the case of an 
unparseable string.
+   *
+   * @param e a string column containing CSV data.
+   * @param schema the schema to use when parsing the CSV string
+   * @param options options to control how the CSV is parsed. accepts the 
same options and the
+   *CSV data source.
+   *
+   * @group collection_funcs
+   * @since 3.0.0
+   */
+  def from_csv(e: Column, schema: String, options: java.util.Map[String, 
String]): Column = {
--- End diff --

What's stopped me to do that was I didn't know how to support the `Column` 
type in R. I even opened JIRA ticket for a similar issue related to 
`schema_of_json`: https://issues.apache.org/jira/browse/SPARK-25446 . The 
`from_json()` accepts the schema as `characterOrstructType` and how to extend 
to support the `Column` type as well not clear to me:

https://github.com/apache/spark/blob/17781d75308c328b11cab3658ca4f358539414f2/R/pkg/R/functions.R#L2186
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-12 Thread szyszy

Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224850514
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.collection.mutable
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+  private val RESOURCE_INFO_CLASS = 
"org.apache.hadoop.yarn.api.records.ResourceInformation"
+
+  /**
+   * Validates sparkConf and throws a SparkException if any of standard 
resources (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   */
+  def validateResources(sparkConf: SparkConf): Unit = {
+val resourceDefinitions = Seq[(String, String)](
+  (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"),
--- End diff --

Sure!
Did you mean this documentation? 

https://hadoop.apache.org/docs/r3.0.1/hadoop-yarn/hadoop-yarn-site/ResourceModel.html
I think it's required to check all the keys for memory / vcore that YARN 
deprecates, as those will flow trough Spark and eventually reach YARN's 
`ResourceInformation` and it will just blow up as only `memory-mb` and `vcores` 
are the ones that are not deprecated. The reason why it haven't caused a 
problem with current Spark code as it is using the `Resource` object and not 
using `ResourceInformation` at all.
So we need to disallow these:
- cpu-vcores
- memory
- mb

What do you think?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22709
  
**[Test build #97310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97310/testReport)**
 for PR 22709 at commit 
[`c97f543`](https://github.com/apache/spark/commit/c97f54347cf08edfa1f31ab7026700170a67c848).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-12 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r224846183
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala
 ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.util.ArrayBasedMapData
+import org.apache.spark.sql.types.{MapType, StringType, StructType}
+
+object ExprUtils {
+
+  def evalSchemaExpr(exp: Expression): StructType = exp match {
--- End diff --

The difference between the two functions appeared when I modified 
`evalSchemaExpr` in `JsonExprUtils` to support `schema_of_json`. When we rebase 
and prepare `schema_of_csv` in the PR 
https://github.com/apache/spark/pull/22666, we will merge those two functions. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-12 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224851274
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.collection.mutable
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+  private val RESOURCE_INFO_CLASS = 
"org.apache.hadoop.yarn.api.records.ResourceInformation"
+
+  /**
+   * Validates sparkConf and throws a SparkException if any of standard 
resources (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   */
+  def validateResources(sparkConf: SparkConf): Unit = {
+val resourceDefinitions = Seq[(String, String)](
+  (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"),
--- End diff --

I'm not familiar with the YARN code or what it does here.

I'm just worried about users setting cpu/memory resources outside of the 
proper Spark settings, and also the inconsistency in your code (using both 
memory and memory-mb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22654
  
**[Test build #97306 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97306/testReport)**
 for PR 22654 at commit 
[`20856b4`](https://github.com/apache/spark/commit/20856b4132cbc6aa34484144112f3463e47c4906).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-12 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/21157
  
Ok it looks like it was @HyukjinKwon who suggested that we remove this hack 
in general rather than the partial work around can I get your thoughts on why? 
It seems like the partial work around would give us the best of both worlds 
(e.g. we don't break peoples existing Spark code and we handle Python tuples 
better).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-12 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22657
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22700: [SPARK-25712][Core][Minor] Improve usage message ...

2018-10-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22700


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22678: [SPARK-25685][BUILD] Allow running tests in Jenki...

2018-10-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22678


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22689: [SPARK-25697][CORE]When zstd compression enabled,...

2018-10-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22689


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21322: [SPARK-24225][CORE] Support closing AutoClosable ...

2018-10-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21322#discussion_r224875111
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -384,15 +385,30 @@ private[spark] class MemoryStore(
 }
   }
 
+  private def maybeReleaseResources(resource: (BlockId, MemoryEntry[_])): 
Unit = {
+maybeReleaseResources(resource._1, resource._2)
+  }
+
+  private def maybeReleaseResources(blockId: BlockId, entry: 
MemoryEntry[_]): Unit = {
+entry match {
+  case SerializedMemoryEntry(buffer, _, _) => buffer.dispose()
--- End diff --

Why not just make these case classes `Closeable` and then you can close 
them consistently


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21322: [SPARK-24225][CORE] Support closing AutoClosable ...

2018-10-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21322#discussion_r224875899
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -1930,6 +1930,18 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  def tryClose(value: Any): Unit = {
--- End diff --

This should accept at best `AnyRef`. It doesn't really seem like we need a 
new global utility method for this. It's a little unusual to try closing things 
that aren't `Closeable` and we can try to rationalize that in the callers above 
if possible.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21322: [SPARK-24225][CORE] Support closing AutoClosable ...

2018-10-12 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21322#discussion_r224874828
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -384,15 +385,30 @@ private[spark] class MemoryStore(
 }
   }
 
+  private def maybeReleaseResources(resource: (BlockId, MemoryEntry[_])): 
Unit = {
+maybeReleaseResources(resource._1, resource._2)
+  }
+
+  private def maybeReleaseResources(blockId: BlockId, entry: 
MemoryEntry[_]): Unit = {
+entry match {
+  case SerializedMemoryEntry(buffer, _, _) => buffer.dispose()
+  case DeserializedMemoryEntry(values: Array[Any], _, _) => 
maybeCloseValues(values, blockId)
+  case _ =>
+}
+  }
+
+  private def maybeCloseValues(values: Array[Any], blockId: BlockId): Unit 
= {
+if (blockId.isBroadcast) {
+  values.foreach(value => Utils.tryClose(value))
--- End diff --

Just a style thing, but could be `values.foreach(Utils.tryClose)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22670: [SPARK-25631][SPARK-25632][SQL][TEST] Improve the test r...

2018-10-12 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22670
  
@dilipbiswal I like this change too. The suite goes from 4:34 to 0:53. I 
wonder if we can make this change elsewhere in general Kafka test config? This 
kind of setting seems useful everywhere.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22670: [SPARK-25631][SPARK-25632][SQL][TEST] Improve the test r...

2018-10-12 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22670
  
@srowen Thanks. Did you mean, the test cases should extend a shared spark 
context (SharedKafkaSparkContext) which would have this property set ?

Actually Sean, there are 3 suites in this directory. 
`DirectKafkaStreamSuite.scala` `KafkaDataConsumerSuite.scala` and  
`KafkaRDDSuite.scala`. Given only two tests which are affected by this timeout 
(which we are fixing here), do you think we need to take on this refactoring 
work as part of this PR ? Please let me know.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22670: [SPARK-25631][SPARK-25632][SQL][TEST] Improve the test r...

2018-10-12 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22670
  
I don't so much mean that much refactoring. I wonder if there are 1-2 other 
places where common Kafka params are set in tests that we could add this to for 
now, that kind of thing. This change is OK by itself too though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21688
  
**[Test build #97312 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97312/testReport)**
 for PR 21688 at commit 
[`3a1f274`](https://github.com/apache/spark/commit/3a1f27412f3aad2d4db8174a2a7f5e5924f76502).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-12 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224830769
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.collection.mutable
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+  private val RESOURCE_INFO_CLASS = 
"org.apache.hadoop.yarn.api.records.ResourceInformation"
+
+  /**
+   * Validates sparkConf and throws a SparkException if any of standard 
resources (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   */
+  def validateResources(sparkConf: SparkConf): Unit = {
+val resourceDefinitions = Seq[(String, String)](
+  (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"),
--- End diff --

Still waiting for a word on this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22709: [SPARK-25718][SQL]Detect recursive reference in A...

2018-10-12 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22709#discussion_r224831076
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala 
---
@@ -67,21 +71,28 @@ object SchemaConverters {
   case ENUM => SchemaType(StringType, nullable = false)
 
   case RECORD =>
+if (existingRecordNames.contains(avroSchema.getFullName)) {
--- End diff --

Another approach is to check the whole json string 
schema(`avroSchema.toString`) here. But it seems overkill. Avro requires the 
full name of record to be unique.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-12 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r224844756
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVUtils.scala ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.csv
+
+object CSVUtils {
+  /**
+   * Filter ignorable rows for CSV iterator (lines empty and starting with 
`comment`).
+   * This is currently being used in CSV reading path and CSV schema 
inference.
+   */
+  def filterCommentAndEmpty(iter: Iterator[String], options: CSVOptions): 
Iterator[String] = {
+iter.filter { line =>
+  line.trim.nonEmpty && !line.startsWith(options.comment.toString)
+}
+  }
+
+  /**
+   * Helper method that converts string representation of a character to 
actual character.
+   * It handles some Java escaped strings and throws exception if given 
string is longer than one
+   * character.
+   */
+  @throws[IllegalArgumentException]
+  def toChar(str: String): Char = {
--- End diff --

There shouldn't be duplicates there. I moved all functions used in 
`sql/catalyst` out of `sql/core`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-12 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22295#discussion_r224860828
  
--- Diff: python/pyspark/sql/session.py ---
@@ -231,6 +231,7 @@ def __init__(self, sparkContext, jsparkSession=None):
 or SparkSession._instantiatedSession._sc._jsc is None:
 SparkSession._instantiatedSession = self
 self._jvm.SparkSession.setDefaultSession(self._jsparkSession)
+self._jvm.SparkSession.setActiveSession(self._jsparkSession)
--- End diff --

If we're going to support this we should have test for it, or if we aren't 
going to support this right now we should document the behaviour.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-12 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22295#discussion_r224858616
  
--- Diff: python/pyspark/sql/session.py ---
@@ -252,6 +253,22 @@ def newSession(self):
 """
 return self.__class__(self._sc, self._jsparkSession.newSession())
 
+@since(3.0)
--- End diff --

@HyukjinKwon are you OK to mark this comment as resolved since we're now 
targeting `3.0`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-12 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22295#discussion_r224858233
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2633,6 +2633,23 @@ def sequence(start, stop, step=None):
 _to_java_column(start), _to_java_column(stop), 
_to_java_column(step)))
 
 
+@since(3.0)
+def getActiveSession():
+"""
+Returns the active SparkSession for the current thread
+"""
+from pyspark.sql import SparkSession
+sc = SparkContext._active_spark_context
--- End diff --

If this is being done to simplify implementation and we don't expect people 
to call it directly here we should mention that in the docstring and also use 
an _ prefix.

I disagree with @HyukjinKwon about this behaviour being what people would 
expect -- it doesn't match the Scala behaviour and one of the reasons to have 
something like `getActiveSession()` instead of `getOrCreate()` is to allow 
folks to do something if we have an active session or do something else if we 
don't.

What about if `sc` is`None` we just return `None `since we can't have an 
`activeSession` without an active `SparkContext` -- does that sound reasonable?

That being said if folks feel strongly about this I'm _ok_ with us setting 
up a SparkContext but we need to document that if that's the path we go.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-10-12 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22295#discussion_r224860350
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3654,6 +3654,109 @@ def test_jvm_default_session_already_set(self):
 spark.stop()
 
 
+class SparkSessionTests2(unittest.TestCase):
+
+def test_active_session(self):
+spark = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+try:
+activeSession = SparkSession.getActiveSession()
+df = activeSession.createDataFrame([(1, 'Alice')], ['age', 
'name'])
+self.assertEqual(df.collect(), [Row(age=1, name=u'Alice')])
+finally:
+spark.stop()
+
+def test_get_active_session_when_no_active_session(self):
+active = SparkSession.getActiveSession()
+self.assertEqual(active, None)
+spark = SparkSession.builder \
+.master("local") \
+.getOrCreate()
+active = SparkSession.getActiveSession()
+self.assertEqual(active, spark)
+spark.stop()
+active = SparkSession.getActiveSession()
+self.assertEqual(active, None)
--- End diff --

Given the change for how we construct the SparkSession can we add a test 
that makes sure we do whatever we decide to with the SparkContext?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18457: [SPARK-21241][MLlib]- Add setIntercept to StreamingLinea...

2018-10-12 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18457
  
Sounds like we're not going to change this @SoulGuedria but we'd love your 
contributions in Spark ML where things are actively being developed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-12 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224865402
  
--- Diff: core/src/main/scala/org/apache/spark/status/LiveEntity.scala ---
@@ -341,7 +341,9 @@ private class LiveExecutorStageSummary(
   metrics.shuffleWriteMetrics.recordsWritten,
   metrics.memoryBytesSpilled,
   metrics.diskBytesSpilled,
-  isBlacklisted)
+  isBlacklisted,
--- End diff --

> you are sending the entire ExecutorSummary for all executors when you 
really just need 2 fields out of it of some executors

The current scala code avoids that by just fetching executor summaries it 
needs; the problem I can see from JS is that it would cause potentially many 
requests to the driver (which are more expensive than a hash table or level db 
lookup), but maybe it's not so bad if it's restricted to what is being shown in 
a single page.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix repr behaviour for R...

2018-10-12 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/20503
  
Gentle ping again to @ashashwat . Also @HyukjinKwon what are your opinions 
on the test coverage?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22414
  
**[Test build #4374 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4374/testReport)**
 for PR 22414 at commit 
[`89e05f2`](https://github.com/apache/spark/commit/89e05f261c9d9495ef04d4d3cccb49c6b9a587fb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22710: DO NOT MERGE

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22710
  
**[Test build #97315 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97315/testReport)**
 for PR 22710 at commit 
[`ca4f4f3`](https://github.com/apache/spark/commit/ca4f4f39730e86fada6d136049a11ecc8e31b81d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-12 Thread abellina

Github user abellina commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r224879925
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala ---
@@ -146,21 +146,31 @@ private[scheduler] class BlacklistTracker (
 nextExpiryTime = math.min(execMinExpiry, nodeMinExpiry)
   }
 
+  private def killExecutor(exec: String, msg: String): Unit = {
+allocationClient match {
+  case Some(a) =>
+logInfo(msg)
+a.killExecutors(Seq(exec), adjustTargetNumExecutors = false, 
countFailures = false,
+  force = true)
+  case None =>
+logInfo(s"Not attempting to kill blacklisted executor id $exec " +
+  s"since allocation client is not defined.")
+}
+  }
+
   private def killBlacklistedExecutor(exec: String): Unit = {
 if (conf.get(config.BLACKLIST_KILL_ENABLED)) {
-  allocationClient match {
-case Some(a) =>
-  logInfo(s"Killing blacklisted executor id $exec " +
-s"since ${config.BLACKLIST_KILL_ENABLED.key} is set.")
-  a.killExecutors(Seq(exec), adjustTargetNumExecutors = false, 
countFailures = false,
-force = true)
-case None =>
-  logWarning(s"Not attempting to kill blacklisted executor id 
$exec " +
-s"since allocation client is not defined.")
-  }
+  killExecutor(exec,
+s"Killing blacklisted executor id $exec since 
${config.BLACKLIST_KILL_ENABLED.key} is set.")
 }
   }
 
+  private[scheduler] def killBlacklistedIdleExecutor(exec: String): Unit = 
{
+killExecutor(exec,
--- End diff --

Makes sense. I guess there is no point in toggling this on and off (e.g. no 
IDLE_BLACKLIST_KILL_ENABLED)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22710: DO NOT MERGE

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22710
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97315/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support par...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97309/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21710
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & formatting for JavaDoc

2018-10-12 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22593
  
Merged to master/2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support par...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20999
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97312/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22670: [SPARK-25631][SPARK-25632][SQL][TEST] Improve the test r...

2018-10-12 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22670
  
@srowen OK, Let me look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17631: [SPARK-20319][SQL] Already quoted identifiers are gettin...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17631
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-12 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/22657
  
@srowen @HyukjinKwon @cloud-fan Thank you for your review of the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

2018-10-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22654
  
LGTM

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21710
  
**[Test build #97317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97317/testReport)**
 for PR 21710 at commit 
[`0bab5ac`](https://github.com/apache/spark/commit/0bab5aca283bacdfe36ba1c669521df9d7ff81f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integra...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22703
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integra...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22703
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97307/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integra...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22703
  
**[Test build #97307 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97307/testReport)**
 for PR 22703 at commit 
[`6e34ce7`](https://github.com/apache/spark/commit/6e34ce7ab7961531d97655e0733ed92f701fbbfd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-12 Thread szyszy

Github user szyszy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224909836
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.collection.mutable
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+  private val RESOURCE_INFO_CLASS = 
"org.apache.hadoop.yarn.api.records.ResourceInformation"
+
+  /**
+   * Validates sparkConf and throws a SparkException if any of standard 
resources (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   */
+  def validateResources(sparkConf: SparkConf): Unit = {
+val resourceDefinitions = Seq[(String, String)](
+  (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"),
--- End diff --

These are two separate things: 
1. One is that I don't reject all the deprecated standard resources has 
been known to YARN (explained in previous comment) which I will address soon.
2. Using `memory-mb` is the only way to initialize the memory resource with 
the YARN client, with the method `ResourceUtils.reinitializeResources`. 
I played around with this a bit, if I omit the standard resources and try 
to specify custom resources and then call 
`ResourceUtils.reinitializeResources`, an internal YARN exception will be 
thrown. 
Unfortunately, invoking this method is the most simple way to build tests 
upon custom resource types, to my best knowledge, so I can't really do much 
about this. 

> and also the inconsistency in your code (using both memory and memory-mb).
What did you mean with this? The only use of `"memory"` all around the 
change is to prevent it from being used with the new resource configs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20761: [SPARK-20327][CORE][YARN] Add CLI support for YAR...

2018-10-12 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20761#discussion_r224913824
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.lang.{Long => JLong}
+import java.lang.reflect.InvocationTargetException
+
+import scala.collection.mutable
+import scala.util.Try
+
+import org.apache.hadoop.yarn.api.records.Resource
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.Utils
+
+/**
+ * This helper class uses some of Hadoop 3 methods from the YARN API,
+ * so we need to use reflection to avoid compile error when building 
against Hadoop 2.x
+ */
+private object ResourceRequestHelper extends Logging {
+  private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
+  private val RESOURCE_INFO_CLASS = 
"org.apache.hadoop.yarn.api.records.ResourceInformation"
+
+  /**
+   * Validates sparkConf and throws a SparkException if any of standard 
resources (memory or cores)
+   * is defined with the property spark.yarn.x.resource.y
+   */
+  def validateResources(sparkConf: SparkConf): Unit = {
+val resourceDefinitions = Seq[(String, String)](
+  (AM_MEMORY.key, YARN_AM_RESOURCE_TYPES_PREFIX + "memory"),
--- End diff --

> What did you mean with this?

I meant you were initializing `memory-mb` in tests but checking only 
`memory` here. That smells like you should be checking `memory-mb` here.

There kinds of things should have comments in the code so in the future we 
know why they are that way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22710: DO NOT MERGE

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22710
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22710: DO NOT MERGE

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22710
  
**[Test build #97315 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97315/testReport)**
 for PR 22710 at commit 
[`ca4f4f3`](https://github.com/apache/spark/commit/ca4f4f39730e86fada6d136049a11ecc8e31b81d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22322: [SPARK-25312][Documentation, Spark Core] Add description...

2018-10-12 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22322
  
Ping @npoberezkin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22593: [Streaming][DOC] Fix typo & formatting for JavaDo...

2018-10-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22593


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...

2018-10-12 Thread koeninger

Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/22703#discussion_r224899199
  
--- Diff: docs/streaming-kafka-0-10-integration.md ---
@@ -3,7 +3,11 @@ layout: global
 title: Spark Streaming + Kafka Integration Guide (Kafka broker version 
0.10.0 or higher)
 ---
 
-The Spark Streaming integration for Kafka 0.10 is similar in design to the 
0.8 [Direct Stream 
approach](streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers).
  It provides simple parallelism,  1:1 correspondence between Kafka partitions 
and Spark partitions, and access to offsets and metadata. However, because the 
newer integration uses the [new Kafka consumer 
API](http://kafka.apache.org/documentation.html#newconsumerapi) instead of the 
simple API, there are notable differences in usage. This version of the 
integration is marked as experimental, so the API is potentially subject to 
change.
+The Spark Streaming integration for Kafka 0.10 provides simple 
parallelism, 1:1 correspondence between Kafka 
+partitions and Spark partitions, and access to offsets and metadata. 
However, because the newer integration uses 
+the [new Kafka consumer 
API](https://kafka.apache.org/documentation.html#newconsumerapi) instead of the 
simple API, 
+there are notable differences in usage. This version of the integration is 
marked as experimental, so the API is 
--- End diff --

Do we want to leave the new integration marked as experimental if it is now 
the only available one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-12 Thread superbobry

Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
Nope, the job I was referring to is not open source; but I guess the 
speedup is easy to justify: much less payload and faster deserialization:

```
>>> from collections import namedtuple
>>> Stats = namedtuple("Stats", ["sample_mean", "sample_variance"])
>>> import pickle
>>> len(pickle.dumps(Stats(42, 42)))
31
>>> len(pickle.dumps(("Stats", Stats._fields, (42, 42
68
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22645: [SPARK-25566][SPARK-25567][WEBUI][SQL]Support pagination...

2018-10-12 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22645
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22645: [SPARK-25566][SPARK-25567][WEBUI][SQL]Support pagination...

2018-10-12 Thread shahidki31

Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22645
  
Thanks a lot @srowen 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...

2018-10-12 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22689
  
Merged to master/2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22699: [SPARK-25711][Core] Improve start-history-server.sh: sho...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22699
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22699: [SPARK-25711][Core] Improve start-history-server.sh: sho...

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22699
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97308/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21710
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3928/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support par...

2018-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20999
  
**[Test build #97309 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97309/testReport)**
 for PR 20999 at commit 
[`9b84057`](https://github.com/apache/spark/commit/9b8405748b3756024e346a2f00d4561f7617b16e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & formatting for JavaDoc

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22593
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97311/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & formatting for JavaDoc

2018-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22593
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

2018-10-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22654


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 362 matches

Mail list logo