[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

2018-02-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20647#discussion_r169556960
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -107,17 +106,24 @@ case class DataSourceV2Relation(
 }
 
 /**
- * A specialization of DataSourceV2Relation with the streaming bit set to 
true. Otherwise identical
- * to the non-streaming relation.
+ * A specialization of [[DataSourceV2Relation]] with the streaming bit set 
to true.
+ *
+ * Note that, this plan has a mutable reader, so Spark won't apply 
operator push-down for this plan,
+ * to avoid making the plan mutable. We should consolidate this plan and 
[[DataSourceV2Relation]]
+ * after we figure out how to apply operator push-down for streaming data 
sources.
--- End diff --

Currently the streaming execution creates the reader once. The reader is 
mutable and contains 2 kinds of states:
1. operator push-down states, e.g. the filters being pushed down.
2. streaming related states, like offsets, kafka connection, etc.

For continues mode, it's fine. We create the reader, set offsets, construct 
the plan, get the physical plan, and process. We mutate the reader states at 
the beginning and never mutate it again.

For micro-batch mode, we have a problem. We create the reader at the 
beginning, set reader offset, construct the plan and get the physical plan for 
every batch. This means we apply operator push-down to this reader many times, 
and data source v2 doesn't define what the behavior should be for this case. 
Thus we can't apply operator push-down for streaming data sources.

@marmbrus @tdas @zsxwing @jose-torres I have 2 proposals to support 
operator push down for streaming relation:
1. Introduce a `reset` API to `DataSourceReader` to clear out the operator 
push-down states. Then we can call `reset` for every micro-batch and safely 
apply operator pushdown.
2. Do plan analyzing/optimizing/planning only once for micro-batch mode. 
Theoretically it's not good, as different micro-batch may have different 
statistics and the optimal physical plan is different, we should rerun the 
planner for each batch. The benefit is, plan analyzing/optimizing/planning may 
be costly, doing it once can mitigate the cost. Also adaptive execution can 
help so it's not that bad to reuse the same physical plan.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20575: [SPARK-23386][DEPLOY] enable direct application links in...

2018-02-20 Thread gerashegalov
Github user gerashegalov commented on the issue:

https://github.com/apache/spark/pull/20575
  
@vanzin what do you mean by "as part of parsing the logs"? This PR is about 
avoiding the long wait for eventLogs to be read from a remote filesystem, and 
being parsed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...

2018-02-20 Thread IgorBerman
Github user IgorBerman commented on a diff in the pull request:

https://github.com/apache/spark/pull/20640#discussion_r169556816
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -648,15 +645,6 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
   totalGpusAcquired -= gpus
   gpusByTaskId -= taskId
 }
-// If it was a failure, mark the slave as failed for blacklisting 
purposes
-if (TaskState.isFailed(state)) {
-  slave.taskFailures += 1
-
-  if (slave.taskFailures >= MAX_SLAVE_FAILURES) {
-logInfo(s"Blacklisting Mesos slave $slaveId due to too many 
failures; " +
--- End diff --

@kayousterhout BlacklistTracker has it's own logging that is concerned with 
blacklisted nodes, won't it be enough? on the other hand, if blacklisting is 
disabled, which is default, then we will lose this information.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20640
  
**[Test build #87580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87580/testReport)**
 for PR 20640 at commit 
[`e2ddc1b`](https://github.com/apache/spark/commit/e2ddc1be19e2f978df4fe84073aff3f5b46afe45).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...

2018-02-20 Thread IgorBerman
Github user IgorBerman commented on a diff in the pull request:

https://github.com/apache/spark/pull/20640#discussion_r169554433
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -571,7 +568,7 @@ private[spark] class MesosCoarseGrainedSchedulerBackend(
   cpus + totalCoresAcquired <= maxCores &&
   mem <= offerMem &&
   numExecutors < executorLimit &&
-  slaves.get(slaveId).map(_.taskFailures).getOrElse(0) < 
MAX_SLAVE_FAILURES &&
+  !scheduler.nodeBlacklist().contains(slaveId) &&
--- End diff --

You are right, thank! 

here:

https://github.com/apache/spark/blob/9e50a1d37a4cf0c34e20a7c1a910ceaff41535a2/core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala#L193

I've managed to track it to 
https://github.com/apache/spark/blob/e18d6f5326e0d9ea03d31de5ce04cb84d3b8ab37/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L852

I'll change it to offerHostname


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20613
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20613
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87574/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20613
  
**[Test build #87574 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87574/testReport)**
 for PR 20613 at commit 
[`27a1af2`](https://github.com/apache/spark/commit/27a1af22e94f49b7801c4f49443ebadb1ff35571).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20603: [SPARK-23418][SQL]: Fail DataSourceV2 reads when ...

2018-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20603


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20603: [SPARK-23418][SQL]: Fail DataSourceV2 reads when user sc...

2018-02-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20603
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87579/testReport)**
 for PR 20647 at commit 
[`9dd1986`](https://github.com/apache/spark/commit/9dd1986558013f93a0c77f6e62c277b51140ea39).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/982/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87578/testReport)**
 for PR 20647 at commit 
[`e1f24a4`](https://github.com/apache/spark/commit/e1f24a4be751ef39ea4d45b71a315810dcb9fd74).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87578/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87578/testReport)**
 for PR 20647 at commit 
[`e1f24a4`](https://github.com/apache/spark/commit/e1f24a4be751ef39ea4d45b71a315810dcb9fd74).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/981/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87577/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87577/testReport)**
 for PR 20647 at commit 
[`138679e`](https://github.com/apache/spark/commit/138679ee8b713c20ac89d44a2e8e5d82c69a).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/980/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20647
  
CC @tdas @gatorsmile @rdblue  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87577/testReport)**
 for PR 20647 at commit 
[`138679e`](https://github.com/apache/spark/commit/138679ee8b713c20ac89d44a2e8e5d82c69a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

2018-02-20 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/20647

[SPARK-23303][SQL] improve the explain result for data source v2 relations

## What changes were proposed in this pull request?

The current explain result for data source v2 relation is unreadable:
```
== Parsed Logical Plan ==
'Filter ('i > 6)
+- AnalysisBarrier
  +- Project [j#1]
 +- DataSourceV2Relation [i#0, j#1], 
org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940

== Analyzed Logical Plan ==
j: int
Project [j#1]
+- Filter (i#0 > 6)
   +- Project [j#1, i#0]
  +- DataSourceV2Relation [i#0, j#1], 
org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940

== Optimized Logical Plan ==
Project [j#1]
+- Filter isnotnull(i#0)
   +- DataSourceV2Relation [i#0, j#1], 
org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940

== Physical Plan ==
*(1) Project [j#1]
+- *(1) Filter isnotnull(i#0)
   +- *(1) DataSourceV2Scan [i#0, j#1], 
org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940
```

after this PR
```
== Parsed Logical Plan ==
'Project [unresolvedalias('j, None)]
+- AnalysisBarrier
  +- Relation AdvancedDataSourceV2[i#0, j#1]

== Analyzed Logical Plan ==
j: int
Project [j#1]
+- Relation AdvancedDataSourceV2[i#0, j#1]

== Optimized Logical Plan ==
Relation AdvancedDataSourceV2[j#1]

== Physical Plan ==
*(1) Scan AdvancedDataSourceV2[j#1]
```
---
```
== Analyzed Logical Plan ==
i: int, j: int
Filter (i#88 > 3)
+- Relation JavaAdvancedDataSourceV2[i#88, j#89]

== Optimized Logical Plan ==
Filter isnotnull(i#88)
+- Relation JavaAdvancedDataSourceV2[i#88, j#89] (PushedFilter: 
[GreaterThan(i,3)])

== Physical Plan ==
*(1) Filter isnotnull(i#88)
+- *(1) Scan JavaAdvancedDataSourceV2[i#88, j#89] (PushedFilter: 
[GreaterThan(i,3)])
```

an example for streaming query
```
== Parsed Logical Plan ==
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true, false) AS value#6]
   +- MapElements , class java.lang.String, 
[StructField(value,StringType,true)], obj#5: java.lang.String
  +- DeserializeToObject cast(value#25 as string).toString, obj#4: 
java.lang.String
 +- Streaming Relation FakeDataSourceV2$[value#25]

== Analyzed Logical Plan ==
value: string, count(1): bigint
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true, false) AS value#6]
   +- MapElements , class java.lang.String, 
[StructField(value,StringType,true)], obj#5: java.lang.String
  +- DeserializeToObject cast(value#25 as string).toString, obj#4: 
java.lang.String
 +- Streaming Relation FakeDataSourceV2$[value#25]

== Optimized Logical Plan ==
Aggregate [value#6], [value#6, count(1) AS count(1)#11L]
+- SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true, false) AS value#6]
   +- MapElements , class java.lang.String, 
[StructField(value,StringType,true)], obj#5: java.lang.String
  +- DeserializeToObject value#25.toString, obj#4: java.lang.String
 +- Streaming Relation FakeDataSourceV2$[value#25]

== Physical Plan ==
*(4) HashAggregate(keys=[value#6], functions=[count(1)], output=[value#6, 
count(1)#11L])
+- StateStoreSave [value#6], state info [ checkpoint = 
*(redacted)/cloud/dev/spark/target/tmp/temporary-549f264b-2531-4fcb-a52f-433c77347c12/state,
 runId = f84d9da9-2f8c-45c1-9ea1-70791be684de, opId = 0, ver = 0, numPartitions 
= 5], Complete, 0
   +- *(3) HashAggregate(keys=[value#6], functions=[merge_count(1)], 
output=[value#6, count#16L])
  +- StateStoreRestore [value#6], state info [ checkpoint = 
*(redacted)/cloud/dev/spark/target/tmp/temporary-549f264b-2531-4fcb-a52f-433c77347c12/state,
 runId = f84d9da9-2f8c-45c1-9ea1-70791be684de, opId = 0, ver = 0, numPartitions 
= 5]
 +- *(2) HashAggregate(keys=[value#6], functions=[merge_count(1)], 
output=[value#6, count#16L])
+- Exchange hashpartitioning(value#6, 5)
   +- *(1) HashAggregate(keys=[value#6], 
functions=[partial_count(1)], output=[value#6, count#16L])
  +- *(1) SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true, false) AS value#6]
 

[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20572
  
**[Test build #87576 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87576/testReport)**
 for PR 20572 at commit 
[`08f3570`](https://github.com/apache/spark/commit/08f3570c0491f96abcaa9a6dc0f4e3030cfea6c0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87576/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/979/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20572
  
**[Test build #87576 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87576/testReport)**
 for PR 20572 at commit 
[`08f3570`](https://github.com/apache/spark/commit/08f3570c0491f96abcaa9a6dc0f4e3030cfea6c0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecu...

2018-02-20 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/20572#discussion_r169538019
  
--- Diff: 
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
 ---
@@ -53,7 +51,7 @@ class CachedKafkaConsumer[K, V] private(
 
   // TODO if the buffer was kept around as a random-access structure,
   // could possibly optimize re-calculating of an RDD in the same batch
-  protected var buffer = ju.Collections.emptyList[ConsumerRecord[K, 
V]]().iterator
+  protected var buffer = 
ju.Collections.emptyListIterator[ConsumerRecord[K, V]]()
--- End diff --

Agreed, think it should be ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87575/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20572
  
**[Test build #87575 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87575/testReport)**
 for PR 20572 at commit 
[`2246750`](https://github.com/apache/spark/commit/224675046be2fbc38fea59c59394736e19042eb4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecu...

2018-02-20 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/20572#discussion_r169537541
  
--- Diff: 
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
 ---
@@ -83,13 +81,50 @@ class CachedKafkaConsumer[K, V] private(
 s"Failed to get records for $groupId $topic $partition $offset 
after polling for $timeout")
   record = buffer.next()
   assert(record.offset == offset,
-s"Got wrong record for $groupId $topic $partition even after 
seeking to offset $offset")
+s"Got wrong record for $groupId $topic $partition even after 
seeking to offset $offset " +
+  s"got offset ${record.offset} instead. If this is a compacted 
topic, consider enabling " +
+  "spark.streaming.kafka.allowNonConsecutiveOffsets"
+  )
 }
 
 nextOffset = offset + 1
 record
   }
 
+  /**
+   * Start a batch on a compacted topic
+   */
+  def compactedStart(offset: Long, timeout: Long): Unit = {
+logDebug(s"compacted start $groupId $topic $partition starting 
$offset")
+// This seek may not be necessary, but it's hard to tell due to gaps 
in compacted topics
+if (offset != nextOffset) {
+  logInfo(s"Initial fetch for compacted $groupId $topic $partition 
$offset")
+  seek(offset)
+  poll(timeout)
+}
+  }
+
+  /**
+   * Get the next record in the batch from a compacted topic.
+   * Assumes compactedStart has been called first, and ignores gaps.
+   */
+  def compactedNext(timeout: Long): ConsumerRecord[K, V] = {
+if (!buffer.hasNext()) { poll(timeout) }
+assert(buffer.hasNext(),
--- End diff --

That's a "shouldn't happen unless the topicpartition or broker is gone" 
kind of thing.  Semantically I could see that being more like require than 
assert, but don't have a strong opinion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecu...

2018-02-20 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/20572#discussion_r169536036
  
--- Diff: 
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala
 ---
@@ -22,12 +22,17 @@ import java.{ util => ju }
 import scala.collection.JavaConverters._
 import scala.util.Random
 
+import kafka.common.TopicAndPartition
--- End diff --

Right, LogCleaner hadn't yet been moved to the new apis, added a comment to 
that effect.
Think we're ok here because it's just being used to mock up a compacted 
topic, not in the actual dstream api.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20572
  
**[Test build #87575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87575/testReport)**
 for PR 20572 at commit 
[`2246750`](https://github.com/apache/spark/commit/224675046be2fbc38fea59c59394736e19042eb4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/978/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14936
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14936
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87573/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14936
  
**[Test build #87573 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87573/testReport)**
 for PR 14936 at commit 
[`bd9ace4`](https://github.com/apache/spark/commit/bd9ace4ea815bb9e4395dc2cafd52e6889250b29).
 * This patch **fails PySpark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20613
  
**[Test build #87574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87574/testReport)**
 for PR 20613 at commit 
[`27a1af2`](https://github.com/apache/spark/commit/27a1af22e94f49b7801c4f49443ebadb1ff35571).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20613
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/977/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20613
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

2018-02-20 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20613
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r169530627
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala ---
@@ -31,4 +31,19 @@ private case object TeradataDialect extends JdbcDialect {
 case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR))
 case _ => None
   }
+
+  override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The table to truncate.
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the
+   *value of isCascadingTruncateTable(). Ignored for 
Teradata as it is unsupported
+   * @return The SQL query to use for truncating a table
+   */
+  override def getTruncateQuery(
+  table: String,
+  cascade: Option[Boolean] = isCascadingTruncateTable): String = {
+s"TRUNCATE TABLE $table"
--- End diff --

Thank you so much for confirming, @klinvill !

@danielvdende . Could you update `TeradataDialect` according to @klinvill 
's advice?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20646
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20646
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87572/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20646
  
**[Test build #87572 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87572/testReport)**
 for PR 20646 at commit 
[`1df90e7`](https://github.com/apache/spark/commit/1df90e796e9388d7992bc55f9f87bfd71af2f7f9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20612: [SPARK-23424][SQL]Add codegenStageId in comment

2018-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20612


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId i...

2018-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20419


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20612: [SPARK-23424][SQL]Add codegenStageId in comment

2018-02-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20612
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20057: [SPARK-22880][SQL] Add cascadeTruncate option to ...

2018-02-20 Thread klinvill
Github user klinvill commented on a diff in the pull request:

https://github.com/apache/spark/pull/20057#discussion_r169526767
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala ---
@@ -31,4 +31,19 @@ private case object TeradataDialect extends JdbcDialect {
 case BooleanType => Option(JdbcType("CHAR(1)", java.sql.Types.CHAR))
 case _ => None
   }
+
+  override def isCascadingTruncateTable(): Option[Boolean] = Some(false)
+
+  /**
+   * The SQL query used to truncate a table.
+   * @param table The table to truncate.
+   * @param cascade Whether or not to cascade the truncation. Default 
value is the
+   *value of isCascadingTruncateTable(). Ignored for 
Teradata as it is unsupported
+   * @return The SQL query to use for truncating a table
+   */
+  override def getTruncateQuery(
+  table: String,
+  cascade: Option[Boolean] = isCascadingTruncateTable): String = {
+s"TRUNCATE TABLE $table"
--- End diff --

Hi @dongjoon-hyun, I was the original author of the TeradataDialect and 
@gatorsmile reviewed and committed it. You are correct, Teradata does not 
support the TRUNCATE statement. Instead Teradata uses a DELETE statement so you 
should be able to use `DELETE FROM $table ALL` instead of `TRUNCATE TABLE 
$table`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/20646
  
Actually, I am having second thoughts about this. This is fundamentally 
changing how the tests work, especially for stress tests. The stress tests 
actually test these corner cases (by randomly adding successive AddData) about 
what if data was being added while the previously added data is being picked 
up. With this change, we will accidentally not test those race-condition-prone 
cases. 

Second, we are taking multiple locks here in multiple sources, and the 
StreamExecution is likely to take the same locks. I am really afraid that we 
are introducing deadlocks by doing this.

I am still thinking what the right approach here. I think it should be
- Explicit synchronized adding of data to multiple sources.
- Not holding locks in multiple sources. 






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20635: [SPARK-23053][CORE][BRANCH-2.1] taskBinarySerialization ...

2018-02-20 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/20635
  
thanks @ivoson , merged!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20646: [SPARK-23408][SS] Synchronize successive AddDataM...

2018-02-20 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20646#discussion_r169522041
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala ---
@@ -429,7 +429,25 @@ trait StreamTest extends QueryTest with 
SharedSQLContext with TimeLimits with Be
 val defaultCheckpointLocation =
   Utils.createTempDir(namePrefix = 
"streaming.metadata").getCanonicalPath
 try {
-  startedTest.foreach { action =>
+  val actionIterator = startedTest.iterator.buffered
+  while (actionIterator.hasNext) {
+// Synchronize sequential addDataMemory actions.
--- End diff --

// Synchronize --> // Collectively synchronize  actions so that the 
data gets added together in a single batch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20646: [SPARK-23408][SS] Synchronize successive AddDataM...

2018-02-20 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/20646#discussion_r169521344
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala ---
@@ -429,7 +429,25 @@ trait StreamTest extends QueryTest with 
SharedSQLContext with TimeLimits with Be
 val defaultCheckpointLocation =
   Utils.createTempDir(namePrefix = 
"streaming.metadata").getCanonicalPath
 try {
-  startedTest.foreach { action =>
+  val actionIterator = startedTest.iterator.buffered
+  while (actionIterator.hasNext) {
+// Synchronize sequential addDataMemory actions.
+val addDataMemoryActions = ArrayBuffer[AddDataMemory[_]]()
+while (actionIterator.hasNext && 
actionIterator.head.isInstanceOf[AddDataMemory[_]]) {
+  
addDataMemoryActions.append(actionIterator.next().asInstanceOf[AddDataMemory[_]])
+}
+if (addDataMemoryActions.nonEmpty) {
+  val synchronizeAll = addDataMemoryActions
--- End diff --

This is some magic-ish code. Can you add a bit more comments on how this 
compose thing works?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20631: [SPARK-23454][SS][DOCS] Added trigger information...

2018-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20631


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20645: SPARK-23472: Add defaultJavaOptions for drivers and exec...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20645
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87569/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20645: SPARK-23472: Add defaultJavaOptions for drivers and exec...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20645
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20645: SPARK-23472: Add defaultJavaOptions for drivers and exec...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20645
  
**[Test build #87569 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87569/testReport)**
 for PR 20645 at commit 
[`89ee1f3`](https://github.com/apache/spark/commit/89ee1f330047c6c8a986e62e8201d9b7890cf4c8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20643: [SPARK-23468][core] Stringify auth secret before ...

2018-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20643


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14936
  
**[Test build #87573 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87573/testReport)**
 for PR 14936 at commit 
[`bd9ace4`](https://github.com/apache/spark/commit/bd9ace4ea815bb9e4395dc2cafd52e6889250b29).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20644: [SPARK-23470][ui] Use first attempt of last stage...

2018-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20644


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20644: [SPARK-23470][ui] Use first attempt of last stage to def...

2018-02-20 Thread sameeragarwal
Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/20644
  
merging this to master/2.3. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20644: [SPARK-23470][ui] Use first attempt of last stage to def...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20644
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87568/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20646
  
(https://issues.apache.org/jira/browse/SPARK-23369 was already filed for 
previous flake)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20644: [SPARK-23470][ui] Use first attempt of last stage to def...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20644
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20644: [SPARK-23470][ui] Use first attempt of last stage to def...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20644
  
**[Test build #87568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87568/testReport)**
 for PR 20644 at commit 
[`0dc95ad`](https://github.com/apache/spark/commit/0dc95ad83e9d736dedbb5d09d18513508847c0e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20646
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20646
  
**[Test build #87572 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87572/testReport)**
 for PR 20646 at commit 
[`1df90e7`](https://github.com/apache/spark/commit/1df90e796e9388d7992bc55f9f87bfd71af2f7f9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20646
  
> java.lang.RuntimeException: [unresolved dependency: 
com.sun.jersey#jersey-core;1.14: configuration not found in 
com.sun.jersey#jersey-core;1.14: 'master(compile)'. Missing configuration: 
'compile'. It was required from org.apache.hadoop#hadoop-yarn-common;2.6.5 
compile]

Surely unrelated to this change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20622: [SPARK-23441][SS] Remove queryExecutionThread.interrupt(...

2018-02-20 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/20622
  
@jose-torres I had a long offline chat with @zsxwing, kudos to him for 
catching a corner case in the current solution. The following sequence of 
events may occur.

- In the query thread, the epoch tracking thread is started
- Before the query thread actually starts the Spark job, the epoch tracking 
thread may detect some sort of reconfiguration and attempt to cancelJob even 
before the query thread has started spark jobs.
- Query thread starts spark job, gets blocked, never terminates. 

Fundamentally, its not a great setup that one thread is starting the jobs 
and another thread is canceling them. Because of the async nature, we have no 
way reasoning which attempt wins, starting or cancelling. Rather let's make 
sure that we start and cancel in the same thread (then we can do some 
reasoning). Here is an alternate solution.
- The epoch thread ONLY interrupts the query thread. It's not responsible 
for any Spark state management (other than the enum state).
- The query thread cancels jobs and stops sources in the `finally` clause.

There is less likely to be race conditions that end up not canceling Spark 
job as a single thread (the query thread) is responsible for all Spark state 
management.







---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20646
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20646
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87571/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20646
  
**[Test build #87571 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87571/testReport)**
 for PR 20646 at commit 
[`1df90e7`](https://github.com/apache/spark/commit/1df90e796e9388d7992bc55f9f87bfd71af2f7f9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20644: [SPARK-23470][ui] Use first attempt of last stage to def...

2018-02-20 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20644
  
Verified that this patch fixes the issue. LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20604: [SPARK-23365][CORE] Do not adjust num executors when kil...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20604
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87567/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20604: [SPARK-23365][CORE] Do not adjust num executors when kil...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20604
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20604: [SPARK-23365][CORE] Do not adjust num executors when kil...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20604
  
**[Test build #87567 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87567/testReport)**
 for PR 20604 at commit 
[`e69851b`](https://github.com/apache/spark/commit/e69851b9b5a1a6bc1ecc1c14a656d4c08572b9d7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20643: [SPARK-23468][core] Stringify auth secret before storing...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20643
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87566/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20643: [SPARK-23468][core] Stringify auth secret before storing...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20643
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20643: [SPARK-23468][core] Stringify auth secret before storing...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20643
  
**[Test build #87566 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87566/testReport)**
 for PR 20643 at commit 
[`77d009d`](https://github.com/apache/spark/commit/77d009d33a4ff6ba85453f10836b984d5e5acdf9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20632: [SPARK-3159] added subtree pruning in the translation fr...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20632
  
**[Test build #4103 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4103/testReport)**
 for PR 20632 at commit 
[`eeb3cc9`](https://github.com/apache/spark/commit/eeb3cc93ffa39198ccaba75fc7a63032325baf1e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20644: [SPARK-23470][ui] Use first attempt of last stage to def...

2018-02-20 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20644
  
Thanks for fixing this so fast. I'm testing it now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20635: [SPARK-23053][CORE][BRANCH-2.1] taskBinarySerialization ...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20635
  
**[Test build #4102 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4102/testReport)**
 for PR 20635 at commit 
[`bd88903`](https://github.com/apache/spark/commit/bd88903aca841e6ad55144127e4c11e9844ef6ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20616: [SPARK-23434][SQL] Spark should not warn `metadat...

2018-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20616


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...

2018-02-20 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/20640#discussion_r169500415
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -571,7 +568,7 @@ private[spark] class MesosCoarseGrainedSchedulerBackend(
   cpus + totalCoresAcquired <= maxCores &&
   mem <= offerMem &&
   numExecutors < executorLimit &&
-  slaves.get(slaveId).map(_.taskFailures).getOrElse(0) < 
MAX_SLAVE_FAILURES &&
+  !scheduler.nodeBlacklist().contains(slaveId) &&
--- End diff --

In other places it looks like the hostname is used in the blacklist - why 
does this check against the slaveId instead of the offerHostname?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...

2018-02-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20616
  
Thank you, @zsxwing and @cloud-fan .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...

2018-02-20 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20616
  
LGTM. Merging to master. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20646
  
**[Test build #87571 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87571/testReport)**
 for PR 20646 at commit 
[`1df90e7`](https://github.com/apache/spark/commit/1df90e796e9388d7992bc55f9f87bfd71af2f7f9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20646
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...

2018-02-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20616
  
Here, it is. It's `AccessControlException`, @zsxwing .
```
18/02/20 23:46:53 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=spark, access=EXECUTE, 
inode="/tmp/people.json/_spark_metadata":ambari-qa:hdfs:-rw-r--r--
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:109)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4111)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1137)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:866)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2110)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
at 
org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
at 
org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource$.createBaseDataset(JsonDataSource.scala:114)
at 
org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource$.infer(JsonDataSource.scala:95)
at 
org.apache.spark.sql.execution.datasources.json.JsonDataSource.inferSchema(JsonDataSource.scala:63)
at 
org.apache.spark.sql.execution.datasources.json.JsonFileFormat.inferSchema(JsonFileFormat.scala:57)
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:202)
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:202)
at scala.Option.orElse(Option.scala:289)
at 
org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:201)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:392)
at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:397)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:340)
at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:24)
at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:29)
at $li

[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...

2018-02-20 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/20640#discussion_r169497847
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -648,15 +645,6 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
   totalGpusAcquired -= gpus
   gpusByTaskId -= taskId
 }
-// If it was a failure, mark the slave as failed for blacklisting 
purposes
-if (TaskState.isFailed(state)) {
-  slave.taskFailures += 1
-
-  if (slave.taskFailures >= MAX_SLAVE_FAILURES) {
-logInfo(s"Blacklisting Mesos slave $slaveId due to too many 
failures; " +
--- End diff --

Is it a concern to lose this error message? (I don't know anything about 
Mesos but it does seem potentially useful?)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20646: [SPARK-23408][SS] Synchronize successive AddDataM...

2018-02-20 Thread jose-torres
GitHub user jose-torres opened a pull request:

https://github.com/apache/spark/pull/20646

[SPARK-23408][SS] Synchronize successive AddDataMemory actions in 
StreamTest.

## What changes were proposed in this pull request?

The stream-stream join tests add data to multiple sources, and expect it 
all to show up in the next batch. But there's a race condition; the new batch 
might trigger when only one of the AddData actions has been reached.

Fortunately, MemoryStream synchronizes batch generation on itself, and 
StreamExecution won't generate empty batches. So we can resolve this race 
condition by synchronizing successive AddDataMemory actions against every 
MemoryStream together. Then we can be sure StreamExecution won't start 
generating a batch before all the data is present.

## How was this patch tested?
existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jose-torres/spark flaky

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20646.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20646


commit d540be6bb051a33d2f6bd69a49fbe11afe9f0a65
Author: Jose Torres 
Date:   2018-02-20T23:34:16Z

just use synchronization

commit d91c55f1a17b03aa2d46682e76c6eb207e71a521
Author: Jose Torres 
Date:   2018-02-20T23:38:35Z

Merge branch 'master' of https://github.com/apache/spark into flaky

commit dce075f53c8a1418dac99c9b7b7f9b7e79ed17ff
Author: Jose Torres 
Date:   2018-02-20T23:45:40Z

fix merge




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20646: [SPARK-23408][SS] Synchronize successive AddDataMemory a...

2018-02-20 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/20646
  
@tdas 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >