[GitHub] spark pull request #16186: [SPARK-18758][SS] StreamingQueryListener events f...

2016-12-06 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16186#discussion_r91240700
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
 ---
@@ -63,18 +83,29 @@ class StreamingQueryListenerBus(sparkListenerBus: 
LiveListenerBus)
 }
   }
 
+  /**
+   * Dispatch events to registered StreamingQueryListeners. Only the 
events associated queries
+   * started in the same SparkSession as this ListenerBus will be 
dispatched to the listeners.
+   */
   override protected def doPostEvent(
   listener: StreamingQueryListener,
   event: StreamingQueryListener.Event): Unit = {
+val runIdsToReportTo = activeQueryRunIds.synchronized { 
activeQueryRunIds.toSet }
--- End diff --

Why need to clone the set? You can just use `activeQueryRunIds.synchronized 
{ activeQueryRunIds.contains(...) }`. Right?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16192: [SPARK-18764][Core]Add a warning log when skipping a cor...

2016-12-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16192
  
LGTM.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16191: spark decision tree

2016-12-06 Thread lklong
Github user lklong commented on the issue:

https://github.com/apache/spark/pull/16191
  
hi ,could somebody help to rely this question?
thanks very much!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16192: [SPARK-18764][Core]Add a warning log when skipping a cor...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16192
  
**[Test build #69782 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69782/consoleFull)**
 for PR 16192 at commit 
[`96b4836`](https://github.com/apache/spark/commit/96b48363ef58dbf1d7f2cf695ce30f05493f2990).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16192: [SPARK-18764][Core]Add a warning log when skippin...

2016-12-06 Thread zsxwing
GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/16192

[SPARK-18764][Core]Add a warning log when skipping a corrupted file

## What changes were proposed in this pull request?

It's better to add a warning log when skipping a corrupted file. It will be 
helpful when we want to finish the job first, then find them in the log and fix 
these files.

## How was this patch tested?

Jenkins

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-18764

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16192


commit 96b48363ef58dbf1d7f2cf695ce30f05493f2990
Author: Shixiong Zhu 
Date:   2016-12-07T07:39:50Z

Add a warning log when skipping a corrupted file




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16171
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69780/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16171
  
**[Test build #69780 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69780/consoleFull)**
 for PR 16171 at commit 
[`a0e8433`](https://github.com/apache/spark/commit/a0e8433f03d21a728dd843feef61b264314f44f8).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16171
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16191: spark decision tree

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16191
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16191: spark decision tree

2016-12-06 Thread zhuangxue
GitHub user zhuangxue opened a pull request:

https://github.com/apache/spark/pull/16191

spark decision tree

What algorithm is used in spark decision tree (is ID3, C4.5 or CART)?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16191


commit 16eaad9daed0b633e6a714b5704509aa7107d6e5
Author: Sean Owen 
Date:   2016-11-10T18:20:03Z

[SPARK-18262][BUILD][SQL] JSON.org license is now CatX

## What changes were proposed in this pull request?

Try excluding org.json:json from hive-exec dep as it's Cat X now. It may be 
the case that it's not used by the part of Hive Spark uses anyway.

## How was this patch tested?

Existing tests

Author: Sean Owen 

Closes #15798 from srowen/SPARK-18262.

commit b533fa2b205544b42dcebe0a6fee9d8275f6da7d
Author: Michael Allman 
Date:   2016-11-10T21:41:13Z

[SPARK-17993][SQL] Fix Parquet log output redirection

(Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-17993)
## What changes were proposed in this pull request?

PR #14690 broke parquet log output redirection for converted partitioned 
Hive tables. For example, when querying parquet files written by Parquet-mr 
1.6.0 Spark prints a torrent of (harmless) warning messages from the Parquet 
reader:

```
Oct 18, 2016 7:42:18 PM WARNING: org.apache.parquet.CorruptStatistics: 
Ignoring statistics because created_by could not be parsed (see PARQUET-251): 
parquet-mr version 1.6.0
org.apache.parquet.VersionParser$VersionParseException: Could not parse 
created_by: parquet-mr version 1.6.0 using format: (.+) version ((.*) )?\(build 
?(.*)\)
at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
at 
org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)
at 
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263)
at 
org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:583)
at 
org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:513)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:270)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:225)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137)
at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:162)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:372)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
```

This only happens during execution, not planning, 

[GitHub] spark issue #16190: [SPARK-18762][WEBUI][WIP] Web UI should be http:4040 ins...

2016-12-06 Thread sarutak
Github user sarutak commented on the issue:

https://github.com/apache/spark/pull/16190
  
@viirya Thanks! I've fixed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16190: [SPARK-18761][WEBUI][WIP] Web UI should be http:4040 ins...

2016-12-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16190
  
I think the jira number should be SPARK-18762, instead of SPARK-18761.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16190: [SPARK-18761][WEBUI][WIP] Web UI should be http:4040 ins...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16190
  
**[Test build #69781 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69781/consoleFull)**
 for PR 16190 at commit 
[`4518010`](https://github.com/apache/spark/commit/451801007496cd853d0053285a0757e31015a12a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16190: [SPARK-18761][WEBUI] Web UI should be http:4040 i...

2016-12-06 Thread sarutak
GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/16190

[SPARK-18761][WEBUI] Web UI should be http:4040 instead of https:4040

## What changes were proposed in this pull request?

When SSL is enabled, the Spark shell shows:
```
Spark context Web UI available at https://192.168.99.1:4040
```
This is wrong because 4040 is http, not https. It redirects to the https 
port.
More importantly, this introduces several broken links in the UI. For 
example, in the master UI, the worker link is https:8081 instead of http:8081 
or https:8481.

CC: @mengxr @liancheng 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-18761

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16190.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16190


commit 451801007496cd853d0053285a0757e31015a12a
Author: sarutak 
Date:   2016-12-07T07:14:15Z

Reverted the change in SPARK-16988




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16171
  
**[Test build #69780 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69780/consoleFull)**
 for PR 16171 at commit 
[`a0e8433`](https://github.com/apache/spark/commit/a0e8433f03d21a728dd843feef61b264314f44f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16187
  
**[Test build #69779 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69779/consoleFull)**
 for PR 16187 at commit 
[`73d7910`](https://github.com/apache/spark/commit/73d7910fec565bb61f5dcd10d6bfd9cce467193a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15422: [SPARK-17850][Core]Add a flag to ignore corrupt files

2016-12-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15422
  
@zsxwing shouldn't we at least log the exception?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16171
  
**[Test build #69777 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69777/consoleFull)**
 for PR 16171 at commit 
[`ef5954b`](https://github.com/apache/spark/commit/ef5954b19b6bca5fb7b603351ce087085ac23e9b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16171
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16171
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69777/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16043
  
**[Test build #69778 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69778/consoleFull)**
 for PR 16043 at commit 
[`113f7be`](https://github.com/apache/spark/commit/113f7be992988ae3e8b2d11916a1731456f0647c).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class ClassifiedEntries(undetermined : Seq[Expression],
`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69778/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16043
  
**[Test build #69778 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69778/consoleFull)**
 for PR 16043 at commit 
[`113f7be`](https://github.com/apache/spark/commit/113f7be992988ae3e8b2d11916a1731456f0647c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16187
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69772/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16187
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16187
  
**[Test build #69772 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69772/consoleFull)**
 for PR 16187 at commit 
[`566c800`](https://github.com/apache/spark/commit/566c8007dcf74594c23ef2b1fcc394ce64029e9b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16168: [SPARK-18209][SQL] More robust view canonicalization wit...

2016-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/16168
  
@hvanhovell @nsyca @gatorsmile @rxin Thank you for your suggestions! I will 
try to make a better approach ASAP! Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-06 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16173
  
ok, the BroadcastFactory' comment  shows `SparkContext uses a user-specified
BroadcastFactory implementation to instantiate a particular broadcast for 
the
entire Spark job.`  so I think it is designed for external implementations. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91235259
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -448,19 +476,105 @@ class SQLViewSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("Using view after change the origin view") {
+withView("v1", "v2") {
+  sql("CREATE VIEW v1 AS SELECT id FROM jt")
+  sql("CREATE VIEW v2 AS SELECT * FROM v1")
+  withTable("jt2", "jt3") {
+// Don't change the view schema
+val df2 = (1 until 10).map(i => i + i).toDF("id")
--- End diff --

Good point! I'll add the case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91235206
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala ---
@@ -55,16 +55,19 @@ private[sql] class HiveSessionCatalog(
 conf,
 hadoopConf) {
 
-  override def lookupRelation(name: TableIdentifier, alias: 
Option[String]): LogicalPlan = {
+  override def lookupRelation(
+  name: TableIdentifier,
+  alias: Option[String],
+  databaseHint: Option[String] = None): LogicalPlan = {
 val table = formatTableName(name.table)
-val db = formatDatabaseName(name.database.getOrElse(currentDb))
+val db = 
formatDatabaseName(name.database.getOrElse(databaseHint.getOrElse(currentDb)))
 if (db == globalTempViewManager.database) {
   val relationAlias = alias.getOrElse(table)
   globalTempViewManager.get(table).map { viewDef =>
 SubqueryAlias(relationAlias, viewDef, Some(name))
   }.getOrElse(throw new NoSuchTableException(db, table))
 } else if (name.database.isDefined || !tempTables.contains(table)) {
-  val database = name.database.map(formatDatabaseName)
+  val database = Some(db).map(formatDatabaseName)
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91235152
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -126,6 +146,55 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
 }
   }
 
+  /**
+   * Apply Projection on unresolved logical plan to:
+   * 1. Omit the columns which are not referenced by the view;
+   * 2. Reorder the columns to keep the same order with the view;
+   */
+  private def withProjection(plan: LogicalPlan, schema: StructType): 
LogicalPlan = {
+// All fields in schema should exist in plan.schema, or we should 
throw an AnalysisException
+// to notify the underlying schema has been changed.
+if (schema.fields.forall { field =>
+  plan.schema.fields.exists(other => compareStructField(field, 
other))}) {
+  val output = schema.fields.map { field =>
+plan.output.find { expr =>
+  expr.name == field.name && expr.dataType == 
field.dataType}.getOrElse(
+throw new AnalysisException("The underlying schema doesn't 
match the original " +
+  s"schema, expected ${schema.sql} but got ${plan.schema.sql}")
+  )}
+  Project(output, plan)
+} else {
+  throw new AnalysisException("The underlying schema doesn't match the 
original schema, " +
+s"expected ${schema.sql} but got ${plan.schema.sql}")
+}
+  }
+
+  /**
+   * Compare the both [[StructField]] to verify whether they have the same 
name and dataType.
+   */
+  private def compareStructField(field: StructField, other: StructField): 
Boolean = {
+field.name == other.name && field.dataType == other.dataType
+  }
+
+  /**
+   * Aliases the schema of the LogicalPlan to the view attribute names
+   */
+  private def aliasColumns(plan: LogicalPlan, fields: Seq[StructField]): 
LogicalPlan = {
+val output = fields.map(field => (field.name, field.getComment))
+if (plan.output.size != output.size) {
--- End diff --

This should not happen. I just want to ensure we are safe here in case the 
`withProjection` has been modified.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...

2016-12-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16189#discussion_r91235127
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -432,6 +435,57 @@ private[spark] class Executor(
   }
 
   /**
+   * Supervises the killing / cancellation of a task by sending the 
interrupted flag, optionally
+   * sending a Thread.interrupt(), and monitoring the task until it 
finishes.
+   */
+  private class TaskReaper(taskRunner: TaskRunner, interruptThread: 
Boolean) extends Runnable {
+
+private[this] val killPollingFrequencyMs: Long =
+  conf.getTimeAsMs("spark.task.killPollingFrequency", "10s")
+
+private[this] val killTimeoutMs: Long = 
conf.getTimeAsMs("spark.task.killTimeout", "2m")
--- End diff --

My goal here was to let users set this to `-1` to disable killing of the 
executor JVM. I'll add a test to make sure that this flag actually behaves that 
way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...

2016-12-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16189#discussion_r91235062
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -432,6 +435,57 @@ private[spark] class Executor(
   }
 
   /**
+   * Supervises the killing / cancellation of a task by sending the 
interrupted flag, optionally
+   * sending a Thread.interrupt(), and monitoring the task until it 
finishes.
+   */
+  private class TaskReaper(taskRunner: TaskRunner, interruptThread: 
Boolean) extends Runnable {
+
+private[this] val killPollingFrequencyMs: Long =
+  conf.getTimeAsMs("spark.task.killPollingFrequency", "10s")
--- End diff --

On the fence about documenting these publicly, but am willing to do so and 
appreciate naming suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...

2016-12-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16189#discussion_r91235005
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -161,12 +163,7 @@ private[spark] class Executor(
* @param interruptThread whether to interrupt the task thread
*/
   def killAllTasks(interruptThread: Boolean) : Unit = {
-// kill all the running tasks
-for (taskRunner <- runningTasks.values().asScala) {
-  if (taskRunner != null) {
-taskRunner.kill(interruptThread)
-  }
-}
+runningTasks.keys().asScala.foreach(t => killTask(t, interruptThread = 
interruptThread))
--- End diff --

A careful reviewer will notice that it's possible for `killTask` to be 
called twice for the same task, either via multiple calls to `killTask` here or 
via a call to `killTask` followed by a later `killAllTasks` call. I think that 
this should technically be okay as of the code in this first draft of this 
patch since having multiple TaskReapers for the same task should be fine, but I 
can also appreciate how this could cause resource exhaustion issues in the 
pathological case where killTask is spammed continuously. If we think it's 
important to avoid multiple reapers in this case then a simple solution would 
be to add a `synchronized` method on `TaskRunner` which submits a `TaskReaper` 
on the first kill request and is a no-op on subsequent requests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91234885
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -114,9 +117,26 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
   alias.map(a => SubqueryAlias(a, qualifiedTable, 
None)).getOrElse(qualifiedTable)
 } else if (table.tableType == CatalogTableType.VIEW) {
   val viewText = table.viewText.getOrElse(sys.error("Invalid view 
without text."))
+  val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewText).transform {
+case u: UnresolvedRelation if u.tableIdentifier.database.isEmpty =>
+  u.copy(tableIdentifier = 
TableIdentifier(u.tableIdentifier.table, table.currentDatabase))
+  }
+  // Resolve the plan and check whether the analyzed plan is valid.
+  val resolvedPlan = try {
+val resolvedPlan = 
sparkSession.sessionState.analyzer.execute(unresolvedPlan)
+sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan)
+
+resolvedPlan
+  } catch {
+case NonFatal(e) =>
+  throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewText", e)
+  }
+  val planWithProjection = 
table.originalSchema.map(withProjection(resolvedPlan, _))
+.getOrElse(resolvedPlan)
+
   SubqueryAlias(
 alias.getOrElse(table.identifier.table),
-sparkSession.sessionState.sqlParser.parsePlan(viewText),
+aliasColumns(planWithProjection, table.schema.fields),
--- End diff --

It will might be a bit more complex, but I think we certainly can do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91234770
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -114,9 +117,26 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
   alias.map(a => SubqueryAlias(a, qualifiedTable, 
None)).getOrElse(qualifiedTable)
 } else if (table.tableType == CatalogTableType.VIEW) {
   val viewText = table.viewText.getOrElse(sys.error("Invalid view 
without text."))
+  val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewText).transform {
+case u: UnresolvedRelation if u.tableIdentifier.database.isEmpty =>
+  u.copy(tableIdentifier = 
TableIdentifier(u.tableIdentifier.table, table.currentDatabase))
+  }
+  // Resolve the plan and check whether the analyzed plan is valid.
+  val resolvedPlan = try {
+val resolvedPlan = 
sparkSession.sessionState.analyzer.execute(unresolvedPlan)
+sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan)
+
+resolvedPlan
+  } catch {
+case NonFatal(e) =>
+  throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewText", e)
+  }
+  val planWithProjection = 
table.originalSchema.map(withProjection(resolvedPlan, _))
--- End diff --

I think so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...

2016-12-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16189#discussion_r91234781
  
--- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala 
---
@@ -209,6 +209,41 @@ class JobCancellationSuite extends SparkFunSuite with 
Matchers with BeforeAndAft
 assert(jobB.get() === 100)
   }
 
+  test("task reaper kills JVM if killed tasks keep running for too long") {
+val conf = new SparkConf().set("spark.task.killTimeout", "5s")
+sc = new SparkContext("local-cluster[2,1,1024]", "test", conf)
+
+// Add a listener to release the semaphore once any tasks are launched.
+val sem = new Semaphore(0)
+sc.addSparkListener(new SparkListener {
+  override def onTaskStart(taskStart: SparkListenerTaskStart) {
+sem.release()
+  }
+})
+
+// jobA is the one to be cancelled.
+val jobA = Future {
+  sc.setJobGroup("jobA", "this is a job to be cancelled", 
interruptOnCancel = true)
+  sc.parallelize(1 to 1, 2).map { i =>
+while (true) { }
+  }.count()
+}
+
+// Block until both tasks of job A have started and cancel job A.
+sem.acquire(2)
+// Small delay to ensure tasks actually start executing the task body
+Thread.sleep(1000)
--- End diff --

This is slightly ugly but it's needed to avoid a race where this regression 
test can spuriously pass (and thereby fail to test anything) in case we cancel 
a task right after it has launched on the executor but before the UDF in the 
task has actually run.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16121: [SPARK-16589][PYTHON] Chained cartesian produces ...

2016-12-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16121#discussion_r91234665
  
--- Diff: python/pyspark/serializers.py ---
@@ -278,50 +278,51 @@ def __repr__(self):
 return "AutoBatchedSerializer(%s)" % self.serializer
 
 
-class CartesianDeserializer(FramedSerializer):
+class CartesianDeserializer(Serializer):
 
 """
 Deserializes the JavaRDD cartesian() of two PythonRDDs.
 """
 
 def __init__(self, key_ser, val_ser):
-FramedSerializer.__init__(self)
 self.key_ser = key_ser
 self.val_ser = val_ser
 
-def prepare_keys_values(self, stream):
-key_stream = self.key_ser._load_stream_without_unbatching(stream)
-val_stream = self.val_ser._load_stream_without_unbatching(stream)
-key_is_batched = isinstance(self.key_ser, BatchedSerializer)
-val_is_batched = isinstance(self.val_ser, BatchedSerializer)
-for (keys, vals) in zip(key_stream, val_stream):
-keys = keys if key_is_batched else [keys]
-vals = vals if val_is_batched else [vals]
-yield (keys, vals)
+def _load_stream_without_unbatching(self, stream):
+key_batch_stream = 
self.key_ser._load_stream_without_unbatching(stream)
+val_batch_stream = 
self.val_ser._load_stream_without_unbatching(stream)
+for (key_batch, val_batch) in zip(key_batch_stream, 
val_batch_stream):
+yield product(key_batch, val_batch)
--- End diff --

Maybe consider adding a comment here explaining why the interaction of 
batching & product


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16121: [SPARK-16589][PYTHON] Chained cartesian produces ...

2016-12-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16121#discussion_r91232276
  
--- Diff: python/pyspark/serializers.py ---
@@ -96,7 +96,7 @@ def load_stream(self, stream):
 raise NotImplementedError
 
 def _load_stream_without_unbatching(self, stream):
--- End diff --

Even though this is internal it might make sense to have a docstring for 
this since were changing its behaviour.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16121: [SPARK-16589][PYTHON] Chained cartesian produces ...

2016-12-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16121#discussion_r91234619
  
--- Diff: python/pyspark/serializers.py ---
@@ -278,50 +278,51 @@ def __repr__(self):
 return "AutoBatchedSerializer(%s)" % self.serializer
 
 
-class CartesianDeserializer(FramedSerializer):
+class CartesianDeserializer(Serializer):
 
 """
 Deserializes the JavaRDD cartesian() of two PythonRDDs.
--- End diff --

Maybe we should document this a bit given that we had problems with the 
implementation. (e.g. expand on the "Due to batching, we can't use the Java 
cartesian method." comment from `rdd.py` to explain how this is intended to 
function).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...

2016-12-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16189#discussion_r91234733
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -432,6 +435,57 @@ private[spark] class Executor(
   }
 
   /**
+   * Supervises the killing / cancellation of a task by sending the 
interrupted flag, optionally
+   * sending a Thread.interrupt(), and monitoring the task until it 
finishes.
+   */
+  private class TaskReaper(taskRunner: TaskRunner, interruptThread: 
Boolean) extends Runnable {
+
+private[this] val killPollingFrequencyMs: Long =
+  conf.getTimeAsMs("spark.task.killPollingFrequency", "10s")
+
+private[this] val killTimeoutMs: Long = 
conf.getTimeAsMs("spark.task.killTimeout", "2m")
+
+private[this] val takeThreadDump: Boolean =
+  conf.getBoolean("spark.task.threadDumpKilledTasks", true)
+
+override def run(): Unit = {
+  val startTimeMs = System.currentTimeMillis()
+  def elapsedTimeMs = System.currentTimeMillis() - startTimeMs
+
+  while (!taskRunner.isFinished && elapsedTimeMs < killTimeoutMs) {
+taskRunner.kill(interruptThread = interruptThread)
+taskRunner.synchronized {
+  Thread.sleep(killPollingFrequencyMs)
+}
+if (!taskRunner.isFinished) {
+  logWarning(s"Killed task ${taskRunner.taskId} is still running 
after $elapsedTimeMs ms")
+  if (takeThreadDump) {
+try {
+  val threads = Utils.getThreadDump()
+  threads.find(_.threadName == taskRunner.threadName).foreach 
{ thread =>
+logWarning(s"Thread dump from task 
${taskRunner.taskId}:\n${thread.stackTrace}")
+  }
+} catch {
+  case NonFatal(e) =>
+logWarning("Exception thrown while obtaining thread dump: 
", e)
+}
+  }
+}
+  }
+  if (!taskRunner.isFinished && killTimeoutMs > 0 && elapsedTimeMs > 
killTimeoutMs) {
+if (isLocal) {
+  logError(s"Killed task ${taskRunner.taskId} could not be stopped 
within " +
--- End diff --

Even if we did throw an exception here, it wouldn't exit the JVM in local 
mode because we don't set an uncaught exception handler in local mode (see code 
further up in this file).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...

2016-12-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16189#discussion_r91234634
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -229,9 +230,11 @@ private[spark] class Executor(
   // ClosedByInterruptException during execBackend.statusUpdate which 
causes
   // Executor to crash
   Thread.interrupted()
+  notifyAll()
 }
 
 override def run(): Unit = {
+  Thread.currentThread().setName(threadName)
--- End diff --

Task ids should be unique so therefore this thread name should be unique. 
Hence, I don't think it's super important to reset the thread's name when 
returning it to this task thread pool because the thread will just be renamed 
as soon as it's recycled for a new task and if the task has already exited then 
it'll be clear from the thread state / context that this is just a completed 
task's thread that's been returned to the pool.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16171
  
**[Test build #69777 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69777/consoleFull)**
 for PR 16171 at commit 
[`ef5954b`](https://github.com/apache/spark/commit/ef5954b19b6bca5fb7b603351ce087085ac23e9b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...

2016-12-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16189#discussion_r91234685
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -192,13 +189,17 @@ private[spark] class Executor(
   serializedTask: ByteBuffer)
 extends Runnable {
 
+val threadName = s"Executor task launch worker for task $taskId"
--- End diff --

This naming scheme was intentionally chosen to match the pattern that we 
use for sorting threads in the executor thread dump page. I'll manually verify 
that this worked as expected there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" to over...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16189
  
**[Test build #69776 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69776/consoleFull)**
 for PR 16189 at commit 
[`a46f9c2`](https://github.com/apache/spark/commit/a46f9c2436d533ff838674cb63e397d1007e34de).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91234443
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -207,31 +205,56 @@ case class CreateViewCommand(
   }
 
   /**
-   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This comment canonicalize
-   * SQL based on the analyzed plan, and also creates the proper schema 
for the view.
+   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This stores the following
+   * properties for a view:
+   * 1. The `viewText` which is used to generate a logical plan when we 
resolve a view;
+   * 2. The `currentDatabase` which sets the current database on Analyze 
stage;
+   * 3. The `schema` which ensure we generate the correct output.
*/
   private def prepareTable(sparkSession: SparkSession, aliasedPlan: 
LogicalPlan): CatalogTable = {
-val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL
+val currentDatabase = 
sparkSession.sessionState.catalog.getCurrentDatabase
 
-// Validate the view SQL - make sure we can parse it and analyze it.
-// If we cannot analyze the generated query, there is probably a bug 
in SQL generation.
-try {
-  sparkSession.sql(viewSQL).queryExecution.assertAnalyzed()
-} catch {
-  case NonFatal(e) =>
-throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
-}
+if (originalText.isDefined) {
+  val viewSQL = originalText.get
+
+  // Validate the view SQL - make sure we can resolve it with 
currentDatabase.
+  val originalSchema = try {
+val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewSQL)
+val resolvedPlan = 
sparkSession.sessionState.analyzer.execute(unresolvedPlan)
+sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan)
+
+resolvedPlan.schema
+  } catch {
+case NonFatal(e) =>
+  throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
+  }
 
-CatalogTable(
-  identifier = name,
-  tableType = CatalogTableType.VIEW,
-  storage = CatalogStorageFormat.empty,
-  schema = aliasedPlan.schema,
-  properties = properties,
-  viewOriginalText = originalText,
-  viewText = Some(viewSQL),
-  comment = comment
-)
+  CatalogTable(
+identifier = name,
+tableType = CatalogTableType.VIEW,
+storage = CatalogStorageFormat.empty,
+schema = aliasedPlan.schema,
+originalSchema = Some(originalSchema),
+properties = properties,
+viewOriginalText = originalText,
+viewText = Some(viewSQL),
+currentDatabase = Some(currentDatabase),
+comment = comment
+  )
+} else {
--- End diff --

I should add comments over this code. The originalText is non-empty only if 
the command is generated from SQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16171
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69775/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16171
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...

2016-12-06 Thread JoshRosen
GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/16189

[SPARK-18761][CORE][WIP] Introduce "task reaper" to oversee task killing in 
executors

## What changes were proposed in this pull request?

Spark's current task cancellation / task killing mechanism is "best effort" 
because some tasks may not be interruptible or may not respond to their 
"killed" flags being set. If a significant fraction of a cluster's task slots 
are occupied by tasks that have been marked as killed but remain running then 
this can lead to a situation where new jobs and tasks are starved of resources 
that are being used by these zombie tasks.

This patch aims to address this problem by adding a "task reaper" mechanism 
to executors. At a high-level, task killing now launches a new thread which 
attempts to kill the task and then watches the task and periodically checks 
whether it has been killed. The TaskReaper will periodically re-attempt to call 
`TaskRunner.kill()` and will log warnings if the task keeps running. I modified 
TaskRunner to rename its thread at the start of the task, allowing TaskReaper 
to take a thread dump and filter it in order to log stacktraces from tasks that 
we are waiting to finish. After a configurable timeout, if the task has not 
been killed then the TaskReaper will throw an exception to trigger executor JVM 
death, thereby forcibly freeing any resources consumed by the zombie tasks.

There are some aspects of the design that I'd like to think about a bit 
more, but I've opened this as `[WIP]` now in order to solicit early feedback. 
I'll comment on some of my thoughts directly on the diff.

## How was this patch tested?

Tested via a new test case in `JobCancellationSuite`, plus manual testing. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark cancellation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16189


commit 2c28594b980845bda1d4db7ae866a91caaad4fff
Author: Josh Rosen 
Date:   2016-12-07T06:17:38Z

Add failing regression test.

commit a46f9c2436d533ff838674cb63e397d1007e34de
Author: Josh Rosen 
Date:   2016-12-07T06:18:43Z

Add TaskReaper to executor.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16171
  
**[Test build #69775 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69775/consoleFull)**
 for PR 16171 at commit 
[`a0dc2c8`](https://github.com/apache/spark/commit/a0dc2c8342df4040b1cd9c5c1827271bbe22278f).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class JavaClassificationModel(JavaPredictionModel, 
HasRawPredictionCol):`
  * `class JavaProbabilisticClassificationModel(JavaClassificationModel, 
HasProbabilityCol):`
  * `class OneVsRestModel(JavaModel, OneVsRestParams, HasFeaturesCol, 
HasPredictionCol,`
  * `class AFTSurvivalRegressionModel(JavaModel, HasFeaturesCol, 
HasPredictionCol,`
  * `class JavaPredictionModel(HasFeaturesCol, HasPredictionCol):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91234292
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -207,31 +205,56 @@ case class CreateViewCommand(
   }
 
   /**
-   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This comment canonicalize
-   * SQL based on the analyzed plan, and also creates the proper schema 
for the view.
+   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This stores the following
+   * properties for a view:
+   * 1. The `viewText` which is used to generate a logical plan when we 
resolve a view;
+   * 2. The `currentDatabase` which sets the current database on Analyze 
stage;
+   * 3. The `schema` which ensure we generate the correct output.
*/
   private def prepareTable(sparkSession: SparkSession, aliasedPlan: 
LogicalPlan): CatalogTable = {
-val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL
+val currentDatabase = 
sparkSession.sessionState.catalog.getCurrentDatabase
 
-// Validate the view SQL - make sure we can parse it and analyze it.
-// If we cannot analyze the generated query, there is probably a bug 
in SQL generation.
-try {
-  sparkSession.sql(viewSQL).queryExecution.assertAnalyzed()
-} catch {
-  case NonFatal(e) =>
-throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
-}
+if (originalText.isDefined) {
+  val viewSQL = originalText.get
+
+  // Validate the view SQL - make sure we can resolve it with 
currentDatabase.
+  val originalSchema = try {
+val unresolvedPlan = 
sparkSession.sessionState.sqlParser.parsePlan(viewSQL)
+val resolvedPlan = 
sparkSession.sessionState.analyzer.execute(unresolvedPlan)
+sparkSession.sessionState.analyzer.checkAnalysis(resolvedPlan)
+
+resolvedPlan.schema
+  } catch {
+case NonFatal(e) =>
+  throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
--- End diff --

Yeah I agree we should throw a AnalysisException here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK][WIP] Classification and regre...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16171
  
**[Test build #69775 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69775/consoleFull)**
 for PR 16171 at commit 
[`a0dc2c8`](https://github.com/apache/spark/commit/a0dc2c8342df4040b1cd9c5c1827271bbe22278f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16168: [SPARK-18209][SQL] More robust view canonicalizat...

2016-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16168#discussion_r91234171
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -207,31 +205,56 @@ case class CreateViewCommand(
   }
 
   /**
-   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This comment canonicalize
-   * SQL based on the analyzed plan, and also creates the proper schema 
for the view.
+   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This stores the following
+   * properties for a view:
+   * 1. The `viewText` which is used to generate a logical plan when we 
resolve a view;
+   * 2. The `currentDatabase` which sets the current database on Analyze 
stage;
+   * 3. The `schema` which ensure we generate the correct output.
*/
   private def prepareTable(sparkSession: SparkSession, aliasedPlan: 
LogicalPlan): CatalogTable = {
-val viewSQL: String = new SQLBuilder(aliasedPlan).toSQL
+val currentDatabase = 
sparkSession.sessionState.catalog.getCurrentDatabase
 
-// Validate the view SQL - make sure we can parse it and analyze it.
-// If we cannot analyze the generated query, there is probably a bug 
in SQL generation.
-try {
-  sparkSession.sql(viewSQL).queryExecution.assertAnalyzed()
-} catch {
-  case NonFatal(e) =>
-throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
-}
+if (originalText.isDefined) {
--- End diff --

I should add comments over this code. The `originalText` is non-empty only 
if the command is generated from SQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to wrong i...

2016-12-06 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/16131
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to wrong i...

2016-12-06 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16131
  
@srowen Is this ready to be merged? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16150: [SPARK-18349][SparkR]:Update R API documentation ...

2016-12-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16150#discussion_r91232450
  
--- Diff: R/pkg/R/mllib.R ---
@@ -1389,7 +1399,9 @@ setMethod("spark.gaussianMixture", signature(data = 
"SparkDataFrame", formula =
 #  Get the summary of a multivariate gaussian mixture model
 
 #' @param object a fitted gaussian mixture model.
-#' @return \code{summary} returns the model's lambda, mu, sigma, k, dim 
and posterior.
+#' @return \code{summary} returns summary of the fitted model, which is a 
list.
+#' The list includes the model's \code{lambda} (lambda), \code{mu} 
(mu),
+#' \code{sigma} (sigma), and \code{posterior} (posterior).
--- End diff --

Same reason as the above one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16186: [SPARK-18758][SS] StreamingQueryListener events from a S...

2016-12-06 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/16186
  
@marmbrus @zsxwing @brkyvz Please review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16150: [SPARK-18349][SparkR]:Update R API documentation ...

2016-12-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16150#discussion_r91232063
  
--- Diff: R/pkg/R/mllib.R ---
@@ -1852,9 +1867,9 @@ summary.treeEnsemble <- function(model) {
 
 #  Get the summary of a Random Forest Regression Model
 
-#' @return \code{summary} returns a summary object of the fitted model, a 
list of components
-#' including formula, number of features, list of features, 
feature importances, number of
-#' trees, and tree weights
+#' @return \code{summary} returns summary information of the fitted model, 
which is a list.
+#' The list of components includes \code{ans} (formula, number of 
features, list of features,
+#' feature importances, number of trees, and tree weights).
--- End diff --

the two places returns `summary.treeEnsemble(object)`. What shall I put in 
the `\code{}`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...

2016-12-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16187
  
Actually if possible please merge this in branch-2.1.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16182
  
**[Test build #3471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3471/consoleFull)**
 for PR 16182 at commit 
[`184a6d1`](https://github.com/apache/spark/commit/184a6d182b84ad297c7bbff65362a703dbbad2b1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16182
  
**[Test build #69774 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69774/consoleFull)**
 for PR 16182 at commit 
[`184a6d1`](https://github.com/apache/spark/commit/184a6d182b84ad297c7bbff65362a703dbbad2b1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...

2016-12-06 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/16182
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16150: [SPARK-18349][SparkR]:Update R API documentation ...

2016-12-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16150#discussion_r91231575
  
--- Diff: R/pkg/R/mllib.R ---
@@ -661,7 +665,10 @@ setMethod("fitted", signature(object = "KMeansModel"),
 #  Get the summary of a k-means model
 
 #' @param object a fitted k-means model.
-#' @return \code{summary} returns the model's features, coefficients, k, 
size and cluster.
+#' @return \code{summary} returns summary information of the fitted model, 
which is a list.
+#' The list includes the model's \code{coefficients} (model 
cluster centers),
--- End diff --

For the return list, I didn't see features and k.  Does R function not only 
return the last line? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16165: [SPARK-8617] [WEBUI] HistoryServer: Include in-progress ...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16165
  
**[Test build #69773 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69773/consoleFull)**
 for PR 16165 at commit 
[`51401b9`](https://github.com/apache/spark/commit/51401b90cc91dcca376b66d115532c00561f7e1e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16150: [SPARK-18349][SparkR]:Update R API documentation ...

2016-12-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16150#discussion_r91231272
  
--- Diff: R/pkg/R/mllib.R ---
@@ -1389,7 +1399,9 @@ setMethod("spark.gaussianMixture", signature(data = 
"SparkDataFrame", formula =
 #  Get the summary of a multivariate gaussian mixture model
 
 #' @param object a fitted gaussian mixture model.
-#' @return \code{summary} returns the model's lambda, mu, sigma, k, dim 
and posterior.
+#' @return \code{summary} returns summary of the fitted model, which is a 
list.
+#' The list includes the model's \code{lambda} (lambda), \code{mu} 
(mu),
+#' \code{sigma} (sigma), and \code{posterior} (posterior).
--- End diff --

missing k, dim?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16186: [SPARK-18758][SS] StreamingQueryListener events from a S...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16186
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16186: [SPARK-18758][SS] StreamingQueryListener events from a S...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16186
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69769/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16186: [SPARK-18758][SS] StreamingQueryListener events from a S...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16186
  
**[Test build #69769 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69769/consoleFull)**
 for PR 16186 at commit 
[`9585ae4`](https://github.com/apache/spark/commit/9585ae41916e577cdb1a5d822cf69efa4af3e7d8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16150: [SPARK-18349][SparkR]:Update R API documentation ...

2016-12-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16150#discussion_r91231151
  
--- Diff: R/pkg/R/mllib.R ---
@@ -661,7 +665,10 @@ setMethod("fitted", signature(object = "KMeansModel"),
 #  Get the summary of a k-means model
 
 #' @param object a fitted k-means model.
-#' @return \code{summary} returns the model's features, coefficients, k, 
size and cluster.
+#' @return \code{summary} returns summary information of the fitted model, 
which is a list.
+#' The list includes the model's \code{coefficients} (model 
cluster centers),
--- End diff --

are we missing features and k?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16188: Branch 1.6 decision tree

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16188
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16182
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69771/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16188: Branch 1.6 decision tree

2016-12-06 Thread zhuangxue
GitHub user zhuangxue opened a pull request:

https://github.com/apache/spark/pull/16188

Branch 1.6 decision tree

What algorithm is used in spark decision tree (is ID3, C4.5 or CART)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16188.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16188


commit 4c28b4c8f342fde937ff77ab30f898dfe3186c03
Author: Gabriele Nizzoli 
Date:   2016-02-02T18:57:18Z

[SPARK-13121][STREAMING] java mapWithState mishandles scala Option

java mapwithstate with Function3 has wrong conversion of java `Optional` to 
scala `Option`, fixed code uses same conversion used in the mapwithstate call 
that uses Function4 as an input. `Optional.fromNullable(v.get)` fails if v is 
`None`, better to use `JavaUtils.optionToOptional(v)` instead.

Author: Gabriele Nizzoli 

Closes #11007 from gabrielenizzoli/branch-1.6.

commit 9c0cf22f7681ae05d894ae05f6a91a9467787519
Author: Grzegorz Chilkiewicz 
Date:   2016-02-02T19:16:24Z

[SPARK-12711][ML] ML StopWordsRemover does not protect itself from column 
name duplication

Fixes problem and verifies fix by test suite.
Also - adds optional parameter: nullable (Boolean) to: 
SchemaUtils.appendColumn
and deduplicates SchemaUtils.appendColumn functions.

Author: Grzegorz Chilkiewicz 

Closes #10741 from grzegorz-chilkiewicz/master.

(cherry picked from commit b1835d727234fdff42aa8cadd17ddcf43b0bed15)
Signed-off-by: Joseph K. Bradley 

commit 3c92333ee78f249dae37070d3b6558b9c92ec7f4
Author: Daoyuan Wang 
Date:   2016-02-02T19:09:40Z

[SPARK-13056][SQL] map column would throw NPE if value is null

Jira:
https://issues.apache.org/jira/browse/SPARK-13056

Create a map like
{ "a": "somestring", "b": null}
Query like
SELECT col["b"] FROM t1;
NPE would be thrown.

Author: Daoyuan Wang 

Closes #10964 from adrian-wang/npewriter.

(cherry picked from commit 358300c795025735c3b2f96c5447b1b227d4abc1)
Signed-off-by: Michael Armbrust 

Conflicts:
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

commit e81333be05cc5e2a41e5eb1a630c5af59a47dd23
Author: Kevin (Sangwoo) Kim 
Date:   2016-02-02T21:24:09Z

[DOCS] Update StructType.scala

The example will throw error like
:20: error: not found: value StructType

Need to add this line:
import org.apache.spark.sql.types._

Author: Kevin (Sangwoo) Kim 

Closes #10141 from swkimme/patch-1.

(cherry picked from commit b377b03531d21b1d02a8f58b3791348962e1f31b)
Signed-off-by: Michael Armbrust 

commit 2f8abb4afc08aa8dc4ed763bcb93ff6b1d6f0d78
Author: Adam Budde 
Date:   2016-02-03T03:35:33Z

[SPARK-13122] Fix race condition in MemoryStore.unrollSafely()

https://issues.apache.org/jira/browse/SPARK-13122

A race condition can occur in MemoryStore's unrollSafely() method if two 
threads that
return the same value for currentTaskAttemptId() execute this method 
concurrently. This
change makes the operation of reading the initial amount of unroll memory 
used, performing
the unroll, and updating the associated memory maps atomic in order to 
avoid this race
condition.

Initial proposed fix wraps all of unrollSafely() in a 
memoryManager.synchronized { } block. A cleaner approach might be introduce a 
mechanism that synchronizes based on task attempt ID. An alternative option 
might be to track unroll/pending unroll memory based on block ID rather than 
task attempt ID.

Author: Adam Budde 

Closes #11012 from budde/master.

(cherry picked from commit ff71261b651a7b289ea2312abd6075da8b838ed9)
Signed-off-by: Andrew Or 

Conflicts:
core/src/main/scala/org/apache/spark/storage/MemoryStore.scala

commit 5fe8796c2fa859e30cf5ba293bee8957e23163bc
Author: Mario Briggs 
Date:   2016-02-03T17:50:28Z

[SPARK-12739][STREAMING] Details of batch in Streaming tab uses two 
Duration columns

I have clearly prefix the two 'Duration' columns in 'Details of Batch' 
Streaming tab as 'Output Op Duration' and 'Job Duration'

Author: Mario Briggs 
Author: mariobriggs 

Closes #11022 from mariobriggs/spark-12739.

(cherry picked from commit e9eb248edfa81d75f99c9afc2063e6b3d9ee7392)
Signed-off-by: Shixiong Zhu 

commit cdfb2a1410aa799596c8b751187dbac28b2cc678
Author: Wenchen Fan 
Date:   2016-02-04T00:13:23Z

[SPARK-13101][SQL][BRANCH-1.6] nullability of array type element should not 
fail analysis of encoder

nullability should only be considered as an optimization rather than part 
of th

[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16182
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16182: [SPARK-18754][SS] Rename recentProgresses to recentProgr...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16182
  
**[Test build #69771 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69771/consoleFull)**
 for PR 16182 at commit 
[`184a6d1`](https://github.com/apache/spark/commit/184a6d182b84ad297c7bbff65362a703dbbad2b1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16183: [SPARK-18671][SS][test-maven] Follow up PR to fix...

2016-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16183


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16183: [SPARK-18671][SS][test-maven] Follow up PR to fix test f...

2016-12-06 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/16183
  
Merging this to master and 2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16187
  
**[Test build #69772 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69772/consoleFull)**
 for PR 16187 at commit 
[`566c800`](https://github.com/apache/spark/commit/566c8007dcf74594c23ef2b1fcc394ce64029e9b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...

2016-12-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16187
  
cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16187: [SPARK-18760][SQL] Consistent format specificatio...

2016-12-06 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/16187

[SPARK-18760][SQL] Consistent format specification for FileFormats

## What changes were proposed in this pull request?
We currently rely on FileFormat implementations to override toString in 
order to get a proper explain output. It'd be better to just depend on 
shortName for those.

Before:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: 
org.apache.spark.sql.execution.datasources.text.TextFileFormat@xyz, Location: 
InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct
```

After:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: text, Location: 
InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct
```

Also closes #14680.

## How was this patch tested?
Verified in spark-shell.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-18760

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16187


commit 566c8007dcf74594c23ef2b1fcc394ce64029e9b
Author: Reynold Xin 
Date:   2016-12-07T05:22:40Z

[SPARK-18760][SQL] Consistent format specification for FileFormats




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-12-06 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/9
  
ping?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2016-12-06 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15628
  
ping @dbtsai :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14537: [SPARK-16948][SQL] Use metastore schema instead o...

2016-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14537


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #6848: [SPARK-8398][CORE] Hadoop input/output format adva...

2016-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/6848


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16181: Branch 2.1

2016-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16181


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #9543: [SPARK-11482][SQL] Make maven repo for Hive metast...

2016-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9543


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #8318: [SPARK-1267][PYSPARK] Adds pip installer for pyspa...

2016-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8318


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #7265: [SPARK-7263] Add new shuffle manager which stores ...

2016-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7265


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16184: [SPARK-18753][SQL] Keep pushed-down null literal as a fi...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16184
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69767/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16184: [SPARK-18753][SQL] Keep pushed-down null literal as a fi...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16184
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16184: [SPARK-18753][SQL] Keep pushed-down null literal as a fi...

2016-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16184
  
**[Test build #69767 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69767/consoleFull)**
 for PR 16184 at commit 
[`c6fe345`](https://github.com/apache/spark/commit/c6fe34511fc1ea5c36713d435dc64673deceae7f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9543: [SPARK-11482][SQL] Make maven repo for Hive metastore jar...

2016-12-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/9543
  
I'm going to close this one for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8785: [Spark-10625] [SQL] Spark SQL JDBC read/write is unable t...

2016-12-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/8785
  
@tribbloid is this a problem that needs to be fixed?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #7265: [SPARK-7263] Add new shuffle manager which stores shuffle...

2016-12-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/7265
  
I'm gong to close this for now. Next year we might actually come back and 
revisit this - probably not with the current parquet implementation since it is 
not very efficient, but some sort of columnar format.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #6848: [SPARK-8398][CORE] Hadoop input/output format advanced co...

2016-12-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/6848
  
I'm going to close this for now.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16135: [SPARK-18700][SQL] Add ReadWriteLock for each table's re...

2016-12-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16135
  
cc @ericl  can you take a look at this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-06 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/16129
  
This LGTM. Now that I'm looking at it, the test suite **never actually 
tests for correctness**, just basic input/output sizes. We really should have 
better tests, but it's ok with me if it's done in a separate JIRA. 

Also, I'd be in favor of changing the title since, while it does affect 
RandomForest/ML, it's really an error in the SamplingUtils, and this method is 
used in at least one other place (RangePartitioner).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-06 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/16122
  
I haven't been able to get a proper unit test environment running where the 
embedded metastore conf is different from the client conf. I did validate that 
Spark without this patch failed to execute a query on a table with an integer 
type partition column filtering on that column where the metastore has direct 
sql access disabled, whereas Spark with this patch works and behaves as 
expected.

@wangyum I don't believe your test uses Hive in a way that's compatible 
with Spark. Can you please remove it?

@ericl Any ideas on how to unit test the case where the client and 
metastore have different configurations?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...

2016-12-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14537
  
The schema inferring is replaced with metastore schema completely in 
#14690. I think we can close this now? cc @cloud-fan @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16183: [SPARK-18671][SS][test-maven] Follow up PR to fix test f...

2016-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16183
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69764/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >