[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore non-existing driver accumulator...

2017-02-22 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/17009
  
Thanks @cloud-fan . `driver accumulators don't belong to this execution` is 
more appropriate. I'll update the words.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17009: [SPARK-19674][SQL]Ignore non-existing driver accu...

2017-02-22 Thread carsonwang
Github user carsonwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17009#discussion_r102656311
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala
 ---
@@ -147,6 +147,10 @@ class SQLListenerSuite extends SparkFunSuite with 
SharedSQLContext with JsonTest
 
 checkAnswer(listener.getExecutionMetrics(0), 
accumulatorUpdates.mapValues(_ * 2))
 
+// Non-existing driver accumulator updates should be filtered and no 
exception will be thrown.
--- End diff --

I was doing some own experiments that adds physical operators at runtime. 
The metrics of the added operators are not registered to the execution so I got 
a NoSuchElementException. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17036: [SPARK-19706][pyspark] add Column.contains in pyspark

2017-02-22 Thread davies
Github user davies commented on the issue:

https://github.com/apache/spark/pull/17036
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102656121
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -193,8 +193,9 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 
 *  ``PERMISSIVE`` : sets other fields to ``null`` when it 
meets a corrupted \
   record and puts the malformed string into a new field 
configured by \
- ``columnNameOfCorruptRecord``. When a schema is set by 
user, it sets \
- ``null`` for extra fields.
+ ``columnNameOfCorruptRecord``. An user-defined schema can 
include \
+ a string type field named ``columnNameOfCorruptRecord`` 
for corrupt records. \
+ When a schema is set by user, it sets ``null`` for extra 
fields.
--- End diff --

what about the other 2 modes? do they also set null for extra fields?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17036: [SPARK-19706][pyspark] add Column.contains in pyspark

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17036
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73332/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17036: [SPARK-19706][pyspark] add Column.contains in pyspark

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17036
  
**[Test build #73332 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73332/testReport)**
 for PR 17036 at commit 
[`cad6379`](https://github.com/apache/spark/commit/cad63790171ea82e9f032895c80858d1a349d1a8).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17036: [SPARK-19706][pyspark] add Column.contains in pyspark

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17036
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17036: [SPARK-19706][pyspark] add Column.contains in pyspark

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17036
  
**[Test build #73332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73332/testReport)**
 for PR 17036 at commit 
[`cad6379`](https://github.com/apache/spark/commit/cad63790171ea82e9f032895c80858d1a349d1a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17034
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73330/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17034
  
**[Test build #73330 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73330/testReport)**
 for PR 17034 at commit 
[`fd59ca9`](https://github.com/apache/spark/commit/fd59ca90422300f0d2e54bc809555c54edb43af8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17034
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17036: [SPARK-19706][pyspark] add Column.contains in pys...

2017-02-22 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/17036

[SPARK-19706][pyspark] add Column.contains in pyspark

## What changes were proposed in this pull request?

to be consistent with the scala API, we should also add `contains` to 
`Column` in pyspark.

## How was this patch tested?

updated unit test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark pyspark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17036.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17036


commit cad63790171ea82e9f032895c80858d1a349d1a8
Author: Wenchen Fan 
Date:   2017-02-23T07:39:05Z

add Column.contains in pyspark




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17036: [SPARK-19706][pyspark] add Column.contains in pyspark

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17036
  
cc @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17035: [SPARK-19705][SQL] Preferred location supporting HDFS ca...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17035
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73325/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73321/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102654066
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -202,21 +221,25 @@ private[csv] class UnivocityParser(
   }
   numMalformedRecords += 1
   None
-} else if (options.failFast && schema.length != tokens.length) {
+} else if (options.failFast && inputSchema.length != tokens.length) {
   throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
 s"${tokens.mkString(options.delimiter.toString)}")
 } else {
-  val checkedTokens = if (options.permissive && schema.length > 
tokens.length) {
-tokens ++ new Array[String](schema.length - tokens.length)
-  } else if (options.permissive && schema.length < tokens.length) {
-tokens.take(schema.length)
+  val checkedTokens = if (options.permissive && inputSchema.length > 
tokens.length) {
+tokens ++ new Array[String](inputSchema.length - tokens.length)
+  } else if (options.permissive && inputSchema.length < tokens.length) 
{
+tokens.take(inputSchema.length)
   } else {
 tokens
   }
 
   try {
 Some(convert(checkedTokens))
   } catch {
+case NonFatal(e) if options.permissive =>
+  val row = new GenericInternalRow(requiredSchema.length)
+  corruptFieldIndex.map(row(_) = UTF8String.fromString(input))
--- End diff --

Aha, fixed. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17035: [SPARK-19705][SQL] Preferred location supporting ...

2017-02-22 Thread tanejagagan
GitHub user tanejagagan opened a pull request:

https://github.com/apache/spark/pull/17035

[SPARK-19705][SQL] Preferred location supporting HDFS cache for FileS…

…canRDD

Added support of HDFS cache using TaskLocation.inMemoryLocationTag
NewHadoopRDD and HadoopRDD both support HDFS cache using 
TaskLocation.inMemoryLocationTag
where "hdfs_cache_" is added to hostname which is then interpretted by 
scheduler
With this enhacement same tag ("hdfs_cache_") will be added to hostname if 
FilePartition only contains single file and the file is cached on one or more 
host
Current implementation would not cased where FilePartition would have 
multiple files as preferredLocation calculation is more complex.

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tanejagagan/spark branch-19705

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17035.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17035


commit a9288e56f30f7d9f051e06502171d7f2639913a7
Author: gagan taneja 
Date:   2017-02-23T07:26:36Z

[SPARK-19705][SQL] Preferred location supporting HDFS cache for FileScanRDD

Added support of HDFS cache using TaskLocation.inMemoryLocationTag
NewHadoopRDD and HadoopRDD both support HDFS cache using 
TaskLocation.inMemoryLocationTag
where "hdfs_cache_" is added to hostname which is then interpretted by 
scheduler
With this enhacement same tag ("hdfs_cache_") will be added to hostname if 
FilePartition only contains single file and the file is cached on one or more 
host
Current implementation would not cased where FilePartition would have 
multiple files as preferredLocation calculation is more complex.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #73321 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73321/testReport)**
 for PR 15821 at commit 
[`9c8ea63`](https://github.com/apache/spark/commit/9c8ea63ccec4ceb9dd40834bca530a6265385579).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73325 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73325/testReport)**
 for PR 17001 at commit 
[`d327994`](https://github.com/apache/spark/commit/d327994593395a69465717a2d401672f025cac36).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102653357
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -202,21 +212,41 @@ private[csv] class UnivocityParser(
   }
   numMalformedRecords += 1
   None
-} else if (options.failFast && schema.length != tokens.length) {
+} else if (options.failFast && dataSchema.length != tokens.length) {
   throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
 s"${tokens.mkString(options.delimiter.toString)}")
 } else {
-  val checkedTokens = if (options.permissive && schema.length > 
tokens.length) {
-tokens ++ new Array[String](schema.length - tokens.length)
-  } else if (options.permissive && schema.length < tokens.length) {
-tokens.take(schema.length)
+  val checkedTokens = if (options.permissive) {
+// If a length of parsed tokens is not equal to expected one, it 
makes the length the same
+// with the expected. If the length is shorter, it adds extra 
tokens in the tail.
+// If longer, it drops extra tokens.
+val lengthSafeTokens = if (dataSchema.length > tokens.length) {
+  tokens ++ new Array[String](dataSchema.length - tokens.length)
+} else if (dataSchema.length < tokens.length) {
+  tokens.take(dataSchema.length)
+} else {
+  tokens
+}
+
+// If we need to handle corrupt fields, it adds an extra token to 
skip a field for malformed
--- End diff --

@HyukjinKwon This fix satisfies your intention? I slightly modified code 
based on your code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16928: [SPARK-18699][SQL] Put malformed tokens into a new field...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16928
  
**[Test build #73331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73331/testReport)**
 for PR 16928 at commit 
[`512fb42`](https://github.com/apache/spark/commit/512fb42404fee1c702bc9e18ad36f15da9e0b273).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17009: [SPARK-19674][SQL]Ignore non-existing driver accumulator...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17009
  
The change looks reasonable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17028: [SPARK-19691][SQL] Fix ClassCastException when calculati...

2017-02-22 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17028
  
@HyukjinKwon @hvanhovell How about the latest fix?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17009: [SPARK-19674][SQL]Ignore non-existing driver accu...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17009#discussion_r102652094
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLListenerSuite.scala
 ---
@@ -147,6 +147,10 @@ class SQLListenerSuite extends SparkFunSuite with 
SharedSQLContext with JsonTest
 
 checkAnswer(listener.getExecutionMetrics(0), 
accumulatorUpdates.mapValues(_ * 2))
 
+// Non-existing driver accumulator updates should be filtered and no 
exception will be thrown.
--- End diff --

when will this happen?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17017: [SPARK-19682][SparkR] Issue warning (or error) when subs...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17017
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73329/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16976#discussion_r102650051
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import java.io.InputStream
+import java.nio.charset.{Charset, StandardCharsets}
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io.{LongWritable, Text}
+import org.apache.hadoop.mapred.TextInputFormat
+import org.apache.hadoop.mapreduce.Job
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
+
+import org.apache.spark.TaskContext
+import org.apache.spark.input.{PortableDataStream, StreamInputFormat}
+import org.apache.spark.rdd.{BinaryFileRDD, RDD}
+import org.apache.spark.sql.{Dataset, Encoders, SparkSession}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.execution.datasources.text.TextFileFormat
+import org.apache.spark.sql.types.StructType
+
+/**
+ * Common functions for parsing CSV files
+ */
+abstract class CSVDataSource extends Serializable {
+  def isSplitable: Boolean
+
+  /**
+   * Parse a [[PartitionedFile]] into [[InternalRow]] instances.
+   */
+  def readFile(
+  conf: Configuration,
+  file: PartitionedFile,
+  parser: UnivocityParser,
+  parsedOptions: CSVOptions): Iterator[InternalRow]
+
+  /**
+   * Infers the schema from `inputPaths` files.
+   */
+  def infer(
+  sparkSession: SparkSession,
+  inputPaths: Seq[FileStatus],
+  parsedOptions: CSVOptions): Option[StructType]
+
+  /**
+   * Generates a header from the given row which is null-safe and 
duplicate-safe.
+   */
+  protected def makeSafeHeader(
+  row: Array[String],
+  caseSensitive: Boolean,
+  options: CSVOptions): Array[String] = {
+if (options.headerFlag) {
+  val duplicates = {
+val headerNames = row.filter(_ != null)
+  .map(name => if (caseSensitive) name else name.toLowerCase)
+headerNames.diff(headerNames.distinct).distinct
+  }
+
+  row.zipWithIndex.map { case (value, index) =>
+if (value == null || value.isEmpty || value == options.nullValue) {
+  // When there are empty strings or the values set in 
`nullValue`, put the
+  // index as the suffix.
+  s"_c$index"
+} else if (!caseSensitive && 
duplicates.contains(value.toLowerCase)) {
+  // When there are case-insensitive duplicates, put the index as 
the suffix.
+  s"$value$index"
+} else if (duplicates.contains(value)) {
+  // When there are duplicates, put the index as the suffix.
+  s"$value$index"
+} else {
+  value
+}
+  }
+} else {
+  row.zipWithIndex.map { case (_, index) =>
+// Uses default column names, "_c#" where # is its position of 
fields
+// when header option is disabled.
+s"_c$index"
+  }
+}
+  }
+}
+
+object CSVDataSource {
+  def apply(options: CSVOptions): CSVDataSource = {
+if (options.wholeFile) {
+  WholeFileCSVDataSource
+} else {
+  TextInputCSVDataSource
+}
+  }
+}
+
+object TextInputCSVDataSource extends CSVDataSource {
+  override val isSplitable: Boolean = true
+
+  override def readFile(
+  conf: Configuration,
+  file: PartitionedFile,
+  parser: UnivocityParser,
+  parsedOptions: CSVOptions): Iterator[InternalRow] = {
+val lines = {
+  val 

[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16976#discussion_r102649816
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import java.io.InputStream
+import java.nio.charset.{Charset, StandardCharsets}
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io.{LongWritable, Text}
+import org.apache.hadoop.mapred.TextInputFormat
+import org.apache.hadoop.mapreduce.Job
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
+
+import org.apache.spark.TaskContext
+import org.apache.spark.input.{PortableDataStream, StreamInputFormat}
+import org.apache.spark.rdd.{BinaryFileRDD, RDD}
+import org.apache.spark.sql.{Dataset, Encoders, SparkSession}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.execution.datasources.text.TextFileFormat
+import org.apache.spark.sql.types.StructType
+
+/**
+ * Common functions for parsing CSV files
+ */
+abstract class CSVDataSource extends Serializable {
+  def isSplitable: Boolean
+
+  /**
+   * Parse a [[PartitionedFile]] into [[InternalRow]] instances.
+   */
+  def readFile(
+  conf: Configuration,
+  file: PartitionedFile,
+  parser: UnivocityParser,
+  parsedOptions: CSVOptions): Iterator[InternalRow]
+
+  /**
+   * Infers the schema from `inputPaths` files.
+   */
+  def infer(
+  sparkSession: SparkSession,
+  inputPaths: Seq[FileStatus],
+  parsedOptions: CSVOptions): Option[StructType]
+
+  /**
+   * Generates a header from the given row which is null-safe and 
duplicate-safe.
+   */
+  protected def makeSafeHeader(
--- End diff --

I assume this is exactly same with 
https://github.com/apache/spark/pull/16976/files#diff-56fbd53c6ada276cb4930affe3720be3L77


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17017: [SPARK-19682][SparkR] Issue warning (or error) when subs...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17017
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17033: [DOCS] application environment rest api

2017-02-22 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17033
  
cc @vanzin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17017: [SPARK-19682][SparkR] Issue warning (or error) when subs...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17017
  
**[Test build #73329 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73329/testReport)**
 for PR 17017 at commit 
[`46f6ca8`](https://github.com/apache/spark/commit/46f6ca8f0a6c36917f4b634262f5657bcde36f2b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16976#discussion_r102649654
  
--- Diff: python/test_support/sql/ages_newlines.csv ---
@@ -0,0 +1,6 @@
+Joe,20,"Hi,
+I am Jeo"
+Tom,30,"My name is Tom"
+Hyukjin,25,"I am Hyukjin
--- End diff --

wow you are only 25?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16976: [SPARK-19610][SQL] Support parsing multiline CSV ...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16976#discussion_r102651748
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -233,3 +236,28 @@ private[csv] class UnivocityParser(
 }
   }
 }
+
+private[csv] object UnivocityParser {
+  def tokenizeStream(
+  inputStream: InputStream,
+  header: Boolean,
+  settings: CsvParserSettings): Iterator[Array[String]] = new 
Iterator[Array[String]] {
+private val parser = new CsvParser(settings)
+// Note that, here we assume `inputStream` is the whole file that 
might include the header.
+parser.beginParsing(inputStream)
+private var nextRecord = {
+  if (header) {
+parser.parseNext()
+  }
+  parser.parseNext()
+}
+
+override def hasNext: Boolean = nextRecord != null
+
+override def next(): Array[String] = {
+  val curRecord = nextRecord
--- End diff --

nit: it's safer to add a check
```
if (!hasNext) {
  throw new NoSuchElementException("End of stream")
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-02-22 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r102650536
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -339,10 +340,17 @@ private[hive] class HiveClientImpl(
 
   override def getDatabase(dbName: String): CatalogDatabase = 
withHiveState {
 Option(client.getDatabase(dbName)).map { d =>
+  // default database's location always use the warehouse path,
+  // and since the location of database stored in metastore is 
qualified,
+  // here we also make qualify for warehouse location
+  val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) {
+SessionCatalog.makeQualifiedPath(sparkConf.get(WAREHOUSE_PATH), 
hadoopConf).toString
--- End diff --

Agreed! 👍 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16923
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73326/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16923
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16923
  
**[Test build #73326 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73326/testReport)**
 for PR 16923 at commit 
[`08d53d2`](https://github.com/apache/spark/commit/08d53d201f62c19cd26c09c29240c71dcb9d08e3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17034
  
**[Test build #73330 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73330/testReport)**
 for PR 17034 at commit 
[`fd59ca9`](https://github.com/apache/spark/commit/fd59ca90422300f0d2e54bc809555c54edb43af8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r102648988
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -339,10 +340,17 @@ private[hive] class HiveClientImpl(
 
   override def getDatabase(dbName: String): CatalogDatabase = 
withHiveState {
 Option(client.getDatabase(dbName)).map { d =>
+  // default database's location always use the warehouse path,
+  // and since the location of database stored in metastore is 
qualified,
+  // here we also make qualify for warehouse location
+  val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) {
+SessionCatalog.makeQualifiedPath(sparkConf.get(WAREHOUSE_PATH), 
hadoopConf).toString
--- End diff --

What I want is consistency. Now we decide to define the location of default 
database as warehouse path, and we should stick with it. The main goal of this 
PR is not to fix the bug when sharing metastore db, but to change the 
definition of default database location.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-22 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/16534
  
To a very limited extent. It can bring some useful information in IPython  
/ Jupyter (maybe some other tools as well) but won't work with built-in `help` 
/ `pydoc.help`.

You  can compare:

```python
from functools import wraps

def f(x, *args):
"""This is
some function"""
return x

class F():
def __init__(self, f):
self.f = f
def __call__(self, x):
return f(x)

g = wraps(f)(F(f))

@wraps(f)
def h(x):
return F(f)(x)

?g
help(g)

?h
help(h)
```

As far as I am aware it is either this or dynamical inheritance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-02-22 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/17034

[SPARK-19704][ML] AFTSurvivalRegression should support numeric censorCol

## What changes were proposed in this pull request?
make `AFTSurvivalRegression` support numeric censorCol
## How was this patch tested?
existing tests and added tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark aft_numeric_censor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17034.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17034






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16594
  
LGTM except one comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17017: [SPARK-19682][SparkR] Issue warning (or error) when subs...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17017
  
**[Test build #73329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73329/testReport)**
 for PR 17017 at commit 
[`46f6ca8`](https://github.com/apache/spark/commit/46f6ca8f0a6c36917f4b634262f5657bcde36f2b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17017: [SPARK-19682][SparkR] Issue warning (or error) wh...

2017-02-22 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17017#discussion_r102648326
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1776,6 +1780,10 @@ setMethod("[[", signature(x = "SparkDataFrame", i = 
"numericOrcharacter"),
 #' @note [[<- since 2.1.1
 setMethod("[[<-", signature(x = "SparkDataFrame", i = 
"numericOrcharacter"),
   function(x, i, value) {
+if (length(i) > 1) {
+  warning("Subset index has length > 1. Only the first index 
is used.")
+  i <- i[1]
+}
--- End diff --

@felixcheung Test added. Thanks for catching this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102648290
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -202,21 +221,25 @@ private[csv] class UnivocityParser(
   }
   numMalformedRecords += 1
   None
-} else if (options.failFast && schema.length != tokens.length) {
+} else if (options.failFast && inputSchema.length != tokens.length) {
   throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
 s"${tokens.mkString(options.delimiter.toString)}")
 } else {
-  val checkedTokens = if (options.permissive && schema.length > 
tokens.length) {
-tokens ++ new Array[String](schema.length - tokens.length)
-  } else if (options.permissive && schema.length < tokens.length) {
-tokens.take(schema.length)
+  val checkedTokens = if (options.permissive && inputSchema.length > 
tokens.length) {
+tokens ++ new Array[String](inputSchema.length - tokens.length)
+  } else if (options.permissive && inputSchema.length < tokens.length) 
{
+tokens.take(inputSchema.length)
--- End diff --

okay, I'll update soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102648114
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -45,24 +45,41 @@ private[csv] class UnivocityParser(
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val corruptFieldIndex = 
schema.getFieldIndex(options.columnNameOfCorruptRecord)
+  corruptFieldIndex.foreach { corrFieldIndex =>
+require(schema(corrFieldIndex).dataType == StringType)
+require(schema(corrFieldIndex).nullable)
+  }
+
+  private val inputSchema = StructType(schema.filter(_.name != 
options.columnNameOfCorruptRecord))
+
   private val valueConverters =
-schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+inputSchema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
 
   private val parser = new CsvParser(options.asParserSettings)
 
   private var numMalformedRecords = 0
 
   private val row = new GenericInternalRow(requiredSchema.length)
 
-  private val indexArr: Array[Int] = {
+  // This parser loads an `indexArr._1`-th position value in input tokens,
+  // then put the value in `row(indexArr._2)`.
+  private val indexArr: Array[(Int, Int)] = {
 val fields = if (options.dropMalformed) {
   // If `dropMalformed` is enabled, then it needs to parse all the 
values
   // so that we can decide which row is malformed.
   requiredSchema ++ schema.filterNot(requiredSchema.contains(_))
 } else {
   requiredSchema
 }
-fields.map(schema.indexOf(_: StructField)).toArray
+val fieldsWithIndexes = fields.zipWithIndex
+corruptFieldIndex.map { case corrFieldIndex =>
+  fieldsWithIndexes.filter { case (_, i) => i != corrFieldIndex }
+}.getOrElse {
+  fieldsWithIndexes
+}.map { case (f, i) =>
+  (inputSchema.indexOf(f), i)
+}.toArray
--- End diff --

okay, I'll update


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102648089
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -202,21 +221,25 @@ private[csv] class UnivocityParser(
   }
   numMalformedRecords += 1
   None
-} else if (options.failFast && schema.length != tokens.length) {
+} else if (options.failFast && inputSchema.length != tokens.length) {
   throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
 s"${tokens.mkString(options.delimiter.toString)}")
 } else {
-  val checkedTokens = if (options.permissive && schema.length > 
tokens.length) {
-tokens ++ new Array[String](schema.length - tokens.length)
-  } else if (options.permissive && schema.length < tokens.length) {
-tokens.take(schema.length)
+  val checkedTokens = if (options.permissive && inputSchema.length > 
tokens.length) {
+tokens ++ new Array[String](inputSchema.length - tokens.length)
+  } else if (options.permissive && inputSchema.length < tokens.length) 
{
+tokens.take(inputSchema.length)
   } else {
 tokens
   }
 
   try {
 Some(convert(checkedTokens))
   } catch {
+case NonFatal(e) if options.permissive =>
+  val row = new GenericInternalRow(requiredSchema.length)
+  corruptFieldIndex.map(row(_) = UTF8String.fromString(input))
--- End diff --

BTW, I would like to note that this change is actually an important point 
that we should avoid. This could cause some bugs that are hard to figure out. 
(see [SPARK-16694](https://issues.apache.org/jira/browse/SPARK-16694))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16928: [SPARK-18699][SQL] Put malformed tokens into a new field...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16928
  
**[Test build #73328 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73328/testReport)**
 for PR 16928 at commit 
[`8d9386a`](https://github.com/apache/spark/commit/8d9386abea0941a40a89fd4860c5568ec55d7d95).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17028: [SPARK-19691][SQL] Fix ClassCastException when calculati...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17028
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73320/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16928: [SPARK-18699][SQL] Put malformed tokens into a new field...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16928
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16928: [SPARK-18699][SQL] Put malformed tokens into a new field...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16928
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73322/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17028: [SPARK-19691][SQL] Fix ClassCastException when calculati...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17028
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16928: [SPARK-18699][SQL] Put malformed tokens into a new field...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16928
  
**[Test build #73322 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73322/testReport)**
 for PR 16928 at commit 
[`c86febe`](https://github.com/apache/spark/commit/c86febe6b018faafa62e0bf6444f8cd4326fb021).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17028: [SPARK-19691][SQL] Fix ClassCastException when calculati...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17028
  
**[Test build #73320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73320/testReport)**
 for PR 17028 at commit 
[`ef26f26`](https://github.com/apache/spark/commit/ef26f262cc747505cb0d2a55d6ee0c531263ac0a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r102647844
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala 
---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.ml.fpm
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.util.DefaultReadWriteTest
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+class FPGrowthSuite extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
+
+  @transient var dataset: Dataset[_] = _
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+dataset = FPGrowthSuite.getFPGrowthData(spark)
+  }
+
+  test("FPGrowth fit and transform with different data types") {
+Array(IntegerType, StringType, ShortType, LongType, ByteType).foreach 
{ dt =>
+  val intData = dataset.withColumn("features", 
col("features").cast(ArrayType(dt)))
+  val model = new FPGrowth().setMinSupport(0.5).fit(intData)
+  val generatedRules = model.setMinConfidence(0.5).getAssociationRules
+  val expectedRules = spark.createDataFrame(Seq(
+(Array("2"), Array("1"), 1.0),
+(Array("1"), Array("2"), 0.75)
+  )).toDF("antecedent", "consequent", "confidence")
+.withColumn("antecedent", col("antecedent").cast(ArrayType(dt)))
+.withColumn("consequent", col("consequent").cast(ArrayType(dt)))
+  assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
+generatedRules.sort("antecedent").rdd.collect()))
+
+  val transformed = model.transform(intData)
+  val expectedTransformed = spark.createDataFrame(Seq(
+(0, Array("1", "2"), Array.emptyIntArray),
+(0, Array("1", "2"), Array.emptyIntArray),
+(0, Array("1", "2"), Array.emptyIntArray),
+(0, Array("1", "3"), Array(2))
+  )).toDF("id", "features", "prediction")
+.withColumn("features", col("features").cast(ArrayType(dt)))
+.withColumn("prediction", col("prediction").cast(ArrayType(dt)))
+  assert(expectedTransformed.sort("id").rdd.collect().sameElements(
+transformed.sort("id").rdd.collect()))
+}
+  }
+
+  test("FPGrowth getFreqItems") {
+val model = new FPGrowth().setMinSupport(0.7).fit(dataset)
+val expectedFreq = spark.createDataFrame(Seq(
+  (Array("1"), 4L),
+  (Array("2"), 3L),
+  (Array("1", "2"), 3L),
+  (Array("2", "1"), 3L)
--- End diff --

As the sequence of the frequent items is not guaranteed, a little hack..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102647596
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -794,6 +795,7 @@ EXPLAIN: 'EXPLAIN';
 FORMAT: 'FORMAT';
 LOGICAL: 'LOGICAL';
 CODEGEN: 'CODEGEN';
+COST: 'COST';
--- End diff --

also put in it `nonReserved`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16938
  
@tejasapatil Spark doesn't need to be exactly same with Hive, we follow 
hive behavior if it's reasonable, or use our own logic if hive's behavior 
doesn't make sense. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102646572
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -202,21 +221,25 @@ private[csv] class UnivocityParser(
   }
   numMalformedRecords += 1
   None
-} else if (options.failFast && schema.length != tokens.length) {
+} else if (options.failFast && inputSchema.length != tokens.length) {
   throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
 s"${tokens.mkString(options.delimiter.toString)}")
 } else {
-  val checkedTokens = if (options.permissive && schema.length > 
tokens.length) {
-tokens ++ new Array[String](schema.length - tokens.length)
-  } else if (options.permissive && schema.length < tokens.length) {
-tokens.take(schema.length)
+  val checkedTokens = if (options.permissive && inputSchema.length > 
tokens.length) {
+tokens ++ new Array[String](inputSchema.length - tokens.length)
+  } else if (options.permissive && inputSchema.length < tokens.length) 
{
+tokens.take(inputSchema.length)
--- End diff --

not related to this PR, but can you add some comments for this code block? 
It's kind if hard to follow the logic here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11211: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated t...

2017-02-22 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/11211
  
@holdenk description is updated. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102646336
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -202,21 +221,25 @@ private[csv] class UnivocityParser(
   }
   numMalformedRecords += 1
   None
-} else if (options.failFast && schema.length != tokens.length) {
+} else if (options.failFast && inputSchema.length != tokens.length) {
   throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
 s"${tokens.mkString(options.delimiter.toString)}")
 } else {
-  val checkedTokens = if (options.permissive && schema.length > 
tokens.length) {
-tokens ++ new Array[String](schema.length - tokens.length)
-  } else if (options.permissive && schema.length < tokens.length) {
-tokens.take(schema.length)
+  val checkedTokens = if (options.permissive && inputSchema.length > 
tokens.length) {
+tokens ++ new Array[String](inputSchema.length - tokens.length)
+  } else if (options.permissive && inputSchema.length < tokens.length) 
{
+tokens.take(inputSchema.length)
   } else {
 tokens
   }
 
   try {
 Some(convert(checkedTokens))
   } catch {
+case NonFatal(e) if options.permissive =>
+  val row = new GenericInternalRow(requiredSchema.length)
+  corruptFieldIndex.map(row(_) = UTF8String.fromString(input))
--- End diff --

`map` -> `foreach`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-22 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/16938
  
I looked into the code. Looks like that version is merely for picking the 
hive shim and metastore interactions and got nothing to do with semantics of 
SQL operations. So you are most likely correct.

@gatorsmile @cloud-fan : Is the goal of hive support in spark to be 
adherent with a specific release of Hive (as long as the hive behavior is sane 
and consistent .. otherwise it doesn't make sense to follow it) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-02-22 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/16971
  
Yes my point was returning null is not very idiomatic in Scala. Better to
return Option or empty collection. Option doesn't work for Java compat, so
empty Array is best in this case I believe.

+1 for empty Array and if we can return the quantiles for the non-empty /
non-NaN cols as per your suggestion that is ideal.
On Thu, 23 Feb 2017 at 08:12, Ruifeng Zheng 
wrote:

> @thunterdb  Good point. I will check the
> sampled in def query.
>
> @MLnick  @gatorsmile
>  I perfer empty array as the result for
> empty dataset or columns that only contains na.
> And, in the case that only some columns only contains na. Current
> implementation will return null, and result for all column can not be
> obtained. I think the result for common columns should be accessable.
>
> val rows = spark.sparkContext.parallelize(Seq(Row(Double.NaN, 1.0, 
Double.NaN),
> +  Row(1.0, -1.0, null), Row(-1.0, Double.NaN, null), Row(Double.NaN, 
Double.NaN, null),
> +  Row(null, null, Double.NaN), Row(null, 1.0, null), Row(-1.0, null, 
Double.NaN),
> +  Row(Double.NaN, null, null)))
>  val schema = StructType(Seq(StructField("input1", DoubleType, 
nullable = true),
> +  StructField("input2", DoubleType, nullable = true),
> +  StructField("input3", DoubleType, nullable = true)))
>  val dfNaN = spark.createDataFrame(rows, schema)
> val resNaNAll = dfNaN.stat.approxQuantile(Array("input1", "input2", 
"input3"),
>Array(q1, q2), epsilon)
>
> In the returned array, result for columns input1 and input2 should be ok,
> and result for input3 is empty. Array(Array(num1, num2), Array(num3,
> num4), Array())
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102646058
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -190,8 +208,9 @@ private[csv] class UnivocityParser(
   }
 
   private def convertWithParseMode(
-  tokens: Array[String])(convert: Array[String] => InternalRow): 
Option[InternalRow] = {
-if (options.dropMalformed && schema.length != tokens.length) {
+  input: String)(convert: Array[String] => InternalRow): 
Option[InternalRow] = {
+val tokens = parser.parseLine(input)
+if (options.dropMalformed && inputSchema.length != tokens.length) {
--- End diff --

I think it's more readable to write `options.dropMalformed && 
corruptFieldIndex.isDefiend`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r102646065
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,341 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared.{HasFeaturesCol, HasPredictionCol}
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.fpm.{AssociationRules => 
MLlibAssociationRules,
+  FPGrowth => MLlibFPGrowth}
+import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+/**
+ * Common params for FPGrowth and FPGrowthModel
+ */
+private[fpm] trait FPGrowthParams extends Params with HasFeaturesCol with 
HasPredictionCol {
+
+  /**
+   * Minimal support level of the frequent pattern. [0.0, 1.0]. Any 
pattern that appears
+   * more than (minSupport * size-of-the-dataset) times will be output
+   * Default: 0.3
+   * @group param
+   */
+  @Since("2.2.0")
+  val minSupport: DoubleParam = new DoubleParam(this, "minSupport",
+"the minimal support level of the frequent pattern (Default: 0.3)",
+ParamValidators.inRange(0.0, 1.0))
+  setDefault(minSupport -> 0.3)
+
+  /** @group getParam */
+  @Since("2.2.0")
+  def getMinSupport: Double = $(minSupport)
+
+  /**
+   * Number of partitions (>=1) used by parallel FP-growth. By default the 
param is not set, and
+   * partition number of the input dataset is used.
+   * @group expertParam
+   */
+  @Since("2.2.0")
+  val numPartitions: IntParam = new IntParam(this, "numPartitions",
+"Number of partitions used by parallel FP-growth", 
ParamValidators.gtEq[Int](1))
+
+  /** @group expertGetParam */
+  @Since("2.2.0")
+  def getNumPartitions: Int = $(numPartitions)
+
+  /**
+   * Minimal confidence for generating Association Rule.
+   * Note that minConfidence has no effect during fitting.
+   * Default: 0.8
+   * @group param
+   */
+  @Since("2.2.0")
+  val minConfidence: DoubleParam = new DoubleParam(this, "minConfidence",
+"minimal confidence for generating Association Rule (Default: 0.8)",
+ParamValidators.inRange(0.0, 1.0))
+  setDefault(minConfidence -> 0.8)
+
+  /** @group getParam */
+  @Since("2.2.0")
+  def getMinConfidence: Double = $(minConfidence)
+
+  /**
+   * Validates and transforms the input schema.
+   * @param schema input schema
+   * @return output schema
+   */
+  @Since("2.2.0")
+  protected def validateAndTransformSchema(schema: StructType): StructType 
= {
+val inputType = schema($(featuresCol)).dataType
+require(inputType.isInstanceOf[ArrayType],
+  s"The input column must be ArrayType, but got $inputType.")
+SchemaUtils.appendColumn(schema, $(predictionCol), 
schema($(featuresCol)).dataType)
+  }
+}
+
+/**
+ * :: Experimental ::
+ * A parallel FP-growth algorithm to mine frequent itemsets.
+ *
+ * @see [[http://dx.doi.org/10.1145/1454008.1454027 Li et al., PFP: 
Parallel FP-Growth for Query
+ *  Recommendation]]
+ */
+@Since("2.2.0")
+@Experimental
+class FPGrowth @Since("2.2.0") (
+@Since("2.2.0") override val uid: String)
+  extends Estimator[FPGrowthModel] with FPGrowthParams with 
DefaultParamsWritable {
+
+  @Since("2.2.0")
+  def this() = this(Identifiable.randomUID("fpgrowth"))
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setMinSupport(value: Double): this.type = set(minSupport, value)
   

[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102645418
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -45,24 +45,41 @@ private[csv] class UnivocityParser(
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val corruptFieldIndex = 
schema.getFieldIndex(options.columnNameOfCorruptRecord)
+  corruptFieldIndex.foreach { corrFieldIndex =>
+require(schema(corrFieldIndex).dataType == StringType)
+require(schema(corrFieldIndex).nullable)
+  }
+
+  private val inputSchema = StructType(schema.filter(_.name != 
options.columnNameOfCorruptRecord))
--- End diff --

(For example, [this code 
path](https://github.com/apache/spark/pull/16928/files/c86febe6b018faafa62e0bf6444f8cd4326fb021#diff-d19881aceddcaa5c60620fdcda99b4c4R213)
 is dependent on the length of the data schema. It drops/adds tokens after 
comparing the length between data schema and tokens. If we keep the corrupt 
column, then the schema length would be different with the tokens.)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-02-22 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/16971
  
@thunterdb Good point. I will check the `sampled` in `def query`. 

@MLnick @gatorsmile I perfer empty array as the result for empty dataset or 
columns that only contains na.
And, in the case that only some columns only contains na. Current 
implementation will return null,  and result for all column can not be 
obtained. I think the result for common columns should be accessable.
```
val rows = spark.sparkContext.parallelize(Seq(Row(Double.NaN, 1.0, 
Double.NaN),
+  Row(1.0, -1.0, null), Row(-1.0, Double.NaN, null), Row(Double.NaN, 
Double.NaN, null),
+  Row(null, null, Double.NaN), Row(null, 1.0, null), Row(-1.0, null, 
Double.NaN),
+  Row(Double.NaN, null, null)))
 val schema = StructType(Seq(StructField("input1", DoubleType, nullable 
= true),
+  StructField("input2", DoubleType, nullable = true),
+  StructField("input3", DoubleType, nullable = true)))
 val dfNaN = spark.createDataFrame(rows, schema)
val resNaNAll = dfNaN.stat.approxQuantile(Array("input1", "input2", 
"input3"),
   Array(q1, q2), epsilon)
```
In the returned array, result for columns `input1` and `input2` should be 
ok, and result for `input3` is empty. `Array(Array(num1, num2), Array(num3, 
num4), Array())`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-22 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16938
  
@tejasapatil In my opinion, test in Hive 2.0.0 just make a compare with 
Spark, the target is to determine these actions in Spark, not to make consist 
with Hive 2.0.0 or Hive 1.2.1, isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102645047
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -45,24 +45,41 @@ private[csv] class UnivocityParser(
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val corruptFieldIndex = 
schema.getFieldIndex(options.columnNameOfCorruptRecord)
+  corruptFieldIndex.foreach { corrFieldIndex =>
+require(schema(corrFieldIndex).dataType == StringType)
+require(schema(corrFieldIndex).nullable)
+  }
+
+  private val inputSchema = StructType(schema.filter(_.name != 
options.columnNameOfCorruptRecord))
--- End diff --

It is because parsing CSV is dependent on the order of schema and tokens. 
In case of JSON, this can be just mapped by its key but for CSV it depends on 
the order of schema.

So, it seems this filters the corrupt field out in order to match the data 
schema with parsed tokens.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102645010
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -45,24 +45,41 @@ private[csv] class UnivocityParser(
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val corruptFieldIndex = 
schema.getFieldIndex(options.columnNameOfCorruptRecord)
+  corruptFieldIndex.foreach { corrFieldIndex =>
+require(schema(corrFieldIndex).dataType == StringType)
+require(schema(corrFieldIndex).nullable)
+  }
+
+  private val inputSchema = StructType(schema.filter(_.name != 
options.columnNameOfCorruptRecord))
+
   private val valueConverters =
-schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+inputSchema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
 
   private val parser = new CsvParser(options.asParserSettings)
 
   private var numMalformedRecords = 0
 
   private val row = new GenericInternalRow(requiredSchema.length)
 
-  private val indexArr: Array[Int] = {
+  // This parser loads an `indexArr._1`-th position value in input tokens,
+  // then put the value in `row(indexArr._2)`.
+  private val indexArr: Array[(Int, Int)] = {
 val fields = if (options.dropMalformed) {
   // If `dropMalformed` is enabled, then it needs to parse all the 
values
   // so that we can decide which row is malformed.
   requiredSchema ++ schema.filterNot(requiredSchema.contains(_))
 } else {
   requiredSchema
 }
-fields.map(schema.indexOf(_: StructField)).toArray
+val fieldsWithIndexes = fields.zipWithIndex
+corruptFieldIndex.map { case corrFieldIndex =>
+  fieldsWithIndexes.filter { case (_, i) => i != corrFieldIndex }
+}.getOrElse {
+  fieldsWithIndexes
+}.map { case (f, i) =>
+  (inputSchema.indexOf(f), i)
+}.toArray
--- End diff --

SGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-22 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/16938
  
@windpiger : I realised that you are checking the hive behavior against 
Hive 2.0.0. Spark is expected to support semantics for Hive 1.2.1 : 
https://github.com/apache/spark/blob/3881f342b49efdb1e0d5ee27f616451ea1928c5d/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L58

I am not upto date with the differences between those two releases of hive 
wrt this discussion. Can you confirm if the observations reported earlier in 
the discussion are valid against Hive 1.2.1 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...

2017-02-22 Thread kevinyu98
Github user kevinyu98 commented on the issue:

https://github.com/apache/spark/pull/16841
  
@gatorsmile sure, I will do that. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102644680
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -45,24 +45,41 @@ private[csv] class UnivocityParser(
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val corruptFieldIndex = 
schema.getFieldIndex(options.columnNameOfCorruptRecord)
+  corruptFieldIndex.foreach { corrFieldIndex =>
+require(schema(corrFieldIndex).dataType == StringType)
+require(schema(corrFieldIndex).nullable)
+  }
+
+  private val inputSchema = StructType(schema.filter(_.name != 
options.columnNameOfCorruptRecord))
--- End diff --

json doesn't do this, why is this difference? cc @HyukjinKwon 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17033: [DOCS] application environment rest api

2017-02-22 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17033
  
cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102644140
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -45,24 +45,41 @@ private[csv] class UnivocityParser(
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val corruptFieldIndex = 
schema.getFieldIndex(options.columnNameOfCorruptRecord)
+  corruptFieldIndex.foreach { corrFieldIndex =>
+require(schema(corrFieldIndex).dataType == StringType)
+require(schema(corrFieldIndex).nullable)
+  }
+
+  private val inputSchema = StructType(schema.filter(_.name != 
options.columnNameOfCorruptRecord))
+
   private val valueConverters =
-schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+inputSchema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
 
   private val parser = new CsvParser(options.asParserSettings)
 
   private var numMalformedRecords = 0
 
   private val row = new GenericInternalRow(requiredSchema.length)
 
-  private val indexArr: Array[Int] = {
+  // This parser loads an `indexArr._1`-th position value in input tokens,
+  // then put the value in `row(indexArr._2)`.
+  private val indexArr: Array[(Int, Int)] = {
 val fields = if (options.dropMalformed) {
   // If `dropMalformed` is enabled, then it needs to parse all the 
values
   // so that we can decide which row is malformed.
   requiredSchema ++ schema.filterNot(requiredSchema.contains(_))
 } else {
   requiredSchema
 }
-fields.map(schema.indexOf(_: StructField)).toArray
+val fieldsWithIndexes = fields.zipWithIndex
+corruptFieldIndex.map { case corrFieldIndex =>
+  fieldsWithIndexes.filter { case (_, i) => i != corrFieldIndex }
+}.getOrElse {
+  fieldsWithIndexes
+}.map { case (f, i) =>
+  (inputSchema.indexOf(f), i)
+}.toArray
--- End diff --

Per 

> a string converter in a corrupt field a bit looks weird,

We are already filling up the tokens in permissive modes. We could make 
this NOOP too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102643961
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -367,10 +368,18 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
  If None is set, it uses the default value, 
session local timezone.
 
 * ``PERMISSIVE`` : sets other fields to ``null`` when it 
meets a corrupted record.
-When a schema is set by user, it sets ``null`` for 
extra fields.
+If users set a string type field named 
``columnNameOfCorruptRecord`` in a
--- End diff --

okay


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...

2017-02-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16841
  
To make the results consistent between big endian and small endian, we can 
improve the queries with the extra order by clauses.

@robbinspg Which queries failed? @kevinyu98 Can you collect the failed 
cases and submit another PR for resolving the issues? Thanks!




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16928: [SPARK-18699][SQL] Put malformed tokens into a ne...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16928#discussion_r102643746
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -367,10 +368,18 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
  If None is set, it uses the default value, 
session local timezone.
 
 * ``PERMISSIVE`` : sets other fields to ``null`` when it 
meets a corrupted record.
-When a schema is set by user, it sets ``null`` for 
extra fields.
+If users set a string type field named 
``columnNameOfCorruptRecord`` in a
--- End diff --

can we just copy-paste the doc from `json`? or you can make some changes 
but please make them consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...

2017-02-22 Thread kevinyu98
Github user kevinyu98 commented on the issue:

https://github.com/apache/spark/pull/16841
  
Hello Pete: Thanks for running the test case. Can you send the failing test 
case file to me? Also I can provide new test files with the output files, can 
you help test on your platforms? thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-22 Thread lw-lin
Github user lw-lin closed the pull request at:

https://github.com/apache/spark/pull/16987


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-22 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/16987
  
Using deterministic file names sounds great. Thanks! I'm closing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17033: [DOCS] application environment rest api

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17033
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17033: [DOCS] application environment rest api

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17033
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73327/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17033: [DOCS] application environment rest api

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17033
  
**[Test build #73327 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73327/testReport)**
 for PR 17033 at commit 
[`312981b`](https://github.com/apache/spark/commit/312981b22549971f4e58ad8e91bca984300efab7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17023: [SPARK-19695][SQL] Throw an exception if a `colum...

2017-02-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17023


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16998
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73316/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17023: [SPARK-19695][SQL] Throw an exception if a `columnNameOf...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17023
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16998
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16998: [SPARK-19665][SQL][WIP] Improve constraint propagation

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16998
  
**[Test build #73316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73316/testReport)**
 for PR 16998 at commit 
[`5be21b3`](https://github.com/apache/spark/commit/5be21b32d5b4e3e36e50317a385a554206967668).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r102642601
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -339,10 +340,17 @@ private[hive] class HiveClientImpl(
 
   override def getDatabase(dbName: String): CatalogDatabase = 
withHiveState {
 Option(client.getDatabase(dbName)).map { d =>
+  // default database's location always use the warehouse path,
+  // and since the location of database stored in metastore is 
qualified,
+  // here we also make qualify for warehouse location
+  val dbLocation = if (dbName == SessionCatalog.DEFAULT_DATABASE) {
+SessionCatalog.makeQualifiedPath(sparkConf.get(WAREHOUSE_PATH), 
hadoopConf).toString
--- End diff --

This won't work for `InMemoryCatalog`, isn't it?

You should either implement this logic in all `ExternalCatalog`s, all put 
it in `SessionCatalog`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17033: [DOCS] application environment rest api

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17033
  
**[Test build #73327 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73327/testReport)**
 for PR 17033 at commit 
[`312981b`](https://github.com/apache/spark/commit/312981b22549971f4e58ad8e91bca984300efab7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17033: [DOCS] application environment rest api

2017-02-22 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/17033

[DOCS] application environment rest api

## What changes were proposed in this pull request?

application environment rest api

## How was this patch tested?

jenkins


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark doc-restapi-environment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17033.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17033


commit 312981b22549971f4e58ad8e91bca984300efab7
Author: uncleGen 
Date:   2017-02-23T05:28:28Z

Docs for rest api of environment.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16938
  
Thank you for your work! 

Maybe the last question.

```
**2. CREATE TABLE ...PARTITIONED BY ... LOCATION path AS SELECT ...**
a) path exists
  hive(external) -> not support
  spark(hive with HiveExternalCatalog) -> ok
  spark(parquet with HiveExternalCatalog) -> throw exception(path 
already exists)
  spark(parquet with InMemoryCatalog) -> throw exception(path 
already exists)
```

In the above case, you used `path exists`. I assumed this is the existence 
of the table directory. Are these behaviors still the same when the specific 
partition directory exists? 






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16923
  
**[Test build #73326 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73326/testReport)**
 for PR 16923 at commit 
[`08d53d2`](https://github.com/apache/spark/commit/08d53d201f62c19cd26c09c29240c71dcb9d08e3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73325 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73325/testReport)**
 for PR 17001 at commit 
[`d327994`](https://github.com/apache/spark/commit/d327994593395a69465717a2d401672f025cac36).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16826
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16826
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73319/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16826
  
**[Test build #73319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73319/testReport)**
 for PR 16826 at commit 
[`dd2dedd`](https://github.com/apache/spark/commit/dd2dedd6d578a3b5a75359be72677e61eea751e3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #73324 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73324/testReport)**
 for PR 13599 at commit 
[`89b194f`](https://github.com/apache/spark/commit/89b194fd4b043939569d2118bd46f8c617ef924c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >