[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...

2017-05-14 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17644#discussion_r116414803
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -307,6 +307,27 @@ case class InsertIntoHiveTable(
   }
 }
 
+table.bucketSpec match {
+  case Some(bucketSpec) =>
+// Writes to bucketed hive tables are allowed only if user does 
not care about maintaining
+// table's bucketing ie. both "hive.enforce.bucketing" and 
"hive.enforce.sorting" are
+// set to false
+val enforceBucketingConfig = "hive.enforce.bucketing"
+val enforceSortingConfig = "hive.enforce.sorting"
+
+val message = s"Output Hive table ${table.identifier} is bucketed 
but Spark" +
+  "currently does NOT populate bucketed output which is compatible 
with Hive."
+
+if (hadoopConf.get(enforceBucketingConfig, "true").toBoolean ||
+  hadoopConf.get(enforceSortingConfig, "true").toBoolean) {
+  throw new AnalysisException(message)
+} else {
+  logWarning(message + s" Inserting data anyways since both 
$enforceBucketingConfig and " +
+s"$enforceSortingConfig are set to false.")
--- End diff --

In hive: It would lead to wrong result.

In spark (over master and also after this PR): the table scan operation 
does not take bucketing into account so it would be read as a regular table. 
So, it won't be read "wrong", its just that we wont take advantage of bucketing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17980: [SPARK-20728][SQL] Make ORCFileFormat configurabl...

2017-05-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/17980#discussion_r116414546
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/DDLSourceLoadSuite.scala 
---
@@ -55,10 +56,12 @@ class DDLSourceLoadSuite extends DataSourceTest with 
SharedSQLContext {
   }
 
   test("should fail to load ORC without Hive Support") {
--- End diff --

We can remove this test case when we remove `sql/hive` ORCFileFormat.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76925/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #76925 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76925/testReport)**
 for PR 16989 at commit 
[`80b3154`](https://github.com/apache/spark/commit/80b31545a1d6b6890e3cc0d549781ca15d7d46dc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17980
  
**[Test build #76931 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76931/testReport)**
 for PR 17980 at commit 
[`73d56f2`](https://github.com/apache/spark/commit/73d56f2f9e3cb91a93a555654a6f9e9933e9ef7a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...

2017-05-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17644#discussion_r116412797
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -307,6 +307,27 @@ case class InsertIntoHiveTable(
   }
 }
 
+table.bucketSpec match {
+  case Some(bucketSpec) =>
+// Writes to bucketed hive tables are allowed only if user does 
not care about maintaining
+// table's bucketing ie. both "hive.enforce.bucketing" and 
"hive.enforce.sorting" are
+// set to false
+val enforceBucketingConfig = "hive.enforce.bucketing"
+val enforceSortingConfig = "hive.enforce.sorting"
+
+val message = s"Output Hive table ${table.identifier} is bucketed 
but Spark" +
+  "currently does NOT populate bucketed output which is compatible 
with Hive."
+
+if (hadoopConf.get(enforceBucketingConfig, "true").toBoolean ||
+  hadoopConf.get(enforceSortingConfig, "true").toBoolean) {
+  throw new AnalysisException(message)
+} else {
+  logWarning(message + s" Inserting data anyways since both 
$enforceBucketingConfig and " +
+s"$enforceSortingConfig are set to false.")
--- End diff --

so after insertion(if not enforcing), the table is still a buckted table 
but read it will cause wrong result?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17981
  
**[Test build #76930 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76930/testReport)**
 for PR 17981 at commit 
[`7e383a2`](https://github.com/apache/spark/commit/7e383a2f7e488c4277ee418454d1bbc69c8c8eb2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...

2017-05-14 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/17981
  
Jenkins, please retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17978: [SPARK-20736][Python] PySpark StringIndexer suppo...

2017-05-14 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17978#discussion_r116411579
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2115,22 +2115,32 @@ class StringIndexer(JavaEstimator, HasInputCol, 
HasOutputCol, HasHandleInvalid,
 .. versionadded:: 1.4.0
 """
 
+stringOrderType = Param(Params._dummy(), "stringOrderType",
+"How to order labels of string column. The 
first label after " +
+"ordering is assigned an index of 0. Supported 
options: " +
+"frequencyDesc, frequencyAsc, alphabetDsec, 
alphabetAsc.",
+typeConverter=TypeConverters.toString)
+
 @keyword_only
-def __init__(self, inputCol=None, outputCol=None, 
handleInvalid="error"):
+def __init__(self, inputCol=None, outputCol=None, 
handleInvalid="error",
+ stringOrderType="frequencyDesc"):
 """
-__init__(self, inputCol=None, outputCol=None, 
handleInvalid="error")
+__init__(self, inputCol=None, outputCol=None, 
handleInvalid="error", \
--- End diff --

@HyukjinKwon  Thank you. Added tests. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17978
  
@viirya Thanks much for your review. I corrected the typo and added some 
tests. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17980
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16598: [SPARK-19236][Core] Added createOrReplaceGlobalTe...

2017-05-14 Thread arman1371
Github user arman1371 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16598#discussion_r116410932
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2603,6 +2603,21 @@ class Dataset[T] private[sql](
   def createGlobalTempView(viewName: String): Unit = withPlan {
 createTempViewCommand(viewName, replace = false, global = true)
   }
+
+  /**
+   * Creates or replaces a global temporary view using the given name. The 
lifetime of this
+   * temporary view is tied to this Spark application.
+   *
+   * Global temporary view is cross-session. Its lifetime is the lifetime 
of the Spark application,
+   * i.e. it will be automatically dropped when the application 
terminates. It's tied to a system
+   * preserved database `_global_temp`, and we must use the qualified name 
to refer a global temp
+   * view, e.g. `SELECT * FROM _global_temp.view1`.
+   *
+   * @group basic
--- End diff --

The createOrReplaceGlobalTempView method is not in java API
@rxin said it should be added since 2.1.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17980
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76924/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17980
  
**[Test build #76924 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76924/testReport)**
 for PR 17980 at commit 
[`7716234`](https://github.com/apache/spark/commit/77162342c66ee21f784b900d892a26739631c151).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17978
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76929/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17978
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17978
  
**[Test build #76929 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76929/testReport)**
 for PR 17978 at commit 
[`f66a445`](https://github.com/apache/spark/commit/f66a4455aba7ffc69d1b397cb828879d84bb39a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17973: [SPARK-20731][SQL] Add ability to change or omit ...

2017-05-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17973#discussion_r116408851
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -622,6 +622,31 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("save tsv with tsv suffix") {
+withTempDir { dir =>
+  val csvDir = new File(dir, "csv").getCanonicalPath
+  val cars = spark.read
+.format("csv")
+.option("header", "true")
+.load(testFile(carsFile))
+
+  cars.coalesce(1).write
+.option("header", "true")
+.option("fileExtension", ".tsv")
+.option("delimiter", "\t")
--- End diff --

Just curious what is the reason you need to omit the extension?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17933
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17981
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...

2017-05-14 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/17910#discussion_r116408136
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -2318,8 +2319,8 @@ class LogisticRegressionSuite
   assert(m1.interceptVector ~== m2.interceptVector absTol 0.05)
 }
 val testParams = Seq(
-  ("binomial", smallBinaryDataset, 2),
-  ("multinomial", smallMultinomialDataset, 3)
+  ("Binomial", smallBinaryDataset, 2),
--- End diff --

The changes you made don't address this comment at all, and there are not 
tests for the suggestion from Yanbo either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17933
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76923/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17981
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76927/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17933
  
**[Test build #76923 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76923/testReport)**
 for PR 17933 at commit 
[`3cdbb3a`](https://github.com/apache/spark/commit/3cdbb3acf12b2082056e8b4e2eb3f1645fa1bde7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17981
  
**[Test build #76927 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76927/testReport)**
 for PR 17981 at commit 
[`7e383a2`](https://github.com/apache/spark/commit/7e383a2f7e488c4277ee418454d1bbc69c8c8eb2).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17978
  
**[Test build #76929 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76929/testReport)**
 for PR 17978 at commit 
[`f66a445`](https://github.com/apache/spark/commit/f66a4455aba7ffc69d1b397cb828879d84bb39a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17848
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76922/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17848
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17848
  
**[Test build #76922 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76922/testReport)**
 for PR 17848 at commit 
[`d276b44`](https://github.com/apache/spark/commit/d276b44ce3f68344ae1151c930105fe291a925ec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17978
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76928/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17978
  
**[Test build #76928 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76928/testReport)**
 for PR 17978 at commit 
[`44f0a36`](https://github.com/apache/spark/commit/44f0a362dd085022de215e9ab8d9536145f20d4d).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17978
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17924
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76920/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17978
  
**[Test build #76928 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76928/testReport)**
 for PR 17978 at commit 
[`44f0a36`](https://github.com/apache/spark/commit/44f0a362dd085022de215e9ab8d9536145f20d4d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17924
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17924
  
**[Test build #76920 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76920/testReport)**
 for PR 17924 at commit 
[`85ef731`](https://github.com/apache/spark/commit/85ef73134b7b7450e0689e138339433a30b92dea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17981
  
**[Test build #76927 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76927/testReport)**
 for PR 17981 at commit 
[`7e383a2`](https://github.com/apache/spark/commit/7e383a2f7e488c4277ee418454d1bbc69c8c8eb2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17933
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17933
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76921/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17933
  
**[Test build #76921 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76921/testReport)**
 for PR 17933 at commit 
[`7935a1a`](https://github.com/apache/spark/commit/7935a1a8d8336924e361559d7a708d73b8568e68).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17981
  
**[Test build #76926 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76926/testReport)**
 for PR 17981 at commit 
[`68041a0`](https://github.com/apache/spark/commit/68041a0db7cd391fdff22bb52636fe140012fa44).
 * This patch **fails R style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17981
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76926/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17981
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17981
  
**[Test build #76926 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76926/testReport)**
 for PR 17981 at commit 
[`68041a0`](https://github.com/apache/spark/commit/68041a0db7cd391fdff22bb52636fe140012fa44).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper i...

2017-05-14 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/17981

[SPARK-15767][ML][SparkR] Decision Tree wrapper in SparkR

## What changes were proposed in this pull request?
support decision tree in R

## How was this patch tested?
added tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark dt_r

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17981.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17981


commit 7ea29392d17e0ec5ecbd1e9c1d09c7fdc04fee35
Author: Zheng RuiFeng 
Date:   2017-05-12T10:00:36Z

create pr

commit 68041a0db7cd391fdff22bb52636fe140012fa44
Author: Zheng RuiFeng 
Date:   2017-05-15T03:07:27Z

fix wrong call




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #76925 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76925/testReport)**
 for PR 16989 at commit 
[`80b3154`](https://github.com/apache/spark/commit/80b31545a1d6b6890e3cc0d549781ca15d7d46dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17910: [SPARK-20669][ML] LogisticRegression family should be ca...

2017-05-14 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/17910
  
Ping @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17980
  
**[Test build #76924 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76924/testReport)**
 for PR 17980 at commit 
[`7716234`](https://github.com/apache/spark/commit/77162342c66ee21f784b900d892a26739631c151).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17980: [SPARK-20728][SQL] Make ORCFileFormat configurabl...

2017-05-14 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/17980

[SPARK-20728][SQL] Make ORCFileFormat configurable between sql/hive and 
sql/core

## What changes were proposed in this pull request?

[SPARK-20682](https://issues.apache.org/jira/browse/SPARK-20682) is trying 
to improve Apache Spark to have a new ORCFileFormat based on Apache ORC for 
many reasons.

On top of that, this PR depends on SPARK-20682 and aims to provide a 
configuration to choose the default ORCFileFormat from legacy `sql/hive` module 
or new `sql/core` module.

For example, this configuration will affects the following operations.
```
spark.read.orc(...)
```

```
CREATE TABLE t
USING ORC
...
```

Since SPARK-20682 (#17924 and #17943) are still under review, I'm 
inevitably including the dependent code. I'll update this and previous PR 
according to the review result. Also, in this PR, I updated 
`ParquetReadBenchmark` to help reviewers understand the state-of-the-art status 
of Apache Spark.

## How was this patch tested?

Pass the Jenkins with new test suites.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-20728

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17980.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17980


commit 77162342c66ee21f784b900d892a26739631c151
Author: Dongjoon Hyun 
Date:   2017-05-15T02:33:15Z

[SPARK-20728][SQL] Make ORCFileFormat configurable between sql/hive and 
sql/core




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17978: [SPARK-20736][Python] PySpark StringIndexer suppo...

2017-05-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17978#discussion_r116400199
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2115,22 +2115,32 @@ class StringIndexer(JavaEstimator, HasInputCol, 
HasOutputCol, HasHandleInvalid,
 .. versionadded:: 1.4.0
 """
 
+stringOrderType = Param(Params._dummy(), "stringOrderType",
+"How to order labels of string column. The 
first label after " +
+"ordering is assigned an index of 0. Supported 
options: " +
+"frequencyDesc, frequencyAsc, alphabetDsec, 
alphabetAsc.",
--- End diff --

alphabetDsec -> alphabetDesc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17978
  
Code changes looks good. But we need to add test for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17848
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76919/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17848
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17848
  
**[Test build #76919 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76919/testReport)**
 for PR 17848 at commit 
[`387af4b`](https://github.com/apache/spark/commit/387af4b98b3b32a89904d05678eb58d76852160c).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17848
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17848
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76918/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17848
  
**[Test build #76918 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76918/testReport)**
 for PR 17848 at commit 
[`c496b62`](https://github.com/apache/spark/commit/c496b6219e58fcd6d223eb2579087a76ce911310).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17933
  
**[Test build #76923 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76923/testReport)**
 for PR 17933 at commit 
[`3cdbb3a`](https://github.com/apache/spark/commit/3cdbb3acf12b2082056e8b4e2eb3f1645fa1bde7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17933
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17933#discussion_r116398935
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -20,9 +20,12 @@ package org.apache.spark.sql.catalyst.util
 import java.sql.{Date, Timestamp}
 import java.text.{DateFormat, SimpleDateFormat}
 import java.util.{Calendar, Locale, TimeZone}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.function.{Function => JFunction}
 import javax.xml.bind.DatatypeConverter
 
 import scala.annotation.tailrec
+import scala.collection.mutable
--- End diff --

Thanks, I'll remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17933#discussion_r116398915
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -98,6 +101,15 @@ object DateTimeUtils {
 sdf
   }
 
+  private val computedTimeZones = new ConcurrentHashMap[String, TimeZone]
+  private val computeTimeZone = new JFunction[String, TimeZone] {
+override def apply(timeZoneId: String): TimeZone = 
TimeZone.getTimeZone(timeZoneId)
+  }
+
+  def getTimeZone(timeZoneId: String): TimeZone = {
+computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone)
--- End diff --

I believe Java 7 support was removed as of Spark 2.2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17941: [SPARK-20684][R] Expose createGlobalTempView and dropGlo...

2017-05-14 Thread falaki
Github user falaki commented on the issue:

https://github.com/apache/spark/pull/17941
  
@felixcheung we all know that SparkR (and in general R) API is not perfect 
when it comes to ETLing unstructured data. For example we don't have a great 
story for nested data, etc. To overcome these limitations many ETL their data 
in Python or Scala and then analyze them in R.

With introduction of sessions that workflow is partially broken. You can 
still do it but you need to persist the table. The global temp view is to solve 
that problem. It exists in PySpark, so I think it deserves to exist in SparkR 
as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17848
  
**[Test build #76922 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76922/testReport)**
 for PR 17848 at commit 
[`d276b44`](https://github.com/apache/spark/commit/d276b44ce3f68344ae1151c930105fe291a925ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17933#discussion_r116398563
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -20,9 +20,12 @@ package org.apache.spark.sql.catalyst.util
 import java.sql.{Date, Timestamp}
 import java.text.{DateFormat, SimpleDateFormat}
 import java.util.{Calendar, Locale, TimeZone}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.function.{Function => JFunction}
 import javax.xml.bind.DatatypeConverter
 
 import scala.annotation.tailrec
+import scala.collection.mutable
--- End diff --

We can remove this now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17933#discussion_r116398454
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -98,6 +101,15 @@ object DateTimeUtils {
 sdf
   }
 
+  private val computedTimeZones = new ConcurrentHashMap[String, TimeZone]
+  private val computeTimeZone = new JFunction[String, TimeZone] {
+override def apply(timeZoneId: String): TimeZone = 
TimeZone.getTimeZone(timeZoneId)
+  }
+
+  def getTimeZone(timeZoneId: String): TimeZone = {
+computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone)
--- End diff --

Is Java 7 support completely removed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17936: [SPARK-20638][Core]Optimize the CartesianRDD to reduce r...

2017-05-14 Thread ConeyLiu
Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/17936
  
Yeah, I can test it.  You see, the `ALS` is an pratical use case. So, 
choose it as a test case more convincing. And I also want to see the 
improvement of this `pr` even after merged #17742.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17933#discussion_r116398184
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -98,6 +99,14 @@ object DateTimeUtils {
 sdf
   }
 
+  private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, 
TimeZone]] {
--- End diff --

Sounds good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

2017-05-14 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/17858
  
Thank you all. Delete the branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17979: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17979
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17979: [SPARK-19320][MESOS][WIP]allow specifying a hard ...

2017-05-14 Thread yanji84
GitHub user yanji84 opened a pull request:

https://github.com/apache/spark/pull/17979

[SPARK-19320][MESOS][WIP]allow specifying a hard limit on number of gpus 
required in each spark executor when running on mesos

## What changes were proposed in this pull request?

Currently, Spark only allows specifying overall gpu resources as an upper 
limit, this adds a new conf parameter to allow specifying a hard limit on the 
number of gpu cores for each executor while still respecting the overall gpu 
resource constraint

## How was this patch tested?

Unit testing

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanji84/spark ji/set_allow_set_docker_user

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17979.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17979


commit 5f8ccd5789137363e035d1dfb9a05d3b9bf3ce6b
Author: Ji Yan 
Date:   2017-03-10T05:30:11Z

respect both gpu and maxgpu

commit 33ebff693d9b78a15221f931dbbca777cba944e0
Author: Ji Yan 
Date:   2017-03-10T05:43:21Z

Merge branch 'master' into ji/hard_limit_on_gpu

commit c2c1c5b66436a439e1d7342b7a2c58c502e26d6b
Author: Ji Yan 
Date:   2017-03-10T05:30:11Z

respect both gpu and maxgpu

commit c5c5c379fc27f579952700fdf2d15dae9eba104a
Author: Ji Yan 
Date:   2017-05-13T16:25:48Z

Merge branch 'ji/hard_limit_on_gpu' of https://github.com/yanji84/spark 
into ji/hard_limit_on_gpu

commit ba87b35817a7288b9b6aa41f4ac2244e235f2efd
Author: Ji Yan 
Date:   2017-05-13T16:53:59Z

fix syntax

commit 5ef2881a2b1e1180b73d532988bab72c5fdab64c
Author: Ji Yan 
Date:   2017-05-14T20:02:16Z

fix gpu offer

commit c301f3d1e05cc7359142a6cfb8222ad65cbdd9eb
Author: Ji Yan 
Date:   2017-05-14T20:15:55Z

syntax fix

commit 7a07742f4e004e0e88aa2b3bc5143adab3689644
Author: Ji Yan 
Date:   2017-05-15T00:30:50Z

pass all tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17936: [SPARK-20638][Core]Optimize the CartesianRDD to reduce r...

2017-05-14 Thread jtengyp
Github user jtengyp commented on the issue:

https://github.com/apache/spark/pull/17936
  
I think  you@ConeyLiu  should directly test the Cartesian phase with the 
following patch.

val user = model.userFeatures
val item = model.productFeatures
val start = System.nanoTime()
val rate = user.cartesian(item)
println(rate.count())
val time = (System.nanoTime() - start) / 1e9

The recommendForAll in mllib ALS has been merged a new PR#17742. Your PR 
may not fit this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17898: [SPARK-20638][Core]Optimize the CartesianRDD to r...

2017-05-14 Thread jtengyp
Github user jtengyp closed the pull request at:

https://github.com/apache/spark/pull/17898


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17933
  
**[Test build #76921 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76921/testReport)**
 for PR 17933 at commit 
[`7935a1a`](https://github.com/apache/spark/commit/7935a1a8d8336924e361559d7a708d73b8568e68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances per t...

2017-05-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17933#discussion_r116396203
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -98,6 +99,14 @@ object DateTimeUtils {
 sdf
   }
 
+  private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, 
TimeZone]] {
--- End diff --

That's a good point.
How about using `ConcurrentHashMap` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17924
  
**[Test build #76920 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76920/testReport)**
 for PR 17924 at commit 
[`85ef731`](https://github.com/apache/spark/commit/85ef73134b7b7450e0689e138339433a30b92dea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...

2017-05-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17924
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17978
  
(I am not used to ML. I just left a trivial comment for Python.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17978: [SPARK-20736][Python] PySpark StringIndexer suppo...

2017-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17978#discussion_r116395911
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2115,22 +2115,32 @@ class StringIndexer(JavaEstimator, HasInputCol, 
HasOutputCol, HasHandleInvalid,
 .. versionadded:: 1.4.0
 """
 
+stringOrderType = Param(Params._dummy(), "stringOrderType",
+"How to order labels of string column. The 
first label after " +
+"ordering is assigned an index of 0. Supported 
options: " +
+"frequencyDesc, frequencyAsc, alphabetDsec, 
alphabetAsc.",
+typeConverter=TypeConverters.toString)
+
 @keyword_only
-def __init__(self, inputCol=None, outputCol=None, 
handleInvalid="error"):
+def __init__(self, inputCol=None, outputCol=None, 
handleInvalid="error",
+ stringOrderType="frequencyDesc"):
 """
-__init__(self, inputCol=None, outputCol=None, 
handleInvalid="error")
+__init__(self, inputCol=None, outputCol=None, 
handleInvalid="error", \
--- End diff --

(Probably, the leading `\` could be removed.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17978: [SPARK-20736][Python] PySpark StringIndexer suppo...

2017-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17978#discussion_r116395876
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2115,22 +2115,32 @@ class StringIndexer(JavaEstimator, HasInputCol, 
HasOutputCol, HasHandleInvalid,
 .. versionadded:: 1.4.0
 """
 
+stringOrderType = Param(Params._dummy(), "stringOrderType",
+"How to order labels of string column. The 
first label after " +
+"ordering is assigned an index of 0. Supported 
options: " +
+"frequencyDesc, frequencyAsc, alphabetDsec, 
alphabetAsc.",
+typeConverter=TypeConverters.toString)
+
 @keyword_only
-def __init__(self, inputCol=None, outputCol=None, 
handleInvalid="error"):
+def __init__(self, inputCol=None, outputCol=None, 
handleInvalid="error",
+ stringOrderType="frequencyDesc"):
 """
-__init__(self, inputCol=None, outputCol=None, 
handleInvalid="error")
+__init__(self, inputCol=None, outputCol=None, 
handleInvalid="error", \
--- End diff --

Probably, the leading `\` could be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17848
  
**[Test build #76919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76919/testReport)**
 for PR 17848 at commit 
[`387af4b`](https://github.com/apache/spark/commit/387af4b98b3b32a89904d05678eb58d76852160c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17848
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17848
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76915/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17848
  
**[Test build #76915 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76915/testReport)**
 for PR 17848 at commit 
[`00b4dff`](https://github.com/apache/spark/commit/00b4dff4e4b57f1406d99957655e2cb3bd85ad8e).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `throw new IOException(s\"UDF class $className doesn't 
implement any UDF interface\")`
  * `throw new IOException(s\"It is invalid to implement multiple 
UDF interfaces, UDF class $className\")`
  * `case n => logError(s\"UDF class with $n type arguments is 
not supported \")`
  * `logError(s\"Can not instantiate class $className, please 
make sure it has public non argument constructor\")`
  * `  case e: ClassNotFoundException => logError(s\"Can not load class 
$className, please make sure it is on the classpath\")`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic and distinc...

2017-05-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17848#discussion_r116395129
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaUDFSuite.java ---
@@ -104,5 +105,36 @@ public void udf4Test() {
   sum += result.getLong(0);
 }
 Assert.assertEquals(55, sum);
+Assert.assertTrue("EXPLAIN outputs are expected to contain the UDF 
name.",
+spark.sql("EXPLAIN SELECT inc(1) AS 
f").collectAsList().toString().contains("inc"));
--- End diff --

This is to fix the issue of name loss for JavaUDF. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17848
  
**[Test build #76918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76918/testReport)**
 for PR 17848 at commit 
[`c496b62`](https://github.com/apache/spark/commit/c496b6219e58fcd6d223eb2579087a76ce911310).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17978
  
@viirya @MLnick @BryanCutler @yinxusen @brkyvz @HyukjinKwon @srowen 
Ping for reviews or comments. Thanks much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17978
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17978
  
**[Test build #76917 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76917/testReport)**
 for PR 17978 at commit 
[`1f336ab`](https://github.com/apache/spark/commit/1f336ab70719f4074f4ac69cc0bb4750723b0bd5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17978
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76917/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17978
  
**[Test build #76917 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76917/testReport)**
 for PR 17978 at commit 
[`1f336ab`](https://github.com/apache/spark/commit/1f336ab70719f4074f4ac69cc0bb4750723b0bd5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...

2017-05-14 Thread yssharma
Github user yssharma commented on the issue:

https://github.com/apache/spark/pull/17467
  
@budde @brkyvz - Any feed back on this one please ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-14 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17967
  
@felixcheung Once this PR gets in, I'll update the SparkR side and include 
some test. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17978
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76916/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17978
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17978
  
**[Test build #76916 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76916/testReport)**
 for PR 17978 at commit 
[`bd80b37`](https://github.com/apache/spark/commit/bd80b37d9728624c6455ceca12198ce763b32a91).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

2017-05-14 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17969#discussion_r116393810
  
--- Diff: R/pkg/R/mllib_wrapper.R ---
@@ -0,0 +1,61 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+#' S4 class that represents a Java ML model
+#'
+#' @param jobj a Java object reference to the backing Scala model
+#' @export
+#' @note JavaModel since 2.3.0
+setClass("JavaModel", representation(jobj = "jobj"))
+
+#' Makes predictions from a Java ML model
+#'
+#' @param object a Spark ML model.
+#' @param newData a SparkDataFrame for testing.
+#' @return \code{predict} returns a SparkDataFrame containing predicted 
value.
+#' @rdname spark.predict
+#' @aliases predict,JavaModel-method
--- End diff --

I believe there is no conflict here. If you find this useful you can use 
templates to include additional information about generic operations. Very 
simple example 
https://github.com/zero323/spark/commit/64a3e854792181e159d39b9e747170b707f2711d

which would create section like this:


![image](https://cloud.githubusercontent.com/assets/1554276/26038702/72b70280-390e-11e7-922c-0d1dece4816e.png)

This can be further parametrized if needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17978
  
**[Test build #76916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76916/testReport)**
 for PR 17978 at commit 
[`bd80b37`](https://github.com/apache/spark/commit/bd80b37d9728624c6455ceca12198ce763b32a91).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17848
  
@zero323 
- When `x` is non-deterministic, all the expressions that are derived from 
`x` (i.e., `y_i`, `z_i`, `v_i`) will be non-deterministic. 
- When `x` is first materialized and computed, that means, the generated 
columns are deterministic. Thus, the results will be consistent.

Not sure whether it answers your concern?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-05-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17848
  
**[Test build #76915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76915/testReport)**
 for PR 17848 at commit 
[`00b4dff`](https://github.com/apache/spark/commit/00b4dff4e4b57f1406d99957655e2cb3bd85ad8e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >