[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/17644#discussion_r116414803 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -307,6 +307,27 @@ case class InsertIntoHiveTable( } } +table.bucketSpec match { + case Some(bucketSpec) => +// Writes to bucketed hive tables are allowed only if user does not care about maintaining +// table's bucketing ie. both "hive.enforce.bucketing" and "hive.enforce.sorting" are +// set to false +val enforceBucketingConfig = "hive.enforce.bucketing" +val enforceSortingConfig = "hive.enforce.sorting" + +val message = s"Output Hive table ${table.identifier} is bucketed but Spark" + + "currently does NOT populate bucketed output which is compatible with Hive." + +if (hadoopConf.get(enforceBucketingConfig, "true").toBoolean || + hadoopConf.get(enforceSortingConfig, "true").toBoolean) { + throw new AnalysisException(message) +} else { + logWarning(message + s" Inserting data anyways since both $enforceBucketingConfig and " + +s"$enforceSortingConfig are set to false.") --- End diff -- In hive: It would lead to wrong result. In spark (over master and also after this PR): the table scan operation does not take bucketing into account so it would be read as a regular table. So, it won't be read "wrong", its just that we wont take advantage of bucketing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17980: [SPARK-20728][SQL] Make ORCFileFormat configurabl...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17980#discussion_r116414546 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/DDLSourceLoadSuite.scala --- @@ -55,10 +56,12 @@ class DDLSourceLoadSuite extends DataSourceTest with SharedSQLContext { } test("should fail to load ORC without Hive Support") { --- End diff -- We can remove this test case when we remove `sql/hive` ORCFileFormat. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76925/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16989 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16989 **[Test build #76925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76925/testReport)** for PR 16989 at commit [`80b3154`](https://github.com/apache/spark/commit/80b31545a1d6b6890e3cc0d549781ca15d7d46dc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17980 **[Test build #76931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76931/testReport)** for PR 17980 at commit [`73d56f2`](https://github.com/apache/spark/commit/73d56f2f9e3cb91a93a555654a6f9e9933e9ef7a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17644#discussion_r116412797 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -307,6 +307,27 @@ case class InsertIntoHiveTable( } } +table.bucketSpec match { + case Some(bucketSpec) => +// Writes to bucketed hive tables are allowed only if user does not care about maintaining +// table's bucketing ie. both "hive.enforce.bucketing" and "hive.enforce.sorting" are +// set to false +val enforceBucketingConfig = "hive.enforce.bucketing" +val enforceSortingConfig = "hive.enforce.sorting" + +val message = s"Output Hive table ${table.identifier} is bucketed but Spark" + + "currently does NOT populate bucketed output which is compatible with Hive." + +if (hadoopConf.get(enforceBucketingConfig, "true").toBoolean || + hadoopConf.get(enforceSortingConfig, "true").toBoolean) { + throw new AnalysisException(message) +} else { + logWarning(message + s" Inserting data anyways since both $enforceBucketingConfig and " + +s"$enforceSortingConfig are set to false.") --- End diff -- so after insertion(if not enforcing), the table is still a buckted table but read it will cause wrong result? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17981 **[Test build #76930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76930/testReport)** for PR 17981 at commit [`7e383a2`](https://github.com/apache/spark/commit/7e383a2f7e488c4277ee418454d1bbc69c8c8eb2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/17981 Jenkins, please retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17978: [SPARK-20736][Python] PySpark StringIndexer suppo...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/17978#discussion_r116411579 --- Diff: python/pyspark/ml/feature.py --- @@ -2115,22 +2115,32 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid, .. versionadded:: 1.4.0 """ +stringOrderType = Param(Params._dummy(), "stringOrderType", +"How to order labels of string column. The first label after " + +"ordering is assigned an index of 0. Supported options: " + +"frequencyDesc, frequencyAsc, alphabetDsec, alphabetAsc.", +typeConverter=TypeConverters.toString) + @keyword_only -def __init__(self, inputCol=None, outputCol=None, handleInvalid="error"): +def __init__(self, inputCol=None, outputCol=None, handleInvalid="error", + stringOrderType="frequencyDesc"): """ -__init__(self, inputCol=None, outputCol=None, handleInvalid="error") +__init__(self, inputCol=None, outputCol=None, handleInvalid="error", \ --- End diff -- @HyukjinKwon Thank you. Added tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17978 @viirya Thanks much for your review. I corrected the typo and added some tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17980 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16598: [SPARK-19236][Core] Added createOrReplaceGlobalTe...
Github user arman1371 commented on a diff in the pull request: https://github.com/apache/spark/pull/16598#discussion_r116410932 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2603,6 +2603,21 @@ class Dataset[T] private[sql]( def createGlobalTempView(viewName: String): Unit = withPlan { createTempViewCommand(viewName, replace = false, global = true) } + + /** + * Creates or replaces a global temporary view using the given name. The lifetime of this + * temporary view is tied to this Spark application. + * + * Global temporary view is cross-session. Its lifetime is the lifetime of the Spark application, + * i.e. it will be automatically dropped when the application terminates. It's tied to a system + * preserved database `_global_temp`, and we must use the qualified name to refer a global temp + * view, e.g. `SELECT * FROM _global_temp.view1`. + * + * @group basic --- End diff -- The createOrReplaceGlobalTempView method is not in java API @rxin said it should be added since 2.1.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17980 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76924/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17980 **[Test build #76924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76924/testReport)** for PR 17980 at commit [`7716234`](https://github.com/apache/spark/commit/77162342c66ee21f784b900d892a26739631c151). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17978 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76929/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17978 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17978 **[Test build #76929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76929/testReport)** for PR 17978 at commit [`f66a445`](https://github.com/apache/spark/commit/f66a4455aba7ffc69d1b397cb828879d84bb39a6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17973: [SPARK-20731][SQL] Add ability to change or omit ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17973#discussion_r116408851 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -622,6 +622,31 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { } } + test("save tsv with tsv suffix") { +withTempDir { dir => + val csvDir = new File(dir, "csv").getCanonicalPath + val cars = spark.read +.format("csv") +.option("header", "true") +.load(testFile(carsFile)) + + cars.coalesce(1).write +.option("header", "true") +.option("fileExtension", ".tsv") +.option("delimiter", "\t") --- End diff -- Just curious what is the reason you need to omit the extension? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17933 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17981 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17910: [SPARK-20669][ML] LogisticRegression family shoul...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17910#discussion_r116408136 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -2318,8 +2319,8 @@ class LogisticRegressionSuite assert(m1.interceptVector ~== m2.interceptVector absTol 0.05) } val testParams = Seq( - ("binomial", smallBinaryDataset, 2), - ("multinomial", smallMultinomialDataset, 3) + ("Binomial", smallBinaryDataset, 2), --- End diff -- The changes you made don't address this comment at all, and there are not tests for the suggestion from Yanbo either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17933 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76923/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17981 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76927/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17933 **[Test build #76923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76923/testReport)** for PR 17933 at commit [`3cdbb3a`](https://github.com/apache/spark/commit/3cdbb3acf12b2082056e8b4e2eb3f1645fa1bde7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17981 **[Test build #76927 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76927/testReport)** for PR 17981 at commit [`7e383a2`](https://github.com/apache/spark/commit/7e383a2f7e488c4277ee418454d1bbc69c8c8eb2). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17978 **[Test build #76929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76929/testReport)** for PR 17978 at commit [`f66a445`](https://github.com/apache/spark/commit/f66a4455aba7ffc69d1b397cb828879d84bb39a6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76922/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #76922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76922/testReport)** for PR 17848 at commit [`d276b44`](https://github.com/apache/spark/commit/d276b44ce3f68344ae1151c930105fe291a925ec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17978 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76928/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17978 **[Test build #76928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76928/testReport)** for PR 17978 at commit [`44f0a36`](https://github.com/apache/spark/commit/44f0a362dd085022de215e9ab8d9536145f20d4d). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17978 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76920/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17978 **[Test build #76928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76928/testReport)** for PR 17978 at commit [`44f0a36`](https://github.com/apache/spark/commit/44f0a362dd085022de215e9ab8d9536145f20d4d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17924 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76920/testReport)** for PR 17924 at commit [`85ef731`](https://github.com/apache/spark/commit/85ef73134b7b7450e0689e138339433a30b92dea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17981 **[Test build #76927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76927/testReport)** for PR 17981 at commit [`7e383a2`](https://github.com/apache/spark/commit/7e383a2f7e488c4277ee418454d1bbc69c8c8eb2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17933 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17933 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76921/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17933 **[Test build #76921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76921/testReport)** for PR 17933 at commit [`7935a1a`](https://github.com/apache/spark/commit/7935a1a8d8336924e361559d7a708d73b8568e68). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17981 **[Test build #76926 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76926/testReport)** for PR 17981 at commit [`68041a0`](https://github.com/apache/spark/commit/68041a0db7cd391fdff22bb52636fe140012fa44). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17981 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76926/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17981 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper in Spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17981 **[Test build #76926 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76926/testReport)** for PR 17981 at commit [`68041a0`](https://github.com/apache/spark/commit/68041a0db7cd391fdff22bb52636fe140012fa44). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17981: [SPARK-15767][ML][SparkR] Decision Tree wrapper i...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/17981 [SPARK-15767][ML][SparkR] Decision Tree wrapper in SparkR ## What changes were proposed in this pull request? support decision tree in R ## How was this patch tested? added tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark dt_r Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17981.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17981 commit 7ea29392d17e0ec5ecbd1e9c1d09c7fdc04fee35 Author: Zheng RuiFengDate: 2017-05-12T10:00:36Z create pr commit 68041a0db7cd391fdff22bb52636fe140012fa44 Author: Zheng RuiFeng Date: 2017-05-15T03:07:27Z fix wrong call --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16989 **[Test build #76925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76925/testReport)** for PR 16989 at commit [`80b3154`](https://github.com/apache/spark/commit/80b31545a1d6b6890e3cc0d549781ca15d7d46dc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17910: [SPARK-20669][ML] LogisticRegression family should be ca...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/17910 Ping @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17980 **[Test build #76924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76924/testReport)** for PR 17980 at commit [`7716234`](https://github.com/apache/spark/commit/77162342c66ee21f784b900d892a26739631c151). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17980: [SPARK-20728][SQL] Make ORCFileFormat configurabl...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/17980 [SPARK-20728][SQL] Make ORCFileFormat configurable between sql/hive and sql/core ## What changes were proposed in this pull request? [SPARK-20682](https://issues.apache.org/jira/browse/SPARK-20682) is trying to improve Apache Spark to have a new ORCFileFormat based on Apache ORC for many reasons. On top of that, this PR depends on SPARK-20682 and aims to provide a configuration to choose the default ORCFileFormat from legacy `sql/hive` module or new `sql/core` module. For example, this configuration will affects the following operations. ``` spark.read.orc(...) ``` ``` CREATE TABLE t USING ORC ... ``` Since SPARK-20682 (#17924 and #17943) are still under review, I'm inevitably including the dependent code. I'll update this and previous PR according to the review result. Also, in this PR, I updated `ParquetReadBenchmark` to help reviewers understand the state-of-the-art status of Apache Spark. ## How was this patch tested? Pass the Jenkins with new test suites. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-20728 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17980.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17980 commit 77162342c66ee21f784b900d892a26739631c151 Author: Dongjoon HyunDate: 2017-05-15T02:33:15Z [SPARK-20728][SQL] Make ORCFileFormat configurable between sql/hive and sql/core --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17978: [SPARK-20736][Python] PySpark StringIndexer suppo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17978#discussion_r116400199 --- Diff: python/pyspark/ml/feature.py --- @@ -2115,22 +2115,32 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid, .. versionadded:: 1.4.0 """ +stringOrderType = Param(Params._dummy(), "stringOrderType", +"How to order labels of string column. The first label after " + +"ordering is assigned an index of 0. Supported options: " + +"frequencyDesc, frequencyAsc, alphabetDsec, alphabetAsc.", --- End diff -- alphabetDsec -> alphabetDesc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17978 Code changes looks good. But we need to add test for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76919/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #76919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76919/testReport)** for PR 17848 at commit [`387af4b`](https://github.com/apache/spark/commit/387af4b98b3b32a89904d05678eb58d76852160c). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76918/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #76918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76918/testReport)** for PR 17848 at commit [`c496b62`](https://github.com/apache/spark/commit/c496b6219e58fcd6d223eb2579087a76ce911310). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17933 **[Test build #76923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76923/testReport)** for PR 17933 at commit [`3cdbb3a`](https://github.com/apache/spark/commit/3cdbb3acf12b2082056e8b4e2eb3f1645fa1bde7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17933 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/17933#discussion_r116398935 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -20,9 +20,12 @@ package org.apache.spark.sql.catalyst.util import java.sql.{Date, Timestamp} import java.text.{DateFormat, SimpleDateFormat} import java.util.{Calendar, Locale, TimeZone} +import java.util.concurrent.ConcurrentHashMap +import java.util.function.{Function => JFunction} import javax.xml.bind.DatatypeConverter import scala.annotation.tailrec +import scala.collection.mutable --- End diff -- Thanks, I'll remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/17933#discussion_r116398915 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -98,6 +101,15 @@ object DateTimeUtils { sdf } + private val computedTimeZones = new ConcurrentHashMap[String, TimeZone] + private val computeTimeZone = new JFunction[String, TimeZone] { +override def apply(timeZoneId: String): TimeZone = TimeZone.getTimeZone(timeZoneId) + } + + def getTimeZone(timeZoneId: String): TimeZone = { +computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone) --- End diff -- I believe Java 7 support was removed as of Spark 2.2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17941: [SPARK-20684][R] Expose createGlobalTempView and dropGlo...
Github user falaki commented on the issue: https://github.com/apache/spark/pull/17941 @felixcheung we all know that SparkR (and in general R) API is not perfect when it comes to ETLing unstructured data. For example we don't have a great story for nested data, etc. To overcome these limitations many ETL their data in Python or Scala and then analyze them in R. With introduction of sessions that workflow is partially broken. You can still do it but you need to persist the table. The global temp view is to solve that problem. It exists in PySpark, so I think it deserves to exist in SparkR as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #76922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76922/testReport)** for PR 17848 at commit [`d276b44`](https://github.com/apache/spark/commit/d276b44ce3f68344ae1151c930105fe291a925ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17933#discussion_r116398563 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -20,9 +20,12 @@ package org.apache.spark.sql.catalyst.util import java.sql.{Date, Timestamp} import java.text.{DateFormat, SimpleDateFormat} import java.util.{Calendar, Locale, TimeZone} +import java.util.concurrent.ConcurrentHashMap +import java.util.function.{Function => JFunction} import javax.xml.bind.DatatypeConverter import scala.annotation.tailrec +import scala.collection.mutable --- End diff -- We can remove this now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17933#discussion_r116398454 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -98,6 +101,15 @@ object DateTimeUtils { sdf } + private val computedTimeZones = new ConcurrentHashMap[String, TimeZone] + private val computeTimeZone = new JFunction[String, TimeZone] { +override def apply(timeZoneId: String): TimeZone = TimeZone.getTimeZone(timeZoneId) + } + + def getTimeZone(timeZoneId: String): TimeZone = { +computedTimeZones.computeIfAbsent(timeZoneId, computeTimeZone) --- End diff -- Is Java 7 support completely removed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17936: [SPARK-20638][Core]Optimize the CartesianRDD to reduce r...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/17936 Yeah, I can test it. You see, the `ALS` is an pratical use case. So, choose it as a test case more convincing. And I also want to see the improvement of this `pr` even after merged #17742. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17933#discussion_r116398184 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -98,6 +99,14 @@ object DateTimeUtils { sdf } + private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, TimeZone]] { --- End diff -- Sounds good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/17858 Thank you all. Delete the branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17979: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17979 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17979: [SPARK-19320][MESOS][WIP]allow specifying a hard ...
GitHub user yanji84 opened a pull request: https://github.com/apache/spark/pull/17979 [SPARK-19320][MESOS][WIP]allow specifying a hard limit on number of gpus required in each spark executor when running on mesos ## What changes were proposed in this pull request? Currently, Spark only allows specifying overall gpu resources as an upper limit, this adds a new conf parameter to allow specifying a hard limit on the number of gpu cores for each executor while still respecting the overall gpu resource constraint ## How was this patch tested? Unit testing Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanji84/spark ji/set_allow_set_docker_user Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17979.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17979 commit 5f8ccd5789137363e035d1dfb9a05d3b9bf3ce6b Author: Ji YanDate: 2017-03-10T05:30:11Z respect both gpu and maxgpu commit 33ebff693d9b78a15221f931dbbca777cba944e0 Author: Ji Yan Date: 2017-03-10T05:43:21Z Merge branch 'master' into ji/hard_limit_on_gpu commit c2c1c5b66436a439e1d7342b7a2c58c502e26d6b Author: Ji Yan Date: 2017-03-10T05:30:11Z respect both gpu and maxgpu commit c5c5c379fc27f579952700fdf2d15dae9eba104a Author: Ji Yan Date: 2017-05-13T16:25:48Z Merge branch 'ji/hard_limit_on_gpu' of https://github.com/yanji84/spark into ji/hard_limit_on_gpu commit ba87b35817a7288b9b6aa41f4ac2244e235f2efd Author: Ji Yan Date: 2017-05-13T16:53:59Z fix syntax commit 5ef2881a2b1e1180b73d532988bab72c5fdab64c Author: Ji Yan Date: 2017-05-14T20:02:16Z fix gpu offer commit c301f3d1e05cc7359142a6cfb8222ad65cbdd9eb Author: Ji Yan Date: 2017-05-14T20:15:55Z syntax fix commit 7a07742f4e004e0e88aa2b3bc5143adab3689644 Author: Ji Yan Date: 2017-05-15T00:30:50Z pass all tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17936: [SPARK-20638][Core]Optimize the CartesianRDD to reduce r...
Github user jtengyp commented on the issue: https://github.com/apache/spark/pull/17936 I think you@ConeyLiu should directly test the Cartesian phase with the following patch. val user = model.userFeatures val item = model.productFeatures val start = System.nanoTime() val rate = user.cartesian(item) println(rate.count()) val time = (System.nanoTime() - start) / 1e9 The recommendForAll in mllib ALS has been merged a new PR#17742. Your PR may not fit this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17898: [SPARK-20638][Core]Optimize the CartesianRDD to r...
Github user jtengyp closed the pull request at: https://github.com/apache/spark/pull/17898 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17933: [SPARK-20588][SQL] Cache TimeZone instances.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17933 **[Test build #76921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76921/testReport)** for PR 17933 at commit [`7935a1a`](https://github.com/apache/spark/commit/7935a1a8d8336924e361559d7a708d73b8568e68). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17933: [SPARK-20588][SQL] Cache TimeZone instances per t...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/17933#discussion_r116396203 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -98,6 +99,14 @@ object DateTimeUtils { sdf } + private val threadLocalTimeZones = new ThreadLocal[mutable.Map[String, TimeZone]] { --- End diff -- That's a good point. How about using `ConcurrentHashMap` instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17924 **[Test build #76920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76920/testReport)** for PR 17924 at commit [`85ef731`](https://github.com/apache/spark/commit/85ef73134b7b7450e0689e138339433a30b92dea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17924: [SPARK-20682][SQL] Support a new faster ORC data source ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17924 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17978 (I am not used to ML. I just left a trivial comment for Python.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17978: [SPARK-20736][Python] PySpark StringIndexer suppo...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17978#discussion_r116395911 --- Diff: python/pyspark/ml/feature.py --- @@ -2115,22 +2115,32 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid, .. versionadded:: 1.4.0 """ +stringOrderType = Param(Params._dummy(), "stringOrderType", +"How to order labels of string column. The first label after " + +"ordering is assigned an index of 0. Supported options: " + +"frequencyDesc, frequencyAsc, alphabetDsec, alphabetAsc.", +typeConverter=TypeConverters.toString) + @keyword_only -def __init__(self, inputCol=None, outputCol=None, handleInvalid="error"): +def __init__(self, inputCol=None, outputCol=None, handleInvalid="error", + stringOrderType="frequencyDesc"): """ -__init__(self, inputCol=None, outputCol=None, handleInvalid="error") +__init__(self, inputCol=None, outputCol=None, handleInvalid="error", \ --- End diff -- (Probably, the leading `\` could be removed.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17978: [SPARK-20736][Python] PySpark StringIndexer suppo...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17978#discussion_r116395876 --- Diff: python/pyspark/ml/feature.py --- @@ -2115,22 +2115,32 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid, .. versionadded:: 1.4.0 """ +stringOrderType = Param(Params._dummy(), "stringOrderType", +"How to order labels of string column. The first label after " + +"ordering is assigned an index of 0. Supported options: " + +"frequencyDesc, frequencyAsc, alphabetDsec, alphabetAsc.", +typeConverter=TypeConverters.toString) + @keyword_only -def __init__(self, inputCol=None, outputCol=None, handleInvalid="error"): +def __init__(self, inputCol=None, outputCol=None, handleInvalid="error", + stringOrderType="frequencyDesc"): """ -__init__(self, inputCol=None, outputCol=None, handleInvalid="error") +__init__(self, inputCol=None, outputCol=None, handleInvalid="error", \ --- End diff -- Probably, the leading `\` could be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #76919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76919/testReport)** for PR 17848 at commit [`387af4b`](https://github.com/apache/spark/commit/387af4b98b3b32a89904d05678eb58d76852160c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76915/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #76915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76915/testReport)** for PR 17848 at commit [`00b4dff`](https://github.com/apache/spark/commit/00b4dff4e4b57f1406d99957655e2cb3bd85ad8e). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `throw new IOException(s\"UDF class $className doesn't implement any UDF interface\")` * `throw new IOException(s\"It is invalid to implement multiple UDF interfaces, UDF class $className\")` * `case n => logError(s\"UDF class with $n type arguments is not supported \")` * `logError(s\"Can not instantiate class $className, please make sure it has public non argument constructor\")` * ` case e: ClassNotFoundException => logError(s\"Can not load class $className, please make sure it is on the classpath\")` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic and distinc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r116395129 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaUDFSuite.java --- @@ -104,5 +105,36 @@ public void udf4Test() { sum += result.getLong(0); } Assert.assertEquals(55, sum); +Assert.assertTrue("EXPLAIN outputs are expected to contain the UDF name.", +spark.sql("EXPLAIN SELECT inc(1) AS f").collectAsList().toString().contains("inc")); --- End diff -- This is to fix the issue of name loss for JavaUDF. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #76918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76918/testReport)** for PR 17848 at commit [`c496b62`](https://github.com/apache/spark/commit/c496b6219e58fcd6d223eb2579087a76ce911310). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17978 @viirya @MLnick @BryanCutler @yinxusen @brkyvz @HyukjinKwon @srowen Ping for reviews or comments. Thanks much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17978 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17978 **[Test build #76917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76917/testReport)** for PR 17978 at commit [`1f336ab`](https://github.com/apache/spark/commit/1f336ab70719f4074f4ac69cc0bb4750723b0bd5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17978 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76917/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17978 **[Test build #76917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76917/testReport)** for PR 17978 at commit [`1f336ab`](https://github.com/apache/spark/commit/1f336ab70719f4074f4ac69cc0bb4750723b0bd5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/17467 @budde @brkyvz - Any feed back on this one please ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17967 @felixcheung Once this PR gets in, I'll update the SparkR side and include some test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17978 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76916/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17978 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17978 **[Test build #76916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76916/testReport)** for PR 17978 at commit [`bd80b37`](https://github.com/apache/spark/commit/bd80b37d9728624c6455ceca12198ce763b32a91). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17969#discussion_r116393810 --- Diff: R/pkg/R/mllib_wrapper.R --- @@ -0,0 +1,61 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +#' S4 class that represents a Java ML model +#' +#' @param jobj a Java object reference to the backing Scala model +#' @export +#' @note JavaModel since 2.3.0 +setClass("JavaModel", representation(jobj = "jobj")) + +#' Makes predictions from a Java ML model +#' +#' @param object a Spark ML model. +#' @param newData a SparkDataFrame for testing. +#' @return \code{predict} returns a SparkDataFrame containing predicted value. +#' @rdname spark.predict +#' @aliases predict,JavaModel-method --- End diff -- I believe there is no conflict here. If you find this useful you can use templates to include additional information about generic operations. Very simple example https://github.com/zero323/spark/commit/64a3e854792181e159d39b9e747170b707f2711d which would create section like this: ![image](https://cloud.githubusercontent.com/assets/1554276/26038702/72b70280-390e-11e7-922c-0d1dece4816e.png) This can be further parametrized if needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17978: [SPARK-20736][Python] PySpark StringIndexer supports Str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17978 **[Test build #76916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76916/testReport)** for PR 17978 at commit [`bd80b37`](https://github.com/apache/spark/commit/bd80b37d9728624c6455ceca12198ce763b32a91). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17848 @zero323 - When `x` is non-deterministic, all the expressions that are derived from `x` (i.e., `y_i`, `z_i`, `v_i`) will be non-deterministic. - When `x` is first materialized and computed, that means, the generated columns are deterministic. Thus, the results will be consistent. Not sure whether it answers your concern? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #76915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76915/testReport)** for PR 17848 at commit [`00b4dff`](https://github.com/apache/spark/commit/00b4dff4e4b57f1406d99957655e2cb3bd85ad8e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org