[GitHub] spark pull request #14203: [SPARK-16546][SQL][PySpark] update python datafra...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14203 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14158 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62364/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14158 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14158 **[Test build #62364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62364/consoleFull)** for PR 14158 at commit [`41c2daa`](https://github.com/apache/spark/commit/41c2daa19a4b4dc340f6345e2624fd269565638b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14203: [SPARK-16546][SQL][PySpark] update python dataframe.drop
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14203 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62365/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14215 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14215 **[Test build #62365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62365/consoleFull)** for PR 14215 at commit [`b45f2ea`](https://github.com/apache/spark/commit/b45f2eae8417d9fdf1ecb8de7dd0a43a3d4c0fa8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14169 Are all script transforms broken? Don't we already have a test case that actually run script transforms? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70923795 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping by input column(s) and using `gapply` or `gapplyCollect` + +# gapply +Apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to +that key. The groups are chosen from `SparkDataFrame`s column(s). +The output of function should be a `data.frame`. Schema specifies the row format of the resulting +`SparkDataFrame`. It must represent R function's output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. Below data type mapping between R +and Spark. + + Data type mapping between R and Spark + +RSpark + + byte + byte + + + integer + integer + + + float + float + + + double + double + + + numeric + double + + + character + string + + + string + string + + + binary + binary + + + raw + binary + + + logical + boolean + + + timestamp + timestamp + + + date + date + + + array + array + + + list + array + + + map + map + + + env + map + + + struct --- End diff -- Not really - as I mentioned the getSQLDatatype looks at the schema - the method which looks at the R objects is in https://github.com/apache/spark/blob/2e4075e2ece9574100c79558cab054485e25c2ee/R/pkg/R/serialize.R#L84 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70923645 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping by input column(s) and using `gapply` or `gapplyCollect` + +# gapply +Apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to +that key. The groups are chosen from `SparkDataFrame`s column(s). +The output of function should be a `data.frame`. Schema specifies the row format of the resulting +`SparkDataFrame`. It must represent R function's output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. Below data type mapping between R +and Spark. + + Data type mapping between R and Spark + +RSpark + + byte + byte + + + integer + integer + + + float + float + + + double + double + + + numeric + double + + + character + string + + + string + string + + + binary + binary + + + raw + binary + + + logical + boolean + + + timestamp + timestamp + + + date + date + + + array + array + + + list + array + + + map + map + + + env + map + + + struct --- End diff -- Sounds good. for the mapping between: 'POSIXct / POSIXlt' to 'timestamp' and 'Date' to 'date' do we need to update 'getSQLDataType' method ? https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L91 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14214 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62362/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14214 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14214 **[Test build #62362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62362/consoleFull)** for PR 14214 at commit [`8ec635f`](https://github.com/apache/spark/commit/8ec635fe7403baf5149e3f6714872bf706b37cd7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14150 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...
Github user ScrapCodes commented on the issue: https://github.com/apache/spark/pull/14151 @rxin Do you think it looks okay now ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...
Github user ScrapCodes commented on the issue: https://github.com/apache/spark/pull/14087 @marmbrus Do you think this is useful ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70922863 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping by input column(s) and using `gapply` or `gapplyCollect` + +# gapply +Apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to +that key. The groups are chosen from `SparkDataFrame`s column(s). +The output of function should be a `data.frame`. Schema specifies the row format of the resulting +`SparkDataFrame`. It must represent R function's output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. Below data type mapping between R +and Spark. + + Data type mapping between R and Spark + +RSpark + + byte + byte + + + integer + integer + + + float + float + + + double + double + + + numeric + double + + + character + string + + + string + string + + + binary + binary + + + raw + binary + + + logical + boolean + + + timestamp + timestamp + + + date + date + + + array + array + + + list + array + + + map + map + + + env + map + + + struct --- End diff -- And as you mentioned above we can also change `date` to `Date` to be more specific. (It would be ideal now that I think to link these R types to the CRAN help page. For example we can link to https://stat.ethz.ch/R-manual/R-devel/library/base/html/Dates.html for Date and https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html for `POSIXct / POSIXlt` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14216 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14216 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62366/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70922747 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping by input column(s) and using `gapply` or `gapplyCollect` + +# gapply +Apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to +that key. The groups are chosen from `SparkDataFrame`s column(s). +The output of function should be a `data.frame`. Schema specifies the row format of the resulting +`SparkDataFrame`. It must represent R function's output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. Below data type mapping between R +and Spark. + + Data type mapping between R and Spark + +RSpark + + byte + byte + + + integer + integer + + + float + float + + + double + double + + + numeric + double + + + character + string + + + string + string + + + binary + binary + + + raw + binary + + + logical + boolean + + + timestamp + timestamp + + + date + date + + + array + array + + + list + array + + + map + map + + + env + map + + + struct --- End diff -- We can remove map, struct. For timestamp lets replace the R side of the table with `POSIXct` / `POSIXlt` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14216 **[Test build #62366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62366/consoleFull)** for PR 14216 at commit [`cbb104a`](https://github.com/apache/spark/commit/cbb104a4c48fc425517e5b68c67054b1dc4455dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/14169 HiveConf provides default value `org.apache.hadoop.hive.ql.exec.TextRecordReader`, `org.apache.hadoop.hive.ql.exec.TextRecordWriter` for keys `hive.script.recordreader` and `hive.script.recordwriter` respectively; however, SQLConf doesn't provides those keys, and it means the default values will be null; this causes the backward-incompatibility; --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/14169 @rxin In Spark 2.0, those conf values start with "hive.", which have default value in HiveConf, cannot get the default value now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70921996 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping by input column(s) and using `gapply` or `gapplyCollect` + +# gapply +Apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to +that key. The groups are chosen from `SparkDataFrame`s column(s). +The output of function should be a `data.frame`. Schema specifies the row format of the resulting +`SparkDataFrame`. It must represent R function's output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. Below data type mapping between R +and Spark. + + Data type mapping between R and Spark + +RSpark + + byte + byte + + + integer + integer + + + float + float + + + double + double + + + numeric + double + + + character + string + + + string + string + + + binary + binary + + + raw + binary + + + logical + boolean + + + timestamp + timestamp + + + date + date + + + array + array + + + list + array + + + map + map + + + env + map + + + struct --- End diff -- Thanks for the explanation, @shivaram ! So, I'll remove map, struct and timestamp and leave the rest as is. Does it sound fine ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14035 ping @mengxr and @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14217: [SPARK-16562][SQL] Do not allow downcast in INT32 based ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14217 **[Test build #62367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62367/consoleFull)** for PR 14217 at commit [`97303c9`](https://github.com/apache/spark/commit/97303c97e990c12abebf309fe3ab9dd0fc31e515). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14217: [SPARK-16562][SQL] Do not allow downcast in INT32 based ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14217 cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14217: [SPARK-16562][SQL] Do not allow downcast in INT32...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14217 [SPARK-16562][SQL] Do not allow downcast in INT32 based types for normal Parquet reader ## What changes were proposed in this pull request? Currently, INT32 based types, (`ShortType`, `ByteType`, `IntegerType`) can be downcasted in any combination. For example, the codes below: ```scala val path = "/tmp/test.parquet" val data = (1 to 4).map(Tuple1(_.toInt)) data.toDF("a").write.parquet(path) val schema = StructType(StructField("a", ShortType, true) :: Nil) spark.read.schema(schema).parquet(path).show() ``` works fine. This should not be allowed. This only happens when vectorized reader is disabled. ## How was this patch tested? Unit test in `ParquetIOSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-16562 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14217.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14217 commit 97303c97e990c12abebf309fe3ab9dd0fc31e515 Author: hyukjinkwonDate: 2016-07-15T04:51:44Z Do not allow downcast in INT32 based types for non-vectorized Parquet reader --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user mariobriggs commented on the issue: https://github.com/apache/spark/pull/14214 What i tried to do as a 'side fix' was like this, eliminate [1] since it was a lazy val. Move [2] out of the code path of the main thread i.e. let ListenerBus thread pay the penalty of producing the physical plan for logging ( i was coming from a performance test scenario, so it allowed me to proceed :-) ) . So the change was that SparkListenerSQLExecutionStart only take QueryExecution as a input parameter and not physicalPlanDescription & SparkPlanInfo . However this cannot be the solution since SparkListenerSQLExecutionStart is a public API already. [3] remains. As you might have already noticed ConsoleSink also suffers from the same problem of [2] and these are inside Dataset.withTypedCallback/withCallback, but it is only for Debug purposes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70920785 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping by input column(s) and using `gapply` or `gapplyCollect` + +# gapply +Apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to +that key. The groups are chosen from `SparkDataFrame`s column(s). +The output of function should be a `data.frame`. Schema specifies the row format of the resulting +`SparkDataFrame`. It must represent R function's output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. Below data type mapping between R +and Spark. + + Data type mapping between R and Spark + +RSpark + + byte + byte + + + integer + integer + + + float + float + + + double + double + + + numeric + double + + + character + string + + + string + string + + + binary + binary + + + raw + binary + + + logical + boolean + + + timestamp + timestamp + + + date + date + + + array + array + + + list + array + + + map + map + + + env + map + + + struct --- End diff -- Thats a good point - So users can create a schema with `struct` and that is mapping to a corresponding SQL type. But they can't create any R objects that will be parsed as `struct`. The main reason our schema is more flexible than our serialization / deserialization support is that the schema can be used to say read JSON files or JDBC tables etc. For the use case here, where users are returning a `data.frame` from UDF I dont think there is any valid mapping for `struct` from R. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14045 **[Test build #62363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62363/consoleFull)** for PR 14045 at commit [`1788d4c`](https://github.com/apache/spark/commit/1788d4c3fb9d547390cdea2bcf28c597bee540d2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14216 **[Test build #62366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62366/consoleFull)** for PR 14216 at commit [`cbb104a`](https://github.com/apache/spark/commit/cbb104a4c48fc425517e5b68c67054b1dc4455dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70920518 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping by input column(s) and using `gapply` or `gapplyCollect` + +# gapply +Apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to +that key. The groups are chosen from `SparkDataFrame`s column(s). +The output of function should be a `data.frame`. Schema specifies the row format of the resulting +`SparkDataFrame`. It must represent R function's output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. Below data type mapping between R +and Spark. + + Data type mapping between R and Spark + +RSpark + + byte + byte + + + integer + integer + + + float + float + + + double + double + + + numeric + double + + + character + string + + + string + string + + + binary + binary + + + raw + binary + + + logical + boolean + + + timestamp + timestamp + + + date + date + + + array + array + + + list + array + + + map + map + + + env + map + + + struct --- End diff -- @shivaram, I've looked at the following list: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L92 It is being called for creating schema's field and it has map, struct, timestamp, etc ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary mi...
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14216 [SPARK-16561][MLLib] fix multivarOnlineSummary min/max bug ## What changes were proposed in this pull request? add a member vector `cnnz` to count each dimensions non-zero value number. instead `nnz` with `cnnz` when calculating min/max ## How was this patch tested? Existing test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark multivarOnlineSummary Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14216.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14216 commit cbb104a4c48fc425517e5b68c67054b1dc4455dd Author: WeichenXuDate: 2016-07-12T05:08:42Z improve multivarOnlineSummary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14045 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62363/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70920244 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping by input column(s) and using `gapply` or `gapplyCollect` + +# gapply +Apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to +that key. The groups are chosen from `SparkDataFrame`s column(s). +The output of function should be a `data.frame`. Schema specifies the row format of the resulting +`SparkDataFrame`. It must represent R function's output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. Below data type mapping between R +and Spark. + + Data type mapping between R and Spark + +RSpark + + byte + byte + + + integer + integer + + + float + float + + + double + double + + + numeric + double + + + character + string + + + string + string + + + binary + binary + + + raw + binary + + + logical + boolean + + + timestamp + timestamp + + + date + date + + + array + array + + + list + array + + + map + map + + + env + map + + + struct --- End diff -- @felixcheung, I think according to the following mapping we expect 'date': https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala#L91 And it seems that there is a 'Date' in base. Do I understand correct ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14045 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user mariobriggs commented on the issue: https://github.com/apache/spark/pull/14214 > [1] should not be eliminated in general; I dont understand the full internal aspects of IncrementalExecution, but my generally thinking was that 1 can be eliminated because 'executedPlan' is a ' lazy val' on QueryExecution ? >[2] is eliminated by this patch, by replacing the queryExecution with incrementalExecution provided by [3]; If the goal is to get it to just as minimal as possible for now and wait for SPARK-16264 (which i was also thinking where it will have to finally wait for full resolution), why not keep [1] and the change to [2] be the simple case of changing [L52](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L52) to the following ``` new Dataset(data.sparkSession, data.queryExecution, implicitly[Encoder[T]]) ``` and no further changes required to your ealier code. Will it be the case that the wrong physical plan will logged in SparkListenerSQLExecutionStart ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14215 **[Test build #62365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62365/consoleFull)** for PR 14215 at commit [`b45f2ea`](https://github.com/apache/spark/commit/b45f2eae8417d9fdf1ecb8de7dd0a43a3d4c0fa8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14215 Hi @gatorsmile @dongjoon-hyun @liancheng , currently this deals with only `NumericType` except `DecimalType` for upcasting only for non-vectorized reader. Before proceeding further, I want to be sure that this approach looks good. Could I ask some feedback please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14215: [SPARK-16544][SQL][WIP] Support for conversion fr...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14215 [SPARK-16544][SQL][WIP] Support for conversion from compatible schema for Parquet data source when data types are not matched ## What changes were proposed in this pull request? This PR adds schema compatibility for Parquet. Currently if user-given schema is different with the Parquet schema, it throws an exception even when the user-given schema is compatible with Parquet schema. For example, executing the codes below: ```scala val path = "/tmp/test.parquet" val data = (1 to 4).map(Tuple1(_)) spark.createDataFrame(data).toDF("a").write.parquet(path) val schema = StructType(StructField("a", LongType, true) :: Nil) spark.read.schema(schema).parquet(path).show() ``` throws an exception as below: ``` org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 ... ``` This PR lets Parqet supports this schema compatibility. - [x] Schema compatibility for `NumericType` except `DecimalType`. - [ ] Schema compatibility for other `AtomicType`. - [ ] Schema compatibility for vectorized reader. ## How was this patch tested? Unit tests in `ParquetIOSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-16544 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14215.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14215 commit b45f2eae8417d9fdf1ecb8de7dd0a43a3d4c0fa8 Author: hyukjinkwonDate: 2016-07-15T03:37:45Z Support for conversion from compatible schema for Parquet data source when data types are not matched --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user nblintao commented on the issue: https://github.com/apache/spark/pull/14158 Updated by truncating long texts and adding a tooltip. The detail description and the screenshot at https://github.com/apache/spark/pull/14158#issue-165127460 is also updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14158 **[Test build #62364 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62364/consoleFull)** for PR 14158 at commit [`41c2daa`](https://github.com/apache/spark/commit/41c2daa19a4b4dc340f6345e2624fd269565638b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14045 **[Test build #62363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62363/consoleFull)** for PR 14045 at commit [`1788d4c`](https://github.com/apache/spark/commit/1788d4c3fb9d547390cdea2bcf28c597bee540d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14214 **[Test build #62362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62362/consoleFull)** for PR 14214 at commit [`8ec635f`](https://github.com/apache/spark/commit/8ec635fe7403baf5149e3f6714872bf706b37cd7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14214: [SPARK-16545][SQL] Eliminate one unnecessary roun...
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14214 [SPARK-16545][SQL] Eliminate one unnecessary round of physical planning in ForeachSink ## Problem As reported by [SPARK-16545](https://issues.apache.org/jira/browse/SPARK-16545), in `ForeachSink` we have initialized 3 rounds of physical planning. Specifically: [1] In `StreamExecution`, [lastExecution.executedPlan](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L369) [2] In `ForeachSink`, [forearchPartition()](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L69) calls withNewExecutionId(..., **_queryExection_**) which further calls [**_queryExecution_**.executedPlan](https://github.com/apache/spark/blob/9a5071996b968148f6b9aba12e0d3fe888d9acd8/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L55) [3] In `ForeachSink`, [val rdd = { ... incrementalExecution = new IncrementalExecution ...}](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L53) ## What changes were proposed in this pull request? [1] should not be eliminated in general; **[2] is eliminated by this patch, by replacing the `queryExecution` with `incrementalExecution` provided by [3];** [3] should be eliminated but can not be done at this stage; let's revisit it when SPARK-16264 is resolved. ## How was this patch tested? - checked manually now there are only 2 rounds of physical planning in ForeachSink after this patch - existing tests ensues it cause no regression You can merge this pull request into a Git repository by running: $ git pull https://github.com/lw-lin/spark physical-3x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14214.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14214 commit 8ec635fe7403baf5149e3f6714872bf706b37cd7 Author: Liwei LinDate: 2016-07-15T02:12:02Z Fix foreachPartition --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14169 What do you mean that "Since Spark 2.0 has deleted those config keys from hive conf" ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13990 @cloud-fan anything else, it good to merge ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14203: [SPARK-16546][SQL][PySpark] update python dataframe.drop
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14203 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14203: [SPARK-16546][SQL][PySpark] update python dataframe.drop
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14203 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62361/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14203: [SPARK-16546][SQL][PySpark] update python dataframe.drop
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14203 **[Test build #62361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62361/consoleFull)** for PR 14203 at commit [`3952ea0`](https://github.com/apache/spark/commit/3952ea059945b014323cbdba22766212bfe25b54). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14203: [SPARK-16546][SQL][PySpark] update python datafra...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14203#discussion_r70913944 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1416,13 +1416,25 @@ def drop(self, col): >>> df.join(df2, df.name == df2.name, 'inner').drop(df2.name).collect() [Row(age=5, name=u'Bob', height=85)] + +>>> df.join(df2, df.name == df2.name, 'inner').drop(df2.name) \\ --- End diff -- @rxin Now I update the testcase to make it clearer. Thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14203: [SPARK-16546][SQL][PySpark] update python dataframe.drop
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14203 **[Test build #62361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62361/consoleFull)** for PR 14203 at commit [`3952ea0`](https://github.com/apache/spark/commit/3952ea059945b014323cbdba22766212bfe25b54). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14211: [SPARK-16557][SQL] Remove stale doc in sql/README...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14211 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14211: [SPARK-16557][SQL] Remove stale doc in sql/README.md
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14211 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14210 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14210 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62359/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14210 **[Test build #62359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62359/consoleFull)** for PR 14210 at commit [`2d76a9f`](https://github.com/apache/spark/commit/2d76a9f1eb50aef1d8036fd59b315bfa401195b3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14154: [SPARK-16497][SQL] Don't throw an exception if drop non-...
Github user lianhuiwang commented on the issue: https://github.com/apache/spark/pull/14154 OK, I close it. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14154: [SPARK-16497][SQL] Don't throw an exception if dr...
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/14154 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14201: [SPARK-14702] Make environment of SparkLauncher l...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14201#discussion_r70912587 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java --- @@ -359,6 +364,82 @@ public SparkLauncher setVerbose(boolean verbose) { } /** + * Sets the working directory of the driver process. + * @param dir The directory to set as the driver's working directory. + * @return This launcher. + */ + public SparkLauncher directory(File dir) { +builder.workingDir = dir; +return this; + } + + /** + * Specifies that stderr in the driver should be redirected to stdout. + * @return This launcher. + */ + public SparkLauncher redirectError() { +builder.redirectErrorStream = true; +return this; + } + + /** + * Redirects error output to the specified Redirect. + * @param to The method of redirection. + * @return This launcher. + */ + public SparkLauncher redirectError(ProcessBuilder.Redirect to) { +builder.errorStream = to; +return this; + } + + /** + * Redirects standard output to the specified Redirect. + * @param to The method of redirection. + * @return This launcher. + */ + public SparkLauncher redirectOutput(ProcessBuilder.Redirect to) { +builder.outputStream = to; +return this; + } + + /** + * Redirects error output to the specified File. + * @param errFile The file to which stderr is written. + * @return This launcher. + */ + public SparkLauncher redirectError(File errFile) { +builder.errorStream = ProcessBuilder.Redirect.to(errFile); +return this; + } + + /** + * Redirects error output to the specified File. + * @param outFile The file to which stdout is written. + * @return This launcher. + */ + public SparkLauncher redirectOutput(File outFile) { +builder.outputStream = ProcessBuilder.Redirect.to(outFile); +return this; + } + + /** + * Sets all output to be logged and redirected to a logger with the specified name. + * @param loggerName The name of the logger to log stdout and stderr. + * @return This launcher. + */ + public SparkLauncher redirectToLog(String loggerName) { +try { + // NOTE: the below ordering is important, so builder.redirectToLog is only set to true iff + // the preceding put() finishes without exception. + builder.getEffectiveConfig().put(CHILD_PROCESS_LOGGER_NAME, loggerName); --- End diff -- No, `getEffectiveConfig()` is updated whenever you modify the configuration (e.g. via `setConf`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14201: [SPARK-14702] Make environment of SparkLauncher l...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14201#discussion_r70912543 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java --- @@ -82,8 +83,12 @@ /** Used internally to create unique logger names. */ private static final AtomicInteger COUNTER = new AtomicInteger(); + public static final ThreadFactory REDIRECTOR_FACTORY = new NamedThreadFactory("launcher-proc-%d"); --- End diff -- It doesn't need to be public. Package private (a.k.a. no modifier) is enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14213: [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14213 **[Test build #62360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62360/consoleFull)** for PR 14213 at commit [`00c9941`](https://github.com/apache/spark/commit/00c9941b7c113afc1d7ab2a59b50c208f46e0cc9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14213: [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62360/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14213: [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14213 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14213: [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14213 **[Test build #62360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62360/consoleFull)** for PR 14213 at commit [`00c9941`](https://github.com/apache/spark/commit/00c9941b7c113afc1d7ab2a59b50c208f46e0cc9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14213: [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-base...
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/14213 [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide ## What changes were proposed in this pull request? Made DataFrame-based API primary * Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html * mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html * ml-guide.html includes a "maintenance mode" announcement about the RDD-based API * **Reviewers: please check this carefully** * (minor) Titles for DF API no longer include "- spark.ml" suffix. Titles for RDD API have "- RDD-based API" suffix * Moved migration guide to ml-guide from mllib-guide * Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides * **Reviewers**: I did not change any of the content of the migration guides. Reorganized DataFrame-based guide: * ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc. * Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html * **Reviewers**: I did not change the content of these guides, except some intro text. * Sidebar remains the same, but with pipeline and tuning sections added Other: * ml-classification-regression.html: Moved text about linear methods to new section in page ## How was this patch tested? Generated docs locally You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkbradley/spark ml-guide-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14213.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14213 commit 00c9941b7c113afc1d7ab2a59b50c208f46e0cc9 Author: Joseph K. BradleyDate: 2016-07-15T01:18:36Z Reorganized MLlib Programming Guide to make DataFrame-based API the primary API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14129: [SPARK-16280][SQL][WIP] Implement histogram_numer...
GitHub user tilumi reopened a pull request: https://github.com/apache/spark/pull/14129 [SPARK-16280][SQL][WIP] Implement histogram_numeric SQL function ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tilumi/spark SPARK-16280 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14129 commit c286d187fdc51dcb3e9bb65a0e1b250ec3049391 Author: Lucas YangDate: 2016-07-11T01:54:42Z implement histogram_numeric SQL function commit ced9954206ea921ec2213cbf3a5485054212ebad Author: Lucas Yang Date: 2016-07-13T02:11:22Z add histogram_numeric test commit 7a91110ab70f707803edc8f0302a0459f3aee9fc Author: Lucas Yang Date: 2016-07-13T11:17:23Z add ImperativeNumericHistogram commit a56e8836c28a9fc189f81173fbabed5332d6adee Author: Lucas Yang Date: 2016-07-15T00:20:23Z histogram benchmark commit 62d44c12323fe9684814d1f681a2e7a29884b07d Author: Lucas Yang Date: 2016-07-15T00:39:34Z polish Benchmark_SPARK_16280 commit 2beadd1f37aab22341e01b75aa6c22bb032da35a Author: Lucas Yang Date: 2016-07-15T01:25:37Z polish Benchmark_SPARK_16280 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14129: [SPARK-16280][SQL][WIP] Implement histogram_numer...
Github user tilumi closed the pull request at: https://github.com/apache/spark/pull/14129 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14129: [SPARK-16280][SQL][WIP] Implement histogram_numeric SQL ...
Github user tilumi commented on the issue: https://github.com/apache/spark/pull/14129 I Implement 3 kinds of histogram_numeric and the result is (10, 100)).map((pair) => { --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14211: [SPARK-16557][SQL] Remove stale doc in sql/README.md
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14211 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62356/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14211: [SPARK-16557][SQL] Remove stale doc in sql/README.md
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14211 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14211: [SPARK-16557][SQL] Remove stale doc in sql/README.md
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14211 **[Test build #62356 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62356/consoleFull)** for PR 14211 at commit [`e507177`](https://github.com/apache/spark/commit/e5071777f6c02a74395c83d5162aa3274ba136e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14196: [SPARK-16540][YARN][CORE] Avoid adding jars twice...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/14196#discussion_r70909029 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2409,9 +2409,9 @@ private[spark] object Utils extends Logging { * "spark.yarn.dist.jars" properties, while in other modes it returns the jar files pointed by * only the "spark.jars" property. */ - def getUserJars(conf: SparkConf): Seq[String] = { + def getUserJars(conf: SparkConf, isShell: Boolean = false): Seq[String] = { --- End diff -- Do I still need to update the docs, or maybe this can be done later? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62357/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62357/consoleFull)** for PR 14132 at commit [`f77a0fa`](https://github.com/apache/spark/commit/f77a0fafbce0195133a9680c2b636222ba491e2b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62354/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62354/consoleFull)** for PR 14132 at commit [`717f47a`](https://github.com/apache/spark/commit/717f47abb5b8574a611c4a256fde3e620fdce92b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...
Github user jameszhouyi commented on the issue: https://github.com/apache/spark/pull/14169 Hi Spark guys, Could you please help to review this PR to merge it in Spark 2.0.0 ? Thanks in advance ! Best Regards, Yi --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14210 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14210 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62355/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in Bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14210 **[Test build #62355 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62355/consoleFull)** for PR 14210 at commit [`680b6f0`](https://github.com/apache/spark/commit/680b6f0faa835eecb4cd7b9e6add4700fdfa809c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14210: [SPARK-16556] [SQL] Fix Silent Ignorance of Bucket Speci...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14210 **[Test build #62359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62359/consoleFull)** for PR 14210 at commit [`2d76a9f`](https://github.com/apache/spark/commit/2d76a9f1eb50aef1d8036fd59b315bfa401195b3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14132 The following is updated. - Add more descriptions and test cases (about finding closest table and nested hint). - Support no parameter hint like `/*+ INDEX */`. - Generalize `hintStatement` rule. - Simplify `withHints`. - Move `toUpperCase` into Analyzer rule. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14079 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62352/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14079 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14201: [SPARK-14702] Make environment of SparkLauncher l...
Github user andreweduffy commented on a diff in the pull request: https://github.com/apache/spark/pull/14201#discussion_r70905418 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java --- @@ -359,6 +364,82 @@ public SparkLauncher setVerbose(boolean verbose) { } /** + * Sets the working directory of the driver process. + * @param dir The directory to set as the driver's working directory. + * @return This launcher. + */ + public SparkLauncher directory(File dir) { +builder.workingDir = dir; +return this; + } + + /** + * Specifies that stderr in the driver should be redirected to stdout. + * @return This launcher. + */ + public SparkLauncher redirectError() { +builder.redirectErrorStream = true; +return this; + } + + /** + * Redirects error output to the specified Redirect. + * @param to The method of redirection. + * @return This launcher. + */ + public SparkLauncher redirectError(ProcessBuilder.Redirect to) { +builder.errorStream = to; +return this; + } + + /** + * Redirects standard output to the specified Redirect. + * @param to The method of redirection. + * @return This launcher. + */ + public SparkLauncher redirectOutput(ProcessBuilder.Redirect to) { +builder.outputStream = to; +return this; + } + + /** + * Redirects error output to the specified File. + * @param errFile The file to which stderr is written. + * @return This launcher. + */ + public SparkLauncher redirectError(File errFile) { +builder.errorStream = ProcessBuilder.Redirect.to(errFile); +return this; + } + + /** + * Redirects error output to the specified File. + * @param outFile The file to which stdout is written. + * @return This launcher. + */ + public SparkLauncher redirectOutput(File outFile) { +builder.outputStream = ProcessBuilder.Redirect.to(outFile); +return this; + } + + /** + * Sets all output to be logged and redirected to a logger with the specified name. + * @param loggerName The name of the logger to log stdout and stderr. + * @return This launcher. + */ + public SparkLauncher redirectToLog(String loggerName) { +try { + // NOTE: the below ordering is important, so builder.redirectToLog is only set to true iff + // the preceding put() finishes without exception. + builder.getEffectiveConfig().put(CHILD_PROCESS_LOGGER_NAME, loggerName); --- End diff -- Should I also modify `startApplication` to read from builder.conf? It appears to use `builder.getEffectiveConfig()` which as far as I can tell is sourced from a properties file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14079 **[Test build #62352 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62352/consoleFull)** for PR 14079 at commit [`351a9a7`](https://github.com/apache/spark/commit/351a9a7e2893a0b90c57233d5e44a52c147bb2a8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14212: [SPARK-16558][Examples][MLlib] examples/mllib/LDAExample...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14212 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62358/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14212: [SPARK-16558][Examples][MLlib] examples/mllib/LDAExample...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14212 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14212: [SPARK-16558][Examples][MLlib] examples/mllib/LDAExample...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14212 **[Test build #62358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62358/consoleFull)** for PR 14212 at commit [`596aba6`](https://github.com/apache/spark/commit/596aba6c80bb2c9c5f90f6cdeb5a0c20e3590f55). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14090: [SPARK-16112][SparkR] Programming guide for gappl...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14090#discussion_r70905195 --- Diff: docs/sparkr.md --- @@ -316,6 +314,139 @@ head(ldf, 3) {% endhighlight %} + Run a given function on a large dataset grouping by input column(s) and using `gapply` or `gapplyCollect` + +# gapply +Apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to +that key. The groups are chosen from `SparkDataFrame`s column(s). +The output of function should be a `data.frame`. Schema specifies the row format of the resulting +`SparkDataFrame`. It must represent R function's output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. Below data type mapping between R +and Spark. + + Data type mapping between R and Spark + +RSpark + + byte + byte + + + integer + integer + + + float + float + + + double + double + + + numeric + double + + + character + string + + + string + string + + + binary + binary + + + raw + binary + + + logical + boolean + + + timestamp + timestamp + + + date + date + + + array + array + + + list + array + + + map + map + + + env + map + + + struct --- End diff -- I don't think `date` is a type either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14211: [SPARK-16557][SQL] Remove stale doc in sql/README.md
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14211 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14201: [SPARK-14702] Make environment of SparkLauncher l...
Github user andreweduffy commented on a diff in the pull request: https://github.com/apache/spark/pull/14201#discussion_r70904921 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java --- @@ -82,8 +83,12 @@ /** Used internally to create unique logger names. */ private static final AtomicInteger COUNTER = new AtomicInteger(); + public static final ThreadFactory REDIRECTOR_FACTORY = new NamedThreadFactory("launcher-proc-%d"); --- End diff -- How should this be shared do you think without making it public static in either SparkLauncher or ChildProcApphandle? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14179: [SPARK-16055][SPARKR] warning added while using s...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14179#discussion_r70904015 --- Diff: R/pkg/R/sparkR.R --- @@ -155,6 +155,9 @@ sparkR.sparkContext <- function( existingPort <- Sys.getenv("EXISTING_SPARKR_BACKEND_PORT", "") if (existingPort != "") { +if(sparkPackages != ""){ +warning("--packages flag should be used with with spark-submit") --- End diff -- @shivaram maybe it should but sparkR.session() is already called in sparkR shell, and calling SparkSession again with the sparkPackages does nothing: ``` > sparkR.session(sparkPackages = "com.databricks:spark-avro_2.10:2.0.1") Java ref type org.apache.spark.sql.SparkSession id 1 > read.df("", source = "avro") 16/07/14 23:55:43 ERROR RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : org.apache.spark.sql.AnalysisException: Failed to find data source: avro. Please use Spark package http://spark-packages.org/package/databricks/spark-avro; ``` @krishnakalyan3 something like "sparkPackages has no effect when using spark-submit or sparkR shell, please use the --packages commandline instead" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14079 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14079 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62351/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org