[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14309#discussion_r71982269 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -641,6 +641,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { Row(key, value, key + 1) }.toSeq) assert(df.schema.map(_.name) === Seq("key", "valueRenamed", "newCol")) + +// Renaming to a column that contains "." character +val df2 = testData.toDF().withColumnRenamed("value", "value.Renamed") +assert(df2.schema.map(_.name) === Seq("key", "value.Renamed")) --- End diff -- Please add more test cases that columns whose name has '.' can be accessed without backticks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14331#discussion_r71982259 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -365,9 +365,6 @@ private[hive] class HiveClientImpl( }, schema = schema, partitionColumnNames = partCols.map(_.name), -sortColumnNames = Seq(), // TODO: populate this -bucketColumnNames = h.getBucketCols.asScala, -numBuckets = h.getNumBuckets, --- End diff -- nvm. I did not see the above post. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14331#discussion_r71982245 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -365,9 +365,6 @@ private[hive] class HiveClientImpl( }, schema = schema, partitionColumnNames = partCols.map(_.name), -sortColumnNames = Seq(), // TODO: populate this -bucketColumnNames = h.getBucketCols.asScala, -numBuckets = h.getNumBuckets, --- End diff -- This PR replaces these attributes by `bucketSpec`. Just wondering why we do not need to populate the `bucketSpec`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14331#discussion_r71982187 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -764,10 +761,7 @@ private[hive] class HiveClientImpl( hiveTable.setFields(schema.asJava) } hiveTable.setPartCols(partCols.asJava) -// TODO: set sort columns here too -hiveTable.setBucketCols(table.bucketColumnNames.asJava) --- End diff -- we don't support bucketed hive table now, and I think we never will, because we have different hash implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14331 **[Test build #62763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62763/consoleFull)** for PR 14331 at commit [`beefff2`](https://github.com/apache/spark/commit/beefff2861ac142dd3a416a29cf101b39b11ac4d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/14331 [SPARK-16691][SQL] move BucketSpec to catalyst module and use it in CatalogTable ## What changes were proposed in this pull request? It's weird that we have `BucketSpec` to abstract bucket info, but don't use it in `CatalogTable`. This PR moves `BucketSpec` into catalyst module. ## How was this patch tested? existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark check Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14331.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14331 commit beefff2861ac142dd3a416a29cf101b39b11ac4d Author: Wenchen FanDate: 2016-07-23T02:23:55Z move BucketSpec to catalyst module and use it in CatalogTable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14331 cc @yhuai @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14328: [MINOR] Close old PRs that should be closed but have not...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14328 Actually I'm having trouble with the merge script. You should merge this yourself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14328: [MINOR] Close old PRs that should be closed but have not...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14328 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14175#discussion_r71982037 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -552,7 +552,12 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( taskId: String, reason: String): Unit = { stateLock.synchronized { - removeExecutor(taskId, SlaveLost(reason)) + // Do not call removeExecutor() after this scheduler backend was stopped because --- End diff -- Not only removeExecutor(), but also other methods, like reviveOffers(), killTask(), ..., should not be called after stopped. If you prefer adding comment in the parent class, then it seems it is more complete to add comment to all methods that may encounter such case. However, I don't think it is necessary to do so, as exceptions will be thrown in such case notifying the caller it is not valid to do such calls, just as why this issue was found. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14257: [SPARK-16621][SQL][WIP] Use a stable ID generation metho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14257 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62761/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14257: [SPARK-16621][SQL][WIP] Use a stable ID generation metho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14257 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14257: [SPARK-16621][SQL][WIP] Use a stable ID generation metho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14257 **[Test build #62761 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62761/consoleFull)** for PR 14257 at commit [`12180db`](https://github.com/apache/spark/commit/12180dbf6346cd5d81623fbb44e77f392dc2108c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62762/consoleFull)** for PR 14132 at commit [`3fa276d`](https://github.com/apache/spark/commit/3fa276d20d9bef56f3bc25e3a6f9d333cfaefdaf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14257: [SPARK-16621][SQL][WIP] Use a stable ID generation metho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14257 **[Test build #62761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62761/consoleFull)** for PR 14257 at commit [`12180db`](https://github.com/apache/spark/commit/12180dbf6346cd5d81623fbb44e77f392dc2108c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14324: [SPARK-16664][SQL] Fix persist call on Data frames with ...
Github user breakdawn commented on the issue: https://github.com/apache/spark/pull/14324 @lw-lin Personally, multiple classes way is smoother base on current implementation. But no matter in what way, it's a big change, maybe it's better to open another jira issue to involve more discussions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14317: [SPARK-16380][EXAMPLES] Update SQL examples and p...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14317#discussion_r71981344 --- Diff: examples/src/main/python/sql/datasource.py --- @@ -0,0 +1,154 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import print_function + +from pyspark.sql import SparkSession +# $example on:schema_merging$ +from pyspark.sql import Row +# $example off:schema_merging$ + +""" +A simple example demonstrating Spark SQL data sources. +Run with: + ./bin/spark-submit examples/src/main/python/sql/datasource.py +""" + + +def basic_datasource_example(spark): +# $example on:generic_load_save_functions$ +df = spark.read.load("examples/src/main/resources/users.parquet") +df.select("name", "favorite_color").write.save("namesAndFavColors.parquet") +# $example off:generic_load_save_functions$ + +# $example on:manual_load_options$ +df = spark.read.load("examples/src/main/resources/people.json", format="json") +df.select("name", "age").write.save("namesAndAges.parquet", format="parquet") +# $example off:manual_load_options$ + +# $example on:direct_sql$ +df = spark.sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`") +# $example off:direct_sql$ + + +def parquet_example(spark): +# $example on:basic_parquet_example$ +peopleDF = spark.read.json("examples/src/main/resources/people.json") + +# DataFrames can be saved as Parquet files, maintaining the schema information. +peopleDF.write.parquet("people.parquet") + +# Read in the Parquet file created above. +# Parquet files are self-describing so the schema is preserved. +# The result of loading a parquet file is also a DataFrame. +parquetFile = spark.read.parquet("people.parquet") + +# Parquet files can also be used to create a temporary view and then used in SQL statements. +parquetFile.createOrReplaceTempView("parquetFile") +teenagers = spark.sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19") +teenagers.show() +# +--+ +# | name| +# +--+ +# |Justin| +# +--+ +# $example off:basic_parquet_example$ + + +def parquet_schema_merging_example(spark): +# $example on:schema_merging$ +# spark is from the previous example. +# Create a simple DataFrame, stored into a partition directory +sc = spark.sparkContext + +squaresDF = spark.createDataFrame(sc.parallelize(range(1, 6)) + .map(lambda i: Row(single=i, double=i ** 2))) +squaresDF.write.parquet("data/test_table/key=1") + +# Create another DataFrame in a new partition directory, +# adding a new column and dropping an existing column +cubesDF = spark.createDataFrame(sc.parallelize(range(6, 11)) +.map(lambda i: Row(single=i, triple=i ** 3))) +cubesDF.write.parquet("data/test_table/key=2") + +# Read the partitioned table +mergedDF = spark.read.option("mergeSchema", "true").parquet("data/test_table") +mergedDF.printSchema() + +# The final schema consists of all 3 columns in the Parquet files together +# with the partitioning column appeared in the partition directory paths. +# root +# |-- double: long (nullable = true) +# |-- single: long (nullable = true) +# |-- triple: long (nullable = true) +# |-- key: integer (nullable = true) +# $example off:schema_merging$ + + +def json_dataset_examplg(spark): +# $example on:json_dataset$ +# spark is from the previous example. +sc = spark.sparkContext + +# A JSON dataset is pointed to by path. +# The path can be either a single text file or a directory storing text files +path = "examples/src/main/resources/people.json"
[GitHub] spark pull request #14317: [SPARK-16380][EXAMPLES] Update SQL examples and p...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14317#discussion_r71981335 --- Diff: examples/src/main/python/sql/basic.py --- @@ -0,0 +1,194 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import print_function + +# $example on:init_session$ +from pyspark.sql import SparkSession +# $example off:init_session$ + +# $example on:schema_inferring$ +from pyspark.sql import Row +# $example off:schema_inferring$ + +# $example on:programmatic_schema$ +# Import data types +from pyspark.sql.types import * +# $example off:programmatic_schema$ + +""" +A simple example demonstrating basic Spark SQL features. +Run with: + ./bin/spark-submit examples/src/main/python/sql/basic.py +""" + + +def basic_df_example(spark): +# $example on:create_df$ +# spark is an existing SparkSession +df = spark.read.json("examples/src/main/resources/people.json") +# Displays the content of the DataFrame to stdout +df.show() +# ++---+ +# | age| name| +# ++---+ +# |null|Michael| +# | 30| Andy| +# | 19| Justin| +# ++---+ +# $example off:create_df$ + +# $example on:untyped_ops$ +# spark, df are from the previous example +# Print the schema in a tree format +df.printSchema() +# root +# |-- age: long (nullable = true) +# |-- name: string (nullable = true) + +# Select only the "name" column +df.select("name").show() +# +---+ +# | name| +# +---+ +# |Michael| +# | Andy| +# | Justin| +# +---+ + +# Select everybody, but increment the age by 1 +df.select(df['name'], df['age'] + 1).show() --- End diff -- Yea, I know I brought up this issue, but it is still in question... Although `df['...']` has potential issue with self-join, it is the way Pandas DataFrame works. Considering we've tried to workaround various self-join issues within Catalyst, now I tend to preserve it as is. Maybe we'll deprecate this syntax later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14317: [SPARK-16380][EXAMPLES] Update SQL examples and p...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14317#discussion_r71981312 --- Diff: docs/sql-programming-guide.md --- @@ -79,7 +79,7 @@ The entry point into all functionality in Spark is the [`SparkSession`](api/java The entry point into all functionality in Spark is the [`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder`: -{% include_example init_session python/sql.py %} +{% include_example init_session python/sql/basic.py %} --- End diff -- For Scala and Java, it's a convention that the file name should be the same as the (major) class defined in the file, while camel case file name doesn't conform to Python code convention. You may check other PySpark file names in the repo as a reference. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14313: [SPARK-16674][SQL] Avoid per-record type dispatch in JDB...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14313 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14313: [SPARK-16674][SQL] Avoid per-record type dispatch in JDB...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14313 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62760/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14313: [SPARK-16674][SQL] Avoid per-record type dispatch in JDB...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14313 **[Test build #62760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62760/consoleFull)** for PR 14313 at commit [`5335093`](https://github.com/apache/spark/commit/53350935b476e2a30dfd03f7fbfe857e6c4316d0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14324: [SPARK-16664][SQL] Fix persist call on Data frames with ...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14324 @breakdawn what else can we do to actually fix the ⥠8118 cols issue? We're actually running out of the constant pool when we compile the generated code. So maybe compile it into multiple classes? Or just fall back to the non-code-gen path? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14313: [SPARK-16674][SQL] Avoid per-record type dispatch in JDB...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14313 **[Test build #62760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62760/consoleFull)** for PR 14313 at commit [`5335093`](https://github.com/apache/spark/commit/53350935b476e2a30dfd03f7fbfe857e6c4316d0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9457: [SPARK-11496][GRAPHX] Parallel implementation of personal...
Github user moustaki commented on the issue: https://github.com/apache/spark/pull/9457 Thanks @dbtsai! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14319: [SPARK-16635] [WEBUI] [SQL] [WIP] Provide Session suppor...
Github user nblintao commented on the issue: https://github.com/apache/spark/pull/14319 Doing more study to figure out the requirements on providing per-session information to users. So, I decided to close the PR for now. The work will be continued after figuring out how this page should be designed. Thank @yhuai, @ajbozarth, and others involved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14319: [SPARK-16635] [WEBUI] [SQL] [WIP] Provide Session...
Github user nblintao closed the pull request at: https://github.com/apache/spark/pull/14319 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14313#discussion_r71980099 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala --- @@ -407,84 +495,8 @@ private[sql] class JDBCRDD( var i = 0 while (i < conversions.length) { --- End diff -- Using funtional transformation is generally slower due to virtual function calls. This part is executed a lot and such overhead will become really heavy. See https://github.com/databricks/scala-style-guide#traversal-and-zipwithindex --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14313#discussion_r71980089 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala --- @@ -322,46 +322,134 @@ private[sql] class JDBCRDD( } } - // Each JDBC-to-Catalyst conversion corresponds to a tag defined here so that - // we don't have to potentially poke around in the Metadata once for every - // row. - // Is there a better way to do this? I'd rather be using a type that - // contains only the tags I define. - abstract class JDBCConversion - case object BooleanConversion extends JDBCConversion - case object DateConversion extends JDBCConversion - case class DecimalConversion(precision: Int, scale: Int) extends JDBCConversion - case object DoubleConversion extends JDBCConversion - case object FloatConversion extends JDBCConversion - case object IntegerConversion extends JDBCConversion - case object LongConversion extends JDBCConversion - case object BinaryLongConversion extends JDBCConversion - case object StringConversion extends JDBCConversion - case object TimestampConversion extends JDBCConversion - case object BinaryConversion extends JDBCConversion - case class ArrayConversion(elementConversion: JDBCConversion) extends JDBCConversion + // A `JDBCConversion` is responsible for converting a value from `ResultSet` + // to a value in a field for `InternalRow`. + private type JDBCConversion = (ResultSet, Int) => Any + + // This `ArrayElementConversion` is responsible for converting elements in + // an array from `ResultSet`. + private type ArrayElementConversion = (Object) => Any /** - * Maps a StructType to a type tag list. + * Maps a StructType to conversions for each type. */ def getConversions(schema: StructType): Array[JDBCConversion] = schema.fields.map(sf => getConversions(sf.dataType, sf.metadata)) private def getConversions(dt: DataType, metadata: Metadata): JDBCConversion = dt match { -case BooleanType => BooleanConversion -case DateType => DateConversion -case DecimalType.Fixed(p, s) => DecimalConversion(p, s) -case DoubleType => DoubleConversion -case FloatType => FloatConversion -case IntegerType => IntegerConversion -case LongType => if (metadata.contains("binarylong")) BinaryLongConversion else LongConversion -case StringType => StringConversion -case TimestampType => TimestampConversion -case BinaryType => BinaryConversion -case ArrayType(et, _) => ArrayConversion(getConversions(et, metadata)) +case BooleanType => + (rs: ResultSet, pos: Int) => rs.getBoolean(pos) + +case DateType => + (rs: ResultSet, pos: Int) => +// DateTimeUtils.fromJavaDate does not handle null value, so we need to check it. +val dateVal = rs.getDate(pos) +if (dateVal != null) { --- End diff -- I guess this is a critical path. I think we don't need yo introduce virtual function calls. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14330: [SPARK-16693][SPARKR] Remove methods deprecated in 2.0.0...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14330 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14330: [SPARK-16693][SPARKR] Remove methods deprecated in 2.0.0...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14330 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62759/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14330: [SPARK-16693][SPARKR] Remove methods deprecated in 2.0.0...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14330 **[Test build #62759 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62759/consoleFull)** for PR 14330 at commit [`9cc2260`](https://github.com/apache/spark/commit/9cc22601569c85928505164d47a0d01371244f8d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14330: [SPARK-16693][SPARKR] Remove methods deprecated in 2.0.0...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14330 See JIRA comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14330: [SPARK-16693][SPARKR] Remove methods deprecated in 2.0.0...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14330 **[Test build #62759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62759/consoleFull)** for PR 14330 at commit [`9cc2260`](https://github.com/apache/spark/commit/9cc22601569c85928505164d47a0d01371244f8d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14330: [SPARK-16693][SPARKR] Remove methods deprecated in 2.0.0...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14330 **[Test build #62758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62758/consoleFull)** for PR 14330 at commit [`3162215`](https://github.com/apache/spark/commit/31622151eae38644ccd22051872797e5632bf5c8). * This patch **fails some tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14330: [SPARK-16693][SPARKR] Remove methods deprecated in 2.0.0...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14330 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62758/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14330: [SPARK-16693][SPARKR] Remove methods deprecated in 2.0.0...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14330 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14330: [SPARK-16693][SPARKR] Remove methods deprecated in 2.0.0...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14330 **[Test build #62758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62758/consoleFull)** for PR 14330 at commit [`3162215`](https://github.com/apache/spark/commit/31622151eae38644ccd22051872797e5632bf5c8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14330: [SPARK-16693][SPARKR] Remove methods deprecated i...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/14330 [SPARK-16693][SPARKR] Remove methods deprecated in 2.0.0 or before ## What changes were proposed in this pull request? Remove deprecated functions which includes: SQLContext/HiveContext stuff sparkR.init jsonFile parquetFile ## How was this patch tested? unit tests @shivaram You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rremovedeprecate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14330.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14330 commit fa92cc10df02345f114c33ccb9cd7846a94236f9 Author: Felix CheungDate: 2016-07-23T20:47:20Z remove deprecated functions, and back compat dispatching for omitting sparkContext or sqlContext parameters commit 503c93560651d8479caffb48b56415405ba3869c Author: Felix Cheung Date: 2016-07-23T21:34:00Z fix namespace file commit 31622151eae38644ccd22051872797e5632bf5c8 Author: Felix Cheung Date: 2016-07-23T22:14:27Z fix test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14275: [SPARK-16637] Unified containerizer
Github user skonto commented on the issue: https://github.com/apache/spark/pull/14275 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14275: [SPARK-16637] Unified containerizer
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/14275#discussion_r71978483 --- Diff: core/src/main/scala/org/apache/spark/deploy/mesos/MesosDriverDescription.scala --- @@ -40,24 +41,28 @@ private[spark] class MesosDriverDescription( val cores: Double, val supervise: Boolean, val command: Command, -val schedulerProperties: Map[String, String], +schedulerProperties: Map[String, String], --- End diff -- That was my point too above. No public getters generated if no val is used and this can be justified if the field is not used outside of the class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14307: [SPARK-16672][SQL] SQLBuilder should not raise exception...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14307 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14329: [SPARKR][DOCS] fix broken url in doc
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14329 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62757/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14307: [SPARK-16672][SQL] SQLBuilder should not raise exception...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62756/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14329: [SPARKR][DOCS] fix broken url in doc
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14329 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14329: [SPARKR][DOCS] fix broken url in doc
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14329 **[Test build #62757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62757/consoleFull)** for PR 14329 at commit [`06d8b41`](https://github.com/apache/spark/commit/06d8b415a3bce4c997683defce87b4833b56b1a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14307: [SPARK-16672][SQL] SQLBuilder should not raise exception...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14307 **[Test build #62756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62756/consoleFull)** for PR 14307 at commit [`70f5401`](https://github.com/apache/spark/commit/70f5401e5d1a606117f85b1caa6c29724c623dff). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62755/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14086 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14086 **[Test build #62755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62755/consoleFull)** for PR 14086 at commit [`8b452cb`](https://github.com/apache/spark/commit/8b452cb51814ed196a0cd16312074de3ea28330d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14329: [SPARKR][DOCS] fix broken url in doc
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14329 **[Test build #62757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62757/consoleFull)** for PR 14329 at commit [`06d8b41`](https://github.com/apache/spark/commit/06d8b415a3bce4c997683defce87b4833b56b1a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14329: [SPARKR][DOCS] fix broken url in doc
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14329 @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14329: [SPARKR][DOCS] fix broken url in doc
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/14329 [SPARKR][DOCS] fix broken url in doc ## What changes were proposed in this pull request? Fix broken url, also, sparkR.session.stop Rd should have it in the header ![image](https://cloud.githubusercontent.com/assets/8969467/17080129/26d41308-50d9-11e6-8967-79d6c920313f.png) Data type section is in the middle of a list of gapply/gapplyCollect subsections: ![image](https://cloud.githubusercontent.com/assets/8969467/17080122/f992d00a-50d8-11e6-8f2c-fd5786213920.png) ## How was this patch tested? manual test You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rdoclinkfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14329.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14329 commit 40ca13c17e8e97732733e7bc200254459920d2f9 Author: Felix CheungDate: 2016-07-23T20:13:49Z doc fix commit 06d8b415a3bce4c997683defce87b4833b56b1a9 Author: Felix Cheung Date: 2016-07-23T20:20:21Z Merge branch 'master' of https://github.com/apache/spark into rdoclinkfix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14313#discussion_r71977368 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala --- @@ -407,84 +495,8 @@ private[sql] class JDBCRDD( var i = 0 while (i < conversions.length) { --- End diff -- Why `while` not `foreach` or similar? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14313#discussion_r71977344 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala --- @@ -322,46 +322,134 @@ private[sql] class JDBCRDD( } } - // Each JDBC-to-Catalyst conversion corresponds to a tag defined here so that - // we don't have to potentially poke around in the Metadata once for every - // row. - // Is there a better way to do this? I'd rather be using a type that - // contains only the tags I define. - abstract class JDBCConversion - case object BooleanConversion extends JDBCConversion - case object DateConversion extends JDBCConversion - case class DecimalConversion(precision: Int, scale: Int) extends JDBCConversion - case object DoubleConversion extends JDBCConversion - case object FloatConversion extends JDBCConversion - case object IntegerConversion extends JDBCConversion - case object LongConversion extends JDBCConversion - case object BinaryLongConversion extends JDBCConversion - case object StringConversion extends JDBCConversion - case object TimestampConversion extends JDBCConversion - case object BinaryConversion extends JDBCConversion - case class ArrayConversion(elementConversion: JDBCConversion) extends JDBCConversion + // A `JDBCConversion` is responsible for converting a value from `ResultSet` + // to a value in a field for `InternalRow`. + private type JDBCConversion = (ResultSet, Int) => Any + + // This `ArrayElementConversion` is responsible for converting elements in + // an array from `ResultSet`. + private type ArrayElementConversion = (Object) => Any /** - * Maps a StructType to a type tag list. + * Maps a StructType to conversions for each type. */ def getConversions(schema: StructType): Array[JDBCConversion] = schema.fields.map(sf => getConversions(sf.dataType, sf.metadata)) private def getConversions(dt: DataType, metadata: Metadata): JDBCConversion = dt match { -case BooleanType => BooleanConversion -case DateType => DateConversion -case DecimalType.Fixed(p, s) => DecimalConversion(p, s) -case DoubleType => DoubleConversion -case FloatType => FloatConversion -case IntegerType => IntegerConversion -case LongType => if (metadata.contains("binarylong")) BinaryLongConversion else LongConversion -case StringType => StringConversion -case TimestampType => TimestampConversion -case BinaryType => BinaryConversion -case ArrayType(et, _) => ArrayConversion(getConversions(et, metadata)) +case BooleanType => + (rs: ResultSet, pos: Int) => rs.getBoolean(pos) + +case DateType => + (rs: ResultSet, pos: Int) => +// DateTimeUtils.fromJavaDate does not handle null value, so we need to check it. +val dateVal = rs.getDate(pos) +if (dateVal != null) { + DateTimeUtils.fromJavaDate(dateVal) +} else { + null +} + +case DecimalType.Fixed(p, s) => + (rs: ResultSet, pos: Int) => +val decimalVal = rs.getBigDecimal(pos) +if (decimalVal == null) { + null +} else { + Decimal(decimalVal, p, s) +} + +case DoubleType => + (rs: ResultSet, pos: Int) => rs.getDouble(pos) + +case FloatType => + (rs: ResultSet, pos: Int) => rs.getFloat(pos) + +case IntegerType => + (rs: ResultSet, pos: Int) => rs.getInt(pos) + +case LongType if metadata.contains("binarylong") => + (rs: ResultSet, pos: Int) => +val bytes = rs.getBytes(pos) +var ans = 0L +var j = 0 +while (j < bytes.size) { + ans = 256 * ans + (255 & bytes(j)) + j = j + 1 +} +ans + +case LongType => + (rs: ResultSet, pos: Int) => rs.getLong(pos) + +case StringType => + (rs: ResultSet, pos: Int) => +// TODO(davies): use getBytes for better performance, if the encoding is UTF-8 +UTF8String.fromString(rs.getString(pos)) + +case TimestampType => + (rs: ResultSet, pos: Int) => +val t = rs.getTimestamp(pos) +if (t != null) { + DateTimeUtils.fromJavaTimestamp(t) +} else { + null +} + +case BinaryType => + (rs: ResultSet, pos: Int) => rs.getBytes(pos) + +case ArrayType(et, _) => + val elementConversion: ArrayElementConversion = getArrayElementConversion(et, metadata) + (rs: ResultSet, pos: Int) =>
[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14313#discussion_r71977337 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala --- @@ -322,46 +322,134 @@ private[sql] class JDBCRDD( } } - // Each JDBC-to-Catalyst conversion corresponds to a tag defined here so that - // we don't have to potentially poke around in the Metadata once for every - // row. - // Is there a better way to do this? I'd rather be using a type that - // contains only the tags I define. - abstract class JDBCConversion - case object BooleanConversion extends JDBCConversion - case object DateConversion extends JDBCConversion - case class DecimalConversion(precision: Int, scale: Int) extends JDBCConversion - case object DoubleConversion extends JDBCConversion - case object FloatConversion extends JDBCConversion - case object IntegerConversion extends JDBCConversion - case object LongConversion extends JDBCConversion - case object BinaryLongConversion extends JDBCConversion - case object StringConversion extends JDBCConversion - case object TimestampConversion extends JDBCConversion - case object BinaryConversion extends JDBCConversion - case class ArrayConversion(elementConversion: JDBCConversion) extends JDBCConversion + // A `JDBCConversion` is responsible for converting a value from `ResultSet` + // to a value in a field for `InternalRow`. + private type JDBCConversion = (ResultSet, Int) => Any + + // This `ArrayElementConversion` is responsible for converting elements in + // an array from `ResultSet`. + private type ArrayElementConversion = (Object) => Any /** - * Maps a StructType to a type tag list. + * Maps a StructType to conversions for each type. */ def getConversions(schema: StructType): Array[JDBCConversion] = schema.fields.map(sf => getConversions(sf.dataType, sf.metadata)) private def getConversions(dt: DataType, metadata: Metadata): JDBCConversion = dt match { -case BooleanType => BooleanConversion -case DateType => DateConversion -case DecimalType.Fixed(p, s) => DecimalConversion(p, s) -case DoubleType => DoubleConversion -case FloatType => FloatConversion -case IntegerType => IntegerConversion -case LongType => if (metadata.contains("binarylong")) BinaryLongConversion else LongConversion -case StringType => StringConversion -case TimestampType => TimestampConversion -case BinaryType => BinaryConversion -case ArrayType(et, _) => ArrayConversion(getConversions(et, metadata)) +case BooleanType => + (rs: ResultSet, pos: Int) => rs.getBoolean(pos) + +case DateType => + (rs: ResultSet, pos: Int) => +// DateTimeUtils.fromJavaDate does not handle null value, so we need to check it. +val dateVal = rs.getDate(pos) +if (dateVal != null) { + DateTimeUtils.fromJavaDate(dateVal) +} else { + null +} + +case DecimalType.Fixed(p, s) => + (rs: ResultSet, pos: Int) => +val decimalVal = rs.getBigDecimal(pos) +if (decimalVal == null) { + null +} else { + Decimal(decimalVal, p, s) +} + +case DoubleType => + (rs: ResultSet, pos: Int) => rs.getDouble(pos) + +case FloatType => + (rs: ResultSet, pos: Int) => rs.getFloat(pos) + +case IntegerType => + (rs: ResultSet, pos: Int) => rs.getInt(pos) + +case LongType if metadata.contains("binarylong") => + (rs: ResultSet, pos: Int) => +val bytes = rs.getBytes(pos) +var ans = 0L +var j = 0 +while (j < bytes.size) { + ans = 256 * ans + (255 & bytes(j)) + j = j + 1 +} +ans + +case LongType => + (rs: ResultSet, pos: Int) => rs.getLong(pos) + +case StringType => + (rs: ResultSet, pos: Int) => +// TODO(davies): use getBytes for better performance, if the encoding is UTF-8 +UTF8String.fromString(rs.getString(pos)) + +case TimestampType => + (rs: ResultSet, pos: Int) => +val t = rs.getTimestamp(pos) +if (t != null) { --- End diff -- same as above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14313#discussion_r71977329 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala --- @@ -322,46 +322,134 @@ private[sql] class JDBCRDD( } } - // Each JDBC-to-Catalyst conversion corresponds to a tag defined here so that - // we don't have to potentially poke around in the Metadata once for every - // row. - // Is there a better way to do this? I'd rather be using a type that - // contains only the tags I define. - abstract class JDBCConversion - case object BooleanConversion extends JDBCConversion - case object DateConversion extends JDBCConversion - case class DecimalConversion(precision: Int, scale: Int) extends JDBCConversion - case object DoubleConversion extends JDBCConversion - case object FloatConversion extends JDBCConversion - case object IntegerConversion extends JDBCConversion - case object LongConversion extends JDBCConversion - case object BinaryLongConversion extends JDBCConversion - case object StringConversion extends JDBCConversion - case object TimestampConversion extends JDBCConversion - case object BinaryConversion extends JDBCConversion - case class ArrayConversion(elementConversion: JDBCConversion) extends JDBCConversion + // A `JDBCConversion` is responsible for converting a value from `ResultSet` + // to a value in a field for `InternalRow`. + private type JDBCConversion = (ResultSet, Int) => Any + + // This `ArrayElementConversion` is responsible for converting elements in + // an array from `ResultSet`. + private type ArrayElementConversion = (Object) => Any /** - * Maps a StructType to a type tag list. + * Maps a StructType to conversions for each type. */ def getConversions(schema: StructType): Array[JDBCConversion] = schema.fields.map(sf => getConversions(sf.dataType, sf.metadata)) private def getConversions(dt: DataType, metadata: Metadata): JDBCConversion = dt match { -case BooleanType => BooleanConversion -case DateType => DateConversion -case DecimalType.Fixed(p, s) => DecimalConversion(p, s) -case DoubleType => DoubleConversion -case FloatType => FloatConversion -case IntegerType => IntegerConversion -case LongType => if (metadata.contains("binarylong")) BinaryLongConversion else LongConversion -case StringType => StringConversion -case TimestampType => TimestampConversion -case BinaryType => BinaryConversion -case ArrayType(et, _) => ArrayConversion(getConversions(et, metadata)) +case BooleanType => + (rs: ResultSet, pos: Int) => rs.getBoolean(pos) + +case DateType => + (rs: ResultSet, pos: Int) => +// DateTimeUtils.fromJavaDate does not handle null value, so we need to check it. +val dateVal = rs.getDate(pos) +if (dateVal != null) { + DateTimeUtils.fromJavaDate(dateVal) +} else { + null +} + +case DecimalType.Fixed(p, s) => + (rs: ResultSet, pos: Int) => +val decimalVal = rs.getBigDecimal(pos) +if (decimalVal == null) { --- End diff -- Same as above (plus you're checking equality with `null` opposite to the above -- consistency violated) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14313: [SPARK-16674][SQL] Avoid per-record type dispatch...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14313#discussion_r71977310 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala --- @@ -322,46 +322,134 @@ private[sql] class JDBCRDD( } } - // Each JDBC-to-Catalyst conversion corresponds to a tag defined here so that - // we don't have to potentially poke around in the Metadata once for every - // row. - // Is there a better way to do this? I'd rather be using a type that - // contains only the tags I define. - abstract class JDBCConversion - case object BooleanConversion extends JDBCConversion - case object DateConversion extends JDBCConversion - case class DecimalConversion(precision: Int, scale: Int) extends JDBCConversion - case object DoubleConversion extends JDBCConversion - case object FloatConversion extends JDBCConversion - case object IntegerConversion extends JDBCConversion - case object LongConversion extends JDBCConversion - case object BinaryLongConversion extends JDBCConversion - case object StringConversion extends JDBCConversion - case object TimestampConversion extends JDBCConversion - case object BinaryConversion extends JDBCConversion - case class ArrayConversion(elementConversion: JDBCConversion) extends JDBCConversion + // A `JDBCConversion` is responsible for converting a value from `ResultSet` + // to a value in a field for `InternalRow`. + private type JDBCConversion = (ResultSet, Int) => Any + + // This `ArrayElementConversion` is responsible for converting elements in + // an array from `ResultSet`. + private type ArrayElementConversion = (Object) => Any /** - * Maps a StructType to a type tag list. + * Maps a StructType to conversions for each type. */ def getConversions(schema: StructType): Array[JDBCConversion] = schema.fields.map(sf => getConversions(sf.dataType, sf.metadata)) private def getConversions(dt: DataType, metadata: Metadata): JDBCConversion = dt match { -case BooleanType => BooleanConversion -case DateType => DateConversion -case DecimalType.Fixed(p, s) => DecimalConversion(p, s) -case DoubleType => DoubleConversion -case FloatType => FloatConversion -case IntegerType => IntegerConversion -case LongType => if (metadata.contains("binarylong")) BinaryLongConversion else LongConversion -case StringType => StringConversion -case TimestampType => TimestampConversion -case BinaryType => BinaryConversion -case ArrayType(et, _) => ArrayConversion(getConversions(et, metadata)) +case BooleanType => + (rs: ResultSet, pos: Int) => rs.getBoolean(pos) + +case DateType => + (rs: ResultSet, pos: Int) => +// DateTimeUtils.fromJavaDate does not handle null value, so we need to check it. +val dateVal = rs.getDate(pos) +if (dateVal != null) { --- End diff -- `Option(dateVal).map(...).orNull`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14307: [SPARK-16672][SQL] SQLBuilder should not raise exception...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14307 **[Test build #62756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62756/consoleFull)** for PR 14307 at commit [`70f5401`](https://github.com/apache/spark/commit/70f5401e5d1a606117f85b1caa6c29724c623dff). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14307: [SPARK-16672][SQL] SQLBuilder should not raise ex...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14307#discussion_r71976751 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/SQLBuilderSuite.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.catalyst.SQLBuilder +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.test.SQLTestUtils + +class SQLBuilderSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { --- End diff -- Oh, I see. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression wrapper ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14182 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62754/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14086 **[Test build #62755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62755/consoleFull)** for PR 14086 at commit [`8b452cb`](https://github.com/apache/spark/commit/8b452cb51814ed196a0cd16312074de3ea28330d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression wrapper ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14182 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression wrapper ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14182 **[Test build #62754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62754/consoleFull)** for PR 14182 at commit [`7f68211`](https://github.com/apache/spark/commit/7f68211e362677e3599f4af7d574962b06611ab5). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression wrapper ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14182 **[Test build #62754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62754/consoleFull)** for PR 14182 at commit [`7f68211`](https://github.com/apache/spark/commit/7f68211e362677e3599f4af7d574962b06611ab5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71976641 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala --- @@ -145,14 +153,24 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter { assert(2 === spark.read.jdbc(url, "TEST.APPENDTEST", new Properties()).collect()(0).length) } - test("CREATE then INSERT to truncate") { + test("Truncate") { +JdbcDialects.registerDialect(testH2Dialect) val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2) val df2 = spark.createDataFrame(sparkContext.parallelize(arr1x2), schema2) +val df3 = spark.createDataFrame(sparkContext.parallelize(arr2x3), schema3) df.write.jdbc(url1, "TEST.TRUNCATETEST", properties) -df2.write.mode(SaveMode.Overwrite).jdbc(url1, "TEST.TRUNCATETEST", properties) +df2.write.mode(SaveMode.Overwrite).option("truncate", true) + .jdbc(url1, "TEST.TRUNCATETEST", properties) assert(1 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).count()) assert(2 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).collect()(0).length) + +val m = intercept[SparkException] { --- End diff -- Sure, that would be better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14098: [SPARK-16380][SQL][Example]:Update SQL examples a...
Github user wangmiao1981 closed the pull request at: https://github.com/apache/spark/pull/14098 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14098: [SPARK-16380][SQL][Example]:Update SQL examples and prog...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/14098 As #14317 has been merged, I close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14317: [SPARK-16380][EXAMPLES] Update SQL examples and p...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/14317#discussion_r71976456 --- Diff: examples/src/main/python/sql/datasource.py --- @@ -0,0 +1,154 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import print_function + +from pyspark.sql import SparkSession +# $example on:schema_merging$ +from pyspark.sql import Row +# $example off:schema_merging$ + +""" +A simple example demonstrating Spark SQL data sources. +Run with: + ./bin/spark-submit examples/src/main/python/sql/datasource.py +""" + + +def basic_datasource_example(spark): +# $example on:generic_load_save_functions$ +df = spark.read.load("examples/src/main/resources/users.parquet") +df.select("name", "favorite_color").write.save("namesAndFavColors.parquet") +# $example off:generic_load_save_functions$ + +# $example on:manual_load_options$ +df = spark.read.load("examples/src/main/resources/people.json", format="json") +df.select("name", "age").write.save("namesAndAges.parquet", format="parquet") +# $example off:manual_load_options$ + +# $example on:direct_sql$ +df = spark.sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`") +# $example off:direct_sql$ + + +def parquet_example(spark): +# $example on:basic_parquet_example$ +peopleDF = spark.read.json("examples/src/main/resources/people.json") + +# DataFrames can be saved as Parquet files, maintaining the schema information. +peopleDF.write.parquet("people.parquet") + +# Read in the Parquet file created above. +# Parquet files are self-describing so the schema is preserved. +# The result of loading a parquet file is also a DataFrame. +parquetFile = spark.read.parquet("people.parquet") + +# Parquet files can also be used to create a temporary view and then used in SQL statements. +parquetFile.createOrReplaceTempView("parquetFile") +teenagers = spark.sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19") +teenagers.show() +# +--+ +# | name| +# +--+ +# |Justin| +# +--+ +# $example off:basic_parquet_example$ + + +def parquet_schema_merging_example(spark): +# $example on:schema_merging$ +# spark is from the previous example. +# Create a simple DataFrame, stored into a partition directory +sc = spark.sparkContext + +squaresDF = spark.createDataFrame(sc.parallelize(range(1, 6)) + .map(lambda i: Row(single=i, double=i ** 2))) +squaresDF.write.parquet("data/test_table/key=1") + +# Create another DataFrame in a new partition directory, +# adding a new column and dropping an existing column +cubesDF = spark.createDataFrame(sc.parallelize(range(6, 11)) +.map(lambda i: Row(single=i, triple=i ** 3))) +cubesDF.write.parquet("data/test_table/key=2") + +# Read the partitioned table +mergedDF = spark.read.option("mergeSchema", "true").parquet("data/test_table") +mergedDF.printSchema() + +# The final schema consists of all 3 columns in the Parquet files together +# with the partitioning column appeared in the partition directory paths. +# root +# |-- double: long (nullable = true) +# |-- single: long (nullable = true) +# |-- triple: long (nullable = true) +# |-- key: integer (nullable = true) +# $example off:schema_merging$ + + +def json_dataset_examplg(spark): +# $example on:json_dataset$ +# spark is from the previous example. +sc = spark.sparkContext + +# A JSON dataset is pointed to by path. +# The path can be either a single text file or a directory storing text files +path =
[GitHub] spark pull request #14317: [SPARK-16380][EXAMPLES] Update SQL examples and p...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/14317#discussion_r71976400 --- Diff: examples/src/main/python/sql/basic.py --- @@ -0,0 +1,194 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import print_function + +# $example on:init_session$ +from pyspark.sql import SparkSession +# $example off:init_session$ + +# $example on:schema_inferring$ +from pyspark.sql import Row +# $example off:schema_inferring$ + +# $example on:programmatic_schema$ +# Import data types +from pyspark.sql.types import * +# $example off:programmatic_schema$ + +""" +A simple example demonstrating basic Spark SQL features. +Run with: + ./bin/spark-submit examples/src/main/python/sql/basic.py +""" + + +def basic_df_example(spark): +# $example on:create_df$ +# spark is an existing SparkSession +df = spark.read.json("examples/src/main/resources/people.json") +# Displays the content of the DataFrame to stdout +df.show() +# ++---+ +# | age| name| +# ++---+ +# |null|Michael| +# | 30| Andy| +# | 19| Justin| +# ++---+ +# $example off:create_df$ + +# $example on:untyped_ops$ +# spark, df are from the previous example +# Print the schema in a tree format +df.printSchema() +# root +# |-- age: long (nullable = true) +# |-- name: string (nullable = true) + +# Select only the "name" column +df.select("name").show() +# +---+ +# | name| +# +---+ +# |Michael| +# | Andy| +# | Justin| +# +---+ + +# Select everybody, but increment the age by 1 +df.select(df['name'], df['age'] + 1).show() --- End diff -- Do you want to use `col('...')`. I have tested it and it works. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14307: [SPARK-16672][SQL] SQLBuilder should not raise ex...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14307#discussion_r71976388 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/SQLBuilderSuite.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.catalyst.SQLBuilder +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.test.SQLTestUtils + +class SQLBuilderSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { --- End diff -- LogicalPlanToSQLSuite? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14317: [SPARK-16380][EXAMPLES] Update SQL examples and p...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/14317#discussion_r71976352 --- Diff: docs/sql-programming-guide.md --- @@ -79,7 +79,7 @@ The entry point into all functionality in Spark is the [`SparkSession`](api/java The entry point into all functionality in Spark is the [`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder`: -{% include_example init_session python/sql.py %} +{% include_example init_session python/sql/basic.py %} --- End diff -- The file name is not consistent with Scala and Java version. The file names are SparkSQLExample.scala and SparkSQLExample.java. The Hive and Data Source examples file names are not consistent either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14317: [SPARK-16380][EXAMPLES] Update SQL examples and p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14317 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14318: [SPARK-16690][TEST] rename SQLTestUtils.withTempTable to...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14318 I'm going to cherry pick this into branch-2.0 to avoid conflicts in bug fixes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14318: [SPARK-16690][TEST] rename SQLTestUtils.withTempT...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14318 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14317: [SPARK-16380][EXAMPLES] Update SQL examples and programm...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14317 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14098: [SPARK-16380][SQL][Example]:Update SQL examples and prog...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/14098 @liancheng Thanks! I will review the PR #14317 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14318: [SPARK-16690][TEST] rename SQLTestUtils.withTempTable to...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14318 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14328: [MINOR] Close old PRs that should be closed but have not...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14328 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14326: [SPARK-3181] [ML] Implement RobustRegression with...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/14326#discussion_r71975650 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/RobustRegression.scala --- @@ -0,0 +1,466 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.regression + +import scala.collection.mutable + +import breeze.linalg.{DenseVector => BDV} +import breeze.optimize.{CachedDiffFunction, DiffFunction, LBFGS => BreezeLBFGS, LBFGSB => BreezeLBFGSB} + +import org.apache.spark.SparkException +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.ml.PredictorParams +import org.apache.spark.ml.feature.Instance +import org.apache.spark.ml.linalg.{Vector, Vectors} +import org.apache.spark.ml.linalg.BLAS._ +import org.apache.spark.ml.param.{DoubleParam, ParamMap, ParamValidators} +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.linalg.VectorImplicits._ +import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Dataset, Row} +import org.apache.spark.sql.functions._ +import org.apache.spark.storage.StorageLevel + +/** + * Params for robust regression. + */ +private[regression] trait RobustRegressionParams extends PredictorParams with HasRegParam + with HasMaxIter with HasTol with HasFitIntercept with HasStandardization with HasWeightCol { + + /** + * The shape parameter to control the amount of robustness. Must be > 1.0. + * At larger values of M, the huber criterion becomes more similar to least squares regression; + * for small values of M, the criterion is more similar to L1 regression. + * Default is 1.35 to get as much robustness as possible while retaining + * 95% statistical efficiency for normally distributed data. + */ + @Since("2.1.0") + final val m = new DoubleParam(this, "m", "The shape parameter to control the amount of " + +"robustness. Must be > 1.0.", ParamValidators.gt(1.0)) + + /** @group getParam */ + @Since("2.1.0") + def getM: Double = $(m) +} + +/** + * Robust regression. + * + * The learning objective is to minimize the huber loss, with regularization. + * + * The robust regression optimizes the squared loss for the samples where + * {{{ |\frac{(y - X \beta)}{\sigma}|\leq M }}} + * and the absolute loss for the samples where + * {{{ |\frac{(y - X \beta)}{\sigma}|\geq M }}}, + * where \beta and \sigma are parameters to be optimized. + * + * This supports two types of regularization: None and L2. + * + * This estimator is different from the R implementation of Robust Regression + * ([[http://www.ats.ucla.edu/stat/r/dae/rreg.htm]]) because the R implementation does a + * weighted least squares implementation with weights given to each sample on the basis + * of how much the residual is greater than a certain threshold. + */ +@Since("2.1.0") +class RobustRegression @Since("2.1.0") (@Since("2.1.0") override val uid: String) + extends Regressor[Vector, RobustRegression, RobustRegressionModel] + with RobustRegressionParams with Logging { + + @Since("2.1.0") + def this() = this(Identifiable.randomUID("robReg")) + + /** + * Sets the value of param [[m]]. + * Default is 1.35. + * @group setParam + */ + @Since("2.1.0") + def setM(value: Double): this.type = set(m, value) + setDefault(m -> 1.35) + + /** + * Sets the regularization parameter. + * Default is 0.0. + * @group setParam + */ + @Since("2.1.0") + def setRegParam(value: Double): this.type = set(regParam, value) + setDefault(regParam -> 0.0) + + /** + * Sets if we should fit the intercept. + * Default is true. + * @group setParam + */
[GitHub] spark issue #14242: Add a comment
Github user kzhang28 commented on the issue: https://github.com/apache/spark/pull/14242 @srowen I closed it. Thank you for your kind reminder. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13986: [SPARK-16617] Upgrade to Avro 1.8.1
Github user benmccann closed the pull request at: https://github.com/apache/spark/pull/13986 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13986: [SPARK-16617] Upgrade to Avro 1.8.1
Github user benmccann commented on the issue: https://github.com/apache/spark/pull/13986 I'll close for now until Hadoop 3.x. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14242: Add a comment
Github user kzhang28 closed the pull request at: https://github.com/apache/spark/pull/14242 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14194: [SPARK-16485][DOC][ML] Fixed several inline formatting i...
Github user lins05 commented on the issue: https://github.com/apache/spark/pull/14194 @jkbradley Could you please take a look at this simple fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14327 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14327 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62753/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14327 **[Test build #62753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62753/consoleFull)** for PR 14327 at commit [`6d1616d`](https://github.com/apache/spark/commit/6d1616d41cc1158089ac0f38a6402a0fef58b191). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14324: [SPARK-16664][SQL] Fix persist call on Data frames with ...
Github user breakdawn commented on the issue: https://github.com/apache/spark/pull/14324 @lw-lin umm, thanks for pointing it out. Since the limit is 8117, 1 will fail, that case needs a update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14086#discussion_r71973330 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala --- @@ -145,14 +153,24 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter { assert(2 === spark.read.jdbc(url, "TEST.APPENDTEST", new Properties()).collect()(0).length) } - test("CREATE then INSERT to truncate") { + test("Truncate") { +JdbcDialects.registerDialect(testH2Dialect) val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2) val df2 = spark.createDataFrame(sparkContext.parallelize(arr1x2), schema2) +val df3 = spark.createDataFrame(sparkContext.parallelize(arr2x3), schema3) df.write.jdbc(url1, "TEST.TRUNCATETEST", properties) -df2.write.mode(SaveMode.Overwrite).jdbc(url1, "TEST.TRUNCATETEST", properties) +df2.write.mode(SaveMode.Overwrite).option("truncate", true) + .jdbc(url1, "TEST.TRUNCATETEST", properties) assert(1 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).count()) assert(2 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).collect()(0).length) + +val m = intercept[SparkException] { --- End diff -- To check my understanding here, this overwrites the table with a different schema (new column `seq`). This shows the truncate fails because the schema has changed. I guess it would be nice to test the case where the truncate works at least, though, we can't really test whether it truncates vs drops. Could you for example just repeat the code on line 163-166 here to verify that overwriting just results in the same results? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14328: [MINOR] Close old PRs that should be closed but have not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62752/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14328: [MINOR] Close old PRs that should be closed but have not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14328 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14328: [MINOR] Close old PRs that should be closed but have not...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14328 **[Test build #62752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62752/consoleFull)** for PR 14328 at commit [`c5a50bd`](https://github.com/apache/spark/commit/c5a50bd8f0947681f1cd2ceb2e14b6440f4f2ddc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14327 **[Test build #62753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62753/consoleFull)** for PR 14327 at commit [`6d1616d`](https://github.com/apache/spark/commit/6d1616d41cc1158089ac0f38a6402a0fef58b191). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14327: [SPARK-16686][SQL] Project shouldn't be pushed do...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14327#discussion_r71972546 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -422,6 +422,35 @@ class DatasetSuite extends QueryTest with SharedSQLContext { 3, 17, 27, 58, 62) } + test("SPARK-16686: Dataset.sample with seed results shouldn't depend on downstream usage") { +val udfOne = spark.udf.register("udfOne", (n: Int) => { + if (n == 1) { +throw new RuntimeException("udfOne shouldn't see swid=1!") --- End diff -- Thanks! I've updated it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14280: [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` on OS X/...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14280 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14320: [SPARK-16416] [Core] force eager creation of loggers to ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14320 OK, seems reasonable to me as is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14327 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14327 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62751/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Project shouldn't be pushed down thro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14327 **[Test build #62751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62751/consoleFull)** for PR 14327 at commit [`9521a5a`](https://github.com/apache/spark/commit/9521a5aca87bead3dcfeabd7abe3468194984ea3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org