[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...
Github user DazhuangSu commented on the issue: https://github.com/apache/spark/pull/19691 @maropu Sorry. I don't really have much time this month. I can close this pr and somebody can continue on this problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22320 **[Test build #95692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95692/testReport)** for PR 22320 at commit [`4590c98`](https://github.com/apache/spark/commit/4590c9837026e820d7d91300a7ab3f87a668755c). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22319: [SPARK-25044][SQL][followup] add back UserDefinedFunctio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22319 **[Test build #95696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95696/testReport)** for PR 22319 at commit [`9e060a4`](https://github.com/apache/spark/commit/9e060a4cc9360a0ebf59db05c0e1466b8e66b157). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22336 **[Test build #95693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95693/testReport)** for PR 22336 at commit [`69f207f`](https://github.com/apache/spark/commit/69f207f8a4531435c4a8df790780557968a33bb1). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22328 **[Test build #95695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95695/testReport)** for PR 22328 at commit [`4d52754`](https://github.com/apache/spark/commit/4d527548bb7db3f2f3c1ddda206e552d223ca27b). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22337 **[Test build #95698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95698/testReport)** for PR 22337 at commit [`a429ddb`](https://github.com/apache/spark/commit/a429ddb00ec42a68c40c4abe34ea435248a54d82). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95697/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22336 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22328 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22319: [SPARK-25044][SQL][followup] add back UserDefinedFunctio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22319 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22328 **[Test build #95694 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95694/testReport)** for PR 22328 at commit [`bd6178c`](https://github.com/apache/spark/commit/bd6178c0bff7c5ffade1dce61894191f63a50976). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22328 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95694/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22328 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95698/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22319: [SPARK-25044][SQL][followup] add back UserDefinedFunctio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22319 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95696/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22328 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95695/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22112 **[Test build #95697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95697/testReport)** for PR 22112 at commit [`8952d08`](https://github.com/apache/spark/commit/8952d082b7b9082d38f5b332ccded2d2d7c96b08). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22336 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95693/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22320 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22319: [SPARK-25044][SQL][followup] add back UserDefinedFunctio...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22319 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22320 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95692/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22112 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22336 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22319: [SPARK-25044][SQL][followup] add back UserDefinedFunctio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22319 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22319: [SPARK-25044][SQL][followup] add back UserDefinedFunctio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22319 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2860/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2861/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22336 **[Test build #95699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95699/testReport)** for PR 22336 at commit [`69f207f`](https://github.com/apache/spark/commit/69f207f8a4531435c4a8df790780557968a33bb1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22112 **[Test build #95701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95701/testReport)** for PR 22112 at commit [`8952d08`](https://github.com/apache/spark/commit/8952d082b7b9082d38f5b332ccded2d2d7c96b08). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22319: [SPARK-25044][SQL][followup] add back UserDefinedFunctio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22319 **[Test build #95700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95700/testReport)** for PR 22319 at commit [`9e060a4`](https://github.com/apache/spark/commit/9e060a4cc9360a0ebf59db05c0e1466b8e66b157). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22320 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22320 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22320 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2863/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22336 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22320 **[Test build #95702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95702/testReport)** for PR 22320 at commit [`4590c98`](https://github.com/apache/spark/commit/4590c9837026e820d7d91300a7ab3f87a668755c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22336 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2862/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22337 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2864/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22337 **[Test build #95703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95703/testReport)** for PR 22337 at commit [`a429ddb`](https://github.com/apache/spark/commit/a429ddb00ec42a68c40c4abe34ea435248a54d82). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias
Github user ajithme commented on the issue: https://github.com/apache/spark/pull/22277 @jiangxb1987 Thanks you for the feedback. Couple of points 1. If introduce a predicate which refers to alias( as u mentioned a > z), it will throw error ``` spark-sql> create table table1 (a int); 18/09/05 13:00:28 WARN HiveMetaStore: Location: file:/user/hive/warehouse/table1 specified for non-external table:table1 Time taken: 0.152 seconds spark-sql> select a, a as c from table1 where a > 10 and a > c; 18/09/05 13:01:04 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Error in query: cannot resolve '`c`' given input columns: [table1.a]; line 1 pos 50; 'Project ['a, 'a AS c#6] +- 'Filter ((a#7 > 10) && (a#7 > 'c)) +- SubqueryAlias table1 +- HiveTableRelation `default`.`table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#7] ``` So i think its invalid scenario for a > z.? please correct me if i am wrong 2) if we add a predicate like __a > a__ instead of __a > z__ ( self referring) the PR still produces valid constrain list ``` (x#5 > x#5),(b#1 <=> y#6),(x#5 > 10),(z#7 <=> x#5),isnotnull(x#5) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19691 ok @mgaido91 can u take this over? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215179601 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.source.image + +/** + * `image` package implements Spark SQL data source API for loading IMAGE data as `DataFrame`. + * The loaded `DataFrame` has one `StructType` column: `image`. + * The schema of the `image` column is: + * - origin: String (represent the origin of image. If loaded from file, then it is file path) + * - height: Int (height of image) + * - width: Int (width of image) + * - nChannels: Int (number of image channels) + * - mode: Int (OpenCV-compatible type) + * - data: BinaryType (Image bytes in OpenCV-compatible order: row-wise BGR in most cases) + * + * To use IMAGE data source, you need to set "image" as the format in `DataFrameReader` and + * optionally specify the datasource options, for example: + * {{{ + * // Scala + * val df = spark.read.format("image") + * .option("dropImageFailures", "true") + * .load("data/mllib/images/imagesWithPartitions") + * + * // Java + * Dataset df = spark.read().format("image") + * .option("dropImageFailures", "true") + * .load("data/mllib/images/imagesWithPartitions"); + * }}} + * + * IMAGE data source supports the following options: + * - "dropImageFailures": Whether to drop the files that are not valid images from the result. + * + * @note This IMAGE data source does not support "write". + * + * @note This class is public for documentation purpose. Please don't use this class directly. + * Rather, use the data source API as illustrated above. + */ +class ImageDataSource private() {} --- End diff -- Re: @cloud-fan The Scala package doc doesn't work for Java, which requires a different format. Re: @HyukjinKwon It would be nice to have some doc in the site, though I didn't find the list of built-in data sources in the doc site. I think it is okay to have docs in both locations for IDE users and for people search on the web. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95703/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22337 **[Test build #95703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95703/testReport)** for PR 22337 at commit [`a429ddb`](https://github.com/apache/spark/commit/a429ddb00ec42a68c40c4abe34ea435248a54d82). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user wmellouli commented on a diff in the pull request: https://github.com/apache/spark/pull/22332#discussion_r215179856 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2226,16 +2226,18 @@ class Dataset[T] private[sql]( * `column`'s expression must only refer to attributes supplied by this Dataset. It is an * error to add a column that refers to some other Dataset. * -* You can choose to add new columns either at the end (default behavior) or at the beginning. +* The position of the new column start from 0, and a negative position means at the end (default behavior). --- End diff -- I modified as you suggested --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user wmellouli commented on a diff in the pull request: https://github.com/apache/spark/pull/22332#discussion_r215184928 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -831,13 +831,21 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { }.toSeq) assert(df.schema.map(_.name) === Seq("key", "value", "newCol")) -val df2 = testData.toDF().withColumn("newCol", col("key") + 1, false) +val df2 = testData.toDF().withColumn("newCol", col("key") + 1, 0) --- End diff -- Test with negative position was covered for the public method `withColumn` and the private method `withColumns`: - https://github.com/apache/spark/pull/22332/files#diff-5d2ebf4e9ca5a990136b276859769289R852 - https://github.com/apache/spark/pull/22332/files#diff-5d2ebf4e9ca5a990136b276859769289R907 I'm testing 3 cases (in the same time) that add new column at the end: - negative position - last position - position greater than columns size --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user wmellouli commented on a diff in the pull request: https://github.com/apache/spark/pull/22332#discussion_r215185048 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2226,16 +2226,18 @@ class Dataset[T] private[sql]( * `column`'s expression must only refer to attributes supplied by this Dataset. It is an * error to add a column that refers to some other Dataset. * -* You can choose to add new columns either at the end (default behavior) or at the beginning. +* The position of the new column start from 0, and a negative position means at the end (default behavior). */ - def withColumn(colName: String, col: Column, atTheEnd: Boolean): DataFrame = -withColumns(Seq(colName), Seq(col), atTheEnd) + def withColumn(colName: String, col: Column, atPosition: Int): DataFrame = --- End diff -- Added --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...
Github user wmellouli commented on the issue: https://github.com/apache/spark/pull/22332 @jaceklaskowski I refactored with what you suggested in your review. Let me know what you think. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22338: [SPARK-25317][CORE] Avoid perf regression in Murm...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/22338 [SPARK-25317][CORE] Avoid perf regression in Murmur3 Hash on UTF8String ## What changes were proposed in this pull request? SPARK-10399 introduced a performance regression on the hash computation for UTF8String. The regression can be evaluated with the code attached in the JIRA. That code runs in about 120 us per method on my laptop (MacBook Pro 2.5 GHz Intel Core i7, RAM 16 GB 1600 MHz DDR3) while the code from branch 2.3 takes on the same machine about 45 us for me. After the PR, the code takes about 45 us on the master branch too. ## How was this patch tested? running the perf test from the JIRA You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-25317 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22338.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22338 commit 91adce590461dda885d88319a700a775e63f9ce6 Author: Marco Gaido Date: 2018-09-04T15:02:07Z [SPARK-25317][CORE] Avoid perf regression in Murmur3 Hash on UTF8String --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22338 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22338 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2865/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22323#discussion_r215187426 --- Diff: docs/running-on-kubernetes.md --- @@ -215,6 +215,19 @@ spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai The configuration properties for mounting volumes into the executor pods use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. For a complete list of available options for each supported type of volumes, please refer to the [Spark Properties](#spark-properties) section below. +## Local Storage + +Spark uses temporary scratch space to spill data to disk during shuffles and other operations. When using Kubernetes as the resource manager the pods will be created with an [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume mounted for each directory listed in `SPARK_LOCAL_DIRS`. If no directories are explicitly specified then a default directory is created and configured appropriately. + +`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do not persist beyond the life of the pod. + +### Using RAM for local storage + +As `emptyDir` volumes use the nodes backing storage for ephemeral storage this default behaviour may not be appropriate for some compute environments. For example if you have diskless nodes with remote storage mounted over a network having lots of executors doing IO to this remote storage may actually degrade performance. + +In this case it may be desirable to set `spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes. When configured like this Sparks local storage usage will count towards your pods memory usage therefore you may wish to increase your memory requests via the normal `spark.driver.memory` and `spark.executor.memory` configuration properties. --- End diff -- @liyinan926 Yes it is in the case of K8S, per the [documentation](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir): > However, you can set the emptyDir.medium field to "Memory" to tell Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead. While tmpfs is very fast, be aware that unlike disks, tmpfs is cleared on node reboot and **any files you write will count against your Container’s memory limit**. Emphasis added by me, since the container memory requests and limits are driven by `spark.*.memory` this is the appropriate setting to change. Changing the memory overhead would also serve to increase these limits but if the user has a rough idea of how much memory they need asking for it explicitly is easier. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22338 **[Test build #95704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95704/testReport)** for PR 22338 at commit [`91adce5`](https://github.com/apache/spark/commit/91adce590461dda885d88319a700a775e63f9ce6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19691 @DazhuangSu @maropu sure, thanks, I'll submit a PR for this soon. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22318: [SPARK-25150][SQL] Rewrite condition when dedupli...
Github user peter-toth commented on a diff in the pull request: https://github.com/apache/spark/pull/22318#discussion_r215189187 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -754,11 +754,16 @@ class Analyzer( * a logical plan node's children. */ object ResolveReferences extends Rule[LogicalPlan] { + +private val emptyAttrMap = new AttributeMap[Attribute](Map.empty) --- End diff -- @mgaido91 , I agree with you and happy to do it, just saw your concerns about binary compatibility. @cloud-fan @maropu please share your thoughts on this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22339: SPARK-17159 Significant speed up for running spar...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/22339 SPARK-17159 Significant speed up for running spark streaming against Object store. ## What changes were proposed in this pull request? Original work by Steve Loughran. Based on #17745. This is a minimal patch of changes to FileInputDStream to reduce File status requests when querying files. Each call to file status is 3+ http calls to object store. This patch eliminates the need for it, by using FileStatus objects. This is a minor optimisation when working with filesystems, but significant when working with object stores. ## How was this patch tested? Tests included. Existing tests pass. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark PR_17745 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22339.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22339 commit 2fba9af597349fc023e04a845d1cfacfc3ab7d9e Author: Steve Loughran Date: 2017-04-24T13:04:04Z SPARK-17159 Significant speed up for running spark streaming against Object store. Based on #17745. Original work by Steve Loughran. This is a minimal patch of changes to FileInputDStream to reduce File status requests when querying files. This is a minor optimisation when working with filesystems, but significant when working with object stores. Change-Id: I269d98902f615818941c88de93a124c65453756e --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22340: [SPARK-25337][SQL] `runSparkSubmit` should provid...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/22340 [SPARK-25337][SQL] `runSparkSubmit` should provide non-testing mode ## What changes were proposed in this pull request? Scala-2.12 test fails due to class path issue. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/ ## How was this patch tested? Manual test. After merging, it will be test via Jenkins. ```scala $ dev/change-scala-version.sh 2.12 $ build/mvn -DskipTests -Phive -Pscala-2.12 clean package $ build/mvn -Phive -Pscala-2.12 -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite test ... HiveExternalCatalogVersionsSuite: - backward compatibility ... Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-25337 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22340.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22340 commit d451f989bb3c06304d2b962678f1cab7b561df10 Author: Dongjoon Hyun Date: 2018-09-05T07:47:40Z [SPARK-25337][SQL] `runSparkSubmit` should provide non-testing mode --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22340: [SPARK-25337][SQL] `runSparkSubmit` should provide non-t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22340 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2866/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22339: SPARK-17159 Significant speed up for running spark strea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22340: [SPARK-25337][SQL] `runSparkSubmit` should provide non-t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22340 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22339: SPARK-17159 Significant speed up for running spark strea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2867/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22339: SPARK-17159 Significant speed up for running spark strea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #95706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95706/testReport)** for PR 22339 at commit [`2fba9af`](https://github.com/apache/spark/commit/2fba9af597349fc023e04a845d1cfacfc3ab7d9e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22340: [SPARK-25337][SQL] `runSparkSubmit` should provide non-t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22340 **[Test build #95705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95705/testReport)** for PR 22340 at commit [`d451f98`](https://github.com/apache/spark/commit/d451f989bb3c06304d2b962678f1cab7b561df10). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22338: [SPARK-25317][CORE] Avoid perf regression in Murm...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22338#discussion_r215192739 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -69,22 +70,27 @@ public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, i } public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) { +return hashUnsafeBytesBlock(base, Ints.checkedCast(base.size()), seed); + } + + private static int hashUnsafeBytesBlock(MemoryBlock base, int lengthInBytes, int seed) { // This is not compatible with original and another implementations. // But remain it for backward compatibility for the components existing before 2.3. -int lengthInBytes = Ints.checkedCast(base.size()); assert (lengthInBytes >= 0): "lengthInBytes cannot be negative"; int lengthAligned = lengthInBytes - lengthInBytes % 4; -int h1 = hashBytesByIntBlock(base.subBlock(0, lengthAligned), seed); +int h1 = hashBytesByIntBlock(base, lengthAligned, seed); +long offset = base.getBaseOffset(); +Object o = base.getBaseObject(); for (int i = lengthAligned; i < lengthInBytes; i++) { - int halfWord = base.getByte(i); + int halfWord = Platform.getByte(o, offset + i); --- End diff -- So seems the performance regression is due to the cost of virtual function calls on `MemoryBlock`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22340 cc @srowen and @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215200249 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageOptions.scala --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.source.image + +import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap + +private[image] class ImageOptions( +@transient private val parameters: CaseInsensitiveMap[String]) extends Serializable { + + def this(parameters: Map[String, String]) = this(CaseInsensitiveMap(parameters)) + + val dropImageFailures = parameters.getOrElse("dropImageFailures", "false").toBoolean --- End diff -- because `parameters` is `Map[String, String]` type. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22338: [SPARK-25317][CORE] Avoid perf regression in Murm...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22338#discussion_r215202638 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -69,22 +70,27 @@ public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, i } public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) { +return hashUnsafeBytesBlock(base, Ints.checkedCast(base.size()), seed); + } + + private static int hashUnsafeBytesBlock(MemoryBlock base, int lengthInBytes, int seed) { // This is not compatible with original and another implementations. // But remain it for backward compatibility for the components existing before 2.3. -int lengthInBytes = Ints.checkedCast(base.size()); assert (lengthInBytes >= 0): "lengthInBytes cannot be negative"; int lengthAligned = lengthInBytes - lengthInBytes % 4; -int h1 = hashBytesByIntBlock(base.subBlock(0, lengthAligned), seed); +int h1 = hashBytesByIntBlock(base, lengthAligned, seed); +long offset = base.getBaseOffset(); +Object o = base.getBaseObject(); for (int i = lengthAligned; i < lengthInBytes; i++) { - int halfWord = base.getByte(i); + int halfWord = Platform.getByte(o, offset + i); --- End diff -- that was my guess too at the beginning, but if you just do this change, performance won't change. Seems reasonable what said by @kiszk about the clue being the size of the javabyte code generated, but needs more investigation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2868/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22328 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22328 **[Test build #95707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95707/testReport)** for PR 22328 at commit [`3fffd7e`](https://github.com/apache/spark/commit/3fffd7e055fad74d7392c88d678935b3e924588f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22336 **[Test build #95699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95699/testReport)** for PR 22336 at commit [`69f207f`](https://github.com/apache/spark/commit/69f207f8a4531435c4a8df790780557968a33bb1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22336 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22336: [SPARK-25306][SQL][FOLLOWUP] Change `test` to `ignore` i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22336 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95699/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22341: [SPARK-24889][Core] Update block info when unpers...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/22341 [SPARK-24889][Core] Update block info when unpersist rdds ## What changes were proposed in this pull request? We will update block info coming from executors, at the timing like caching a RDD. However, when removing RDDs with unpersisting, we don't ask to update block info. So the block info is not updated. We can fix this with few options: 1. Ask to update block info when unpersisting This is simplest but changes driver-executor communication a bit. 2. Update block info when processing the event of unpersisting RDD We send a `SparkListenerUnpersistRDD` event when unpersisting RDD. When processing this event, we can update block info of the RDD. This only changes event processing code so the risk seems to be lowest. Currently this patch takes option 2 for lowest risk. If we agree first option has no risk, we can change to it. ## How was this patch tested? Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-24889 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22341.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22341 commit dd5f766e0f270cfc58ca4298c39179469f021f78 Author: Liang-Chi Hsieh Date: 2018-08-30T23:17:46Z Update memory and disk info when unpersist rdds. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22341: [SPARK-24889][Core] Update block info when unpersist rdd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22341 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22341: [SPARK-24889][Core] Update block info when unpersist rdd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22341 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2869/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22341: [SPARK-24889][Core] Update block info when unpersist rdd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22341 **[Test build #95708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95708/testReport)** for PR 22341 at commit [`dd5f766`](https://github.com/apache/spark/commit/dd5f766e0f270cfc58ca4298c39179469f021f78). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22320#discussion_r215214259 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -82,7 +83,7 @@ case class CreateHiveTableAsSelectCommand( query, overwrite = true, ifPartitionNotExists = false, - outputColumns = outputColumns).run(sparkSession, child) + outputColumnNames = outputColumnNames).run(sparkSession, child) --- End diff -- Why is this duplication needed here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22320#discussion_r215213849 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -805,6 +805,80 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be } } + test("Insert overwrite table command should output correct schema: basic") { +withTable("tbl", "tbl2") { + withView("view1") { +val df = spark.range(10).toDF("id") --- End diff -- "case sensitive"? How is so since Spark SQL is case-insensitive by default? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22320#discussion_r215215098 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -754,6 +754,54 @@ class HiveDDLSuite } } + test("Insert overwrite Hive table should output correct schema") { +withSQLConf(CONVERT_METASTORE_PARQUET.key -> "false") { + withTable("tbl", "tbl2") { +withView("view1") { + spark.sql("CREATE TABLE tbl(id long)") + spark.sql("INSERT OVERWRITE TABLE tbl VALUES 4") --- End diff -- I might be missing something, but why does this test use SQL statements not DataFrameWriter API, e.g. `Seq(4).toDF("id").write.mode(SaveMode.Overwrite).saveAsTable("tbl")`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215216011 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.source.image + +/** + * `image` package implements Spark SQL data source API for loading IMAGE data as `DataFrame`. + * The loaded `DataFrame` has one `StructType` column: `image`. + * The schema of the `image` column is: + * - origin: String (represent the origin of image. If loaded from file, then it is file path) + * - height: Int (height of image) + * - width: Int (width of image) + * - nChannels: Int (number of image channels) + * - mode: Int (OpenCV-compatible type) + * - data: BinaryType (Image bytes in OpenCV-compatible order: row-wise BGR in most cases) + * + * To use IMAGE data source, you need to set "image" as the format in `DataFrameReader` and + * optionally specify options, for example: + * {{{ + * // Scala + * val df = spark.read.format("image") + * .option("dropImageFailures", "true") --- End diff -- Really? What about `option(key: String, value: Boolean): DataFrameReader` then? There are more --> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22328 **[Test build #95707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95707/testReport)** for PR 22328 at commit [`3fffd7e`](https://github.com/apache/spark/commit/3fffd7e055fad74d7392c88d678935b3e924588f). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22328 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22328 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95707/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22339: SPARK-17159 Significant speed up for running spark strea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22339 **[Test build #95706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95706/testReport)** for PR 22339 at commit [`2fba9af`](https://github.com/apache/spark/commit/2fba9af597349fc023e04a845d1cfacfc3ab7d9e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22341: [SPARK-24889][Core] Update block info when unpersist rdd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22341 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95708/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22341: [SPARK-24889][Core] Update block info when unpersist rdd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22341 **[Test build #95708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95708/testReport)** for PR 22341 at commit [`dd5f766`](https://github.com/apache/spark/commit/dd5f766e0f270cfc58ca4298c39179469f021f78). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22341: [SPARK-24889][Core] Update block info when unpersist rdd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22341 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22339: SPARK-17159 Significant speed up for running spark strea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22339: SPARK-17159 Significant speed up for running spark strea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95706/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22341: [SPARK-24889][Core] Update block info when unpersist rdd...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22341 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22341: [SPARK-24889][Core] Update block info when unpersist rdd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22341 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22341: [SPARK-24889][Core] Update block info when unpersist rdd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22341 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2870/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22341: [SPARK-24889][Core] Update block info when unpersist rdd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22341 **[Test build #95709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95709/testReport)** for PR 22341 at commit [`dd5f766`](https://github.com/apache/spark/commit/dd5f766e0f270cfc58ca4298c39179469f021f78). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22320 **[Test build #95702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95702/testReport)** for PR 22320 at commit [`4590c98`](https://github.com/apache/spark/commit/4590c9837026e820d7d91300a7ab3f87a668755c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22320 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95702/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org