[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/14579 Right I wouldn't expect it to error with subclassing - just not pipeline successfully - but only in a very long shot corner case. I think the try/finally with persistance is not an uncommon pattern (we have something similar happen frequently inside of Spark ML/mllib but its in Scala code). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Oracle supports it... http://docs.oracle.com/javadb/10.10.1.2/ref/rrefsqljusing.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12004: [SPARK-7481][build] [WIP] Add Hadoop 2.6+ spark-cloud mo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12004 **[Test build #63588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63588/consoleFull)** for PR 12004 at commit [`cb07c1d`](https://github.com/apache/spark/commit/cb07c1d7b79944059e477b0b615ce061b08cef00). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14590: [SPARK-17008][SPARK-17009][SQL] Normalization and isolat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14590 **[Test build #3216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3216/consoleFull)** for PR 14590 at commit [`e061820`](https://github.com/apache/spark/commit/e0618203c317f8b8211c0e983403834f8e39a950). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14454: [Minor] [ML] Rename TreeEnsembleModels to TreeEnsembleMo...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14454 ping @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14559: [SPARK-16968]Add additional options in jdbc when creatin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14559 **[Test build #63587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63587/consoleFull)** for PR 14559 at commit [`57be055`](https://github.com/apache/spark/commit/57be055c542d1720bb9fd57810d4c2593444). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14591: [SPARK-17010][MINOR][DOC]Wrong description in mem...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14591 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14559: [SPARK-16968]Add additional options in jdbc when creatin...
Github user GraceH commented on the issue: https://github.com/apache/spark/pull/14559 @HyukjinKwon and @srowen, here is the initial proposal. Please let me know your comment. I will refine that with unit test later. BTW, the readwriter.py calls high level api of jdbc(url, table, connectionProperties). If we don't change that API like reader api does, we may not need to expose the JDBCOptions in that file. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14592: [SPARK-17011][SQL] Support testing exceptions in SQLQuer...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14592 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14591: [SPARK-17010][MINOR][DOC]Wrong description in memory man...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14591 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14592: [SPARK-17011][SQL] Support testing exceptions in ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14592#discussion_r74370013 --- Diff: sql/core/src/test/resources/sql-tests/results/number-format.sql.out --- @@ -19,16 +19,24 @@ struct<2147483648:bigint,(-2147483649):bigint> -- !query 2 -select 9223372036854775808, -9223372036854775809 +select 9223372036854775807, -9223372036854775808 --- End diff -- sgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Which NoSQL platforms support `Using Outer Join`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14593: [MINOR][DOCS] Fix style in examples and inconsistent ind...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14593 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14593: [MINOR][DOCS] Fix style in examples and inconsistent ind...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14593 **[Test build #63586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63586/consoleFull)** for PR 14593 at commit [`e4a832e`](https://github.com/apache/spark/commit/e4a832e61989297a77dae4a3b6cc1044dd66d499). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14593: [MINOR][DOCS] Fix style in examples and inconsistent ind...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14593 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63586/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13775 @yhuai You mean just using `sql("SELECT * FROM t").count()`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r74369496 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/VectorizedSparkOrcNewRecordReader.java --- @@ -0,0 +1,318 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.io.orc; + +import java.io.IOException; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.List; + +import org.apache.commons.lang.NotImplementedException; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; + +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.catalyst.util.ArrayData; +import org.apache.spark.sql.catalyst.util.MapData; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.Decimal; +import org.apache.spark.unsafe.types.CalendarInterval; +import org.apache.spark.unsafe.types.UTF8String; + +/** + * A RecordReader that returns InternalRow for Spark SQL execution. + * This reader uses an internal reader that returns Hive's VectorizedRowBatch. An adapter + * class is used to return internal row by directly accessing data in column vectors. + */ +public class VectorizedSparkOrcNewRecordReader +extends org.apache.hadoop.mapreduce.RecordReader{ + private final org.apache.hadoop.mapred.RecordReader reader; + private final int numColumns; + private VectorizedRowBatch internalValue; + private float progress = 0.0f; + private List columnIDs; + + private long numRowsOfBatch = 0; + private int indexOfRow = 0; + + private final Row row; + + public VectorizedSparkOrcNewRecordReader( + Reader file, + JobConf conf, + FileSplit fileSplit, + List columnIDs) throws IOException { +List types = file.getTypes(); +numColumns = (types.size() == 0) ? 0 : types.get(0).getSubtypesCount(); +this.reader = new SparkVectorizedOrcRecordReader(file, conf, + new org.apache.hadoop.mapred.FileSplit(fileSplit)); + +this.columnIDs = new ArrayList<>(columnIDs); +this.internalValue = this.reader.createValue(); +this.progress = reader.getProgress(); +this.row = new Row(this.internalValue.cols, this.columnIDs); + } + + @Override + public void close() throws IOException { +reader.close(); + } + + @Override + public NullWritable getCurrentKey() throws IOException, + InterruptedException { +return NullWritable.get(); + } + + @Override + public InternalRow getCurrentValue() throws IOException, + InterruptedException { +if (indexOfRow >= numRowsOfBatch) { + return null; +} +row.rowId = indexOfRow; +indexOfRow++; + +return row; + } + + @Override + public float getProgress() throws IOException, InterruptedException { +return progress; + } + + @Override + public void initialize(InputSplit split, TaskAttemptContext context) + throws IOException, InterruptedException { + } + + @Override + public boolean nextKeyValue() throws IOException, InterruptedException { +if
[GitHub] spark pull request #14592: [SPARK-17011][SQL] Support testing exceptions in ...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14592#discussion_r74369492 --- Diff: sql/core/src/test/resources/sql-tests/results/number-format.sql.out --- @@ -19,16 +19,24 @@ struct<2147483648:bigint,(-2147483649):bigint> -- !query 2 -select 9223372036854775808, -9223372036854775809 +select 9223372036854775807, -9223372036854775808 --- End diff -- Can I add it in a separate pull request? I want to add all literal parsing here, but don't want to distract this pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14592: [SPARK-17011][SQL] Support testing exceptions in ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14592#discussion_r74369336 --- Diff: sql/core/src/test/resources/sql-tests/results/number-format.sql.out --- @@ -19,16 +19,24 @@ struct<2147483648:bigint,(-2147483649):bigint> -- !query 2 -select 9223372036854775808, -9223372036854775809 +select 9223372036854775807, -9223372036854775808 --- End diff -- can you also add the boundary conditions for int as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14592: [SPARK-17011][SQL] Support testing exceptions in ...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14592#discussion_r74369163 --- Diff: sql/core/src/test/resources/sql-tests/results/number-format.sql.out --- @@ -19,16 +19,24 @@ struct<2147483648:bigint,(-2147483649):bigint> -- !query 2 -select 9223372036854775808, -9223372036854775809 +select 9223372036854775807, -9223372036854775808 -- !query 2 schema -struct<9223372036854775808:decimal(19,0),(-9223372036854775809):decimal(19,0)> +struct<9223372036854775807:bigint,(-9223372036854775808):decimal(19,0)> --- End diff -- Here it is https://issues.apache.org/jira/browse/SPARK-17013 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14593: [MINOR][DOCS] Fix style in examples and inconsistent ind...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14593 **[Test build #63586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63586/consoleFull)** for PR 14593 at commit [`e4a832e`](https://github.com/apache/spark/commit/e4a832e61989297a77dae4a3b6cc1044dd66d499). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14593: [MINOR][DOCS] Fix style in examples and inconsistent ind...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14593 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63583/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14593: [MINOR][DOCS] Fix style in examples and inconsistent ind...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14593 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14592: [SPARK-17011][SQL] Support testing exceptions in ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14592#discussion_r74369002 --- Diff: sql/core/src/test/resources/sql-tests/results/number-format.sql.out --- @@ -19,16 +19,24 @@ struct<2147483648:bigint,(-2147483649):bigint> -- !query 2 -select 9223372036854775808, -9223372036854775809 +select 9223372036854775807, -9223372036854775808 -- !query 2 schema -struct<9223372036854775808:decimal(19,0),(-9223372036854775809):decimal(19,0)> +struct<9223372036854775807:bigint,(-9223372036854775808):decimal(19,0)> --- End diff -- @petermaxlee can you file a jira ticket for this bug? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14593: [MINOR][DOCS] Fix style in examples and inconsistent ind...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14593 **[Test build #63583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63583/consoleFull)** for PR 14593 at commit [`3cff947`](https://github.com/apache/spark/commit/3cff9477b10814d8fc9eeb27556b285e01d38956). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `[Tokenization](http://en.wikipedia.org/wiki/Lexical_analysis#Tokenization) is the process of taking text (such as a sentence) and breaking it into individual terms (usually words). A simple [Tokenizer](api/scala/index.html#org.apache.spark.ml.feature.Tokenizer) class provides this functionality. The example below shows how to split sentences into sequences of words.` * `* *(Breaking change)* The `apply` and `copy` methods for the case class [`BoostingStrategy`](api/scala/index.html#org.apache.spark.mllib.tree.configuration.BoostingStrategy) have been changed because of a modification to the case class fields. This could be an issue for users who use `BoostingStrategy` to set GBT parameters.` * `* *(Breaking change)* The return value of [`LDA.run`](api/scala/index.html#org.apache.spark.mllib.clustering.LDA) has changed. It now returns an abstract class `LDAModel` instead of the concrete class `DistributedLDAModel`. The object of type `LDAModel` can still be cast to the appropriate concrete type, which depends on the optimization algorithm.` * `* In `DecisionTree`, the deprecated class method `train` has been removed. (The object/static `train` methods remain.)` * `* The `scoreCol` output column (with default value \"score\") was renamed to be `probabilityCol` (with default value \"probability\"). The type was originally `Double` (for the probability of class 1.0), but it is now `Vector` (for the probability of each class, to support multiclass classification in the future).` * `labels - the number of times any class was predicted correctly (true positives) normalized by the number of data` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14592: [SPARK-17011][SQL] Support testing exceptions in ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14592#discussion_r74368974 --- Diff: sql/core/src/test/resources/sql-tests/results/number-format.sql.out --- @@ -19,16 +19,24 @@ struct<2147483648:bigint,(-2147483649):bigint> -- !query 2 -select 9223372036854775808, -9223372036854775809 +select 9223372036854775807, -9223372036854775808 -- !query 2 schema -struct<9223372036854775808:decimal(19,0),(-9223372036854775809):decimal(19,0)> +struct<9223372036854775807:bigint,(-9223372036854775808):decimal(19,0)> --- End diff -- I'd call this a bug. I tried in Spark 1.6 and it was returning double (which was worse). Here's postgres: ``` rxin=# select pg_typeof(-9223372036854775808); pg_typeof --- bigint (1 row) rxin=# select pg_typeof(-9223372036854775807); pg_typeof --- bigint (1 row) rxin=# select pg_typeof(-9223372036854775806); pg_typeof --- bigint (1 row) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13775 **[Test build #63585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63585/consoleFull)** for PR 13775 at commit [`06066eb`](https://github.com/apache/spark/commit/06066eb241eb97c4cf363adff2b0160b8a423ab8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user dafrista commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r74368871 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/VectorizedSparkOrcNewRecordReader.java --- @@ -0,0 +1,318 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.io.orc; + +import java.io.IOException; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.List; + +import org.apache.commons.lang.NotImplementedException; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; + +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.catalyst.util.ArrayData; +import org.apache.spark.sql.catalyst.util.MapData; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.Decimal; +import org.apache.spark.unsafe.types.CalendarInterval; +import org.apache.spark.unsafe.types.UTF8String; + +/** + * A RecordReader that returns InternalRow for Spark SQL execution. + * This reader uses an internal reader that returns Hive's VectorizedRowBatch. An adapter + * class is used to return internal row by directly accessing data in column vectors. + */ +public class VectorizedSparkOrcNewRecordReader +extends org.apache.hadoop.mapreduce.RecordReader{ + private final org.apache.hadoop.mapred.RecordReader reader; + private final int numColumns; + private VectorizedRowBatch internalValue; + private float progress = 0.0f; + private List columnIDs; + + private long numRowsOfBatch = 0; + private int indexOfRow = 0; + + private final Row row; + + public VectorizedSparkOrcNewRecordReader( + Reader file, + JobConf conf, + FileSplit fileSplit, + List columnIDs) throws IOException { +List types = file.getTypes(); +numColumns = (types.size() == 0) ? 0 : types.get(0).getSubtypesCount(); +this.reader = new SparkVectorizedOrcRecordReader(file, conf, + new org.apache.hadoop.mapred.FileSplit(fileSplit)); + +this.columnIDs = new ArrayList<>(columnIDs); +this.internalValue = this.reader.createValue(); +this.progress = reader.getProgress(); +this.row = new Row(this.internalValue.cols, this.columnIDs); + } + + @Override + public void close() throws IOException { +reader.close(); + } + + @Override + public NullWritable getCurrentKey() throws IOException, + InterruptedException { +return NullWritable.get(); + } + + @Override + public InternalRow getCurrentValue() throws IOException, + InterruptedException { +if (indexOfRow >= numRowsOfBatch) { + return null; +} +row.rowId = indexOfRow; +indexOfRow++; + +return row; + } + + @Override + public float getProgress() throws IOException, InterruptedException { +return progress; + } + + @Override + public void initialize(InputSplit split, TaskAttemptContext context) + throws IOException, InterruptedException { + } + + @Override + public boolean nextKeyValue() throws IOException, InterruptedException { +if
[GitHub] spark issue #14592: [SPARK-17011][SQL] Support testing exceptions in SQLQuer...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14592 I'd like to do it incrementally, and ideally one SQL testing file(xxx.sql) one PR, but we can have many PRs at the same time, they are not likely to get conflicted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14590: [SPARK-17008][SPARK-17009][SQL] Normalization and...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14590#discussion_r74368764 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -126,14 +129,18 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { cleaned.split("(?<=[^]);").map(_.trim).filter(_ != "").toSeq } +// Create a local SparkSession to have stronger isolation between different test cases. +// This does not isolate catalog changes. +val localSparkSession = spark.newSession() --- End diff -- SparkSession should be fine. SparkContext is the expensive one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14592: [SPARK-17011][SQL] Support testing exceptions in ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14592#discussion_r74368708 --- Diff: sql/core/src/test/resources/sql-tests/results/number-format.sql.out --- @@ -19,16 +19,24 @@ struct<2147483648:bigint,(-2147483649):bigint> -- !query 2 -select 9223372036854775808, -9223372036854775809 +select 9223372036854775807, -9223372036854775808 -- !query 2 schema -struct<9223372036854775808:decimal(19,0),(-9223372036854775809):decimal(19,0)> +struct<9223372036854775807:bigint,(-9223372036854775808):decimal(19,0)> --- End diff -- maybe a parser bug? cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14580 **[Test build #63584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63584/consoleFull)** for PR 14580 at commit [`ddb4ddd`](https://github.com/apache/spark/commit/ddb4dddb1829098ef012cc63ddf059d663b8454b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14590: [SPARK-17008][SPARK-17009][SQL] Normalization and...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14590#discussion_r74368596 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -126,14 +129,18 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { cleaned.split("(?<=[^]);").map(_.trim).filter(_ != "").toSeq } +// Create a local SparkSession to have stronger isolation between different test cases. +// This does not isolate catalog changes. +val localSparkSession = spark.newSession() --- End diff -- Is it expensive? I do remember other tests share one spark session for performance reasons. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14568 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14568 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63576/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 I think we should think out of the SQL box. We know that Spark is not a subset of DBMS. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14568 **[Test build #63576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63576/consoleFull)** for PR 14568 at commit [`b4d4ea6`](https://github.com/apache/spark/commit/b4d4ea6213d1792e76a25cfe385fb2e3f11bfb6e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/13775 for the benchmark, how about we just test the scan operation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14593: [MINOR][DOCS] Fix style in examples and inconsistent ind...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14593 Also just for reviewers, the inconsistent stuffs I listed in the PR description happen randomly across documentation. So, this fixes them to be consistent according to style guide lines and resembling the majority in documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14593: [MINOR][DOCS] Fix style in examples and inconsistent ind...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14593 BTW, this is not fixing some wrong examples and inconsistent indentation codes in `structured-streaming-programming-guide.md` because https://github.com/apache/spark/pull/14564 is handling them. I made a separate PR for this because that PR is originally about fixing codes in `./examples`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14593: [MINOR][DOCS] Fix style in examples and inconsistent ind...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14593 **[Test build #63583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63583/consoleFull)** for PR 14593 at commit [`3cff947`](https://github.com/apache/spark/commit/3cff9477b10814d8fc9eeb27556b285e01d38956). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14593: [MINOR][DOCS] Fix style in examples and inconsist...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14593 [MINOR][DOCS] Fix style in examples and inconsistent indentation across documentation ## What changes were proposed in this pull request? This PR fixes the documentation as below: - Remove unnecessary spaces which is inconsistent spacing across documentation. - Fix the style in examples in documentation. This includes below: - Python has 4 spaces and Java and Scala has 2 spaces (See https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide). - Avoid excessive parentheses and curly braces for anonymous functions. (See https://github.com/databricks/scala-style-guide#anonymous) - Make consistent indentation for XML. - Remove trailing multiple whitespaces at the end of file and lines ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark minor-documentation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14593.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14593 commit c7d3e7b10b6bec585361aadef43e1f2046c0f5e2 Author: hyukjinkwonDate: 2016-08-11T04:06:39Z Fix style in examples and inconsistent indentation across documentation commit 3cff9477b10814d8fc9eeb27556b285e01d38956 Author: hyukjinkwon Date: 2016-08-11T04:36:18Z Fix all similar instances across documentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14592: [SPARK-17011][SQL] Support testing exceptions in SQLQuer...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/14592 @cloud-fan after adding enough features to the test harness, do you think I should port all tests over in a single pull request, or more incremental? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Not sure which RDBMS are supporting `Using Outer Join`. `NULL` generated by outer joins are removed. This sounds a little bit strange. After all, `NULL` also has a meaning. In the plan (by EXPLAIN), it is not easy to know this is a regular outer join or using outer join. That is why I think we should introduce a new join type. At least, users can easily know they are triggering using outer join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14592: [SPARK-17011][SQL] Support testing exceptions in SQLQuer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14592 **[Test build #63582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63582/consoleFull)** for PR 14592 at commit [`76defce`](https://github.com/apache/spark/commit/76defceb9fbaf13ca522da750d92eeb5f7799472). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r74367827 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/VectorizedSparkOrcNewRecordReader.java --- @@ -0,0 +1,318 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.io.orc; + +import java.io.IOException; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.List; + +import org.apache.commons.lang.NotImplementedException; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; + +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.catalyst.util.ArrayData; +import org.apache.spark.sql.catalyst.util.MapData; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.Decimal; +import org.apache.spark.unsafe.types.CalendarInterval; +import org.apache.spark.unsafe.types.UTF8String; + +/** + * A RecordReader that returns InternalRow for Spark SQL execution. + * This reader uses an internal reader that returns Hive's VectorizedRowBatch. An adapter + * class is used to return internal row by directly accessing data in column vectors. + */ +public class VectorizedSparkOrcNewRecordReader +extends org.apache.hadoop.mapreduce.RecordReader{ + private final org.apache.hadoop.mapred.RecordReader reader; + private final int numColumns; + private VectorizedRowBatch internalValue; + private float progress = 0.0f; + private List columnIDs; + + private long numRowsOfBatch = 0; + private int indexOfRow = 0; + + private final Row row; + + public VectorizedSparkOrcNewRecordReader( + Reader file, + JobConf conf, + FileSplit fileSplit, + List columnIDs) throws IOException { +List types = file.getTypes(); +numColumns = (types.size() == 0) ? 0 : types.get(0).getSubtypesCount(); +this.reader = new SparkVectorizedOrcRecordReader(file, conf, + new org.apache.hadoop.mapred.FileSplit(fileSplit)); + +this.columnIDs = new ArrayList<>(columnIDs); +this.internalValue = this.reader.createValue(); +this.progress = reader.getProgress(); +this.row = new Row(this.internalValue.cols, this.columnIDs); + } + + @Override + public void close() throws IOException { +reader.close(); + } + + @Override + public NullWritable getCurrentKey() throws IOException, + InterruptedException { +return NullWritable.get(); + } + + @Override + public InternalRow getCurrentValue() throws IOException, + InterruptedException { +if (indexOfRow >= numRowsOfBatch) { + return null; +} +row.rowId = indexOfRow; +indexOfRow++; + +return row; + } + + @Override + public float getProgress() throws IOException, InterruptedException { +return progress; + } + + @Override + public void initialize(InputSplit split, TaskAttemptContext context) + throws IOException, InterruptedException { + } + + @Override + public boolean nextKeyValue() throws IOException, InterruptedException { +if
[GitHub] spark issue #14588: [SPARK-17005][SQL] fix method tpe in trait AnnotationApi...
Github user keypointt commented on the issue: https://github.com/apache/spark/pull/14588 Oh I'm so sorry. It's breaking 2.10 anyway. I'll double check scala version compatibility next time submitting a PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14588: [SPARK-17005][SQL] fix method tpe in trait Annota...
Github user keypointt closed the pull request at: https://github.com/apache/spark/pull/14588 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14102 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63575/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14102 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14102 **[Test build #63575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63575/consoleFull)** for PR 14102 at commit [`bceda7b`](https://github.com/apache/spark/commit/bceda7ba4f06c0b6fd99f11ef2662f9f3a154af0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14589: [SPARK-17007][SQL] Move test data files into a te...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14589 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14592: [SPARK-17011][SQL] Support testing exceptions in SQLQuer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14592 **[Test build #63580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63580/consoleFull)** for PR 14592 at commit [`1a7cdc0`](https://github.com/apache/spark/commit/1a7cdc029f1cba22f7b8c59eaa241575b287983f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14583: [SPARK-16994][SQL] PushDownPredicate should not ignore l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14583 **[Test build #63581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63581/consoleFull)** for PR 14583 at commit [`d23d348`](https://github.com/apache/spark/commit/d23d348bb0c88211d87063bceaaabff7cc7a8a7a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14589: [SPARK-17007][SQL] Move test data files into a test-data...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14589 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14592: [SPARK-17011][SQL] Support testing exceptions in ...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14592#discussion_r74367468 --- Diff: sql/core/src/test/resources/sql-tests/results/number-format.sql.out --- @@ -19,16 +19,24 @@ struct<2147483648:bigint,(-2147483649):bigint> -- !query 2 -select 9223372036854775808, -9223372036854775809 +select 9223372036854775807, -9223372036854775808 -- !query 2 schema -struct<9223372036854775808:decimal(19,0),(-9223372036854775809):decimal(19,0)> +struct<9223372036854775807:bigint,(-9223372036854775808):decimal(19,0)> --- End diff -- Also cc @sarutak who wrote the original test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14589: [SPARK-17007][SQL] Move test data files into a test-data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14589 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14589: [SPARK-17007][SQL] Move test data files into a test-data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14589 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63573/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14592: [SPARK-17011][SQL] Support testing exceptions in ...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14592#discussion_r74367410 --- Diff: sql/core/src/test/resources/sql-tests/results/number-format.sql.out --- @@ -19,16 +19,24 @@ struct<2147483648:bigint,(-2147483649):bigint> -- !query 2 -select 9223372036854775808, -9223372036854775809 +select 9223372036854775807, -9223372036854775808 -- !query 2 schema -struct<9223372036854775808:decimal(19,0),(-9223372036854775809):decimal(19,0)> +struct<9223372036854775807:bigint,(-9223372036854775808):decimal(19,0)> --- End diff -- "-9223372036854775808" is a valid long value (Long.MinValue) but Spark treats it as a decimal(19, 0) because "9223372036854775808" is out of range. Is this expected? cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14589: [SPARK-17007][SQL] Move test data files into a test-data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14589 **[Test build #63573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63573/consoleFull)** for PR 14589 at commit [`3bc7c03`](https://github.com/apache/spark/commit/3bc7c03cb7ea226e2ace29e771b9b64eee91d13d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14546 **[Test build #63579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63579/consoleFull)** for PR 14546 at commit [`32c639c`](https://github.com/apache/spark/commit/32c639c49d23f0873b5fcc4c28fa809ee87f7005). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14592: [SPARK-17011][SQL] Support testing exceptions in ...
GitHub user petermaxlee opened a pull request: https://github.com/apache/spark/pull/14592 [SPARK-17011][SQL] Support testing exceptions in SQLQueryTestSuite ## What changes were proposed in this pull request? This patch adds exception testing to SQLQueryTestSuite. When there is an exception in query execution, the query result contains the the exception class along with the exception message. As part of this, I moved some additional test cases for limit from SQLQuerySuite over to SQLQueryTestSuite. ## How was this patch tested? This is a test harness change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/petermaxlee/spark SPARK-17011 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14592.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14592 commit 1a7cdc029f1cba22f7b8c59eaa241575b287983f Author: petermaxleeDate: 2016-08-11T04:19:56Z [SPARK-17011][SQL] Support testing exceptions in SQLQueryTestSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14397: [SPARK-16771][SQL] WITH clause should not fall into infi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14397 **[Test build #63578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63578/consoleFull)** for PR 14397 at commit [`178813e`](https://github.com/apache/spark/commit/178813ebf6e7d5f58ebab7784e07bfd5b8c5d883). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14546 Thank you for review, @gatorsmile . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14397: [SPARK-16771][SQL] WITH clause should not fall into infi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14397 **[Test build #63577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63577/consoleFull)** for PR 14397 at commit [`624bb3d`](https://github.com/apache/spark/commit/624bb3d9f6ffe558c1897501c06c76f938e15602). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14397: [SPARK-16771][SQL] WITH clause should not fall into infi...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14397 Rebased just to resolve conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution in CTE by ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14452 @gatorsmile Do you have concrete example for that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14590: [SPARK-17008][SPARK-17009][SQL] Normalization and...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14590 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14590: [SPARK-17008][SPARK-17009][SQL] Normalization and isolat...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14590 The failed Python test is unrelated. I'm going to merge this in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14590: [SPARK-17008][SPARK-17009][SQL] Normalization and...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14590#discussion_r74366616 --- Diff: sql/core/src/test/resources/sql-tests/results/datetime.sql.out --- @@ -0,0 +1,10 @@ +-- Automatically generated by org.apache.spark.sql.SQLQueryTestSuite --- End diff -- It might be better to remove the package name so we don't need to change all the generated files when we move this class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 Already pinged the previously involved Committers. Let us see what are their feedbacks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Yep. `EliminateOuterJoin` should be updated properly. Any idea? If you have more general idea, you can make a PR to override this. You made this optimizer. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 That is a public API. We are unable to remove it. https://github.com/apache/spark/pull/8600 has a serious bug. It has been fixed in another PR: https://github.com/apache/spark/pull/10353. Now, the issue is how to deal with using/natural outer join. Maybe we can introduce new join types. Or, in the rule, we can find a hacky way to know whether this outer join is nature/using joins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Yep. Exactly. That is what I mean. That is not a regular outer join you considered in this optimizer and now both features are Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14590: [SPARK-17008][SPARK-17009][SQL] Normalization and isolat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14590 **[Test build #3216 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3216/consoleFull)** for PR 14590 at commit [`e061820`](https://github.com/apache/spark/commit/e0618203c317f8b8211c0e983403834f8e39a950). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14580 For the regular outer join, the rule works fine. The issue you hit is caused by "using outer join" + "outer join elimination". Thus, your fix does not resolve the root issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14590: [SPARK-17008][SPARK-17009][SQL] Normalization and isolat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14590 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14590: [SPARK-17008][SPARK-17009][SQL] Normalization and isolat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14590 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63572/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14576: [SPARK-16391][SQL] ReduceAggregator and partial a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14576#discussion_r74365669 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/ReduceAggregator.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.expressions + +import org.apache.spark.annotation.Experimental +import org.apache.spark.sql.Encoder +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder + +/** + * :: Experimental :: + * An aggregator that uses a single associative and commutative reduce function. This reduce + * function can be used to go through all input values and reduces them to a single value. + * If there is no input, a null value is returned. + * + * @since 2.1.0 + */ +@Experimental +abstract class ReduceAggregator[T] extends Aggregator[T, (Boolean, T), T] { + + // Question 1: Should func and encoder be parameters rather than abstract methods? + // rxin: abstract method has better java compatibility and forces naming the concrete impl, + // whereas parameter has better type inference (infer encoders via context bounds). --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14590: [SPARK-17008][SPARK-17009][SQL] Normalization and isolat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14590 **[Test build #63572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63572/consoleFull)** for PR 14590 at commit [`e061820`](https://github.com/apache/spark/commit/e0618203c317f8b8211c0e983403834f8e39a950). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14591: [SPARK-17010][MINOR][DOC]Wrong description in memory man...
Github user WangTaoTheTonic commented on the issue: https://github.com/apache/spark/pull/14591 @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14588: [SPARK-17005][SQL] fix method tpe in trait AnnotationApi...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14588 Please let me cc @srowen to make sure because I believe it is about building. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14568 **[Test build #63576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63576/consoleFull)** for PR 14568 at commit [`b4d4ea6`](https://github.com/apache/spark/commit/b4d4ea6213d1792e76a25cfe385fb2e3f11bfb6e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 In addition, I'm wondering if you really want to remove that feature which was merged into 1.6 branch on Sep. 21 2015 and already released? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14591: [SPARK-17010][MINOR][DOC]Wrong description in memory man...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14591 **[Test build #63574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63574/consoleFull)** for PR 14591 at commit [`9d9bc2a`](https://github.com/apache/spark/commit/9d9bc2ae1420d91cea7779f38d329579e1ec126a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14591: [SPARK-17010][MINOR][DOC]Wrong description in memory man...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14591 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14591: [SPARK-17010][MINOR][DOC]Wrong description in memory man...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14591 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63574/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14580 Hi, @gatorsmile . Thank you for review. BTW, could you give me a reason why you think like the following? > The fix does not look right to me. What is the root cause which you think? I think I missed your context. For me, current optimizer work definitely incorrectly (as we see the reported case) and this PR fixes that now. I think this is not a SQL standard issue. If you give some counter examples, I can grasp your concern here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14583: [SPARK-16994][SQL] PushDownPredicate should not i...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14583#discussion_r74363622 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -1988,6 +1988,11 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } + test("SPARK-16994: filter should not be pushed down into local limit") { --- End diff -- Thank you, @gatorsmile . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14102 @cloud-fan Thanks! I think it is ready to be reviewed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14102: [SPARK-16434][SQL] Avoid per-record type dispatch in JSO...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14102 **[Test build #63575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63575/consoleFull)** for PR 14102 at commit [`bceda7b`](https://github.com/apache/spark/commit/bceda7ba4f06c0b6fd99f11ef2662f9f3a154af0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14591: [SPARK-17010][MINOR][DOC]Wrong description in memory man...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14591 **[Test build #63574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63574/consoleFull)** for PR 14591 at commit [`9d9bc2a`](https://github.com/apache/spark/commit/9d9bc2ae1420d91cea7779f38d329579e1ec126a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14591: [SPARK-17010][MINOR][DOC]Wrong description in mem...
GitHub user WangTaoTheTonic opened a pull request: https://github.com/apache/spark/pull/14591 [SPARK-17010][MINOR][DOC]Wrong description in memory management document ## What changes were proposed in this pull request? change the remain percent to right one. ## How was this patch tested? Manual review You can merge this pull request into a Git repository by running: $ git pull https://github.com/WangTaoTheTonic/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14591.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14591 commit 9d9bc2ae1420d91cea7779f38d329579e1ec126a Author: Tao WangDate: 2016-08-11T02:44:53Z Update tuning.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14589: [SPARK-17007][SQL] Move test data files into a test-data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14589 **[Test build #63573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63573/consoleFull)** for PR 14589 at commit [`3bc7c03`](https://github.com/apache/spark/commit/3bc7c03cb7ea226e2ace29e771b9b64eee91d13d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14589: [SPARK-17007][SQL] Move test data files into a test-data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14589 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14589: [SPARK-17007][SQL] Move test data files into a test-data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14589 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63571/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14589: [SPARK-17007][SQL] Move test data files into a test-data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14589 **[Test build #63571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63571/consoleFull)** for PR 14589 at commit [`17bc9c0`](https://github.com/apache/spark/commit/17bc9c0b259be87782e313f18b2b88de134811af). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r74362668 --- Diff: python/pyspark/cloudpickle.py --- @@ -194,7 +194,7 @@ def save_function(self, obj, name=None): # we'll pickle the actual function object rather than simply saving a # reference (as is done in default pickler), via save_function_tuple. if islambda(obj) or obj.__code__.co_filename == '' or themodule is None: -#print("save global", islambda(obj), obj.__code__.co_filename, modname, themodule) +# print("save global", islambda(obj), obj.__code__.co_filename, modname, themodule) --- End diff -- Seems like we might just want to remove this commented out line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r74362636 --- Diff: python/pep8rc --- @@ -0,0 +1,21 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +[pep8] --- End diff -- There is another pep8config file at ./dev/toxi.ini - seems like it would be good to have a single file (also unify the ignore lists) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14590: [SPARK-17008][SPARK-17009][SQL] Normalization and isolat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14590 **[Test build #63572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63572/consoleFull)** for PR 14590 at commit [`e061820`](https://github.com/apache/spark/commit/e0618203c317f8b8211c0e983403834f8e39a950). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org