[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12646 **[Test build #79788 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79788/testReport)** for PR 12646 at commit [`51ecfc8`](https://github.com/apache/spark/commit/51ecfc8e4acb0ffd6389726a8fa381dd040925a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r128426729 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2304,7 +2304,15 @@ object functions { * @group string_funcs * @since 1.5.0 */ - def ltrim(e: Column): Column = withExpr {StringTrimLeft(e.expr) } + def ltrim(e: Column): Column = withExpr {StringTrimLeft(e.expr)} + + /** + * Trim the specified character string from left end for the specified string column. + * @group string_funcs + * @since 2.2.0 --- End diff -- sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18680 @BryanCutler Thank you for reviewing! As for scope, yes, I'd like these APIs to be public. Do you have any concerns about it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18680#discussion_r128425605 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.arrow + +import scala.collection.JavaConverters._ + +import org.apache.arrow.memory.RootAllocator +import org.apache.arrow.vector.types.FloatingPointPrecision +import org.apache.arrow.vector.types.pojo.{ArrowType, Field, FieldType, Schema} + +import org.apache.spark.sql.types._ + +object ArrowUtils { + + val rootAllocator = new RootAllocator(Long.MaxValue) + + // todo: support more types. + + def toArrowType(dt: DataType): ArrowType = dt match { +case BooleanType => ArrowType.Bool.INSTANCE +case ByteType => new ArrowType.Int(8, true) +case ShortType => new ArrowType.Int(8 * 2, true) +case IntegerType => new ArrowType.Int(8 * 4, true) +case LongType => new ArrowType.Int(8 * 8, true) +case FloatType => new ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE) +case DoubleType => new ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE) +case StringType => ArrowType.Utf8.INSTANCE +case BinaryType => ArrowType.Binary.INSTANCE +case DecimalType.Fixed(precision, scale) => new ArrowType.Decimal(precision, scale) +case _ => throw new UnsupportedOperationException(s"Unsupported data type: ${dt.simpleString}") + } + + def fromArrowType(dt: ArrowType): DataType = dt match { +case ArrowType.Bool.INSTANCE => BooleanType +case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 => ByteType +case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 2 => ShortType +case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 4 => IntegerType +case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 8 => LongType +case float: ArrowType.FloatingPoint + if float.getPrecision() == FloatingPointPrecision.SINGLE => FloatType +case float: ArrowType.FloatingPoint + if float.getPrecision() == FloatingPointPrecision.DOUBLE => DoubleType +case ArrowType.Utf8.INSTANCE => StringType +case ArrowType.Binary.INSTANCE => BinaryType +case d: ArrowType.Decimal => DecimalType(d.getPrecision, d.getScale) +case _ => throw new UnsupportedOperationException(s"Unsupported data type: $dt") + } + + def toArrowField(name: String, dt: DataType, nullable: Boolean): Field = { --- End diff -- No, this is used to create an Arrow schema from `StructType` in `ArrowUtils .toArrowSchema()`, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18680#discussion_r128425637 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ReadOnlyColumnVector.java --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.vectorized; + +import org.apache.spark.memory.MemoryMode; +import org.apache.spark.sql.types.*; + +/** + * An abstract class for read-only column vector. + */ +public abstract class ReadOnlyColumnVector extends ColumnVector { --- End diff -- I agree that it'd be better to refactor `ColumnVector`, but I think `ColumnVector` is related to `ColumnarBatch` or other classes, so we should do it, and also refactor `ColumnarBatch` at the same time, in the future PRs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18680#discussion_r128425617 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java --- @@ -0,0 +1,545 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.vectorized; + +import org.apache.arrow.vector.*; +import org.apache.arrow.vector.complex.*; +import org.apache.arrow.vector.holders.NullableVarCharHolder; + +import org.apache.spark.memory.MemoryMode; +import org.apache.spark.sql.execution.arrow.ArrowUtils; +import org.apache.spark.sql.types.*; +import org.apache.spark.unsafe.types.UTF8String; + +/** + * A column backed by Apache Arrow. + */ +public final class ArrowColumnVector extends ReadOnlyColumnVector { + + private final ArrowVectorAccessor accessor; + + @Override + public long nullsNativeAddress() { +throw new RuntimeException("Cannot get native address for arrow column"); + } + + @Override + public long valuesNativeAddress() { +throw new RuntimeException("Cannot get native address for arrow column"); + } + + @Override + public void close() { +if (childColumns != null) { + for (int i = 0; i < childColumns.length; i++) { +childColumns[i].close(); + } +} +accessor.close(); + } + + // + // APIs dealing with nulls + // + + @Override + public boolean isNullAt(int rowId) { +return accessor.isNullAt(rowId); + } + + // + // APIs dealing with Booleans + // + + @Override + public boolean getBoolean(int rowId) { +return accessor.getBoolean(rowId); + } + + @Override + public boolean[] getBooleans(int rowId, int count) { +boolean[] array = new boolean[count]; +for (int i = 0; i < count; ++i) { + array[i] = accessor.getBoolean(rowId + i); +} +return array; + } + + // + // APIs dealing with Bytes + // + + @Override + public byte getByte(int rowId) { +return accessor.getByte(rowId); + } + + @Override + public byte[] getBytes(int rowId, int count) { +byte[] array = new byte[count]; +for (int i = 0; i < count; ++i) { + array[i] = accessor.getByte(rowId + i); +} +return array; + } + + // + // APIs dealing with Shorts + // + + @Override + public short getShort(int rowId) { +return accessor.getShort(rowId); + } + + @Override + public short[] getShorts(int rowId, int count) { +short[] array = new short[count]; +for (int i = 0; i < count; ++i) { + array[i] = accessor.getShort(rowId + i); +} +return array; + } + + // + // APIs dealing with Ints + // + + @Override + public int getInt(int rowId) { +return accessor.getInt(rowId); + } + + @Override + public int[] getInts(int rowId, int count) { +int[] array = new int[count]; +for (int i = 0; i < count; ++i) { + array[i] = accessor.getInt(rowId + i); +} +return array; + } + + @Override + public int getDictId(int rowId) { +throw new UnsupportedOperationException(); + } + + // + // APIs dealing with Longs + // + + @Override + public long getLong(int rowId) { +return accessor.getLong(rowId); + } + + @Override + public long[] getLongs(int rowId, int count) { +long[] array = new long[count]; +for (int i = 0; i < count; ++i) { + array[i] = accessor.getLong(rowId + i); +} +return array; + } + + // + // APIs dealing with floats + // + + @Override + public float getFloat(int rowId) { +return accessor.getFloat(rowId); + } + +
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18468 cc: @cloud-fan and @ueshin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [WIP] Pull non-deterministic joining keys from Join oper...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 I just checked with Hive's behavior (2.1.0). I tried a query like `select * from l left outer join r on rand(l.a) > 0.1 and rand(cast(l.b as int)) > 0.2 and rand(r.c) > 0.2 and rand(cast(r.d as int)) > 0.5;`. The conditions `rand(r.c) > 0.2 and rand(cast(r.d as int)) > 0.5` are pushed down to Filter operator. TableScan alias: r Statistics: Num rows: 2 Data size: 10 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: c (type: int), d (type: double) outputColumnNames: _col0, _col1 Statistics: Num rows: 2 Data size: 10 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((rand(UDFToInteger(_col1)) > 0.5) and (rand(_col0) > 0.2)) (type: boolean) Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator filter predicates: 0 {(rand(_col0) > 0.1)} {(rand(UDFToInteger(_col1)) > 0.2)} 1 keys: 0 The other conditions `rand(l.a) > 0.1 and rand(cast(l.b as int)) > 0.2` are filter predicates with the Join operator. Map Join Operator condition map: Left Outer Join0 to 1 filter predicates: 0 {(rand(_col0) > 0.1)} {(rand(UDFToInteger(_col1)) > 0.2)} 1 keys: 0 1 A query `select * from l left outer join r on rand(l.a) = rand(r.c);` with non-deterministic joining keys. There's no push down. Hive simply evaluates the joining keys. Map Join Operator condition map: Left Outer Join0 to 1 keys: 0 rand(_col0) (type: double) 1 rand(_col0) (type: double) outputColumnNames: _col0, _col1, _col2, _col3 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18185 Will review it tonight. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18388 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18388 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79785/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18680 **[Test build #79787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79787/testReport)** for PR 18680 at commit [`91b94ef`](https://github.com/apache/spark/commit/91b94ef6d08771fe8e5eb5d41f43153af9a75f06). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18388 **[Test build #79785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79785/testReport)** for PR 18388 at commit [`7dd2cec`](https://github.com/apache/spark/commit/7dd2cec311189feb555f3cfdbb27b29676efc18b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18680#discussion_r128422124 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ReadOnlyColumnVector.java --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.vectorized; + +import org.apache.spark.memory.MemoryMode; +import org.apache.spark.sql.types.*; + +/** + * An abstract class for read-only column vector. + */ +public abstract class ReadOnlyColumnVector extends ColumnVector { + + protected ReadOnlyColumnVector(int capacity, MemoryMode memMode) { --- End diff -- I see, I'll modify it to accept `dataType` but I guess we shouldn't pass it to `ColumnVector` to avoid illegally allocating child columns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18550: [Minor][SS][DOCS] Minor doc change for kafka integration
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18550 ping @tdas Please take a look for this simple doc change. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18444 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18444 Thanks! merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18655 **[Test build #79786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79786/testReport)** for PR 18655 at commit [`7084b38`](https://github.com/apache/spark/commit/7084b388d87c8347b79898827658d7827bf5649d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12646 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79782/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12646 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12646 **[Test build #79782 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79782/testReport)** for PR 12646 at commit [`9bb80ea`](https://github.com/apache/spark/commit/9bb80eaf8e0b4339850d8c48e221c8ad1e477552). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79783/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #79783 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79783/testReport)** for PR 18468 at commit [`4b4e281`](https://github.com/apache/spark/commit/4b4e2812d250d3d46fdbcd29c3e66964ea6dd345). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18607: [SPARK-21362][SQL][Adding Apache Drill JDBC Diale...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18607#discussion_r128414516 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/ApacheDrillDialect.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.jdbc + +import java.sql.Types + +import org.apache.spark.sql.types.{BooleanType, DataType, LongType, MetadataBuilder} --- End diff -- ditto. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18607: [SPARK-21362][SQL][Adding Apache Drill JDBC Diale...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18607#discussion_r128414496 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/ApacheDrillDialect.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.jdbc + +import java.sql.Types --- End diff -- Do we use this import? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16924: [SPARK-19531] Send UPDATE_LENGTH for Spark History servi...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16924 In `FsHistoryProvider`, since there is a check for file size, I think it is designed to find updated logs from running applications? https://github.com/apache/spark/blob/e26dac5feb02033f980b1e69c9b0ff50869b6f9e/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L331 Btw, in `EventLoggingListener`, if it writes to local files, seems to me the file length will be updated and so `FsHistoryProvider` can find the updated logs. Seems this looks reasonable to make the non local fs functionally consistent with local fs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18576 ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18185 ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128410279 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -443,12 +459,57 @@ final class ShuffleBlockFetcherIterator( } private def fetchUpToMaxBytes(): Unit = { -// Send fetch requests up to maxBytesInFlight -while (fetchRequests.nonEmpty && - (bytesInFlight == 0 || -(reqsInFlight + 1 <= maxReqsInFlight && - bytesInFlight + fetchRequests.front.size <= maxBytesInFlight))) { - sendRequest(fetchRequests.dequeue()) +// Send fetch requests up to maxBytesInFlight. If you cannot fetch from a remote host +// immediately, defer the request until the next time it can be processed. + +// Process any outstanding deferred fetch requests if possible. +if (deferredFetchRequests.nonEmpty) { + for ((remoteAddress, defReqQueue) <- deferredFetchRequests) { +while (isRemoteBlockFetchable(defReqQueue) && +!isRemoteAddressMaxedOut(remoteAddress, defReqQueue.front)) { + val request = defReqQueue.dequeue() + logDebug(s"Processing deferred fetch request for $remoteAddress with " ++ s"${request.blocks.length} blocks") + send(remoteAddress, request) + if (defReqQueue.isEmpty) { +deferredFetchRequests -= remoteAddress + } +} + } +} + +// Process any regular fetch requests if possible. +while (isRemoteBlockFetchable(fetchRequests)) { + val request = fetchRequests.dequeue() + val remoteAddress = request.address + if (isRemoteAddressMaxedOut(remoteAddress, request)) { +logDebug(s"Deferring fetch request for $remoteAddress with ${request.blocks.size} blocks") +val defReqQueue = deferredFetchRequests.getOrElse(remoteAddress, new Queue[FetchRequest]()) +defReqQueue.enqueue(request) +deferredFetchRequests(remoteAddress) = defReqQueue --- End diff -- the `defReqQueue` is mutable, so we don't need to do this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128410233 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -443,12 +459,57 @@ final class ShuffleBlockFetcherIterator( } private def fetchUpToMaxBytes(): Unit = { -// Send fetch requests up to maxBytesInFlight -while (fetchRequests.nonEmpty && - (bytesInFlight == 0 || -(reqsInFlight + 1 <= maxReqsInFlight && - bytesInFlight + fetchRequests.front.size <= maxBytesInFlight))) { - sendRequest(fetchRequests.dequeue()) +// Send fetch requests up to maxBytesInFlight. If you cannot fetch from a remote host +// immediately, defer the request until the next time it can be processed. + +// Process any outstanding deferred fetch requests if possible. +if (deferredFetchRequests.nonEmpty) { + for ((remoteAddress, defReqQueue) <- deferredFetchRequests) { +while (isRemoteBlockFetchable(defReqQueue) && +!isRemoteAddressMaxedOut(remoteAddress, defReqQueue.front)) { + val request = defReqQueue.dequeue() + logDebug(s"Processing deferred fetch request for $remoteAddress with " ++ s"${request.blocks.length} blocks") + send(remoteAddress, request) + if (defReqQueue.isEmpty) { +deferredFetchRequests -= remoteAddress --- End diff -- we can leave the empty queue here, as we may still have fetch requests to put in this queue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128409414 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -375,6 +390,7 @@ final class ShuffleBlockFetcherIterator( result match { case r @ SuccessFetchResult(blockId, address, size, buf, isNetworkReqDone) => if (address != blockManager.blockManagerId) { +numBlocksInFlightPerAddress(address) = numBlocksInFlightPerAddress(address) - 1 --- End diff -- can we do this earlier? e.g. right after the fetch result is enqueued to `results`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128408952 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -321,6 +321,17 @@ package object config { .intConf .createWithDefault(3) + private[spark] val REDUCER_MAX_BLOCKS_IN_FLIGHT_PER_ADDRESS = +ConfigBuilder("spark.reducer.maxBlocksInFlightPerAddress") + .doc("This configuration limits the number of remote blocks being fetched per reduce task" + +" from a given host port. When a large number of blocks are being requested from a given" + +" address in a single fetch or simultaneously, this could crash the serving executor or" + +" Node Manager. This is especially useful to reduce the load on the Node Manager when" + --- End diff -- shall we say `shuffle service` instead of `Node Manager`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18503: [SPARK-21271][SQL] Ensure Unsafe.sizeInBytes is a...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18503#discussion_r128408940 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java --- @@ -167,6 +167,7 @@ public UnsafeRow() {} */ public void pointTo(Object baseObject, long baseOffset, int sizeInBytes) { assert numFields >= 0 : "numFields (" + numFields + ") should >= 0"; +assert sizeInBytes % 8 == 0 : "sizeInBytes (" + sizeInBytes + ") should be a multiple of 8"; --- End diff -- Yes, done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18503: [SPARK-21271][SQL] Ensure Unsafe.sizeInBytes is a...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18503#discussion_r128408918 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -479,6 +479,61 @@ class StreamSuite extends StreamTest { CheckAnswer((1, 2), (2, 2), (3, 2))) } + testQuietly("store to and recover from a checkpoint") { --- End diff -- Ah, you are right. This test currently relies on internal assert at `Unsafe.pointTo` for checking a multiple of 8 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18487: [SPARK-21243][Core] Limit no. of map outputs in a shuffl...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18487 @rxin it's kind of a stability fix(make shuffle service more stable), so I'm ok to backport if the conflict is small. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18503: [SPARK-21271][SQL] Ensure Unsafe.sizeInBytes is a...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/18503#discussion_r128408410 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -363,7 +363,8 @@ private[state] class HDFSBackedStateStoreProvider extends StateStoreProvider wit val valueRowBuffer = new Array[Byte](valueSize) ByteStreams.readFully(input, valueRowBuffer, 0, valueSize) val valueRow = new UnsafeRow(valueSchema.fields.length) -valueRow.pointTo(valueRowBuffer, valueSize) +// If valueSize in existing file is not multiple of 8, round it down to multiple of 8 +valueRow.pointTo(valueRowBuffer, (valueSize / 8) * 8) --- End diff -- This isnt rounding. This essentially floor to the multiple of 8. @cloud-fan is this safe to do with ANY row generated in earlier Spark 2.0 - 2.2? I want to be 100% sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18503: [SPARK-21271][SQL] Ensure Unsafe.sizeInBytes is a...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/18503#discussion_r128408166 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -479,6 +479,61 @@ class StreamSuite extends StreamTest { CheckAnswer((1, 2), (2, 2), (3, 2))) } + testQuietly("store to and recover from a checkpoint") { --- End diff -- It does not really check it explicitly .. does it? It tests it implicitly by creating checkpoints and then restarting. There are other tests that already do the same thing. E.g. This test is effectively same as https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala#L88 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18281 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18281 **[Test build #79784 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79784/testReport)** for PR 18281 at commit [`ce14172`](https://github.com/apache/spark/commit/ce14172711b51a4321ed02a3cf8450a54374d4f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18281 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79784/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18388 **[Test build #79785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79785/testReport)** for PR 18388 at commit [`7dd2cec`](https://github.com/apache/spark/commit/7dd2cec311189feb555f3cfdbb27b29676efc18b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18664 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79781/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18664 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18664 **[Test build #79781 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79781/testReport)** for PR 18664 at commit [`b709d78`](https://github.com/apache/spark/commit/b709d78c03701f92f617651879ee33dada0c4da1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79780/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #79780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79780/testReport)** for PR 17848 at commit [`0ea4691`](https://github.com/apache/spark/commit/0ea4691d3ea979b86cb7c44f8290ff7dc805a8a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168][DStream] Add changes to use kinesis fetche...
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/18029 @budde @brkyvz - could I get some love here please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18281 **[Test build #79784 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79784/testReport)** for PR 18281 at commit [`ce14172`](https://github.com/apache/spark/commit/ce14172711b51a4321ed02a3cf8450a54374d4f5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #79783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79783/testReport)** for PR 18468 at commit [`4b4e281`](https://github.com/apache/spark/commit/4b4e2812d250d3d46fdbcd29c3e66964ea6dd345). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18607: [SPARK-21362][SQL][Adding Apache Drill JDBC Dialect]
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18607 Could you also add the docker-based test suite, like what we did in https://github.com/apache/spark/pull/9893/files? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12646 **[Test build #79782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79782/testReport)** for PR 12646 at commit [`9bb80ea`](https://github.com/apache/spark/commit/9bb80eaf8e0b4339850d8c48e221c8ad1e477552). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r128398109 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2304,7 +2304,15 @@ object functions { * @group string_funcs * @since 1.5.0 */ - def ltrim(e: Column): Column = withExpr {StringTrimLeft(e.expr) } + def ltrim(e: Column): Column = withExpr {StringTrimLeft(e.expr)} + + /** + * Trim the specified character string from left end for the specified string column. + * @group string_funcs + * @since 2.2.0 --- End diff -- Update the versions to 2.3.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/12646 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18686: [SPARK-21477] [SQL] [MINOR] Mark LocalTableScanExec's in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18686 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18686: [SPARK-21477] [SQL] [MINOR] Mark LocalTableScanExec's in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18686 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79777/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18686: [SPARK-21477] [SQL] [MINOR] Mark LocalTableScanExec's in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18686 **[Test build #79777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79777/testReport)** for PR 18686 at commit [`662d377`](https://github.com/apache/spark/commit/662d377ebcf8c62afa87cabaa6bfd4cd77fb9630). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/18664 @ueshin @holdenk I think I'm seeing an issue with transferring timestamp data to Pandas with Arrow, so I'll try to explain. Spark will assume the timestamp is in local time, so when converting to internal data, it will adjust with an offset to UTC from local timezone. Currently, the internal data is converted to Arrow without a timezone, which Arrow takes as timezone unaware. When a Pandas DataFrame is created from that data, it does not adjust to local time, so a different timestamp is shown. For my case below, using PST as local time, it will add 8 hours. ``` In [2]: dt = datetime.datetime(1970, 1, 1, 0, 0, 1) In [5]: TimestampType().toInternal(dt) Out[5]: 2880100 In [8]: df = spark.createDataFrame([(dt,)], schema=StructType([StructField("ts", TimestampType(), True)])) In [7]: df.show() +---+ | ts| +---+ |1970-01-01 00:00:01| +---+ In [9]: spark.conf.set("spark.sql.execution.arrow.enable", "true") In [10]: df.toPandas() Out[10]: ts 0 1970-01-01 08:00:01 In [11]: spark.conf.set("spark.sql.execution.arrow.enable", "false") In [12]: df.toPandas() Out[12]: ts 0 1970-01-01 00:00:01 ``` It wasn't a problem before Arrow because the data gets converted before going into Pandas. I believe there are a few different ways to handle this 1) Adjust the Spark internal data to represent local time, not UTC time, and create an Arrow field without specifying the timezone. 2) Give the Arrow field the timezone from `DateTimeUtils.defaultTimeZone()` and adjust the internal data to represent local time, not UTC time. 3) Give the Arrow field a "UTC" timezone, then no adjustments need to be done to the internal data but I think Pandas will still display as UTC and it would be up to the user to change timezone. I'm not sure what the best solution is because there could be issues with them all, any thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18444 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18444 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79774/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18444 **[Test build #79774 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79774/testReport)** for PR 18444 at commit [`a340745`](https://github.com/apache/spark/commit/a3407459405c2a5b3c7539d5075853e65c80f9cd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18664 **[Test build #79781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79781/testReport)** for PR 18664 at commit [`b709d78`](https://github.com/apache/spark/commit/b709d78c03701f92f617651879ee33dada0c4da1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18684: [SPARK-21475][Core] Use NIO's Files API to replace FileI...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18684 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79771/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18684: [SPARK-21475][Core] Use NIO's Files API to replace FileI...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18684 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18684: [SPARK-21475][Core] Use NIO's Files API to replace FileI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18684 **[Test build #79771 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79771/testReport)** for PR 18684 at commit [`f2d534a`](https://github.com/apache/spark/commit/f2d534a1693c31138b464ed1094dc05888cdc3d0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18676: [SPARK-21463] Allow userSpecifiedSchema to overri...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18676 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18676: [SPARK-21463] Allow userSpecifiedSchema to override part...
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/18676 Thanks! Merging to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18487: [SPARK-21243][Core] Limit no. of map outputs in a shuffl...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18487 hm is this a bug fix? if not we shouldn't cherry pick it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18462: [SPARK-21333][Docs] Removed invalid joinTypes fro...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18462 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18462: [SPARK-21333][Docs] Removed invalid joinTypes from javad...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18462 Thanks! Merging to master/2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18674: [SPARK-21456][MESOS] Make the driver failover_tim...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18674 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18674: [SPARK-21456][MESOS] Make the driver failover_timeout co...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18674 Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18674: [SPARK-21456][MESOS] Make the driver failover_timeout co...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/18674 @vanzin Thanks for the review. I have made the changes you recommended (documenting the zero default value and using the config key). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79779/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #79779 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79779/testReport)** for PR 17848 at commit [`43bb9a9`](https://github.com/apache/spark/commit/43bb9a9254d0d694b2be57ec6a3574d53e9c3141). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18674: [SPARK-21456][MESOS] Make the driver failover_timeout co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18674 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79778/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18674: [SPARK-21456][MESOS] Make the driver failover_timeout co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18674 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18674: [SPARK-21456][MESOS] Make the driver failover_timeout co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18674 **[Test build #79778 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79778/testReport)** for PR 18674 at commit [`f4a001f`](https://github.com/apache/spark/commit/f4a001faa612655c6c2aa7a7da85248be862241a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #79780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79780/testReport)** for PR 17848 at commit [`0ea4691`](https://github.com/apache/spark/commit/0ea4691d3ea979b86cb7c44f8290ff7dc805a8a7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #79779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79779/testReport)** for PR 17848 at commit [`43bb9a9`](https://github.com/apache/spark/commit/43bb9a9254d0d694b2be57ec6a3574d53e9c3141). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18665: [SPARK-21446] [SQL] Fix setAutoCommit never execu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18665 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18674: [SPARK-21456][MESOS] Make the driver failover_timeout co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18674 **[Test build #79778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79778/testReport)** for PR 18674 at commit [`f4a001f`](https://github.com/apache/spark/commit/f4a001faa612655c6c2aa7a7da85248be862241a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18665: [SPARK-21446] [SQL] Fix setAutoCommit never executed
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18665 Thanks! Merging to master/2.2/2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18676: [SPARK-21463] Allow userSpecifiedSchema to override part...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/18676 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17848 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79776/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #79776 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79776/testReport)** for PR 17848 at commit [`d0a9086`](https://github.com/apache/spark/commit/d0a90865ca7c6a9afd6fbb28b3e8d1c9c602013c). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18684: [SPARK-21475][Core] Use NIO's Files API to replace FileI...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18684 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79769/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18686: [SQL] [MINOR] Mark LocalTableScanExec's input data trans...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18686 **[Test build #79777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79777/testReport)** for PR 18686 at commit [`662d377`](https://github.com/apache/spark/commit/662d377ebcf8c62afa87cabaa6bfd4cd77fb9630). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18684: [SPARK-21475][Core] Use NIO's Files API to replace FileI...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18684 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18686: [SQL] [MINOR] Mark LocalTableScanExec's input data trans...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18686 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18684: [SPARK-21475][Core] Use NIO's Files API to replace FileI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18684 **[Test build #79769 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79769/testReport)** for PR 18684 at commit [`b9dad5a`](https://github.com/apache/spark/commit/b9dad5ac976261359623fafbbfa9389310272238). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18686: [SQL] [MINOR] Mark LocalTableScanExec's input dat...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/18686 [SQL] [MINOR] Mark LocalTableScanExec's input data transient ## What changes were proposed in this pull request? This PR is to mark the parameter `rows` and `unsafeRow` transient. It can avoid serializing the unneeded objects. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark LocalTableScanExec Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18686.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18686 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18503: [SPARK-21271][SQL] Ensure Unsafe.sizeInBytes is a multip...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18503 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18503: [SPARK-21271][SQL] Ensure Unsafe.sizeInBytes is a multip...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79770/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18503: [SPARK-21271][SQL] Ensure Unsafe.sizeInBytes is a multip...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18503 **[Test build #79770 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79770/testReport)** for PR 18503 at commit [`762f02a`](https://github.com/apache/spark/commit/762f02a2c9211ab953a2dc4b2d9938911f2e883d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF and Ja...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17848 **[Test build #79776 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79776/testReport)** for PR 17848 at commit [`d0a9086`](https://github.com/apache/spark/commit/d0a90865ca7c6a9afd6fbb28b3e8d1c9c602013c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18281 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org