[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18749 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18828 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80280/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18828 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18828 **[Test build #80280 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80280/testReport)** for PR 18828 at commit [`861f255`](https://github.com/apache/spark/commit/861f255638e0b301e3a21b5a0bc491fbfc9537d4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18317: [SPARK-21113][CORE] Read ahead input stream to am...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/18317#discussion_r131516266 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -0,0 +1,279 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import com.google.common.base.Preconditions; +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.GuardedBy; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.locks.Condition; +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReentrantLock; + +/** + * {@link InputStream} implementation which asynchronously reads ahead from the underlying input + * stream when specified amount of data has been read from the current buffer. It does it by maintaining + * two buffer - active buffer and read ahead buffer. Active buffer contains data which should be returned + * when a read() call is issued. The read ahead buffer is used to asynchronously read from the underlying + * input stream and once the current active buffer is exhausted, we flip the two buffers so that we can + * start reading from the read ahead buffer without being blocked in disk I/O. + */ +public class ReadAheadInputStream extends InputStream { + + private Lock stateChangeLock = new ReentrantLock(); + + @GuardedBy("stateChangeLock") + private ByteBuffer activeBuffer; + + @GuardedBy("stateChangeLock") + private ByteBuffer readAheadBuffer; + + @GuardedBy("stateChangeLock") + private boolean endOfStream; + + @GuardedBy("stateChangeLock") + // true if async read is in progress + private boolean isReadInProgress; + + @GuardedBy("stateChangeLock") + // true if read is aborted due to an exception in reading from underlying input stream. + private boolean isReadAborted; + + @GuardedBy("stateChangeLock") + private Exception readException; + + // If the remaining data size in the current buffer is below this threshold, + // we issue an async read from the underlying input stream. + private final int readAheadThresholdInBytes; + + private final InputStream underlyingInputStream; + + private final ExecutorService executorService = Executors.newSingleThreadExecutor(); + + private final Condition asyncReadComplete = stateChangeLock.newCondition(); + + private final byte[] oneByte = new byte[1]; + + public ReadAheadInputStream(InputStream inputStream, int bufferSizeInBytes, int readAheadThresholdInBytes) { +Preconditions.checkArgument(bufferSizeInBytes > 0, "bufferSizeInBytes should be greater than 0"); +Preconditions.checkArgument(readAheadThresholdInBytes > 0 && readAheadThresholdInBytes < bufferSizeInBytes, +"readAheadThresholdInBytes should be greater than 0 and less than bufferSizeInBytes" ); +activeBuffer = ByteBuffer.allocate(bufferSizeInBytes); +readAheadBuffer = ByteBuffer.allocate(bufferSizeInBytes); +this.readAheadThresholdInBytes = readAheadThresholdInBytes; +this.underlyingInputStream = inputStream; +activeBuffer.flip(); +readAheadBuffer.flip(); + } + + private boolean isEndOfStream() { +if(activeBuffer.remaining() == 0 && readAheadBuffer.remaining() == 0 && endOfStream) { + return true; +} +return false; + } + + + private void readAsync(final ByteBuffer byteBuffer) throws IOException { +stateChangeLock.lock(); +if (endOfStream || isReadInProgress) { + stateChangeLock.unlock(); + return; +} +byteBuffer.position(0); +byteBuffer.flip(); +isReadInProgress = true; +stateChangeLock.unlock(); +executorService.execute(() -> { + byte[] arr; + stateChangeLock.lock(); + arr = byteBuffer.array(); + stateChangeLock.unlock(); + // Please note that it is safe to release the lock and read into the read ahead buffer
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18538 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18538 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80281/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18538 **[Test build #80281 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80281/testReport)** for PR 18538 at commit [`cfcb106`](https://github.com/apache/spark/commit/cfcb106788e5ea2b905767ff23825c4e5a9bc1e9). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18538 **[Test build #80281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80281/testReport)** for PR 18538 at commit [`cfcb106`](https://github.com/apache/spark/commit/cfcb106788e5ea2b905767ff23825c4e5a9bc1e9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18538 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18538 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18538 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive infor...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18765 You need to close it by yourself. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact se...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18802 cc @markgrover @vanzin Could you please take a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18779 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18828 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18828 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80279/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18828 **[Test build #80279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80279/testReport)** for PR 18828 at commit [`84e76b9`](https://github.com/apache/spark/commit/84e76b938517363de7d20adb9f2b87486293edac). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18779 Thanks! Merging to master/2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18779 Really appreciate your patience and your work! This is how we work in Spark SQL. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop....
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131515429 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -99,18 +100,36 @@ class SparkHadoopUtil extends Logging { hadoopConf.set("fs.s3a.session.token", sessionToken) } } - // Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" - conf.getAll.foreach { case (key, value) => -if (key.startsWith("spark.hadoop.")) { - hadoopConf.set(key.substring("spark.hadoop.".length), value) -} - } + appendSparkHadoopConfigs(conf, hadoopConf) val bufferSize = conf.get("spark.buffer.size", "65536") hadoopConf.set("io.file.buffer.size", bufferSize) } } /** + * Appends spark.hadoop.* configurations from a [[SparkConf]] to a Hadoop + * configuration without the spark.hadoop. prefix. + */ + def appendSparkHadoopConfigs(conf: SparkConf, hadoopConf: Configuration): Unit = { +// Copy any "spark.hadoop.foo=bar" spark properties into conf as "foo=bar" +conf.getAll.foreach { case (key, value) if key.startsWith("spark.hadoop.") => + hadoopConf.set(key.substring("spark.hadoop.".length), value) +} + } + + /** + * Appends spark.hadoop.* configurations from a Map to another without the spark.hadoop. prefix. + */ + def appendSparkHadoopConfigs( + srcMap: Map[String, String], + destMap: HashMap[String, String]): Unit = { +// Copy any "spark.hadoop.foo=bar" system properties into destMap as "foo=bar" +srcMap.foreach { case (key, value) if key.startsWith("spark.hadoop.") => + destMap.put(key.substring("spark.hadoop.".length), value) +} --- End diff -- Your solution requires another `case _ => ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop....
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131515413 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -99,18 +100,36 @@ class SparkHadoopUtil extends Logging { hadoopConf.set("fs.s3a.session.token", sessionToken) } } - // Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" - conf.getAll.foreach { case (key, value) => -if (key.startsWith("spark.hadoop.")) { - hadoopConf.set(key.substring("spark.hadoop.".length), value) -} - } + appendSparkHadoopConfigs(conf, hadoopConf) val bufferSize = conf.get("spark.buffer.size", "65536") hadoopConf.set("io.file.buffer.size", bufferSize) } } /** + * Appends spark.hadoop.* configurations from a [[SparkConf]] to a Hadoop + * configuration without the spark.hadoop. prefix. + */ + def appendSparkHadoopConfigs(conf: SparkConf, hadoopConf: Configuration): Unit = { +// Copy any "spark.hadoop.foo=bar" spark properties into conf as "foo=bar" +conf.getAll.foreach { case (key, value) if key.startsWith("spark.hadoop.") => + hadoopConf.set(key.substring("spark.hadoop.".length), value) +} + } + + /** + * Appends spark.hadoop.* configurations from a Map to another without the spark.hadoop. prefix. + */ + def appendSparkHadoopConfigs( + srcMap: Map[String, String], + destMap: HashMap[String, String]): Unit = { +// Copy any "spark.hadoop.foo=bar" system properties into destMap as "foo=bar" +srcMap.foreach { case (key, value) if key.startsWith("spark.hadoop.") => + destMap.put(key.substring("spark.hadoop.".length), value) +} --- End diff -- Your change is different from what I posted before ```Scala for ((key, value) <- conf if key.startsWith("spark.hadoop.")) { propMap.put(key.substring("spark.hadoop.".length), value) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18623: [SPARK-21374][CORE] Fix reading globbed paths fro...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18623 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18848: [SPARK-21374][CORE] Fix reading globbed paths fro...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18848 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18848: [SPARK-21374][CORE] Fix reading globbed paths from S3 in...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18848 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80277/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18779 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18779 **[Test build #80277 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80277/testReport)** for PR 18779 at commit [`2bf42b1`](https://github.com/apache/spark/commit/2bf42b11cec794d19ed8f2fd000a9cd7aeb159bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18317: [SPARK-21113][CORE] Read ahead input stream to am...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18317#discussion_r131514782 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -0,0 +1,279 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import com.google.common.base.Preconditions; +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.GuardedBy; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.locks.Condition; +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReentrantLock; + +/** + * {@link InputStream} implementation which asynchronously reads ahead from the underlying input + * stream when specified amount of data has been read from the current buffer. It does it by maintaining + * two buffer - active buffer and read ahead buffer. Active buffer contains data which should be returned + * when a read() call is issued. The read ahead buffer is used to asynchronously read from the underlying + * input stream and once the current active buffer is exhausted, we flip the two buffers so that we can + * start reading from the read ahead buffer without being blocked in disk I/O. + */ +public class ReadAheadInputStream extends InputStream { + + private Lock stateChangeLock = new ReentrantLock(); + + @GuardedBy("stateChangeLock") + private ByteBuffer activeBuffer; + + @GuardedBy("stateChangeLock") + private ByteBuffer readAheadBuffer; + + @GuardedBy("stateChangeLock") + private boolean endOfStream; + + @GuardedBy("stateChangeLock") + // true if async read is in progress + private boolean isReadInProgress; + + @GuardedBy("stateChangeLock") + // true if read is aborted due to an exception in reading from underlying input stream. + private boolean isReadAborted; + + @GuardedBy("stateChangeLock") + private Exception readException; + + // If the remaining data size in the current buffer is below this threshold, + // we issue an async read from the underlying input stream. + private final int readAheadThresholdInBytes; + + private final InputStream underlyingInputStream; + + private final ExecutorService executorService = Executors.newSingleThreadExecutor(); + + private final Condition asyncReadComplete = stateChangeLock.newCondition(); + + private final byte[] oneByte = new byte[1]; + + public ReadAheadInputStream(InputStream inputStream, int bufferSizeInBytes, int readAheadThresholdInBytes) { +Preconditions.checkArgument(bufferSizeInBytes > 0, "bufferSizeInBytes should be greater than 0"); +Preconditions.checkArgument(readAheadThresholdInBytes > 0 && readAheadThresholdInBytes < bufferSizeInBytes, +"readAheadThresholdInBytes should be greater than 0 and less than bufferSizeInBytes" ); +activeBuffer = ByteBuffer.allocate(bufferSizeInBytes); +readAheadBuffer = ByteBuffer.allocate(bufferSizeInBytes); +this.readAheadThresholdInBytes = readAheadThresholdInBytes; +this.underlyingInputStream = inputStream; +activeBuffer.flip(); +readAheadBuffer.flip(); + } + + private boolean isEndOfStream() { +if(activeBuffer.remaining() == 0 && readAheadBuffer.remaining() == 0 && endOfStream) { + return true; +} +return false; + } + + + private void readAsync(final ByteBuffer byteBuffer) throws IOException { +stateChangeLock.lock(); +if (endOfStream || isReadInProgress) { + stateChangeLock.unlock(); + return; +} +byteBuffer.position(0); +byteBuffer.flip(); +isReadInProgress = true; +stateChangeLock.unlock(); +executorService.execute(() -> { + byte[] arr; + stateChangeLock.lock(); + arr = byteBuffer.array(); + stateChangeLock.unlock(); + // Please note that it is safe to release the lock and read into the read ahead buffer +
[GitHub] spark issue #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop.*` prop...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18668 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80278/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop.*` prop...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18668 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop.*` prop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18668 **[Test build #80278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80278/testReport)** for PR 18668 at commit [`55729fa`](https://github.com/apache/spark/commit/55729fa343e285f3711154dc0adcb4585a5e6f1f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18828 **[Test build #80280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80280/testReport)** for PR 18828 at commit [`861f255`](https://github.com/apache/spark/commit/861f255638e0b301e3a21b5a0bc491fbfc9537d4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18828 **[Test build #80279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80279/testReport)** for PR 18828 at commit [`84e76b9`](https://github.com/apache/spark/commit/84e76b938517363de7d20adb9f2b87486293edac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18317 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80272/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18317 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18317 **[Test build #80272 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80272/testReport)** for PR 18317 at commit [`89cbac8`](https://github.com/apache/spark/commit/89cbac877eefe0e0dc5d5956331c228be0382f72). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18749 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18749 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80274/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18749 **[Test build #80274 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80274/testReport)** for PR 18749 at commit [`974eab2`](https://github.com/apache/spark/commit/974eab27d77169c7bc7205595e3702b32865). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop.*` prop...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18668 **[Test build #80278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80278/testReport)** for PR 18668 at commit [`55729fa`](https://github.com/apache/spark/commit/55729fa343e285f3711154dc0adcb4585a5e6f1f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop....
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131513659 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -99,17 +100,30 @@ class SparkHadoopUtil extends Logging { hadoopConf.set("fs.s3a.session.token", sessionToken) } } - // Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" - conf.getAll.foreach { case (key, value) => -if (key.startsWith("spark.hadoop.")) { - hadoopConf.set(key.substring("spark.hadoop.".length), value) -} - } + appendSparkHadoopConfigs(conf, hadoopConf) val bufferSize = conf.get("spark.buffer.size", "65536") hadoopConf.set("io.file.buffer.size", bufferSize) } } + def appendSparkHadoopConfigs(conf: SparkConf, hadoopConf: Configuration): Unit = { +// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" +conf.getAll.foreach { case (key, value) => + if (key.startsWith("spark.hadoop.")) { +hadoopConf.set(key.substring("spark.hadoop.".length), value) + } +} + } + + def appendSparkHadoopConfigs(propMap: HashMap[String, String]): Unit = { +// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" +sys.props.foreach { case (key, value) => + if (key.startsWith("spark.hadoop.")) { +propMap.put(key.substring("spark.hadoop.".length), value) + } --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop....
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131513657 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -404,6 +405,8 @@ private[spark] object HiveUtils extends Logging { propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "") propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "") +SparkHadoopUtil.get.appendSparkHadoopConfigs(propMap) --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18317 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18317 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80273/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18317 **[Test build #80273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80273/testReport)** for PR 18317 at commit [`d84269d`](https://github.com/apache/spark/commit/d84269d7208ccacb20795d5a46fcca52584bcbef). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18851 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80275/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18851 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18851 **[Test build #80275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80275/testReport)** for PR 18851 at commit [`9911176`](https://github.com/apache/spark/commit/9911176d901a6bdfadec3ef68b28e6dd2c82be9e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18828 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18828 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80276/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18828 **[Test build #80276 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80276/testReport)** for PR 18828 at commit [`8f2f91d`](https://github.com/apache/spark/commit/8f2f91d9dfd0e19e0f60fc70f3f10ef73a2d0019). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/18779 I learned a lot from you, thanks all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18779 **[Test build #80277 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80277/testReport)** for PR 18779 at commit [`2bf42b1`](https://github.com/apache/spark/commit/2bf42b11cec794d19ed8f2fd000a9cd7aeb159bb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18841: [SPARK-21635][SQL] ACOS(2) and ASIN(2) should be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18841#discussion_r131512180 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -170,29 +193,29 @@ case class Pi() extends LeafMathExpression(math.Pi, "PI") // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr) - Returns the inverse cosine (a.k.a. arccosine) of `expr` if -1<=`expr`<=1 or NaN otherwise.", + usage = "_FUNC_(expr) - Returns the inverse cosine (a.k.a. arccosine) of `expr` if -1<=`expr`<=1 or NULL otherwise.", extended = """ Examples: > SELECT _FUNC_(1); 0.0 > SELECT _FUNC_(2); - NaN + NULL --- End diff -- I checked the Hive patch. I saw there is a comment that questions about NaN in Hive. In SparkSQL, I think NaN is used in some expressions, e.g., IsNaN, NaNvl. Maybe we should keep the existing behavior? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80271/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18731 **[Test build #80271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80271/testReport)** for PR 18731 at commit [`6071daf`](https://github.com/apache/spark/commit/6071dafb1269f54c9246ebe1cdccaad721365971). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18849 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80270/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18849 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18849 **[Test build #80270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80270/testReport)** for PR 18849 at commit [`7ccf474`](https://github.com/apache/spark/commit/7ccf4743024a8a447a4b05369f6ebf237cf88c4f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18841: [SPARK-21635][SQL] ACOS(2) and ASIN(2) should be ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18841#discussion_r131511957 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -170,29 +193,29 @@ case class Pi() extends LeafMathExpression(math.Pi, "PI") // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr) - Returns the inverse cosine (a.k.a. arccosine) of `expr` if -1<=`expr`<=1 or NaN otherwise.", + usage = "_FUNC_(expr) - Returns the inverse cosine (a.k.a. arccosine) of `expr` if -1<=`expr`<=1 or NULL otherwise.", extended = """ Examples: > SELECT _FUNC_(1); 0.0 > SELECT _FUNC_(2); - NaN + NULL --- End diff -- As we already explicitly define the result value (NaN), I'm not sure we should change it like this. Compatibility issue might be arisen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18828 **[Test build #80276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80276/testReport)** for PR 18828 at commit [`8f2f91d`](https://github.com/apache/spark/commit/8f2f91d9dfd0e19e0f60fc70f3f10ef73a2d0019). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18851 **[Test build #80275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80275/testReport)** for PR 18851 at commit [`9911176`](https://github.com/apache/spark/commit/9911176d901a6bdfadec3ef68b28e6dd2c82be9e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18851 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined ...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18851 [SPARK-21644][SQL] LocalLimit.maxRows is defined incorrectly ## What changes were proposed in this pull request? The definition of `maxRows` in `LocalLimit` operator was simply wrong. This patch introduces a new `maxRowsPerPartition` method and uses that in pruning. The patch also adds more documentation on why we need local limit vs global limit. Note that this previously has never been a bug because the way the code is structured, but future use of the maxRows could lead to bugs. ## How was this patch tested? Should be covered by existing test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-21644 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18851.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18851 commit 9911176d901a6bdfadec3ef68b28e6dd2c82be9e Author: Reynold Xin Date: 2017-08-05T01:19:26Z [SPARK-21644][SQL] LocalLimit.maxRows is defined incorrectly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18844 Actually why do we need this? Can't you just add Error? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18749 **[Test build #80274 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80274/testReport)** for PR 18749 at commit [`974eab2`](https://github.com/apache/spark/commit/974eab27d77169c7bc7205595e3702b32865). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18830: [SPARK-21621][Core] Reset numRecordsWritten after DiskBl...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/18830 Thanks for reviewing. Hi @jiangxb1987, seems the test didn't triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18749 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18749 I think of opening a discussion in the mailing list if you are suggesting something. I don't think we should talk about this here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18317 **[Test build #80273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80273/testReport)** for PR 18317 at commit [`d84269d`](https://github.com/apache/spark/commit/d84269d7208ccacb20795d5a46fcca52584bcbef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18317 **[Test build #80272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80272/testReport)** for PR 18317 at commit [`89cbac8`](https://github.com/apache/spark/commit/89cbac877eefe0e0dc5d5956331c228be0382f72). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18659: [SPARK-21404][PYSPARK][WIP] Simple Python Vectorized UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18659 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18659: [SPARK-21404][PYSPARK][WIP] Simple Python Vectorized UDF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18659 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80265/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18659: [SPARK-21404][PYSPARK][WIP] Simple Python Vectorized UDF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18659 **[Test build #80265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80265/testReport)** for PR 18659 at commit [`a01a2d3`](https://github.com/apache/spark/commit/a01a2d3a338083b043e7183957f416e7aefee527). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18844 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18844 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80266/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18844 **[Test build #80266 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80266/testReport)** for PR 18844 at commit [`f8bcf47`](https://github.com/apache/spark/commit/f8bcf477303b48f5a9746ca8cd2a05f1ab6852ea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18742 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18742 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80269/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18742 **[Test build #80269 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80269/testReport)** for PR 18742 at commit [`e8c3041`](https://github.com/apache/spark/commit/e8c3041ce38c707b854a661b751ed3350e8f0371). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18317 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18317 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80268/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18317 **[Test build #80268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80268/testReport)** for PR 18317 at commit [`a83a3d2`](https://github.com/apache/spark/commit/a83a3d20694126a1d1b4c4b9c28e8362713b60f8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18317 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18848: [SPARK-21374][CORE] Fix reading globbed paths from S3 in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18848 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18848: [SPARK-21374][CORE] Fix reading globbed paths from S3 in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80263/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18317 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80267/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18317 **[Test build #80267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80267/testReport)** for PR 18317 at commit [`42740be`](https://github.com/apache/spark/commit/42740be3b174c7b6978e77105084e0d8d70fdaa3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18848: [SPARK-21374][CORE] Fix reading globbed paths from S3 in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18848 **[Test build #80263 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80263/testReport)** for PR 18848 at commit [`9d9b57a`](https://github.com/apache/spark/commit/9d9b57a68d6d0504f4b58fd3fc99d6f678e4f213). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r131506100 --- Diff: python/pyspark/sql/tests.py --- @@ -3036,6 +3052,9 @@ def test_toPandas_arrow_toggle(self): pdf = df.toPandas() self.spark.conf.set("spark.sql.execution.arrow.enable", "true") pdf_arrow = df.toPandas() +# need to remove timezone for comparison +pdf_arrow["7_timestamp_t"] = \ +pdf_arrow["7_timestamp_t"].apply(lambda ts: ts.tz_localize(None)) --- End diff -- Which way do you mean, with or without Arrow? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18731 **[Test build #80271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80271/testReport)** for PR 18731 at commit [`6071daf`](https://github.com/apache/spark/commit/6071dafb1269f54c9246ebe1cdccaad721365971). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18850: Remove dropping bus logic.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18850 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18850: Remove dropping bus logic.
GitHub user dirkraft opened a pull request: https://github.com/apache/spark/pull/18850 Remove dropping bus logic. Log warning when event uptake takes too long, but block anyways. Since dropping events is no longer possible, removed that code. ## What changes were proposed in this pull request? LiveListenerBus events are now never dropped. I believe that critical components communicate through this bus. I will also add the argument that a buggy spark UI (because of dropped events) is just as useless, and so I am unaware of any kind of event which can be lossy. ## How was this patch tested? Basically we ran it in our production environment. Our past couple spark runs now reliably complete with this change since crucial events never get dropped, but it is unclear how much lag this change might be contributing to the overall run time. We have partition counts in the thousands and executors in the hundreds. Perhaps the events can be tagged as critical (`queue.put`) or not (`queue.offer`), but this small change is meant to get spark stable again. We have turned up the eventqueue size to 1,000,000 (and fiddled with all available settings), but it still isn't enough. With some probability, enough crucial events are dropped and leaves the spark job hung indefinitely unable to recover (sometimes it does). The large eventqueue also maxed out our driver process with 64GB of memory, so that's pretty much untenable. With this change and over tens (hundreds?) of millions of events flowing through this bus, only 1100 triggered the slow warning, usually around 20ms with a max of 100ms. The current working hypothesis is that the events tend to arrive in bursts and so quickly overwhelm the queue and then quickly empty out. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dirkraft/spark dont-drop-events-ever Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18850.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18850 commit 56c4c1d87716b6f629be83fff8197b36a607a9ae Author: Jason Dunkelberger Date: 2017-08-04T23:21:02Z Remove droppy logic. Log warning when event uptake takes too long, but block anyways. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18849: [SPARK-21617][SQL] Store correct table metadata w...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18849#discussion_r131504834 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -908,7 +909,13 @@ private[hive] object HiveClientImpl { } // after SPARK-19279, it is not allowed to create a hive table with an empty schema, // so here we should not add a default col schema -if (schema.isEmpty && DDLUtils.isDatasourceTable(table)) { +// +// Because HiveExternalCatalog sometimes writes back "raw" tables that have not been +// completely translated to Spark's view, the provider information needs to be looked +// up in two places. +val provider = table.provider.orElse( --- End diff -- This change would have fixed the second exception in the bug (about storing an empty schema); but the code was just ending up in that situation because of the other problems this PR is fixing. This change shouldn't be needed for the fix, but I included it for further correctness. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18849 **[Test build #80270 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80270/testReport)** for PR 18849 at commit [`7ccf474`](https://github.com/apache/spark/commit/7ccf4743024a8a447a4b05369f6ebf237cf88c4f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18849 This is a corrected version of #18824 after I tracked the actual failure and looked at the suggested code paths in the original review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18849: [SPARK-21617][SQL] Store correct table metadata w...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/18849 [SPARK-21617][SQL] Store correct table metadata when altering schema in Hive metastore. HiveExternalCatalog.alterTableSchema takes a shortcut by modifying the raw Hive table metadata instead of the full Spark view; that means it needs to be aware of whether the table is Hive-compatible or not. For compatible tables, the current "replace the schema" code is the correct path, except that an exception in that path should result in an error, and not in retrying in a different way. For non-compatible tables, Spark should just update the table properties, and leave the schema stored in the raw table untouched. Because Spark doesn't explicitly store metadata about whether a table is Hive-compatible or not, a new property was added just to make that explicit. The code tries to detect old DS tables that don't have the property and do the right thing. These changes also uncovered a problem with the way case-sensitive DS tables were being saved to the Hive metastore; the metastore is case-insensitive, and the code was treating these tables as Hive-compatible if the data source had a Hive counterpart (e.g. for parquet). In this scenario, the schema could be corrupted when being updated from Spark if conflicting columns existed ignoring case. The change fixes this by making case-sensitive DS-tables not Hive-compatible. Tested with existing and added unit tests (plus internal tests with a 2.1 metastore). You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-21617 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18849.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18849 commit aae3abd673adc7ff939d842e49d566fa722403a3 Author: Marcelo Vanzin Date: 2017-08-02T21:47:34Z [SPARK-21617][SQL] Store correct metadata in Hive for altered DS table. This change fixes two issues: - when loading table metadata from Hive, restore the "provider" field of CatalogTable so DS tables can be identified. - when altering a DS table in the Hive metastore, make sure to not alter the table's schema, since the DS table's schema is stored as a table property in those cases. Also added a new unit test for this issue which fails without this change. commit 2350b105a599dde849e44bde50aa6d13812e4f83 Author: Marcelo Vanzin Date: 2017-08-04T22:49:31Z Fix 2.1 DDL suite to not use SparkSession. commit 7ccf4743024a8a447a4b05369f6ebf237cf88c4f Author: Marcelo Vanzin Date: 2017-08-04T22:57:44Z Proper fix. HiveExternalCatalog.alterTableSchema takes a shortcut by modifying the raw Hive table metadata instead of the full Spark view; that means it needs to be aware of whether the table is Hive-compatible or not. For compatible tables, the current "replace the schema" code is the correct path, except that an exception in that path should result in an error, and not in retrying in a different way. For non-compatible tables, Spark should just update the table properties, and leave the schema stored in the raw table untouched. Because Spark doesn't explicitly store metadata about whether a table is Hive-compatible or not, a new property was added just to make that explicit. The code tries to detect old DS tables that don't have the property and do the right thing. These changes also uncovered a problem with the way case-sensitive DS tables were being saved to the Hive metastore; the metastore is case-insensitive, and the code was treating these tables as Hive-compatible if the data source had a Hive counterpart (e.g. for parquet). In this scenario, the schema could be corrupted when being updated from Spark if conflicting columns existed ignoring case. The change fixes this by making case-sensitive DS-tables not Hive-compatible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 I just checked the dependency size. They look pretty reasonable, roughly 2 MBs in total (although I do worry in the future whether ORC would bring in a lot more jars). cc @omalley any guidance on this topic? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org