[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19136 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19136 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81723/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/19222 [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks to choose several types of memory block ## What changes were proposed in this pull request? This PR allows us to use one of several types of `MemoryBlock`, such as byte array, int array, long array, or `java.nio.DirectByteBuffer`. To use `java.nio.DirectByteBuffer` allows to have off heap memory which is automatically deallocated by JVM. `spark.unsafe.Platform` interface refactored from indefinite Objects, to MemoryBlocks and arrays of primitives. This PR uses `MemoryBlock` for `OffHeapColumnVector`, `UTF8String`, and other places. For now, this PR does not use `MemoryBlock` for `BufferHolder` based on @cloud-fan's [suggestion](https://github.com/apache/spark/pull/11494#issuecomment-309694290). Many codes were ported from #11494. Many efforts were put here. I think this PR should credit to @yzotov. ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-10399 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19222.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19222 commit c2aa3b0d353cf850a79fb58891a7ad56a25f72cf Author: Kazuaki Ishizaki Date: 2017-09-13T10:16:19Z introduce ByteArrayMemoryBlock, IntArrayMemoryBlock, LongArrayMemoryBlock, and OffheaMemoryBlock commit e7fb6593a688dbfedbc9708bc0bf2d297509eb31 Author: Kazuaki Ishizaki Date: 2017-09-13T17:15:25Z OffHeapColumnVector uses UnsafeMemoryAllocator commit 2307f32e24aa8c4375d5bce4631bdb18fd70659e Author: Kazuaki Ishizaki Date: 2017-09-13T17:27:09Z UTF8String uses UnsafeMemoryAllocator commit b7ffa10e7fe359dd3efdae3d54d87db215ce0958 Author: Kazuaki Ishizaki Date: 2017-09-13T17:36:57Z Platform.copymemory() in UsafeInMemorySorter uses new MemoryBlock --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19136 **[Test build #81723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81723/testReport)** for PR 19136 at commit [`abcc606`](https://github.com/apache/spark/commit/abcc606e006e9975d1507eed379a48a3134165ad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r138694227 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -897,6 +897,80 @@ class SparkSubmitSuite sysProps("spark.submit.pyFiles") should (startWith("/")) } + test("handle remote http(s) resources in yarn mode") { +val hadoopConf = new Configuration() +updateConfWithFakeS3Fs(hadoopConf) + +val tmpDir = Utils.createTempDir() +val mainResource = File.createTempFile("tmpPy", ".py", tmpDir) +val tmpJar = TestUtils.createJarWithFiles(Map("test.resource" -> "USER"), tmpDir) +val tmpJarPath = s"s3a://${new File(tmpJar.toURI).getAbsolutePath}" +// This assumes UT environment could access external network. --- End diff -- It would be better if tests could avoid this... you could start a local http server, but that feels like a lot of work. Is there some way to mock the behavior instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r138692611 --- Diff: docs/running-on-yarn.md --- @@ -212,6 +212,14 @@ To use a custom metrics.properties for the application master and executors, upd + spark.yarn.dist.forceDownloadSchemes + (none) + +Comma-separated schemes in which remote resources have to download to local disk and upload --- End diff -- Better wording: Comma-separated list of schemes for which files will be downloaded to the local disk prior to being added to YARN's distributed cache. For use in cases where the YARN service does not support schemes that are supported by Spark. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r138689342 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -367,6 +368,53 @@ object SparkSubmit extends CommandLineUtils with Logging { }.orNull } +// When running in YARN cluster manager, --- End diff -- "When running in YARN cluster manager, ?" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r138694708 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -897,6 +897,80 @@ class SparkSubmitSuite sysProps("spark.submit.pyFiles") should (startWith("/")) } + test("handle remote http(s) resources in yarn mode") { --- End diff -- It seems you have 3 different tests in this block (at least), could you break them into separate tests? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r138689976 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -367,6 +368,53 @@ object SparkSubmit extends CommandLineUtils with Logging { }.orNull } +// When running in YARN cluster manager, +if (clusterManager == YARN) { + sparkConf.setIfMissing(SecurityManager.SPARK_AUTH_SECRET_CONF, "unused") + val secMgr = new SecurityManager(sparkConf) + val forceDownloadSchemes = sparkConf.get(FORCE_DOWNLOAD_SCHEMES) + + // Check the scheme list provided by "spark.yarn.dist.forceDownloadSchemes" to see if current + // resource's scheme is included in this list, or Hadoop FileSystem doesn't support current + // scheme, if so Spark will download the resources to local disk and upload to Hadoop FS. + def shouldDownload(scheme: String): Boolean = { +val isFsAvailable = Try { FileSystem.getFileSystemClass(scheme, hadoopConf) } + .map(_ => true).getOrElse(false) --- End diff -- `Try { ... }.isSuccess`? You could also avoid this call if the scheme is in the blacklist. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r138694417 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -897,6 +897,80 @@ class SparkSubmitSuite sysProps("spark.submit.pyFiles") should (startWith("/")) } + test("handle remote http(s) resources in yarn mode") { +val hadoopConf = new Configuration() +updateConfWithFakeS3Fs(hadoopConf) + +val tmpDir = Utils.createTempDir() +val mainResource = File.createTempFile("tmpPy", ".py", tmpDir) +val tmpJar = TestUtils.createJarWithFiles(Map("test.resource" -> "USER"), tmpDir) +val tmpJarPath = s"s3a://${new File(tmpJar.toURI).getAbsolutePath}" +// This assumes UT environment could access external network. +val remoteHttpJar = + "http://central.maven.org/maven2/io/dropwizard/metrics/metrics-core/"; + +"3.2.4/metrics-core-3.2.4.jar" + +val args = Seq( + "--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"), + "--name", "testApp", + "--master", "yarn", + "--deploy-mode", "client", + "--jars", s"$tmpJarPath,$remoteHttpJar", + s"s3a://$mainResource" +) + +val appArgs = new SparkSubmitArguments(args) +val sysProps = SparkSubmit.prepareSubmitEnvironment(appArgs, Some(hadoopConf))._3 + +// Resources in S3 should still be remote path, but remote http resource will be downloaded --- End diff -- ...still are... Also I'm not sure I understand the comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19130#discussion_r138693449 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -367,6 +368,53 @@ object SparkSubmit extends CommandLineUtils with Logging { }.orNull } +// When running in YARN cluster manager, +if (clusterManager == YARN) { + sparkConf.setIfMissing(SecurityManager.SPARK_AUTH_SECRET_CONF, "unused") + val secMgr = new SecurityManager(sparkConf) + val forceDownloadSchemes = sparkConf.get(FORCE_DOWNLOAD_SCHEMES) + + // Check the scheme list provided by "spark.yarn.dist.forceDownloadSchemes" to see if current + // resource's scheme is included in this list, or Hadoop FileSystem doesn't support current + // scheme, if so Spark will download the resources to local disk and upload to Hadoop FS. + def shouldDownload(scheme: String): Boolean = { +val isFsAvailable = Try { FileSystem.getFileSystemClass(scheme, hadoopConf) } + .map(_ => true).getOrElse(false) +forceDownloadSchemes.contains(scheme) || !isFsAvailable + } + + def downloadResource(resource: String): String = { +val uri = Utils.resolveURI(resource) +uri.getScheme match { + case "local" | "file" => resource + case e if shouldDownload(e) => +if (deployMode == CLIENT) { + // In client mode, we already download the resources, so figuring out the local one + // should be enough. + val fileName = new Path(uri).getName + new File(targetDir, fileName).toURI.toString +} else { + downloadFile(resource, targetDir, sparkConf, hadoopConf, secMgr) +} + case _ => uri.toString +} + } + + args.primaryResource = Option(args.primaryResource).map { downloadResource }.orNull + args.files = Option(args.files).map { files => +files.split(",").map(_.trim).filter(_.nonEmpty).map { downloadResource }.mkString(",") + }.orNull + args.pyFiles = Option(args.pyFiles).map { files => +files.split(",").map(_.trim).filter(_.nonEmpty).map { downloadResource }.mkString(",") + }.orNull + args.jars = Option(args.jars).map { files => +files.split(",").map(_.trim).filter(_.nonEmpty).map { downloadResource }.mkString(",") + }.orNull + args.archives = Option(args.archives).map { files => +files.split(",").map(_.trim).filter(_.nonEmpty).map { downloadResource }.mkString(",") + }.orNull --- End diff -- I was going to say this is missing `spark.yarn.dist.files` and `.jars`, but later those properties seem to be set based on `args.files` and `args.jars`. Which kinda raises the question of what happens when the user sets both. From the documentation it sounds like that should work (both sets of files get added), but from the code it seems `--files` and `--jars` would overwrite the `spark.yarn.*` configs... In any case, that's not the fault of your change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19221: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19221 **[Test build #81730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81730/testReport)** for PR 19221 at commit [`0b8d47a`](https://github.com/apache/spark/commit/0b8d47a982708839fc83f76b42a3527e66a69da5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19221: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.s...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19221 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19136 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19136 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81722/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19136 **[Test build #81722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81722/testReport)** for PR 19136 at commit [`1e86d5c`](https://github.com/apache/spark/commit/1e86d5ca445d732af6ac651d49d391d5cd012a92). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19204 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19204 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81729/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19204 **[Test build #81729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81729/testReport)** for PR 19204 at commit [`cd84d66`](https://github.com/apache/spark/commit/cd84d66151e710f1a262f081d0c578a12453374d). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19221: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19221 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19221: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiv...
GitHub user janewangfb opened a pull request: https://github.com/apache/spark/pull/19221 [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.scala ## What changes were proposed in this pull request? The code is already merged to master: https://github.com/apache/spark/pull/18975 This is a following up PR to merge HiveTmpFile.scala to SaveAsHiveFile. ## How was this patch tested? Build successfully You can merge this pull request into a Git repository by running: $ git pull https://github.com/janewangfb/spark merge_savehivefile_hivetmpfile Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19221.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19221 commit 0b8d47a982708839fc83f76b42a3527e66a69da5 Author: Jane Wang Date: 2017-09-13T17:35:06Z Merge HiveTmpFile.scala to SaveAsHiveFile.scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19204 **[Test build #81729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81729/testReport)** for PR 19204 at commit [`cd84d66`](https://github.com/apache/spark/commit/cd84d66151e710f1a262f081d0c578a12453374d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19211 **[Test build #81728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81728/testReport)** for PR 19211 at commit [`ad6ff49`](https://github.com/apache/spark/commit/ad6ff49de17c204e8d4feb775185a05d7fa9f53b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19211 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19211 (Nevermind the test failures, I killed the obsolete builds.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19211 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19211 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19211 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19211 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81724/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19211 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81726/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19211 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81725/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138681562 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.io.Closeable; + +/** + * A data reader returned by a read task and is responsible for outputting data for a RDD partition. + */ +public interface DataReader extends Closeable { --- End diff -- The initialization is done when creating this `DataReader` from a `ReadTask`. That ensures that the initialization happens (easy to forget `open()`) and simplifies the checks that need to be done because `DataReader` can't exist otherwise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19202: [SPARK-21980][SQL]References in grouping function...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19202 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19188#discussion_r138681442 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmarkArguments.scala --- @@ -32,6 +36,10 @@ class TPCDSQueryBenchmarkArguments(val args: Array[String]) { dataLocation = value args = tail +case ("--query-filter") :: value :: tail => + queryFilter = value.toLowerCase(Locale.ROOT).split(",").map(_.trim).toSet --- End diff -- Could you also make `"--data-location"` case insensitive? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r138681313 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import scala.language.existentials + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.hive.common.FileUtils +import org.apache.hadoop.hive.ql.plan.TableDesc +import org.apache.hadoop.hive.serde.serdeConstants +import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe +import org.apache.hadoop.mapred._ + +import org.apache.spark.SparkException +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.SparkPlan +import org.apache.spark.sql.hive.client.HiveClientImpl + +/** + * Command for writing the results of `query` to file system. + * + * The syntax of using this command in SQL is: + * {{{ + * INSERT OVERWRITE [LOCAL] DIRECTORY + * path + * [ROW FORMAT row_format] + * [STORED AS file_format] + * SELECT ... + * }}} + * + * @param isLocal whether the path specified in `storage` is a local directory + * @param storage storage format used to describe how the query result is stored. + * @param query the logical plan representing data to write to + * @param overwrite whether overwrites existing directory + */ +case class InsertIntoHiveDirCommand( +isLocal: Boolean, +storage: CatalogStorageFormat, +query: LogicalPlan, +overwrite: Boolean) extends SaveAsHiveFile with HiveTmpPath { --- End diff -- @cloud-fan and gatorsmile, I will merge them together and submit a PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19202 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19202 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r138680470 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import scala.language.existentials + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.hive.common.FileUtils +import org.apache.hadoop.hive.ql.plan.TableDesc +import org.apache.hadoop.hive.serde.serdeConstants +import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe +import org.apache.hadoop.mapred._ + +import org.apache.spark.SparkException +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.SparkPlan +import org.apache.spark.sql.hive.client.HiveClientImpl + +/** + * Command for writing the results of `query` to file system. + * + * The syntax of using this command in SQL is: + * {{{ + * INSERT OVERWRITE [LOCAL] DIRECTORY + * path + * [ROW FORMAT row_format] + * [STORED AS file_format] + * SELECT ... + * }}} + * + * @param isLocal whether the path specified in `storage` is a local directory + * @param storage storage format used to describe how the query result is stored. + * @param query the logical plan representing data to write to + * @param overwrite whether overwrites existing directory + */ +case class InsertIntoHiveDirCommand( +isLocal: Boolean, +storage: CatalogStorageFormat, +query: LogicalPlan, +overwrite: Boolean) extends SaveAsHiveFile with HiveTmpPath { --- End diff -- Sure, will submit a follow-up PR soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18266 @wangyum Could you update the example in the PR description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19204 **[Test build #81727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81727/testReport)** for PR 19204 at commit [`5a6f9b4`](https://github.com/apache/spark/commit/5a6f9b42e34025188b08fa0a0eefa4e2ddc68509). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19204 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81727/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19204 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19204 **[Test build #81727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81727/testReport)** for PR 19204 at commit [`5a6f9b4`](https://github.com/apache/spark/commit/5a6f9b42e34025188b08fa0a0eefa4e2ddc68509). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19204 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18266 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18266 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81719/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18266 **[Test build #81719 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81719/testReport)** for PR 18266 at commit [`1fdf002`](https://github.com/apache/spark/commit/1fdf002b64ed381b31b6a4ba721357c647b11772). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19211 **[Test build #81726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81726/testReport)** for PR 19211 at commit [`24f5c8d`](https://github.com/apache/spark/commit/24f5c8d0c78a8a362f4690ad03dac9dd07808f85). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19216: [SPARK-21990][SQL] QueryPlanConstraints misses some cons...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19216 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19216: [SPARK-21990][SQL] QueryPlanConstraints misses some cons...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19216 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81720/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19216: [SPARK-21990][SQL] QueryPlanConstraints misses some cons...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19216 **[Test build #81720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81720/testReport)** for PR 19216 at commit [`4e85e5f`](https://github.com/apache/spark/commit/4e85e5f6faa7903d72349f0fe69f5ea3d4df6070). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19211 **[Test build #81725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81725/testReport)** for PR 19211 at commit [`cf5c6ce`](https://github.com/apache/spark/commit/cf5c6ce74c185ebd90ea0f9040b177c64161). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19204 Thanks @WeichenXu123, I added it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19211 **[Test build #81724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81724/testReport)** for PR 19211 at commit [`2915a5e`](https://github.com/apache/spark/commit/2915a5ec1bd9d4bc7a40b0ad20ca5b0db8f5382e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16578 thanks @rxin! let's keep this going then. I'm sure we can get this ready for more folks to review in a couple of weeks. please feel free to ping this - will make sure to follow up. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138665881 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.io.Closeable; + +/** + * A data reader returned by a read task and is responsible for outputting data for a RDD partition. + */ +public interface DataReader extends Closeable { --- End diff -- Document this and link it back to whatever method it is. Also I'd still add an explicit init or open. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19136 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81717/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19136 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19136 **[Test build #81723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81723/testReport)** for PR 19136 at commit [`abcc606`](https://github.com/apache/spark/commit/abcc606e006e9975d1507eed379a48a3134165ad). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19136 **[Test build #81717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81717/testReport)** for PR 19136 at commit [`4ff1b18`](https://github.com/apache/spark/commit/4ff1b18d3db9f50ba7f3d31288d0da37736d6b5f). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class DataSourceV2Options ` * `class DataSourceRDDPartition(val index: Int, val readTask: ReadTask[UnsafeRow])` * `class DataSourceRDD(` * `case class DataSourceV2Relation(` * `case class DataSourceV2ScanExec(` * `class RowToUnsafeRowReadTask(rowReadTask: ReadTask[Row], schema: StructType)` * `class RowToUnsafeDataReader(rowReader: DataReader[Row], encoder: ExpressionEncoder[Row])` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19136 @yueawang these new push-downs are in my [prototype](https://github.com/cloud-fan/spark/pull/10). This PR is the first version of data source v2, so I'd like to cut down the patch size and only implement features that we already have in data source v1. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19136 **[Test build #81722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81722/testReport)** for PR 19136 at commit [`1e86d5c`](https://github.com/apache/spark/commit/1e86d5ca445d732af6ac651d49d391d5cd012a92). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138652705 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.io.Closeable; + +/** + * A data reader returned by a read task and is responsible for outputting data for a RDD partition. + */ +public interface DataReader extends Closeable { --- End diff -- currently it can be `Row`, `UnsafeRow`, `ColumnarBatch`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18337: [SPARK-21131][GraphX] Fix batch gradient bug in SVDPlusP...
Github user daniellaah commented on the issue: https://github.com/apache/spark/pull/18337 @lxmly deleted. I test on some private data, it turns out that the algorithm works well ... accidentally. ð --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11494: [SPARK-10399][CORE][SQL] Introduce OffHeapMemoryBlock to...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11494 Please go ahead. I tihink the author has gone inactive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19068 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81721/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19068 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19068 **[Test build #81721 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81721/testReport)** for PR 19068 at commit [`9682eab`](https://github.com/apache/spark/commit/9682eabd4184340745e54b9eef8ac878ca942ba3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19210: Fix Graphite re-connects for Graphite instances behind E...
Github user alexmnyc commented on the issue: https://github.com/apache/spark/pull/19210 @jerryshao thanks. it's done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19218 Could you add tests? Probably, you could insert some data then check if the data compressed by listing up files in temp dir? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11494: [SPARK-10399][CORE][SQL] Introduce OffHeapMemoryBlock to...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/11494 While we pinged @yzotov , @yzotov did not respond to us for a very long time. As @cloud-fan pointed out, this PR seems to be good refactoring. I am willing to continue this refactoring instead of @yzotov if no one expresses concerns. What do you think? cc: @HyukjinKwon , @cloud-fan , @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19068 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19068 **[Test build #81718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81718/testReport)** for PR 19068 at commit [`267a1b2`](https://github.com/apache/spark/commit/267a1b2f5bb83b4f20810f704105c0d996b71e93). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19068 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81718/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19068 **[Test build #81721 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81721/testReport)** for PR 19068 at commit [`9682eab`](https://github.com/apache/spark/commit/9682eabd4184340745e54b9eef8ac878ca942ba3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19216: [SPARK-21990][SQL] QueryPlanConstraints misses some cons...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19216 **[Test build #81720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81720/testReport)** for PR 19216 at commit [`4e85e5f`](https://github.com/apache/spark/commit/4e85e5f6faa7903d72349f0fe69f5ea3d4df6070). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState shoul...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/19068#discussion_r138625628 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -232,6 +232,54 @@ private[spark] object HiveUtils extends Logging { } /** + * Generate an instance of [[HiveConf]] from [[SparkConf]]& hadoop [[Configuration]] & + * formatted extra time configurations with an isolated classloader needed if isolationOn + * for [[HiveClient]] construction + * @param sparkConf a [[SparkConf]] object specifying Spark parameters + * @param classLoader an isolated classloader needed if isolationOn for [[HiveClient]] + *construction + * @param hadoopConf a hadoop [[Configuration]] object, Optional if we want generated it from + * the sparkConf + * @param extraTimeConfs time configurations in the form of long values from the given hadoopConf + */ + + private[hive] def newHiveConfigurations( + sparkConf: SparkConf = new SparkConf(loadDefaults = true), + classLoader: ClassLoader = null)( + hadoopConf: Configuration = SparkHadoopUtil.get.newConfiguration(sparkConf))( + extraTimeConfs: Map[String, String] = formatTimeVarsForHiveClient(hadoopConf)): HiveConf = { --- End diff -- OK --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2
Github user yueawang commented on the issue: https://github.com/apache/spark/pull/19136 @cloud-fan, Last week when I saw this PR, and I remember that you also implemented some new pushdowns like sort or limit, are they removed in this the latest commit?? Any concern? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState shoul...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/19068#discussion_r138625678 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -232,6 +232,54 @@ private[spark] object HiveUtils extends Logging { } /** + * Generate an instance of [[HiveConf]] from [[SparkConf]]& hadoop [[Configuration]] & + * formatted extra time configurations with an isolated classloader needed if isolationOn + * for [[HiveClient]] construction + * @param sparkConf a [[SparkConf]] object specifying Spark parameters + * @param classLoader an isolated classloader needed if isolationOn for [[HiveClient]] + *construction + * @param hadoopConf a hadoop [[Configuration]] object, Optional if we want generated it from + * the sparkConf + * @param extraTimeConfs time configurations in the form of long values from the given hadoopConf --- End diff -- OK --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19106 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19106 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81716/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19106 **[Test build #81716 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81716/testReport)** for PR 19106 at commit [`d661caa`](https://github.com/apache/spark/commit/d661caae8fbb7e09f7b862a045d7ddf0d086eb89). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138624261 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/Statistics.java --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.upward; --- End diff -- this package name is really confusing. maybe just put all of them in the v2.reader package. There isn't that many classes ... if you are worried about discoverability, use a common interface, or create a top level class and put the interfaces there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138623586 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/ColumnPruningSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.downward; + +import org.apache.spark.sql.types.StructType; + +/** + * A mix-in interface for `DataSourceV2Reader`. Users can implement this interface to only read the + * required columns/nested fields during scan. + */ +public interface ColumnPruningSupport { + + /** + * Apply column pruning w.r.t. the given requiredSchema. + * + * Implementation should try its best to prune the unnecessary columns/nested fields, but it's + * also OK to do the pruning partially, e.g., a data source may not be able to prune nested + * fields, and only prune top-level columns. + */ + void pruneColumns(StructType requiredSchema); --- End diff -- link this to readSchema function --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138622262 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.io.Closeable; + +/** + * A data reader returned by a read task and is responsible for outputting data for a RDD partition. + */ +public interface DataReader extends Closeable { --- End diff -- what can T be? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138622067 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SchemaRequiredDataSourceV2.java --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import org.apache.spark.sql.sources.v2.reader.DataSourceV2Reader; +import org.apache.spark.sql.types.StructType; + +/** + * A variant of `DataSourceV2` which requires users to provide a schema when reading data. A data + * source can inherit both `DataSourceV2` and `SchemaRequiredDataSourceV2` if it supports both schema + * inference and user-specified schemas. + */ +public interface SchemaRequiredDataSourceV2 { --- End diff -- I personally find this divergence at the top pretty confusing ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138621970 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SchemaRequiredDataSourceV2.java --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import org.apache.spark.sql.sources.v2.reader.DataSourceV2Reader; +import org.apache.spark.sql.types.StructType; + +/** + * A variant of `DataSourceV2` which requires users to provide a schema when reading data. A data + * source can inherit both `DataSourceV2` and `SchemaRequiredDataSourceV2` if it supports both schema + * inference and user-specified schemas. + */ +public interface SchemaRequiredDataSourceV2 { --- End diff -- what's an example of such data source? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138621700 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import java.util.HashMap; +import java.util.Locale; +import java.util.Map; +import java.util.Optional; + +/** + * An immutable case-insensitive string-to-string map, which is used to represent data source --- End diff -- we need to be clear that only the keys are case insensitive. the values are case preserving. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138621506 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import java.util.HashMap; +import java.util.Locale; +import java.util.Map; +import java.util.Optional; + +/** + * An immutable case-insensitive string-to-string map, which is used to represent data source + * options. + */ +public class DataSourceV2Options { --- End diff -- add a simple test suite for this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18266 **[Test build #81719 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81719/testReport)** for PR 18266 at commit [`1fdf002`](https://github.com/apache/spark/commit/1fdf002b64ed381b31b6a4ba721357c647b11772). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18266#discussion_r138621092 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala --- @@ -80,7 +80,7 @@ object JDBCRDD extends Logging { * @return A Catalyst schema corresponding to columns in the given order. */ private def pruneSchema(schema: StructType, columns: Array[String]): StructType = { -val fieldMap = Map(schema.fields.map(x => x.metadata.getString("name") -> x): _*) +val fieldMap = Map(schema.fields.map(x => x.name -> x): _*) --- End diff -- It seems safe to remove this line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16578 I tried this and this is definitely super useful! it's a big patch and most of the people working in this area are either doing something else that's not Spark, or working on a few high priority SPIPs (e.g. vectorized UDFs in Python, data source API v2), so it might take a bit for people to come around to review ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState shoul...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19068#discussion_r138619511 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -232,6 +232,54 @@ private[spark] object HiveUtils extends Logging { } /** + * Generate an instance of [[HiveConf]] from [[SparkConf]]& hadoop [[Configuration]] & + * formatted extra time configurations with an isolated classloader needed if isolationOn + * for [[HiveClient]] construction + * @param sparkConf a [[SparkConf]] object specifying Spark parameters + * @param classLoader an isolated classloader needed if isolationOn for [[HiveClient]] + *construction + * @param hadoopConf a hadoop [[Configuration]] object, Optional if we want generated it from + * the sparkConf + * @param extraTimeConfs time configurations in the form of long values from the given hadoopConf + */ + + private[hive] def newHiveConfigurations( + sparkConf: SparkConf = new SparkConf(loadDefaults = true), + classLoader: ClassLoader = null)( + hadoopConf: Configuration = SparkHadoopUtil.get.newConfiguration(sparkConf))( + extraTimeConfs: Map[String, String] = formatTimeVarsForHiveClient(hadoopConf)): HiveConf = { --- End diff -- How about we remove these default values and explicitly specify them in https://github.com/apache/spark/pull/19068/files#diff-f7aac41bf732c1ba1edbac436d331a55R84? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState shoul...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19068#discussion_r138615099 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -232,6 +232,54 @@ private[spark] object HiveUtils extends Logging { } /** + * Generate an instance of [[HiveConf]] from [[SparkConf]]& hadoop [[Configuration]] & + * formatted extra time configurations with an isolated classloader needed if isolationOn + * for [[HiveClient]] construction + * @param sparkConf a [[SparkConf]] object specifying Spark parameters + * @param classLoader an isolated classloader needed if isolationOn for [[HiveClient]] + *construction + * @param hadoopConf a hadoop [[Configuration]] object, Optional if we want generated it from + * the sparkConf + * @param extraTimeConfs time configurations in the form of long values from the given hadoopConf --- End diff -- it's not only time configs, I think we'd better call it `config`, following `IsolatedClientLoader.config` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19068 **[Test build #81718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81718/testReport)** for PR 19068 at commit [`267a1b2`](https://github.com/apache/spark/commit/267a1b2f5bb83b4f20810f704105c0d996b71e93). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19182: [SPARK-21970][Core] Fix Redundant Throws Declarat...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19182 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19182: [SPARK-21970][Core] Fix Redundant Throws Declarations in...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19182 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19068 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19136 **[Test build #81717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81717/testReport)** for PR 19136 at commit [`4ff1b18`](https://github.com/apache/spark/commit/4ff1b18d3db9f50ba7f3d31288d0da37736d6b5f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19220: [SPARK-18608][ML][FOLLOWUP] Fix double caching for PySpa...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/19220 cc @zhengruifeng @jkbradley @WeichenXu123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org