[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19136
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19136
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81723/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-09-13 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/19222

[SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks to choose several 
types of memory block

## What changes were proposed in this pull request?

This PR allows us to use one of several types of `MemoryBlock`, such as 
byte array, int array, long array, or `java.nio.DirectByteBuffer`. To use 
`java.nio.DirectByteBuffer` allows to have off heap memory which is 
automatically deallocated by JVM. `spark.unsafe.Platform`  interface refactored 
from indefinite Objects, to MemoryBlocks and arrays of primitives. 

This PR uses `MemoryBlock` for `OffHeapColumnVector`, `UTF8String`, and 
other places.

For now, this PR does not use `MemoryBlock` for `BufferHolder` based on 
@cloud-fan's 
[suggestion](https://github.com/apache/spark/pull/11494#issuecomment-309694290).


Many codes were ported from #11494. Many efforts were put here. I think 
this PR should credit to @yzotov.

## How was this patch tested?

Existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-10399

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19222


commit c2aa3b0d353cf850a79fb58891a7ad56a25f72cf
Author: Kazuaki Ishizaki 
Date:   2017-09-13T10:16:19Z

introduce ByteArrayMemoryBlock, IntArrayMemoryBlock, LongArrayMemoryBlock, 
and OffheaMemoryBlock

commit e7fb6593a688dbfedbc9708bc0bf2d297509eb31
Author: Kazuaki Ishizaki 
Date:   2017-09-13T17:15:25Z

OffHeapColumnVector uses UnsafeMemoryAllocator

commit 2307f32e24aa8c4375d5bce4631bdb18fd70659e
Author: Kazuaki Ishizaki 
Date:   2017-09-13T17:27:09Z

UTF8String uses UnsafeMemoryAllocator

commit b7ffa10e7fe359dd3efdae3d54d87db215ce0958
Author: Kazuaki Ishizaki 
Date:   2017-09-13T17:36:57Z

Platform.copymemory() in UsafeInMemorySorter uses new MemoryBlock




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19136
  
**[Test build #81723 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81723/testReport)**
 for PR 19136 at commit 
[`abcc606`](https://github.com/apache/spark/commit/abcc606e006e9975d1507eed379a48a3134165ad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...

2017-09-13 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19130#discussion_r138694227
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -897,6 +897,80 @@ class SparkSubmitSuite
 sysProps("spark.submit.pyFiles") should (startWith("/"))
   }
 
+  test("handle remote http(s) resources in yarn mode") {
+val hadoopConf = new Configuration()
+updateConfWithFakeS3Fs(hadoopConf)
+
+val tmpDir = Utils.createTempDir()
+val mainResource = File.createTempFile("tmpPy", ".py", tmpDir)
+val tmpJar = TestUtils.createJarWithFiles(Map("test.resource" -> 
"USER"), tmpDir)
+val tmpJarPath = s"s3a://${new File(tmpJar.toURI).getAbsolutePath}"
+// This assumes UT environment could access external network.
--- End diff --

It would be better if tests could avoid this... you could start a local 
http server, but that feels like a lot of work. Is there some way to mock the 
behavior instead?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...

2017-09-13 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19130#discussion_r138692611
  
--- Diff: docs/running-on-yarn.md ---
@@ -212,6 +212,14 @@ To use a custom metrics.properties for the application 
master and executors, upd
   
 
 
+  spark.yarn.dist.forceDownloadSchemes
+  (none)
+  
+Comma-separated schemes in which remote resources have to download to 
local disk and upload 
--- End diff --

Better wording:

Comma-separated list of schemes for which files will be downloaded to the 
local disk prior to being added to YARN's distributed cache. For use in cases 
where the YARN service does not support schemes that are supported by Spark.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...

2017-09-13 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19130#discussion_r138689342
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -367,6 +368,53 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
   }.orNull
 }
 
+// When running in YARN cluster manager,
--- End diff --

"When running in YARN cluster manager, ?"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...

2017-09-13 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19130#discussion_r138694708
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -897,6 +897,80 @@ class SparkSubmitSuite
 sysProps("spark.submit.pyFiles") should (startWith("/"))
   }
 
+  test("handle remote http(s) resources in yarn mode") {
--- End diff --

It seems you have 3 different tests in this block (at least), could you 
break them into separate tests?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...

2017-09-13 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19130#discussion_r138689976
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -367,6 +368,53 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
   }.orNull
 }
 
+// When running in YARN cluster manager,
+if (clusterManager == YARN) {
+  sparkConf.setIfMissing(SecurityManager.SPARK_AUTH_SECRET_CONF, 
"unused")
+  val secMgr = new SecurityManager(sparkConf)
+  val forceDownloadSchemes = sparkConf.get(FORCE_DOWNLOAD_SCHEMES)
+
+  // Check the scheme list provided by 
"spark.yarn.dist.forceDownloadSchemes" to see if current
+  // resource's scheme is included in this list, or Hadoop FileSystem 
doesn't support current
+  // scheme, if so Spark will download the resources to local disk and 
upload to Hadoop FS.
+  def shouldDownload(scheme: String): Boolean = {
+val isFsAvailable = Try { FileSystem.getFileSystemClass(scheme, 
hadoopConf) }
+  .map(_ => true).getOrElse(false)
--- End diff --

`Try { ... }.isSuccess`? You could also avoid this call if the scheme is in 
the blacklist.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...

2017-09-13 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19130#discussion_r138694417
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -897,6 +897,80 @@ class SparkSubmitSuite
 sysProps("spark.submit.pyFiles") should (startWith("/"))
   }
 
+  test("handle remote http(s) resources in yarn mode") {
+val hadoopConf = new Configuration()
+updateConfWithFakeS3Fs(hadoopConf)
+
+val tmpDir = Utils.createTempDir()
+val mainResource = File.createTempFile("tmpPy", ".py", tmpDir)
+val tmpJar = TestUtils.createJarWithFiles(Map("test.resource" -> 
"USER"), tmpDir)
+val tmpJarPath = s"s3a://${new File(tmpJar.toURI).getAbsolutePath}"
+// This assumes UT environment could access external network.
+val remoteHttpJar =
+  
"http://central.maven.org/maven2/io/dropwizard/metrics/metrics-core/"; +
+"3.2.4/metrics-core-3.2.4.jar"
+
+val args = Seq(
+  "--class", UserClasspathFirstTest.getClass.getName.stripPrefix("$"),
+  "--name", "testApp",
+  "--master", "yarn",
+  "--deploy-mode", "client",
+  "--jars", s"$tmpJarPath,$remoteHttpJar",
+  s"s3a://$mainResource"
+)
+
+val appArgs = new SparkSubmitArguments(args)
+val sysProps = SparkSubmit.prepareSubmitEnvironment(appArgs, 
Some(hadoopConf))._3
+
+// Resources in S3 should still be remote path, but remote http 
resource will be downloaded
--- End diff --

...still are...

Also I'm not sure I understand the comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19130: [SPARK-21917][CORE][YARN] Supporting adding http(...

2017-09-13 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19130#discussion_r138693449
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -367,6 +368,53 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
   }.orNull
 }
 
+// When running in YARN cluster manager,
+if (clusterManager == YARN) {
+  sparkConf.setIfMissing(SecurityManager.SPARK_AUTH_SECRET_CONF, 
"unused")
+  val secMgr = new SecurityManager(sparkConf)
+  val forceDownloadSchemes = sparkConf.get(FORCE_DOWNLOAD_SCHEMES)
+
+  // Check the scheme list provided by 
"spark.yarn.dist.forceDownloadSchemes" to see if current
+  // resource's scheme is included in this list, or Hadoop FileSystem 
doesn't support current
+  // scheme, if so Spark will download the resources to local disk and 
upload to Hadoop FS.
+  def shouldDownload(scheme: String): Boolean = {
+val isFsAvailable = Try { FileSystem.getFileSystemClass(scheme, 
hadoopConf) }
+  .map(_ => true).getOrElse(false)
+forceDownloadSchemes.contains(scheme) || !isFsAvailable
+  }
+
+  def downloadResource(resource: String): String = {
+val uri = Utils.resolveURI(resource)
+uri.getScheme match {
+  case "local" | "file" => resource
+  case e if shouldDownload(e) =>
+if (deployMode == CLIENT) {
+  // In client mode, we already download the resources, so 
figuring out the local one
+  // should be enough.
+  val fileName = new Path(uri).getName
+  new File(targetDir, fileName).toURI.toString
+} else {
+  downloadFile(resource, targetDir, sparkConf, hadoopConf, 
secMgr)
+}
+  case _ => uri.toString
+}
+  }
+
+  args.primaryResource = Option(args.primaryResource).map { 
downloadResource }.orNull
+  args.files = Option(args.files).map { files =>
+files.split(",").map(_.trim).filter(_.nonEmpty).map { 
downloadResource }.mkString(",")
+  }.orNull
+  args.pyFiles = Option(args.pyFiles).map { files =>
+files.split(",").map(_.trim).filter(_.nonEmpty).map { 
downloadResource }.mkString(",")
+  }.orNull
+  args.jars = Option(args.jars).map { files =>
+files.split(",").map(_.trim).filter(_.nonEmpty).map { 
downloadResource }.mkString(",")
+  }.orNull
+  args.archives = Option(args.archives).map { files =>
+files.split(",").map(_.trim).filter(_.nonEmpty).map { 
downloadResource }.mkString(",")
+  }.orNull
--- End diff --

I was going to say this is missing `spark.yarn.dist.files` and `.jars`, but 
later those properties seem to be set based on `args.files` and `args.jars`.

Which kinda raises the question of what happens when the user sets both. 
From the documentation it sounds like that should work (both sets of files get 
added), but from the code it seems `--files` and `--jars` would overwrite the 
`spark.yarn.*` configs...

In any case, that's not the fault of your change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19221: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.s...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19221
  
**[Test build #81730 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81730/testReport)**
 for PR 19221 at commit 
[`0b8d47a`](https://github.com/apache/spark/commit/0b8d47a982708839fc83f76b42a3527e66a69da5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19221: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.s...

2017-09-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19221
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19136
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19136
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81722/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19136
  
**[Test build #81722 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81722/testReport)**
 for PR 19136 at commit 
[`1e86d5c`](https://github.com/apache/spark/commit/1e86d5ca445d732af6ac651d49d391d5cd012a92).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19204
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19204
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81729/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19204
  
**[Test build #81729 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81729/testReport)**
 for PR 19204 at commit 
[`cd84d66`](https://github.com/apache/spark/commit/cd84d66151e710f1a262f081d0c578a12453374d).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19221: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.s...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19221
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19221: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiv...

2017-09-13 Thread janewangfb
GitHub user janewangfb opened a pull request:

https://github.com/apache/spark/pull/19221

[SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.scala

## What changes were proposed in this pull request?

The code is already merged to master:
https://github.com/apache/spark/pull/18975

This is a following up PR to merge HiveTmpFile.scala to SaveAsHiveFile.

## How was this patch tested?

Build successfully


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/janewangfb/spark 
merge_savehivefile_hivetmpfile

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19221.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19221


commit 0b8d47a982708839fc83f76b42a3527e66a69da5
Author: Jane Wang 
Date:   2017-09-13T17:35:06Z

Merge HiveTmpFile.scala to SaveAsHiveFile.scala




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19204
  
**[Test build #81729 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81729/testReport)**
 for PR 19204 at commit 
[`cd84d66`](https://github.com/apache/spark/commit/cd84d66151e710f1a262f081d0c578a12453374d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19211
  
**[Test build #81728 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81728/testReport)**
 for PR 19211 at commit 
[`ad6ff49`](https://github.com/apache/spark/commit/ad6ff49de17c204e8d4feb775185a05d7fa9f53b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19211
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19211
  
(Nevermind the test failures, I killed the obsolete builds.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19211
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19211
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19211
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19211
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81724/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19211
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81726/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19211
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81725/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138681562
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java 
---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import java.io.Closeable;
+
+/**
+ * A data reader returned by a read task and is responsible for outputting 
data for a RDD partition.
+ */
+public interface DataReader extends Closeable {
--- End diff --

The initialization is done when creating this `DataReader` from a 
`ReadTask`. That ensures that the initialization happens (easy to forget 
`open()`) and simplifies the checks that need to be done because `DataReader` 
can't exist otherwise.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19202: [SPARK-21980][SQL]References in grouping function...

2017-09-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19202


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...

2017-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19188#discussion_r138681442
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmarkArguments.scala
 ---
@@ -32,6 +36,10 @@ class TPCDSQueryBenchmarkArguments(val args: 
Array[String]) {
   dataLocation = value
   args = tail
 
+case ("--query-filter") :: value :: tail =>
+  queryFilter = 
value.toLowerCase(Locale.ROOT).split(",").map(_.trim).toSet
--- End diff --

Could you also make `"--data-location"` case insensitive?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-13 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r138681313
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import scala.language.existentials
+
+import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.hadoop.hive.common.FileUtils
+import org.apache.hadoop.hive.ql.plan.TableDesc
+import org.apache.hadoop.hive.serde.serdeConstants
+import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
+import org.apache.hadoop.mapred._
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.hive.client.HiveClientImpl
+
+/**
+ * Command for writing the results of `query` to file system.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   INSERT OVERWRITE [LOCAL] DIRECTORY
+ *   path
+ *   [ROW FORMAT row_format]
+ *   [STORED AS file_format]
+ *   SELECT ...
+ * }}}
+ *
+ * @param isLocal whether the path specified in `storage` is a local 
directory
+ * @param storage storage format used to describe how the query result is 
stored.
+ * @param query the logical plan representing data to write to
+ * @param overwrite whether overwrites existing directory
+ */
+case class InsertIntoHiveDirCommand(
+isLocal: Boolean,
+storage: CatalogStorageFormat,
+query: LogicalPlan,
+overwrite: Boolean) extends SaveAsHiveFile with HiveTmpPath {
--- End diff --

@cloud-fan and gatorsmile, I will merge them together and submit a PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19202
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19202
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r138680470
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import scala.language.existentials
+
+import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.hadoop.hive.common.FileUtils
+import org.apache.hadoop.hive.ql.plan.TableDesc
+import org.apache.hadoop.hive.serde.serdeConstants
+import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
+import org.apache.hadoop.mapred._
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.hive.client.HiveClientImpl
+
+/**
+ * Command for writing the results of `query` to file system.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   INSERT OVERWRITE [LOCAL] DIRECTORY
+ *   path
+ *   [ROW FORMAT row_format]
+ *   [STORED AS file_format]
+ *   SELECT ...
+ * }}}
+ *
+ * @param isLocal whether the path specified in `storage` is a local 
directory
+ * @param storage storage format used to describe how the query result is 
stored.
+ * @param query the logical plan representing data to write to
+ * @param overwrite whether overwrites existing directory
+ */
+case class InsertIntoHiveDirCommand(
+isLocal: Boolean,
+storage: CatalogStorageFormat,
+query: LogicalPlan,
+overwrite: Boolean) extends SaveAsHiveFile with HiveTmpPath {
--- End diff --

Sure, will submit a follow-up PR soon. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-09-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18266
  
@wangyum Could you update the example in the PR description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19204
  
**[Test build #81727 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81727/testReport)**
 for PR 19204 at commit 
[`5a6f9b4`](https://github.com/apache/spark/commit/5a6f9b42e34025188b08fa0a0eefa4e2ddc68509).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19204
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81727/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19204
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19204
  
**[Test build #81727 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81727/testReport)**
 for PR 19204 at commit 
[`5a6f9b4`](https://github.com/apache/spark/commit/5a6f9b42e34025188b08fa0a0eefa4e2ddc68509).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19204
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18266
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18266
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81719/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18266
  
**[Test build #81719 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81719/testReport)**
 for PR 18266 at commit 
[`1fdf002`](https://github.com/apache/spark/commit/1fdf002b64ed381b31b6a4ba721357c647b11772).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19211
  
**[Test build #81726 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81726/testReport)**
 for PR 19211 at commit 
[`24f5c8d`](https://github.com/apache/spark/commit/24f5c8d0c78a8a362f4690ad03dac9dd07808f85).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19216: [SPARK-21990][SQL] QueryPlanConstraints misses some cons...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19216
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19216: [SPARK-21990][SQL] QueryPlanConstraints misses some cons...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19216
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81720/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19216: [SPARK-21990][SQL] QueryPlanConstraints misses some cons...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19216
  
**[Test build #81720 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81720/testReport)**
 for PR 19216 at commit 
[`4e85e5f`](https://github.com/apache/spark/commit/4e85e5f6faa7903d72349f0fe69f5ea3d4df6070).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19211
  
**[Test build #81725 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81725/testReport)**
 for PR 19211 at commit 
[`cf5c6ce`](https://github.com/apache/spark/commit/cf5c6ce74c185ebd90ea0f9040b177c64161).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19204
  
Thanks @WeichenXu123, I added it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19211
  
**[Test build #81724 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81724/testReport)**
 for PR 19211 at commit 
[`2915a5e`](https://github.com/apache/spark/commit/2915a5ec1bd9d4bc7a40b0ad20ca5b0db8f5382e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-13 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16578
  
thanks @rxin! let's keep this going then. I'm sure we can get this ready 
for more folks to review in a couple of weeks.

please feel free to ping this - will make sure to follow up.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138665881
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java 
---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import java.io.Closeable;
+
+/**
+ * A data reader returned by a read task and is responsible for outputting 
data for a RDD partition.
+ */
+public interface DataReader extends Closeable {
--- End diff --

Document this and link it back to whatever method it is.

Also I'd still add an explicit init or open.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19136
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81717/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19136
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19136
  
**[Test build #81723 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81723/testReport)**
 for PR 19136 at commit 
[`abcc606`](https://github.com/apache/spark/commit/abcc606e006e9975d1507eed379a48a3134165ad).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19136
  
**[Test build #81717 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81717/testReport)**
 for PR 19136 at commit 
[`4ff1b18`](https://github.com/apache/spark/commit/4ff1b18d3db9f50ba7f3d31288d0da37736d6b5f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class DataSourceV2Options `
  * `class DataSourceRDDPartition(val index: Int, val readTask: 
ReadTask[UnsafeRow])`
  * `class DataSourceRDD(`
  * `case class DataSourceV2Relation(`
  * `case class DataSourceV2ScanExec(`
  * `class RowToUnsafeRowReadTask(rowReadTask: ReadTask[Row], schema: 
StructType)`
  * `class RowToUnsafeDataReader(rowReader: DataReader[Row], encoder: 
ExpressionEncoder[Row])`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19136
  
@yueawang these new push-downs are in my 
[prototype](https://github.com/cloud-fan/spark/pull/10). This PR is the first 
version of data source v2, so I'd like to cut down the patch size and only 
implement features that we already have in data source v1.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19136
  
**[Test build #81722 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81722/testReport)**
 for PR 19136 at commit 
[`1e86d5c`](https://github.com/apache/spark/commit/1e86d5ca445d732af6ac651d49d391d5cd012a92).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2 read path

2017-09-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138652705
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java 
---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import java.io.Closeable;
+
+/**
+ * A data reader returned by a read task and is responsible for outputting 
data for a RDD partition.
+ */
+public interface DataReader extends Closeable {
--- End diff --

currently it can be `Row`, `UnsafeRow`, `ColumnarBatch`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18337: [SPARK-21131][GraphX] Fix batch gradient bug in SVDPlusP...

2017-09-13 Thread daniellaah
Github user daniellaah commented on the issue:

https://github.com/apache/spark/pull/18337
  
@lxmly deleted. I test on some private data, it turns out that the 
algorithm works well ... accidentally. 😂


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11494: [SPARK-10399][CORE][SQL] Introduce OffHeapMemoryBlock to...

2017-09-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/11494
  
Please go ahead. I tihink the author has gone inactive. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81721/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19068
  
**[Test build #81721 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81721/testReport)**
 for PR 19068 at commit 
[`9682eab`](https://github.com/apache/spark/commit/9682eabd4184340745e54b9eef8ac878ca942ba3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19210: Fix Graphite re-connects for Graphite instances behind E...

2017-09-13 Thread alexmnyc
Github user alexmnyc commented on the issue:

https://github.com/apache/spark/pull/19210
  
@jerryshao thanks. it's done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-09-13 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19218
  
Could you add tests? Probably, you could insert some data then check if the 
data compressed by listing up files in temp dir?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11494: [SPARK-10399][CORE][SQL] Introduce OffHeapMemoryBlock to...

2017-09-13 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/11494
  
While we pinged @yzotov , @yzotov did not respond to us for a very long 
time. As @cloud-fan pointed out, this PR seems to be good refactoring. I am 
willing to continue this refactoring instead of @yzotov if no one expresses 
concerns.
What do you think? cc: @HyukjinKwon , @cloud-fan , @jiangxb1987 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19068
  
**[Test build #81718 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81718/testReport)**
 for PR 19068 at commit 
[`267a1b2`](https://github.com/apache/spark/commit/267a1b2f5bb83b4f20810f704105c0d996b71e93).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81718/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19068
  
**[Test build #81721 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81721/testReport)**
 for PR 19068 at commit 
[`9682eab`](https://github.com/apache/spark/commit/9682eabd4184340745e54b9eef8ac878ca942ba3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19216: [SPARK-21990][SQL] QueryPlanConstraints misses some cons...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19216
  
**[Test build #81720 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81720/testReport)**
 for PR 19216 at commit 
[`4e85e5f`](https://github.com/apache/spark/commit/4e85e5f6faa7903d72349f0fe69f5ea3d4df6070).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState shoul...

2017-09-13 Thread yaooqinn
Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/19068#discussion_r138625628
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -232,6 +232,54 @@ private[spark] object HiveUtils extends Logging {
   }
 
   /**
+   * Generate an instance of [[HiveConf]] from [[SparkConf]]& hadoop 
[[Configuration]] &
+   * formatted extra time configurations with an isolated classloader 
needed if isolationOn
+   * for [[HiveClient]] construction
+   * @param sparkConf a [[SparkConf]] object specifying Spark parameters
+   * @param classLoader an isolated classloader needed if isolationOn for 
[[HiveClient]]
+   *construction
+   * @param hadoopConf a hadoop [[Configuration]] object, Optional if we 
want generated it from
+   *   the sparkConf
+   * @param extraTimeConfs time configurations in the form of long values 
from the given hadoopConf
+   */
+
+  private[hive] def newHiveConfigurations(
+  sparkConf: SparkConf = new SparkConf(loadDefaults = true),
+  classLoader: ClassLoader = null)(
+  hadoopConf: Configuration = 
SparkHadoopUtil.get.newConfiguration(sparkConf))(
+  extraTimeConfs: Map[String, String] = 
formatTimeVarsForHiveClient(hadoopConf)): HiveConf = {
--- End diff --

OK



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread yueawang
Github user yueawang commented on the issue:

https://github.com/apache/spark/pull/19136
  
@cloud-fan, Last week when I saw this PR, and I remember that you also 
implemented some new pushdowns like sort or limit, are they removed in this the 
latest commit?? Any concern? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState shoul...

2017-09-13 Thread yaooqinn
Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/19068#discussion_r138625678
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -232,6 +232,54 @@ private[spark] object HiveUtils extends Logging {
   }
 
   /**
+   * Generate an instance of [[HiveConf]] from [[SparkConf]]& hadoop 
[[Configuration]] &
+   * formatted extra time configurations with an isolated classloader 
needed if isolationOn
+   * for [[HiveClient]] construction
+   * @param sparkConf a [[SparkConf]] object specifying Spark parameters
+   * @param classLoader an isolated classloader needed if isolationOn for 
[[HiveClient]]
+   *construction
+   * @param hadoopConf a hadoop [[Configuration]] object, Optional if we 
want generated it from
+   *   the sparkConf
+   * @param extraTimeConfs time configurations in the form of long values 
from the given hadoopConf
--- End diff --

OK



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19106
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...

2017-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19106
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81716/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19106
  
**[Test build #81716 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81716/testReport)**
 for PR 19106 at commit 
[`d661caa`](https://github.com/apache/spark/commit/d661caae8fbb7e09f7b862a045d7ddf0d086eb89).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138624261
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/Statistics.java
 ---
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader.upward;
--- End diff --

this package name is really confusing. maybe just put all of them in the 
v2.reader package. There isn't that many classes ... if you are worried about 
discoverability, use a common interface, or create a top level class and put 
the interfaces there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138623586
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/ColumnPruningSupport.java
 ---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader.downward;
+
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * A mix-in interface for `DataSourceV2Reader`. Users can implement this 
interface to only read the
+ * required columns/nested fields during scan.
+ */
+public interface ColumnPruningSupport {
+
+  /**
+   * Apply column pruning w.r.t. the given requiredSchema.
+   *
+   * Implementation should try its best to prune the unnecessary 
columns/nested fields, but it's
+   * also OK to do the pruning partially, e.g., a data source may not be 
able to prune nested
+   * fields, and only prune top-level columns.
+   */
+  void pruneColumns(StructType requiredSchema);
--- End diff --

link this to readSchema function


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138622262
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java 
---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import java.io.Closeable;
+
+/**
+ * A data reader returned by a read task and is responsible for outputting 
data for a RDD partition.
+ */
+public interface DataReader extends Closeable {
--- End diff --

what can T be?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138622067
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SchemaRequiredDataSourceV2.java
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import org.apache.spark.sql.sources.v2.reader.DataSourceV2Reader;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * A variant of `DataSourceV2` which requires users to provide a schema 
when reading data. A data
+ * source can inherit both `DataSourceV2` and `SchemaRequiredDataSourceV2` 
if it supports both schema
+ * inference and user-specified schemas.
+ */
+public interface SchemaRequiredDataSourceV2 {
--- End diff --

I personally find this divergence at the top pretty confusing ...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138621970
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SchemaRequiredDataSourceV2.java
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import org.apache.spark.sql.sources.v2.reader.DataSourceV2Reader;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * A variant of `DataSourceV2` which requires users to provide a schema 
when reading data. A data
+ * source can inherit both `DataSourceV2` and `SchemaRequiredDataSourceV2` 
if it supports both schema
+ * inference and user-specified schemas.
+ */
+public interface SchemaRequiredDataSourceV2 {
--- End diff --

what's an example of such data source?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138621700
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java 
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import java.util.HashMap;
+import java.util.Locale;
+import java.util.Map;
+import java.util.Optional;
+
+/**
+ * An immutable case-insensitive string-to-string map, which is used to 
represent data source
--- End diff --

we need to be clear that only the keys are case insensitive. the values are 
case preserving.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138621506
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2Options.java 
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2;
+
+import java.util.HashMap;
+import java.util.Locale;
+import java.util.Map;
+import java.util.Optional;
+
+/**
+ * An immutable case-insensitive string-to-string map, which is used to 
represent data source
+ * options.
+ */
+public class DataSourceV2Options {
--- End diff --

add a simple test suite for this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18266
  
**[Test build #81719 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81719/testReport)**
 for PR 18266 at commit 
[`1fdf002`](https://github.com/apache/spark/commit/1fdf002b64ed381b31b6a4ba721357c647b11772).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...

2017-09-13 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18266#discussion_r138621092
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
 ---
@@ -80,7 +80,7 @@ object JDBCRDD extends Logging {
* @return A Catalyst schema corresponding to columns in the given order.
*/
   private def pruneSchema(schema: StructType, columns: Array[String]): 
StructType = {
-val fieldMap = Map(schema.fields.map(x => x.metadata.getString("name") 
-> x): _*)
+val fieldMap = Map(schema.fields.map(x => x.name -> x): _*)
--- End diff --

It seems safe to remove this line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-13 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16578
  
I tried this and this is definitely super useful! it's a big patch and most 
of the people working in this area are either doing something else that's not 
Spark, or working on a few high priority SPIPs (e.g. vectorized UDFs in Python, 
data source API v2), so it might take a bit for people to come around to review 
...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState shoul...

2017-09-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19068#discussion_r138619511
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -232,6 +232,54 @@ private[spark] object HiveUtils extends Logging {
   }
 
   /**
+   * Generate an instance of [[HiveConf]] from [[SparkConf]]& hadoop 
[[Configuration]] &
+   * formatted extra time configurations with an isolated classloader 
needed if isolationOn
+   * for [[HiveClient]] construction
+   * @param sparkConf a [[SparkConf]] object specifying Spark parameters
+   * @param classLoader an isolated classloader needed if isolationOn for 
[[HiveClient]]
+   *construction
+   * @param hadoopConf a hadoop [[Configuration]] object, Optional if we 
want generated it from
+   *   the sparkConf
+   * @param extraTimeConfs time configurations in the form of long values 
from the given hadoopConf
+   */
+
+  private[hive] def newHiveConfigurations(
+  sparkConf: SparkConf = new SparkConf(loadDefaults = true),
+  classLoader: ClassLoader = null)(
+  hadoopConf: Configuration = 
SparkHadoopUtil.get.newConfiguration(sparkConf))(
+  extraTimeConfs: Map[String, String] = 
formatTimeVarsForHiveClient(hadoopConf)): HiveConf = {
--- End diff --

How about we remove these default values and explicitly specify them in 
https://github.com/apache/spark/pull/19068/files#diff-f7aac41bf732c1ba1edbac436d331a55R84?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState shoul...

2017-09-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19068#discussion_r138615099
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -232,6 +232,54 @@ private[spark] object HiveUtils extends Logging {
   }
 
   /**
+   * Generate an instance of [[HiveConf]] from [[SparkConf]]& hadoop 
[[Configuration]] &
+   * formatted extra time configurations with an isolated classloader 
needed if isolationOn
+   * for [[HiveClient]] construction
+   * @param sparkConf a [[SparkConf]] object specifying Spark parameters
+   * @param classLoader an isolated classloader needed if isolationOn for 
[[HiveClient]]
+   *construction
+   * @param hadoopConf a hadoop [[Configuration]] object, Optional if we 
want generated it from
+   *   the sparkConf
+   * @param extraTimeConfs time configurations in the form of long values 
from the given hadoopConf
--- End diff --

it's not only time configs, I think we'd better call it `config`, following 
`IsolatedClientLoader.config`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19068
  
**[Test build #81718 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81718/testReport)**
 for PR 19068 at commit 
[`267a1b2`](https://github.com/apache/spark/commit/267a1b2f5bb83b4f20810f704105c0d996b71e93).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19182: [SPARK-21970][Core] Fix Redundant Throws Declarat...

2017-09-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19182


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19182: [SPARK-21970][Core] Fix Redundant Throws Declarations in...

2017-09-13 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19182
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19068
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19136: [SPARK-15689][SQL] data source v2

2017-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19136
  
**[Test build #81717 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81717/testReport)**
 for PR 19136 at commit 
[`4ff1b18`](https://github.com/apache/spark/commit/4ff1b18d3db9f50ba7f3d31288d0da37736d6b5f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19220: [SPARK-18608][ML][FOLLOWUP] Fix double caching for PySpa...

2017-09-13 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/19220
  
cc @zhengruifeng @jkbradley @WeichenXu123 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   >