date:20161015

[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15497
  
**[Test build #67006 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67006/consoleFull)**
 for PR 15497 at commit 
[`5bc47b6`](https://github.com/apache/spark/commit/5bc47b639ede049f44ad4f47a88d26219fea6193).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15497
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15497
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67006/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14847: [SPARK-17254][SQL] Add StopAfter physical plan fo...

2016-10-15 Thread viirya

Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/14847


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15498: [SPARK-17953] [Documentation] Fix typo in SparkSe...

2016-10-15 Thread tae-jun

GitHub user tae-jun opened a pull request:

https://github.com/apache/spark/pull/15498

[SPARK-17953] [Documentation] Fix typo in SparkSession scaladoc

## What changes were proposed in this pull request?

### Before:
```scala
SparkSession.builder()
 .master("local")
 .appName("Word Count")
 .config("spark.some.config.option", "some-value").
 .getOrCreate()
```

### After:
```scala
SparkSession.builder()
 .master("local")
 .appName("Word Count")
 .config("spark.some.config.option", "some-value")
 .getOrCreate()
```

There was one unexpected dot!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tae-jun/spark SPARK-17953

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15498.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15498


commit bd4e125dd09a0937f050a4a6a6db18859fc235f0
Author: Jun Kim 
Date:   2016-10-15T07:26:15Z

Fix typo in SparkSession scaladoc

There was one unexpected dot!




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15498: [SPARK-17953] [Documentation] Fix typo in SparkSession s...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15498
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15498: [SPARK-17953] [Documentation] Fix typo in SparkSession s...

2016-10-15 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15498
  
Thanks - merging in master/branch-2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15497
  
**[Test build #3343 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3343/consoleFull)**
 for PR 15497 at commit 
[`5bc47b6`](https://github.com/apache/spark/commit/5bc47b639ede049f44ad4f47a88d26219fea6193).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15498: [SPARK-17953] [Documentation] Fix typo in SparkSe...

2016-10-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15498


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15480: [SPARK-16845][SQL] `GeneratedClass$SpecificOrdering` gro...

2016-10-15 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15480
  
cc @ueshin want to help review this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15319
  
**[Test build #67007 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67007/consoleFull)**
 for PR 15319 at commit 
[`1558d4c`](https://github.com/apache/spark/commit/1558d4c2f9190691239e9b27e9517714c2af2bcc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15319
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67007/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15319
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15471: [WIP][SPARK-17919] Make timeout to RBackend configurable...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15471
  
**[Test build #67005 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67005/consoleFull)**
 for PR 15471 at commit 
[`6f15a15`](https://github.com/apache/spark/commit/6f15a1541f01429ae19237252c600b108722ecb4).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15471: [WIP][SPARK-17919] Make timeout to RBackend configurable...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15471
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67005/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15471: [WIP][SPARK-17919] Make timeout to RBackend configurable...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15471
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15474: [DO_NOT_MERGE] Test netty

2016-10-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15474
  
Did you clear your locally built artifacts first? maybe that's the 
difference. The Jenkins test here hits the same problem I was seeing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15480: [SPARK-16845][SQL] `GeneratedClass$SpecificOrderi...

2016-10-15 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15480#discussion_r83528941
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -118,7 +118,45 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
   }
   """
 }.mkString("\n")
-comparisons
+
+/*
--- End diff --

Instead of this implementation, is it possible to use [`this 
function`](https://github.com/apache/spark/blob/b1b47274bfeba17a9e4e9acebd7385289f31f6c8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L613)
 by adding `return` as a default argument?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15495: [SPARK-17620][SQL] Determine Serde by hive.defaul...

2016-10-15 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15495#discussion_r83528964
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -587,6 +594,30 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("CTAS with default fileformat") {
+val table = "ctas1"
+val ctas = s"CREATE TABLE IF NOT EXISTS $table SELECT key k, value 
FROM src"
+withSQLConf(SQLConf.CONVERT_CTAS.key -> "true") {
+  withSQLConf("hive.default.fileformat" -> "textfile") {
+withTable(table) {
+  sql(ctas)
+  // We should use parquet here as that is the default datasource 
fileformat. The default
+  // datasource file format is controlled by 
`spark.sql.sources.default` configuration.
+  // This testcase verifies that setting `hive.default.fileformat` 
has no impact on
+  // the target table's fileformat in case of CTAS.
+  assert(sessionState.conf.defaultDataSourceName === "parquet")
+  checkRelation(tableName = table, isDataSourceTable = true, 
format = "parquet")
--- End diff --

As I know, we can't trigger it. Maybe @yhuai will know it? You can compile 
it with scala 2.10 locally to make sure it passes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-15 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/15218
  
I am assuming @kayousterhout does not have comments on this.
Can you please fix the conflict @zhzhan ? I will merge it in after that to 
master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15497
  
**[Test build #3343 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3343/consoleFull)**
 for PR 15497 at commit 
[`5bc47b6`](https://github.com/apache/spark/commit/5bc47b639ede049f44ad4f47a88d26219fea6193).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83529473
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ *
+ * TODO: support {@link #mark(int)}/{@link #reset()}
+ *
+ */
+@ThreadSafe
--- End diff --

I only see one location in the codebase where we use this annotation, and I 
think we probably shouldn't use it at all if not used consistently. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83529504
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ *
+ * TODO: support {@link #mark(int)}/{@link #reset()}
+ *
+ */
+@ThreadSafe
+public final class NioBufferedFileInputStream extends InputStream {
+
+  private static int DEFAULT_BUFFER_SIZE_BYTES = 8192;
+
+  private final ByteBuffer byteBuffer;
+
+  private final FileChannel fileChannel;
+
+  public NioBufferedFileInputStream(File file, int bufferSizeInBytes) 
throws IOException {
+byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes);
+fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ);
+byteBuffer.flip();
+  }
+
+  public NioBufferedFileInputStream(File file) throws IOException {
+this(file, DEFAULT_BUFFER_SIZE_BYTES);
+  }
+
+  /**
+   * Checks weather data is left to be read from the input stream.
+   * @return true if data is left, false otherwise
+   * @throws IOException
+   */
+  private boolean refill() throws IOException {
+if (!byteBuffer.hasRemaining()) {
+  byteBuffer.clear();
+  int nRead = 0;
+  while (nRead == 0) {
+nRead = fileChannel.read(byteBuffer);
+  }
+  if (nRead < 0) {
+return false;
+  }
+  byteBuffer.flip();
+}
+return true;
+  }
+
+  @Override
+  public synchronized int read() throws IOException {
+if (!refill()) {
+  return -1;
+}
+return byteBuffer.get() & 0xFF;
+  }
+
+  @Override
+  public synchronized int read(byte[] b, int offset, int len) throws 
IOException {
+if (offset < 0 || len < 0 || (offset + len) < 0 || (b.length - (offset 
+ len)) < 0) {
--- End diff --

Hardly matters, but now that this condition has been made more explicit, 
then final condition is simpler as `offset + len > b.length`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83529508
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ *
+ * TODO: support {@link #mark(int)}/{@link #reset()}
+ *
+ */
+@ThreadSafe
+public final class NioBufferedFileInputStream extends InputStream {
+
+  private static int DEFAULT_BUFFER_SIZE_BYTES = 8192;
+
+  private final ByteBuffer byteBuffer;
+
+  private final FileChannel fileChannel;
+
+  public NioBufferedFileInputStream(File file, int bufferSizeInBytes) 
throws IOException {
+byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes);
+fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ);
+byteBuffer.flip();
+  }
+
+  public NioBufferedFileInputStream(File file) throws IOException {
+this(file, DEFAULT_BUFFER_SIZE_BYTES);
+  }
+
+  /**
+   * Checks weather data is left to be read from the input stream.
+   * @return true if data is left, false otherwise
+   * @throws IOException
+   */
+  private boolean refill() throws IOException {
+if (!byteBuffer.hasRemaining()) {
+  byteBuffer.clear();
+  int nRead = 0;
+  while (nRead == 0) {
+nRead = fileChannel.read(byteBuffer);
+  }
+  if (nRead < 0) {
+return false;
+  }
+  byteBuffer.flip();
+}
+return true;
+  }
+
+  @Override
+  public synchronized int read() throws IOException {
+if (!refill()) {
+  return -1;
+}
+return byteBuffer.get() & 0xFF;
+  }
+
+  @Override
+  public synchronized int read(byte[] b, int offset, int len) throws 
IOException {
+if (offset < 0 || len < 0 || (offset + len) < 0 || (b.length - (offset 
+ len)) < 0) {
+  throw new IndexOutOfBoundsException();
+}
+if (!refill()) {
+  return -1;
+}
+len = Math.min(len, byteBuffer.remaining());
+byteBuffer.get(b, offset, len);
+return len;
+  }
+
+  @Override
+  public synchronized int available() throws IOException {
+return byteBuffer.remaining();
+  }
+
+  @Override
+  public synchronized long skip(long n) throws IOException {
+if (n <= 0L) {
+  return 0L;
+}
+if (byteBuffer.remaining() >= n) {
+  // The buffered content is enough to skip
+  byteBuffer.position(byteBuffer.position() + (int) n);
+  return n;
+}
+long skippedFromBuffer = byteBuffer.remaining();
+long toSkipFromFileChannel = n - skippedFromBuffer;
+// Discard everything we have read in the buffer.
+byteBuffer.position(0);
+byteBuffer.flip();
+return skippedFromBuffer + skipFromFileChannel(toSkipFromFileChannel);
+  }
+
+  private long skipFromFileChannel(long n) throws IOException {
+long currentFilePosition = fileChannel.position();
+long size = fileChannel.size();
+if (n > size - currentFilePosition) {
+  fileChannel.position(size);
+  return size - currentFilePosition;
+} else {
+  fileChannel.position(currentFilePosition + n);
+  return n;
+}
+  }
+
+  @Override
+  public synchronized void close() throws IOException {
+fileChannel.close();
+

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83529483
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ *
+ * TODO: support {@link #mark(int)}/{@link #reset()}
--- End diff --

I don't really care, but this could be a comment inside the class rather 
than user-facing. In fact I don't even know it's a to-do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83529476
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ *
+ * TODO: support {@link #mark(int)}/{@link #reset()}
+ *
+ */
+@ThreadSafe
+public final class NioBufferedFileInputStream extends InputStream {
+
+  private static int DEFAULT_BUFFER_SIZE_BYTES = 8192;
--- End diff --

Oops, forgot to say this should be `final`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15411: Set master URL configuration in scala example

2016-10-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15411
  
Hm, so I see several more examples that get included in documentation that 
don't set master. I am not sure that is a salient difference, because in 
general, when writing your own app you would not hard-code a master in the code 
either. The examples evidently don't generally set master for this reason. 
Therefore I'm not sure we shoudl make this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...

2016-10-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15450
  
@sethah I wanted to check how strongly against this kind of change you 
might be, and continue to discussion here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15450
  
**[Test build #67009 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67009/consoleFull)**
 for PR 15450 at commit 
[`ab486c1`](https://github.com/apache/spark/commit/ab486c121d759272a7a38b64fa25ec9a8de12647).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15480: [SPARK-16845][SQL] `GeneratedClass$SpecificOrderi...

2016-10-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15480#discussion_r83530039
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -118,7 +118,45 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
   }
   """
 }.mkString("\n")
-comparisons
+
+/*
+ * 40 = 7000 bytes / 170 (around 170 bytes per ordering comparison).
+ * The maximum byte code size to be compiled for HotSpot is 8000 bytes.
+ * We should keep less than 8000 bytes.
+ */
+val numberOfComparisonsThreshold = 40
+
+if (ordering.size <= numberOfComparisonsThreshold) {
+  s"""
+ |  InternalRow ${ctx.INPUT_ROW} = null;  // Holds current row 
being evaluated.
+ |  ${comparisons(ordering)}
+  """.stripMargin
+} else {
+  val groupedOrderingItr = 
ordering.grouped(numberOfComparisonsThreshold)
+  var groupedOrderingLength = 0
+  groupedOrderingItr.zipWithIndex.foreach { case (orderingGroup, i) =>
+groupedOrderingLength += 1
+val funcName = s"compare_$i"
--- End diff --

We need to use fresh name for `funcName` or its prefix (see 
[here](https://github.com/apache/spark/blob/b1b47274bfeba17a9e4e9acebd7385289f31f6c8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L634)).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2016-10-15 Thread philipphoffmann

Github user philipphoffmann commented on the issue:

https://github.com/apache/spark/pull/14936
  
Alright, I changed the implementation to keep the existing defaults.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14936
  
**[Test build #67010 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67010/consoleFull)**
 for PR 14936 at commit 
[`dec052c`](https://github.com/apache/spark/commit/dec052cac905697595193e98a1d855ccf0c37704).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15480: [SPARK-16845][SQL] `GeneratedClass$SpecificOrderi...

2016-10-15 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15480#discussion_r83530625
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
 ---
@@ -118,7 +118,45 @@ object GenerateOrdering extends 
CodeGenerator[Seq[SortOrder], Ordering[InternalR
   }
   """
 }.mkString("\n")
-comparisons
+
+/*
+ * 40 = 7000 bytes / 170 (around 170 bytes per ordering comparison).
+ * The maximum byte code size to be compiled for HotSpot is 8000 bytes.
+ * We should keep less than 8000 bytes.
+ */
+val numberOfComparisonsThreshold = 40
+
+if (ordering.size <= numberOfComparisonsThreshold) {
+  s"""
+ |  InternalRow ${ctx.INPUT_ROW} = null;  // Holds current row 
being evaluated.
+ |  ${comparisons(ordering)}
+  """.stripMargin
+} else {
+  val groupedOrderingItr = 
ordering.grouped(numberOfComparisonsThreshold)
+  var groupedOrderingLength = 0
+  groupedOrderingItr.zipWithIndex.foreach { case (orderingGroup, i) =>
+groupedOrderingLength += 1
+val funcName = s"compare_$i"
+val funcCode =
+  s"""
+ |private int $funcName(InternalRow a, InternalRow b) {
+ |  InternalRow ${ctx.INPUT_ROW} = null;  // Holds current row 
being evaluated.
+ |  ${comparisons(orderingGroup)}
+ |  return 0;
+ |}
+  """.stripMargin
+ctx.addNewFunction(funcName, funcCode)
+  }
+
+  (0 to groupedOrderingLength - 1).map { i =>
--- End diff --

nit: use `(0 until groupedOrderingLength)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15450
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67009/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15450
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15450
  
**[Test build #67009 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67009/consoleFull)**
 for PR 15450 at commit 
[`ab486c1`](https://github.com/apache/spark/commit/ab486c121d759272a7a38b64fa25ec9a8de12647).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14936
  
**[Test build #67010 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67010/consoleFull)**
 for PR 14936 at commit 
[`dec052c`](https://github.com/apache/spark/commit/dec052cac905697595193e98a1d855ccf0c37704).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14936
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14936
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67010/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15499: [SPARK-17955][SQL] Make DataFrameReader.jdbc call...

2016-10-15 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/15499

[SPARK-17955][SQL] Make DataFrameReader.jdbc call 
DataFrameReader.format("jdbc").load

## What changes were proposed in this pull request?

This PR proposes to make `DataFrameReader.jdbc` call 
`DataFrameReader.format("jdbc").load` consistently with other APIs in 
`DataFrameReader`/`DataFrameWriter` and avoid calling 
`sparkSession.baseRelationToDataFrame(..)` here and there.

The changes were mostly copied from `DataFrameWriter.jdbc()` which was 
recently updated.

```
-val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
-val options = new JDBCOptions(url, table, params)
-val relation = JDBCRelation(parts, options)(sparkSession)
-sparkSession.baseRelationToDataFrame(relation)
+this.extraOptions = this.extraOptions ++ connectionProperties.asScala
+// explicit url and dbtable should override all
+this.extraOptions += ("url" -> url, "dbtable" -> table)
+format("jdbc").load()
```

## How was this patch tested?

Existing tests should cover this.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-17955

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15499.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15499


commit aa8cd3507d6ae79c0b2c93a077e655880d10a01c
Author: hyukjinkwon 
Date:   2016-10-15T12:36:01Z

Use the same read path in DataFrameReader.jdbc and 
DataFrameReader.format("jdbc")

commit 0d6e2d1aa6a3348c4fad5256b7580364499d3daf
Author: hyukjinkwon 
Date:   2016-10-15T12:39:11Z

Add missing dots




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15499: [SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFr...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15499
  
**[Test build #67011 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67011/consoleFull)**
 for PR 15499 at commit 
[`0d6e2d1`](https://github.com/apache/spark/commit/0d6e2d1aa6a3348c4fad5256b7580364499d3daf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...

2016-10-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15361
  
Hi @chenghao-intel and @davies, it seems related code paths were updated by 
your before. Do you mind if I ask to take a look please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...

2016-10-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15049
  
ping @liancheng and @yhuai...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14947: [SPARK-17388][SQL] Support for inferring type date/times...

2016-10-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14947
  
ping @davies ..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...

2016-10-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14660
  
@liancheng Would there be other things maybe I should take care of?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15354
  
**[Test build #67012 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67012/consoleFull)**
 for PR 15354 at commit 
[`38d89a6`](https://github.com/apache/spark/commit/38d89a6ab04b9181f7be818a7ee6cf0bd77e2c69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-15 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/15500

[SPARK-17956][SQL] Fix projection output ordering

## What changes were proposed in this pull request?

Currently `ProjectExec` simply takes child plan's `outputOrdering` as its 
`outputOrdering`. In some cases, this might lead to incorrect `outputOrdering`. 
This argument applies to `TakeOrderedAndProjectExec` too.


## How was this patch tested?

Jenkins tests.

Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 project-sort-order

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15500.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15500


commit 58142f136533fe91956deee9575d7bf48164865b
Author: Liang-Chi Hsieh 
Date:   2016-10-15T13:27:27Z

Fix projection output ordering.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15500
  
**[Test build #67013 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67013/consoleFull)**
 for PR 15500 at commit 
[`58142f1`](https://github.com/apache/spark/commit/58142f136533fe91956deee9575d7bf48164865b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15501: Branch 2.0

2016-10-15 Thread lastbus

GitHub user lastbus opened a pull request:

https://github.com/apache/spark/pull/15501

Branch 2.0

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15501.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15501


commit 0297896119e11f23da4b14f62f50ec72b5fac57f
Author: Junyang Qian 
Date:   2016-08-20T13:59:23Z

[SPARK-16508][SPARKR] Fix CRAN undocumented/duplicated arguments warnings.

This PR tries to fix all the remaining "undocumented/duplicated arguments" 
warnings given by CRAN-check.

One left is doc for R `stats::glm` exported in SparkR. To mute that 
warning, we have to also provide document for all arguments of that non-SparkR 
function.

Some previous conversation is in #14558.

R unit test and `check-cran.sh` script (with no-test).

Author: Junyang Qian 

Closes #14705 from junyangq/SPARK-16508-master.

(cherry picked from commit 01401e965b58f7e8ab615764a452d7d18f1d4bf0)
Signed-off-by: Shivaram Venkataraman 

commit e62b29f29f44196a1cbe13004ff4abfd8e5be1c1
Author: Dongjoon Hyun 
Date:   2016-08-21T20:07:47Z

[SPARK-17098][SQL] Fix `NullPropagation` optimizer to handle `COUNT(NULL) 
OVER` correctly

## What changes were proposed in this pull request?

Currently, `NullPropagation` optimizer replaces `COUNT` on null literals in 
a bottom-up fashion. During that, `WindowExpression` is not covered properly. 
This PR adds the missing propagation logic.

**Before**
```scala
scala> sql("SELECT COUNT(1 + NULL) OVER ()").show
java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 
as bigint) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING)
```

**After**
```scala
scala> sql("SELECT COUNT(1 + NULL) OVER ()").show

+--+
|count((1 + CAST(NULL AS INT))) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING)|

+--+
|   
  0|

+--+
```

## How was this patch tested?

Pass the Jenkins test with a new test case.

Author: Dongjoon Hyun 

Closes #14689 from dongjoon-hyun/SPARK-17098.

(cherry picked from commit 91c2397684ab791572ac57ffb2a924ff058bb64f)
Signed-off-by: Herman van Hovell 

commit 49cc44de3ad5495b2690633791941aa00a62b553
Author: Davies Liu 
Date:   2016-08-22T08:16:03Z

[SPARK-17115][SQL] decrease the threshold when split expressions

## What changes were proposed in this pull request?

In 2.0, we change the threshold of splitting expressions from 16K to 64K, 
which cause very bad performance on wide table, because the generated method 
can't be JIT compiled by default (above the limit of 8K bytecode).

This PR will decrease it to 1K, based on the benchmark results for a wide 
table with 400 columns of LongType.

It also fix a bug around splitting expression in whole-stage codegen (it 
should not split them).

## How was this patch tested?

Added benchmark suite.

Author: Davies Liu 

Closes #14692 from davies/split_exprs.

(cherry picked from commit 8d35a6f68d6d733212674491cbf31bed73fada0f)
Signed-off-by: Wenchen Fan 

commit 2add45fabeb0ea4f7b17b5bc4910161370e72627
Author: Jagadeesan 
Date:   2016-08-22T08:30:31Z

[SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - 
UNSUPPORTED OPERATIONS]

Changes in  Spark Stuctured Streaming doc in this link

https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations

Author: Jagadeesan 

Closes #14715 from jagadeesanas2/SPARK-17085.

(cherry picked from commit bd9655063bdba8836b4ec96ed115e5653e246b65)
Signed-off-by: Sean Owen 

commit 79195982a4c6f8b1a3e02069dea00049cc806574
Author: Junyang Qian 
Date:   2016-08-22T

[GitHub] spark issue #15501: Branch 2.0

2016-10-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15501
  
@lastbus close this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15501: Branch 2.0

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15501
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83532883
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ *
+ * TODO: support {@link #mark(int)}/{@link #reset()}
--- End diff --

Okay, I removed the TODO here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83532901
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ *
+ * TODO: support {@link #mark(int)}/{@link #reset()}
+ *
+ */
+@ThreadSafe
--- End diff --

Alright, removed it for consistency. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83532904
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ *
+ * TODO: support {@link #mark(int)}/{@link #reset()}
+ *
+ */
+@ThreadSafe
+public final class NioBufferedFileInputStream extends InputStream {
+
+  private static int DEFAULT_BUFFER_SIZE_BYTES = 8192;
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15408
  
**[Test build #67014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67014/consoleFull)**
 for PR 15408 at commit 
[`f1f108f`](https://github.com/apache/spark/commit/f1f108f3bffaa9cecbca37dcb6a818b45174e3d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83532952
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,138 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ */
+public final class NioBufferedFileInputStream extends InputStream {
+
+  private static final int DEFAULT_BUFFER_SIZE_BYTES = 8192;
+
+  private final ByteBuffer byteBuffer;
+
+  private final FileChannel fileChannel;
+
+  public NioBufferedFileInputStream(File file, int bufferSizeInBytes) 
throws IOException {
+byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes);
+fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ);
+byteBuffer.flip();
+  }
+
+  public NioBufferedFileInputStream(File file) throws IOException {
+this(file, DEFAULT_BUFFER_SIZE_BYTES);
+  }
+
+  /**
+   * Checks weather data is left to be read from the input stream.
+   * @return true if data is left, false otherwise
+   * @throws IOException
+   */
+  private boolean refill() throws IOException {
+if (!byteBuffer.hasRemaining()) {
+  byteBuffer.clear();
+  int nRead = 0;
+  while (nRead == 0) {
+nRead = fileChannel.read(byteBuffer);
+  }
+  if (nRead < 0) {
+return false;
+  }
+  byteBuffer.flip();
+}
+return true;
+  }
+
+  @Override
+  public synchronized int read() throws IOException {
+if (!refill()) {
+  return -1;
+}
+return byteBuffer.get() & 0xFF;
+  }
+
+  @Override
+  public synchronized int read(byte[] b, int offset, int len) throws 
IOException {
+if (offset < 0 || len < 0 || (b.length - (offset + len)) < 0) {
--- End diff --

Ah no I think that condition was needed. I mean: `if (offset < 0 || len < 0 
|| offset + len < 0 || offset + len > b.length) {`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83532939
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,138 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
--- End diff --

Nit^2 : no longer needed as an import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83532978
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ *
+ * TODO: support {@link #mark(int)}/{@link #reset()}
+ *
+ */
+@ThreadSafe
+public final class NioBufferedFileInputStream extends InputStream {
+
+  private static int DEFAULT_BUFFER_SIZE_BYTES = 8192;
+
+  private final ByteBuffer byteBuffer;
+
+  private final FileChannel fileChannel;
+
+  public NioBufferedFileInputStream(File file, int bufferSizeInBytes) 
throws IOException {
+byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes);
+fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ);
+byteBuffer.flip();
+  }
+
+  public NioBufferedFileInputStream(File file) throws IOException {
+this(file, DEFAULT_BUFFER_SIZE_BYTES);
+  }
+
+  /**
+   * Checks weather data is left to be read from the input stream.
+   * @return true if data is left, false otherwise
+   * @throws IOException
+   */
+  private boolean refill() throws IOException {
+if (!byteBuffer.hasRemaining()) {
+  byteBuffer.clear();
+  int nRead = 0;
+  while (nRead == 0) {
+nRead = fileChannel.read(byteBuffer);
+  }
+  if (nRead < 0) {
+return false;
+  }
+  byteBuffer.flip();
+}
+return true;
+  }
+
+  @Override
+  public synchronized int read() throws IOException {
+if (!refill()) {
+  return -1;
+}
+return byteBuffer.get() & 0xFF;
+  }
+
+  @Override
+  public synchronized int read(byte[] b, int offset, int len) throws 
IOException {
+if (offset < 0 || len < 0 || (offset + len) < 0 || (b.length - (offset 
+ len)) < 0) {
--- End diff --

We still need to check if `offset` and `len` is less than 0 right? Removed 
the` offset + len < 0` condition because that is covered in the last condition 
`(b.length - (offset + len)) < 0`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83532993
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ *
+ * TODO: support {@link #mark(int)}/{@link #reset()}
+ *
+ */
+@ThreadSafe
+public final class NioBufferedFileInputStream extends InputStream {
+
+  private static int DEFAULT_BUFFER_SIZE_BYTES = 8192;
+
+  private final ByteBuffer byteBuffer;
+
+  private final FileChannel fileChannel;
+
+  public NioBufferedFileInputStream(File file, int bufferSizeInBytes) 
throws IOException {
+byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes);
+fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ);
+byteBuffer.flip();
+  }
+
+  public NioBufferedFileInputStream(File file) throws IOException {
+this(file, DEFAULT_BUFFER_SIZE_BYTES);
+  }
+
+  /**
+   * Checks weather data is left to be read from the input stream.
+   * @return true if data is left, false otherwise
+   * @throws IOException
+   */
+  private boolean refill() throws IOException {
+if (!byteBuffer.hasRemaining()) {
+  byteBuffer.clear();
+  int nRead = 0;
+  while (nRead == 0) {
+nRead = fileChannel.read(byteBuffer);
+  }
+  if (nRead < 0) {
+return false;
+  }
+  byteBuffer.flip();
+}
+return true;
+  }
+
+  @Override
+  public synchronized int read() throws IOException {
+if (!refill()) {
+  return -1;
+}
+return byteBuffer.get() & 0xFF;
+  }
+
+  @Override
+  public synchronized int read(byte[] b, int offset, int len) throws 
IOException {
+if (offset < 0 || len < 0 || (offset + len) < 0 || (b.length - (offset 
+ len)) < 0) {
+  throw new IndexOutOfBoundsException();
+}
+if (!refill()) {
+  return -1;
+}
+len = Math.min(len, byteBuffer.remaining());
+byteBuffer.get(b, offset, len);
+return len;
+  }
+
+  @Override
+  public synchronized int available() throws IOException {
+return byteBuffer.remaining();
+  }
+
+  @Override
+  public synchronized long skip(long n) throws IOException {
+if (n <= 0L) {
+  return 0L;
+}
+if (byteBuffer.remaining() >= n) {
+  // The buffered content is enough to skip
+  byteBuffer.position(byteBuffer.position() + (int) n);
+  return n;
+}
+long skippedFromBuffer = byteBuffer.remaining();
+long toSkipFromFileChannel = n - skippedFromBuffer;
+// Discard everything we have read in the buffer.
+byteBuffer.position(0);
+byteBuffer.flip();
+return skippedFromBuffer + skipFromFileChannel(toSkipFromFileChannel);
+  }
+
+  private long skipFromFileChannel(long n) throws IOException {
+long currentFilePosition = fileChannel.position();
+long size = fileChannel.size();
+if (n > size - currentFilePosition) {
+  fileChannel.position(size);
+  return size - currentFilePosition;
+} else {
+  fileChannel.position(currentFilePosition + n);
+  return n;
+}
+  }
+
+  @Override
+  public synchronized void close() throws IOException {
+fileChannel.close();
+

[GitHub] spark issue #15474: [DO_NOT_MERGE] Test netty

2016-10-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15474
  
PS I got this to work by making the pyspark tests specify the 
"userClassPathFirst" flags. I think that's actually reasonable to set. I will 
try this in my PR at https://github.com/apache/spark/pull/15436 and hope it 
works; if so you can close this I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-15 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/15408#discussion_r83533112
  
--- Diff: 
core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.ThreadSafe;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.StandardOpenOption;
+
+/**
+ * {@link InputStream} implementation which uses direct buffer
+ * to read a file to avoid extra copy of data between Java and
+ * native memory which happens when using {@link 
java.io.BufferedInputStream}.
+ * Unfortunately, this is not something already available in JDK,
+ * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
+ * but does not support buffering.
+ *
+ * TODO: support {@link #mark(int)}/{@link #reset()}
+ *
+ */
+@ThreadSafe
+public final class NioBufferedFileInputStream extends InputStream {
+
+  private static int DEFAULT_BUFFER_SIZE_BYTES = 8192;
+
+  private final ByteBuffer byteBuffer;
+
+  private final FileChannel fileChannel;
+
+  public NioBufferedFileInputStream(File file, int bufferSizeInBytes) 
throws IOException {
+byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes);
+fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ);
+byteBuffer.flip();
+  }
+
+  public NioBufferedFileInputStream(File file) throws IOException {
+this(file, DEFAULT_BUFFER_SIZE_BYTES);
+  }
+
+  /**
+   * Checks weather data is left to be read from the input stream.
+   * @return true if data is left, false otherwise
+   * @throws IOException
+   */
+  private boolean refill() throws IOException {
+if (!byteBuffer.hasRemaining()) {
+  byteBuffer.clear();
+  int nRead = 0;
+  while (nRead == 0) {
+nRead = fileChannel.read(byteBuffer);
+  }
+  if (nRead < 0) {
+return false;
+  }
+  byteBuffer.flip();
+}
+return true;
+  }
+
+  @Override
+  public synchronized int read() throws IOException {
+if (!refill()) {
+  return -1;
+}
+return byteBuffer.get() & 0xFF;
+  }
+
+  @Override
+  public synchronized int read(byte[] b, int offset, int len) throws 
IOException {
+if (offset < 0 || len < 0 || (offset + len) < 0 || (b.length - (offset 
+ len)) < 0) {
--- End diff --

Ignore my previous comments, we still need it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15436: [SPARK-17875] [BUILD] Remove unneeded direct dependence ...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15436
  
**[Test build #67015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67015/consoleFull)**
 for PR 15436 at commit 
[`f49f6a6`](https://github.com/apache/spark/commit/f49f6a6ec956b069b4934a0b94450413529c1b93).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15408
  
**[Test build #67016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67016/consoleFull)**
 for PR 15408 at commit 
[`5306fb0`](https://github.com/apache/spark/commit/5306fb097ecef7ff69c3281f33f221826879ef04).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15499: [SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFr...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15499
  
**[Test build #67011 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67011/consoleFull)**
 for PR 15499 at commit 
[`0d6e2d1`](https://github.com/apache/spark/commit/0d6e2d1aa6a3348c4fad5256b7580364499d3daf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15499: [SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFr...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15499
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15499: [SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFr...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15499
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67011/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2016-10-15 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/15435
  
@sethah Good suggestion. code updated, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15435
  
**[Test build #67017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67017/consoleFull)**
 for PR 15435 at commit 
[`1bf5aa4`](https://github.com/apache/spark/commit/1bf5aa4b750899cc7a8ea83368d2ff5a66a76b91).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query i...

2016-10-15 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/15502

[SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS More Than Once 
#15048

### What changes were proposed in this pull request?
This PR is to backport https://github.com/apache/spark/pull/15048 and 
https://github.com/apache/spark/pull/15459. 

However, in 2.0, we do not have a unified logical node `CreateTable` and 
the analyzer rule `PreWriteCheck` is also different. To minimize the code 
changes, this PR adds a new rule `AnalyzeCreateTableAsSelect`. Please treat it 
as a new PR to review. Thanks!

As explained in https://github.com/apache/spark/pull/14797:
>Some analyzer rules have assumptions on logical plans, optimizer may break 
these assumption, we should not pass an optimized query plan into 
QueryExecution (will be analyzed again), otherwise we may some weird bugs.
For example, we have a rule for decimal calculation to promote the 
precision before binary operations, use PromotePrecision as placeholder to 
indicate that this rule should not apply twice. But a Optimizer rule will 
remove this placeholder, that break the assumption, then the rule applied 
twice, cause wrong result.

We should not optimize the query in CTAS more than once. For example, 
```Scala
spark.range(99, 101).createOrReplaceTempView("tab1")
val sqlStmt = "SELECT id, cast(id as long) * cast('1.0' as decimal(38, 18)) 
as num FROM tab1"
sql(s"CREATE TABLE tab2 USING PARQUET AS $sqlStmt")
checkAnswer(spark.table("tab2"), sql(sqlStmt))
```
Before this PR, the results do not match
```
== Results ==
!== Correct Answer - 2 ==   == Spark Answer - 2 ==
![100,100.00]   [100,null]
 [99,99.00] [99,99.00]
```
After this PR, the results match.
```
+---+--+
|id |num   |
+---+--+
|99 |99.00 |
|100|100.00|
+---+--+
```

In this PR, we do not treat the `query` in CTAS as a child. Thus, the 
`query` will not be optimized when optimizing CTAS statement. However, we still 
need to analyze it for normalizing and verifying the CTAS in the Analyzer. 
Thus, we do it in the analyzer rule `PreprocessDDL`, because so far only this 
rule needs the analyzed plan of the `query`.

### How was this patch tested?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark ctasOptimize2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15502.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15502


commit a9931a538912aeed620df216beb355970979c3f0
Author: gatorsmile 
Date:   2016-10-14T05:16:08Z

the first set of changes

commit d5f91871b4329ca9292a9ca129d6c603f4cf47fc
Author: gatorsmile 
Date:   2016-10-15T15:06:21Z

2nd change set

commit a658da47983001260205c97406dbf744fd9abfcd
Author: gatorsmile 
Date:   2016-10-15T15:12:57Z

more comment

commit 9cfebc523e4b88c3df3ffae8ca5ea92e98a0a616
Author: gatorsmile 
Date:   2016-10-15T15:17:26Z

rename




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS ...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15502
  
**[Test build #67018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67018/consoleFull)**
 for PR 15502 at commit 
[`9cfebc5`](https://github.com/apache/spark/commit/9cfebc523e4b88c3df3ffae8ca5ea92e98a0a616).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15500
  
**[Test build #67019 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67019/consoleFull)**
 for PR 15500 at commit 
[`029c36d`](https://github.com/apache/spark/commit/029c36d345a8f7042e63f4b586cfeaa4362367fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15354
  
**[Test build #67012 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67012/consoleFull)**
 for PR 15354 at commit 
[`38d89a6`](https://github.com/apache/spark/commit/38d89a6ab04b9181f7be818a7ee6cf0bd77e2c69).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15354
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15354
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67012/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15500
  
**[Test build #67013 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67013/consoleFull)**
 for PR 15500 at commit 
[`58142f1`](https://github.com/apache/spark/commit/58142f136533fe91956deee9575d7bf48164865b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15500
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67013/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15500
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15435
  
**[Test build #67017 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67017/consoleFull)**
 for PR 15435 at commit 
[`1bf5aa4`](https://github.com/apache/spark/commit/1bf5aa4b750899cc7a8ea83368d2ff5a66a76b91).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15435
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67017/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15435
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15408
  
**[Test build #67014 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67014/consoleFull)**
 for PR 15408 at commit 
[`f1f108f`](https://github.com/apache/spark/commit/f1f108f3bffaa9cecbca37dcb6a818b45174e3d3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15408
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15408
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67014/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15408
  
**[Test build #67016 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67016/consoleFull)**
 for PR 15408 at commit 
[`5306fb0`](https://github.com/apache/spark/commit/5306fb097ecef7ff69c3281f33f221826879ef04).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15408
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67016/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15408
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15319
  
**[Test build #67020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67020/consoleFull)**
 for PR 15319 at commit 
[`388443d`](https://github.com/apache/spark/commit/388443d2886d09fec6a25b8400c6eb9631373135).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15494: [SPARK-17947] [SQL] Add Doc and Comment about spark.sql....

2016-10-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15494
  
You can see the failed test cases in the PR: 
https://github.com/apache/spark/pull/15478

`ANALYZE TABLE` will fail due to the 
[checking](https://github.com/apache/spark/blob/6ce1b675ee9fc9a6034439c3ca00441f9f172f84/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L111-L115
) when we try to alter the table properties. 

`SHOW CREATE TABLE` also outputs the table properties that should not be 
part of the output CREAT TABLE statement.

`CREATE TABLE LIKE` always excludes all the table properties of the source 
table. However, we might make a change in this part. This is still waiting for 
your input in another PR. See the 
[discussion](https://github.com/apache/spark/pull/14531#issuecomment-252147424) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS ...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15502
  
**[Test build #67018 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67018/consoleFull)**
 for PR 15502 at commit 
[`9cfebc5`](https://github.com/apache/spark/commit/9cfebc523e4b88c3df3ffae8ca5ea92e98a0a616).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class AnalyzeCreateTableAsSelect(sparkSession: SparkSession) 
extends Rule[LogicalPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS ...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15502
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67018/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS ...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15502
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15500
  
**[Test build #67019 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67019/consoleFull)**
 for PR 15500 at commit 
[`029c36d`](https://github.com/apache/spark/commit/029c36d345a8f7042e63f4b586cfeaa4362367fd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15500
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15500
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67019/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15495: [SPARK-17620][SQL] Determine Serde by hive.default.filef...

2016-10-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15495
  
@yhuai Do you think it is good enough to merge? Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...

2016-10-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15398
  
Also cc @yhuai , @JoshRosen and @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14547: [SPARK-16718][MLlib] gbm-style treeboost

2016-10-15 Thread vlad17

Github user vlad17 commented on the issue:

https://github.com/apache/spark/pull/14547
  
@sethah You raise good points.

Regarding (1), I don't know if it is actually true. I don't want to speak 
for @jkbradley, but I was just going off of "software engineering intuition" 
about backwards capability of the algorithm's behavior. But let's consider an 
analogous example - if LogisticRegression was using regular batch GD, and we 
moved it to L-BFGS, it wouldn't make much sense to offer a new option for "gd".

I think the question is whether reverting to original behavior is common 
enough to merit a larger, more clunky, and more confusing API. And as the 
notion of "original" will be changing over time, I'm starting to see the 
attractiveness of @sethah's original proposition to get rid of this option 
entirely, and let us do whatever we want under the hood impurity-wise.

**TL; DR:** I can see at no point a data scientist saying "you know what 
will help my l1 error? A mean predictor!"

The strongest point in favor of this that comes to me is the following: 
people who would be changing the impurity metric are going to be people who are 
working on a GBT model tuning; but there's no good reason to use variance-based 
impurity with mean predictions for a loss that isn't optimized by those 
changes! Any model tuning which would, in some way or another, be checking 
`.setImpurity("variance")` vs `.setImpurity("loss-based")` that happens to show 
that you do better when choosing variance with CV, then all you've done is grid 
search on GBT model parameters to overfit to noise in your data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in...

2016-10-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r83537154
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala
 ---
@@ -74,6 +107,31 @@ class RegexpExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation("a\nb" like regEx, true, create_row("a%b"))
 
 checkEvaluation(Literal.create(null, StringType) like regEx, null, 
create_row("bc%"))
+
+checkEvaluation("" like regEx, true, create_row(""))
+checkEvaluation("a" like regEx, false, create_row(""))
+checkEvaluation("" like regEx, false, create_row("a"))
+
+checkEvaluation("""""" like regEx, true, create_row("""%\\%"""))
+checkEvaluation("""%%""" like regEx, true, create_row("""%%"""))
+checkEvaluation("""\__""" like regEx, true, create_row("""\\\__"""))
+checkEvaluation("""\\\__""" like regEx, false, 
create_row("""%\\%\%"""))
+checkEvaluation("""_\\\%""" like regEx, false, create_row("""%\\"""))
+
+// scalastyle:off nonascii
+checkEvaluation("a\u20ACa" like regEx, true, create_row("_\u20AC_"))
+checkEvaluation("aâ¬a" like regEx, true, create_row("_â¬_"))
+checkEvaluation("aâ¬a" like regEx, true, create_row("_\u20AC_"))
+checkEvaluation("a\u20ACa" like regEx, true, create_row("_â¬_"))
+// scalastyle:on nonascii
+
+// TODO: should throw an exception?
--- End diff --

To answer your point 3, I did a try in DB2. 
```
db2 => select actkwd from act where actkwd like '%A%\a' escape '\'
SQL0130N  The ESCAPE clause is not a single character, or the pattern 
string 
contains an invalid occurrence of the escape character.  SQLSTATE=22025
```

In DB2, normally, our design is very conservative. If we think this could 
be a user error, we will stop it with an error. We do not want to give users 
any surprise. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15436: [SPARK-17875] [BUILD] Remove unneeded direct dependence ...

2016-10-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15436
  
**[Test build #67015 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67015/consoleFull)**
 for PR 15436 at commit 
[`f49f6a6`](https://github.com/apache/spark/commit/f49f6a6ec956b069b4934a0b94450413529c1b93).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15436: [SPARK-17875] [BUILD] Remove unneeded direct dependence ...

2016-10-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15436
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67015/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 171 matches

Mail list logo