[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15497 **[Test build #67006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67006/consoleFull)** for PR 15497 at commit [`5bc47b6`](https://github.com/apache/spark/commit/5bc47b639ede049f44ad4f47a88d26219fea6193). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15497 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15497 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67006/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14847: [SPARK-17254][SQL] Add StopAfter physical plan fo...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/14847 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15498: [SPARK-17953] [Documentation] Fix typo in SparkSe...
GitHub user tae-jun opened a pull request: https://github.com/apache/spark/pull/15498 [SPARK-17953] [Documentation] Fix typo in SparkSession scaladoc ## What changes were proposed in this pull request? ### Before: ```scala SparkSession.builder() .master("local") .appName("Word Count") .config("spark.some.config.option", "some-value"). .getOrCreate() ``` ### After: ```scala SparkSession.builder() .master("local") .appName("Word Count") .config("spark.some.config.option", "some-value") .getOrCreate() ``` There was one unexpected dot! You can merge this pull request into a Git repository by running: $ git pull https://github.com/tae-jun/spark SPARK-17953 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15498.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15498 commit bd4e125dd09a0937f050a4a6a6db18859fc235f0 Author: Jun Kim Date: 2016-10-15T07:26:15Z Fix typo in SparkSession scaladoc There was one unexpected dot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15498: [SPARK-17953] [Documentation] Fix typo in SparkSession s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15498 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15498: [SPARK-17953] [Documentation] Fix typo in SparkSession s...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15498 Thanks - merging in master/branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15497 **[Test build #3343 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3343/consoleFull)** for PR 15497 at commit [`5bc47b6`](https://github.com/apache/spark/commit/5bc47b639ede049f44ad4f47a88d26219fea6193). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15498: [SPARK-17953] [Documentation] Fix typo in SparkSe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15498 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15480: [SPARK-16845][SQL] `GeneratedClass$SpecificOrdering` gro...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15480 cc @ueshin want to help review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15319 **[Test build #67007 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67007/consoleFull)** for PR 15319 at commit [`1558d4c`](https://github.com/apache/spark/commit/1558d4c2f9190691239e9b27e9517714c2af2bcc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15319 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67007/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15319 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15471: [WIP][SPARK-17919] Make timeout to RBackend configurable...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15471 **[Test build #67005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67005/consoleFull)** for PR 15471 at commit [`6f15a15`](https://github.com/apache/spark/commit/6f15a1541f01429ae19237252c600b108722ecb4). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15471: [WIP][SPARK-17919] Make timeout to RBackend configurable...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15471 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67005/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15471: [WIP][SPARK-17919] Make timeout to RBackend configurable...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15471 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15474: [DO_NOT_MERGE] Test netty
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15474 Did you clear your locally built artifacts first? maybe that's the difference. The Jenkins test here hits the same problem I was seeing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15480: [SPARK-16845][SQL] `GeneratedClass$SpecificOrderi...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15480#discussion_r83528941 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -118,7 +118,45 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR } """ }.mkString("\n") -comparisons + +/* --- End diff -- Instead of this implementation, is it possible to use [`this function`](https://github.com/apache/spark/blob/b1b47274bfeba17a9e4e9acebd7385289f31f6c8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L613) by adding `return` as a default argument? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15495: [SPARK-17620][SQL] Determine Serde by hive.defaul...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15495#discussion_r83528964 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -587,6 +594,30 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + test("CTAS with default fileformat") { +val table = "ctas1" +val ctas = s"CREATE TABLE IF NOT EXISTS $table SELECT key k, value FROM src" +withSQLConf(SQLConf.CONVERT_CTAS.key -> "true") { + withSQLConf("hive.default.fileformat" -> "textfile") { +withTable(table) { + sql(ctas) + // We should use parquet here as that is the default datasource fileformat. The default + // datasource file format is controlled by `spark.sql.sources.default` configuration. + // This testcase verifies that setting `hive.default.fileformat` has no impact on + // the target table's fileformat in case of CTAS. + assert(sessionState.conf.defaultDataSourceName === "parquet") + checkRelation(tableName = table, isDataSourceTable = true, format = "parquet") --- End diff -- As I know, we can't trigger it. Maybe @yhuai will know it? You can compile it with scala 2.10 locally to make sure it passes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15218 I am assuming @kayousterhout does not have comments on this. Can you please fix the conflict @zhzhan ? I will merge it in after that to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15497: [Test][SPARK-16002][Follow-up] Fix flaky test in Streami...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15497 **[Test build #3343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3343/consoleFull)** for PR 15497 at commit [`5bc47b6`](https://github.com/apache/spark/commit/5bc47b639ede049f44ad4f47a88d26219fea6193). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83529473 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,142 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + * TODO: support {@link #mark(int)}/{@link #reset()} + * + */ +@ThreadSafe --- End diff -- I only see one location in the codebase where we use this annotation, and I think we probably shouldn't use it at all if not used consistently. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83529504 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,142 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + * TODO: support {@link #mark(int)}/{@link #reset()} + * + */ +@ThreadSafe +public final class NioBufferedFileInputStream extends InputStream { + + private static int DEFAULT_BUFFER_SIZE_BYTES = 8192; + + private final ByteBuffer byteBuffer; + + private final FileChannel fileChannel; + + public NioBufferedFileInputStream(File file, int bufferSizeInBytes) throws IOException { +byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes); +fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ); +byteBuffer.flip(); + } + + public NioBufferedFileInputStream(File file) throws IOException { +this(file, DEFAULT_BUFFER_SIZE_BYTES); + } + + /** + * Checks weather data is left to be read from the input stream. + * @return true if data is left, false otherwise + * @throws IOException + */ + private boolean refill() throws IOException { +if (!byteBuffer.hasRemaining()) { + byteBuffer.clear(); + int nRead = 0; + while (nRead == 0) { +nRead = fileChannel.read(byteBuffer); + } + if (nRead < 0) { +return false; + } + byteBuffer.flip(); +} +return true; + } + + @Override + public synchronized int read() throws IOException { +if (!refill()) { + return -1; +} +return byteBuffer.get() & 0xFF; + } + + @Override + public synchronized int read(byte[] b, int offset, int len) throws IOException { +if (offset < 0 || len < 0 || (offset + len) < 0 || (b.length - (offset + len)) < 0) { --- End diff -- Hardly matters, but now that this condition has been made more explicit, then final condition is simpler as `offset + len > b.length` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83529508 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,142 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + * TODO: support {@link #mark(int)}/{@link #reset()} + * + */ +@ThreadSafe +public final class NioBufferedFileInputStream extends InputStream { + + private static int DEFAULT_BUFFER_SIZE_BYTES = 8192; + + private final ByteBuffer byteBuffer; + + private final FileChannel fileChannel; + + public NioBufferedFileInputStream(File file, int bufferSizeInBytes) throws IOException { +byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes); +fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ); +byteBuffer.flip(); + } + + public NioBufferedFileInputStream(File file) throws IOException { +this(file, DEFAULT_BUFFER_SIZE_BYTES); + } + + /** + * Checks weather data is left to be read from the input stream. + * @return true if data is left, false otherwise + * @throws IOException + */ + private boolean refill() throws IOException { +if (!byteBuffer.hasRemaining()) { + byteBuffer.clear(); + int nRead = 0; + while (nRead == 0) { +nRead = fileChannel.read(byteBuffer); + } + if (nRead < 0) { +return false; + } + byteBuffer.flip(); +} +return true; + } + + @Override + public synchronized int read() throws IOException { +if (!refill()) { + return -1; +} +return byteBuffer.get() & 0xFF; + } + + @Override + public synchronized int read(byte[] b, int offset, int len) throws IOException { +if (offset < 0 || len < 0 || (offset + len) < 0 || (b.length - (offset + len)) < 0) { + throw new IndexOutOfBoundsException(); +} +if (!refill()) { + return -1; +} +len = Math.min(len, byteBuffer.remaining()); +byteBuffer.get(b, offset, len); +return len; + } + + @Override + public synchronized int available() throws IOException { +return byteBuffer.remaining(); + } + + @Override + public synchronized long skip(long n) throws IOException { +if (n <= 0L) { + return 0L; +} +if (byteBuffer.remaining() >= n) { + // The buffered content is enough to skip + byteBuffer.position(byteBuffer.position() + (int) n); + return n; +} +long skippedFromBuffer = byteBuffer.remaining(); +long toSkipFromFileChannel = n - skippedFromBuffer; +// Discard everything we have read in the buffer. +byteBuffer.position(0); +byteBuffer.flip(); +return skippedFromBuffer + skipFromFileChannel(toSkipFromFileChannel); + } + + private long skipFromFileChannel(long n) throws IOException { +long currentFilePosition = fileChannel.position(); +long size = fileChannel.size(); +if (n > size - currentFilePosition) { + fileChannel.position(size); + return size - currentFilePosition; +} else { + fileChannel.position(currentFilePosition + n); + return n; +} + } + + @Override + public synchronized void close() throws IOException { +fileChannel.close(); +
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83529483 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,142 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + * TODO: support {@link #mark(int)}/{@link #reset()} --- End diff -- I don't really care, but this could be a comment inside the class rather than user-facing. In fact I don't even know it's a to-do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83529476 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,142 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + * TODO: support {@link #mark(int)}/{@link #reset()} + * + */ +@ThreadSafe +public final class NioBufferedFileInputStream extends InputStream { + + private static int DEFAULT_BUFFER_SIZE_BYTES = 8192; --- End diff -- Oops, forgot to say this should be `final` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15411: Set master URL configuration in scala example
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15411 Hm, so I see several more examples that get included in documentation that don't set master. I am not sure that is a salient difference, because in general, when writing your own app you would not hard-code a master in the code either. The examples evidently don't generally set master for this reason. Therefore I'm not sure we shoudl make this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15450 @sethah I wanted to check how strongly against this kind of change you might be, and continue to discussion here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15450 **[Test build #67009 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67009/consoleFull)** for PR 15450 at commit [`ab486c1`](https://github.com/apache/spark/commit/ab486c121d759272a7a38b64fa25ec9a8de12647). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15480: [SPARK-16845][SQL] `GeneratedClass$SpecificOrderi...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15480#discussion_r83530039 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -118,7 +118,45 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR } """ }.mkString("\n") -comparisons + +/* + * 40 = 7000 bytes / 170 (around 170 bytes per ordering comparison). + * The maximum byte code size to be compiled for HotSpot is 8000 bytes. + * We should keep less than 8000 bytes. + */ +val numberOfComparisonsThreshold = 40 + +if (ordering.size <= numberOfComparisonsThreshold) { + s""" + | InternalRow ${ctx.INPUT_ROW} = null; // Holds current row being evaluated. + | ${comparisons(ordering)} + """.stripMargin +} else { + val groupedOrderingItr = ordering.grouped(numberOfComparisonsThreshold) + var groupedOrderingLength = 0 + groupedOrderingItr.zipWithIndex.foreach { case (orderingGroup, i) => +groupedOrderingLength += 1 +val funcName = s"compare_$i" --- End diff -- We need to use fresh name for `funcName` or its prefix (see [here](https://github.com/apache/spark/blob/b1b47274bfeba17a9e4e9acebd7385289f31f6c8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L634)). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...
Github user philipphoffmann commented on the issue: https://github.com/apache/spark/pull/14936 Alright, I changed the implementation to keep the existing defaults. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14936 **[Test build #67010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67010/consoleFull)** for PR 14936 at commit [`dec052c`](https://github.com/apache/spark/commit/dec052cac905697595193e98a1d855ccf0c37704). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15480: [SPARK-16845][SQL] `GeneratedClass$SpecificOrderi...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15480#discussion_r83530625 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -118,7 +118,45 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR } """ }.mkString("\n") -comparisons + +/* + * 40 = 7000 bytes / 170 (around 170 bytes per ordering comparison). + * The maximum byte code size to be compiled for HotSpot is 8000 bytes. + * We should keep less than 8000 bytes. + */ +val numberOfComparisonsThreshold = 40 + +if (ordering.size <= numberOfComparisonsThreshold) { + s""" + | InternalRow ${ctx.INPUT_ROW} = null; // Holds current row being evaluated. + | ${comparisons(ordering)} + """.stripMargin +} else { + val groupedOrderingItr = ordering.grouped(numberOfComparisonsThreshold) + var groupedOrderingLength = 0 + groupedOrderingItr.zipWithIndex.foreach { case (orderingGroup, i) => +groupedOrderingLength += 1 +val funcName = s"compare_$i" +val funcCode = + s""" + |private int $funcName(InternalRow a, InternalRow b) { + | InternalRow ${ctx.INPUT_ROW} = null; // Holds current row being evaluated. + | ${comparisons(orderingGroup)} + | return 0; + |} + """.stripMargin +ctx.addNewFunction(funcName, funcCode) + } + + (0 to groupedOrderingLength - 1).map { i => --- End diff -- nit: use `(0 until groupedOrderingLength)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15450 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67009/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15450 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15450 **[Test build #67009 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67009/consoleFull)** for PR 15450 at commit [`ab486c1`](https://github.com/apache/spark/commit/ab486c121d759272a7a38b64fa25ec9a8de12647). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14936 **[Test build #67010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67010/consoleFull)** for PR 14936 at commit [`dec052c`](https://github.com/apache/spark/commit/dec052cac905697595193e98a1d855ccf0c37704). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14936 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14936 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67010/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15499: [SPARK-17955][SQL] Make DataFrameReader.jdbc call...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/15499 [SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFrameReader.format("jdbc").load ## What changes were proposed in this pull request? This PR proposes to make `DataFrameReader.jdbc` call `DataFrameReader.format("jdbc").load` consistently with other APIs in `DataFrameReader`/`DataFrameWriter` and avoid calling `sparkSession.baseRelationToDataFrame(..)` here and there. The changes were mostly copied from `DataFrameWriter.jdbc()` which was recently updated. ``` -val params = extraOptions.toMap ++ connectionProperties.asScala.toMap -val options = new JDBCOptions(url, table, params) -val relation = JDBCRelation(parts, options)(sparkSession) -sparkSession.baseRelationToDataFrame(relation) +this.extraOptions = this.extraOptions ++ connectionProperties.asScala +// explicit url and dbtable should override all +this.extraOptions += ("url" -> url, "dbtable" -> table) +format("jdbc").load() ``` ## How was this patch tested? Existing tests should cover this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-17955 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15499.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15499 commit aa8cd3507d6ae79c0b2c93a077e655880d10a01c Author: hyukjinkwon Date: 2016-10-15T12:36:01Z Use the same read path in DataFrameReader.jdbc and DataFrameReader.format("jdbc") commit 0d6e2d1aa6a3348c4fad5256b7580364499d3daf Author: hyukjinkwon Date: 2016-10-15T12:39:11Z Add missing dots --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15499: [SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15499 **[Test build #67011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67011/consoleFull)** for PR 15499 at commit [`0d6e2d1`](https://github.com/apache/spark/commit/0d6e2d1aa6a3348c4fad5256b7580364499d3daf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15361 Hi @chenghao-intel and @davies, it seems related code paths were updated by your before. Do you mind if I ask to take a look please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15049 ping @liancheng and @yhuai... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14947: [SPARK-17388][SQL] Support for inferring type date/times...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14947 ping @davies .. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14660 @liancheng Would there be other things maybe I should take care of? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15354 **[Test build #67012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67012/consoleFull)** for PR 15354 at commit [`38d89a6`](https://github.com/apache/spark/commit/38d89a6ab04b9181f7be818a7ee6cf0bd77e2c69). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15500: [SPARK-17956][SQL] Fix projection output ordering
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/15500 [SPARK-17956][SQL] Fix projection output ordering ## What changes were proposed in this pull request? Currently `ProjectExec` simply takes child plan's `outputOrdering` as its `outputOrdering`. In some cases, this might lead to incorrect `outputOrdering`. This argument applies to `TakeOrderedAndProjectExec` too. ## How was this patch tested? Jenkins tests. Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 project-sort-order Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15500.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15500 commit 58142f136533fe91956deee9575d7bf48164865b Author: Liang-Chi Hsieh Date: 2016-10-15T13:27:27Z Fix projection output ordering. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15500 **[Test build #67013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67013/consoleFull)** for PR 15500 at commit [`58142f1`](https://github.com/apache/spark/commit/58142f136533fe91956deee9575d7bf48164865b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15501: Branch 2.0
GitHub user lastbus opened a pull request: https://github.com/apache/spark/pull/15501 Branch 2.0 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15501.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15501 commit 0297896119e11f23da4b14f62f50ec72b5fac57f Author: Junyang Qian Date: 2016-08-20T13:59:23Z [SPARK-16508][SPARKR] Fix CRAN undocumented/duplicated arguments warnings. This PR tries to fix all the remaining "undocumented/duplicated arguments" warnings given by CRAN-check. One left is doc for R `stats::glm` exported in SparkR. To mute that warning, we have to also provide document for all arguments of that non-SparkR function. Some previous conversation is in #14558. R unit test and `check-cran.sh` script (with no-test). Author: Junyang Qian Closes #14705 from junyangq/SPARK-16508-master. (cherry picked from commit 01401e965b58f7e8ab615764a452d7d18f1d4bf0) Signed-off-by: Shivaram Venkataraman commit e62b29f29f44196a1cbe13004ff4abfd8e5be1c1 Author: Dongjoon Hyun Date: 2016-08-21T20:07:47Z [SPARK-17098][SQL] Fix `NullPropagation` optimizer to handle `COUNT(NULL) OVER` correctly ## What changes were proposed in this pull request? Currently, `NullPropagation` optimizer replaces `COUNT` on null literals in a bottom-up fashion. During that, `WindowExpression` is not covered properly. This PR adds the missing propagation logic. **Before** ```scala scala> sql("SELECT COUNT(1 + NULL) OVER ()").show java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 as bigint) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ``` **After** ```scala scala> sql("SELECT COUNT(1 + NULL) OVER ()").show +--+ |count((1 + CAST(NULL AS INT))) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)| +--+ | 0| +--+ ``` ## How was this patch tested? Pass the Jenkins test with a new test case. Author: Dongjoon Hyun Closes #14689 from dongjoon-hyun/SPARK-17098. (cherry picked from commit 91c2397684ab791572ac57ffb2a924ff058bb64f) Signed-off-by: Herman van Hovell commit 49cc44de3ad5495b2690633791941aa00a62b553 Author: Davies Liu Date: 2016-08-22T08:16:03Z [SPARK-17115][SQL] decrease the threshold when split expressions ## What changes were proposed in this pull request? In 2.0, we change the threshold of splitting expressions from 16K to 64K, which cause very bad performance on wide table, because the generated method can't be JIT compiled by default (above the limit of 8K bytecode). This PR will decrease it to 1K, based on the benchmark results for a wide table with 400 columns of LongType. It also fix a bug around splitting expression in whole-stage codegen (it should not split them). ## How was this patch tested? Added benchmark suite. Author: Davies Liu Closes #14692 from davies/split_exprs. (cherry picked from commit 8d35a6f68d6d733212674491cbf31bed73fada0f) Signed-off-by: Wenchen Fan commit 2add45fabeb0ea4f7b17b5bc4910161370e72627 Author: Jagadeesan Date: 2016-08-22T08:30:31Z [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS] Changes in Spark Stuctured Streaming doc in this link https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations Author: Jagadeesan Closes #14715 from jagadeesanas2/SPARK-17085. (cherry picked from commit bd9655063bdba8836b4ec96ed115e5653e246b65) Signed-off-by: Sean Owen commit 79195982a4c6f8b1a3e02069dea00049cc806574 Author: Junyang Qian Date: 2016-08-22T
[GitHub] spark issue #15501: Branch 2.0
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15501 @lastbus close this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15501: Branch 2.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15501 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83532883 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,142 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + * TODO: support {@link #mark(int)}/{@link #reset()} --- End diff -- Okay, I removed the TODO here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83532901 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,142 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + * TODO: support {@link #mark(int)}/{@link #reset()} + * + */ +@ThreadSafe --- End diff -- Alright, removed it for consistency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83532904 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,142 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + * TODO: support {@link #mark(int)}/{@link #reset()} + * + */ +@ThreadSafe +public final class NioBufferedFileInputStream extends InputStream { + + private static int DEFAULT_BUFFER_SIZE_BYTES = 8192; --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15408 **[Test build #67014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67014/consoleFull)** for PR 15408 at commit [`f1f108f`](https://github.com/apache/spark/commit/f1f108f3bffaa9cecbca37dcb6a818b45174e3d3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83532952 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,138 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + */ +public final class NioBufferedFileInputStream extends InputStream { + + private static final int DEFAULT_BUFFER_SIZE_BYTES = 8192; + + private final ByteBuffer byteBuffer; + + private final FileChannel fileChannel; + + public NioBufferedFileInputStream(File file, int bufferSizeInBytes) throws IOException { +byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes); +fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ); +byteBuffer.flip(); + } + + public NioBufferedFileInputStream(File file) throws IOException { +this(file, DEFAULT_BUFFER_SIZE_BYTES); + } + + /** + * Checks weather data is left to be read from the input stream. + * @return true if data is left, false otherwise + * @throws IOException + */ + private boolean refill() throws IOException { +if (!byteBuffer.hasRemaining()) { + byteBuffer.clear(); + int nRead = 0; + while (nRead == 0) { +nRead = fileChannel.read(byteBuffer); + } + if (nRead < 0) { +return false; + } + byteBuffer.flip(); +} +return true; + } + + @Override + public synchronized int read() throws IOException { +if (!refill()) { + return -1; +} +return byteBuffer.get() & 0xFF; + } + + @Override + public synchronized int read(byte[] b, int offset, int len) throws IOException { +if (offset < 0 || len < 0 || (b.length - (offset + len)) < 0) { --- End diff -- Ah no I think that condition was needed. I mean: `if (offset < 0 || len < 0 || offset + len < 0 || offset + len > b.length) {` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83532939 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,138 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; --- End diff -- Nit^2 : no longer needed as an import --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83532978 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,142 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + * TODO: support {@link #mark(int)}/{@link #reset()} + * + */ +@ThreadSafe +public final class NioBufferedFileInputStream extends InputStream { + + private static int DEFAULT_BUFFER_SIZE_BYTES = 8192; + + private final ByteBuffer byteBuffer; + + private final FileChannel fileChannel; + + public NioBufferedFileInputStream(File file, int bufferSizeInBytes) throws IOException { +byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes); +fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ); +byteBuffer.flip(); + } + + public NioBufferedFileInputStream(File file) throws IOException { +this(file, DEFAULT_BUFFER_SIZE_BYTES); + } + + /** + * Checks weather data is left to be read from the input stream. + * @return true if data is left, false otherwise + * @throws IOException + */ + private boolean refill() throws IOException { +if (!byteBuffer.hasRemaining()) { + byteBuffer.clear(); + int nRead = 0; + while (nRead == 0) { +nRead = fileChannel.read(byteBuffer); + } + if (nRead < 0) { +return false; + } + byteBuffer.flip(); +} +return true; + } + + @Override + public synchronized int read() throws IOException { +if (!refill()) { + return -1; +} +return byteBuffer.get() & 0xFF; + } + + @Override + public synchronized int read(byte[] b, int offset, int len) throws IOException { +if (offset < 0 || len < 0 || (offset + len) < 0 || (b.length - (offset + len)) < 0) { --- End diff -- We still need to check if `offset` and `len` is less than 0 right? Removed the` offset + len < 0` condition because that is covered in the last condition `(b.length - (offset + len)) < 0` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83532993 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,142 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + * TODO: support {@link #mark(int)}/{@link #reset()} + * + */ +@ThreadSafe +public final class NioBufferedFileInputStream extends InputStream { + + private static int DEFAULT_BUFFER_SIZE_BYTES = 8192; + + private final ByteBuffer byteBuffer; + + private final FileChannel fileChannel; + + public NioBufferedFileInputStream(File file, int bufferSizeInBytes) throws IOException { +byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes); +fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ); +byteBuffer.flip(); + } + + public NioBufferedFileInputStream(File file) throws IOException { +this(file, DEFAULT_BUFFER_SIZE_BYTES); + } + + /** + * Checks weather data is left to be read from the input stream. + * @return true if data is left, false otherwise + * @throws IOException + */ + private boolean refill() throws IOException { +if (!byteBuffer.hasRemaining()) { + byteBuffer.clear(); + int nRead = 0; + while (nRead == 0) { +nRead = fileChannel.read(byteBuffer); + } + if (nRead < 0) { +return false; + } + byteBuffer.flip(); +} +return true; + } + + @Override + public synchronized int read() throws IOException { +if (!refill()) { + return -1; +} +return byteBuffer.get() & 0xFF; + } + + @Override + public synchronized int read(byte[] b, int offset, int len) throws IOException { +if (offset < 0 || len < 0 || (offset + len) < 0 || (b.length - (offset + len)) < 0) { + throw new IndexOutOfBoundsException(); +} +if (!refill()) { + return -1; +} +len = Math.min(len, byteBuffer.remaining()); +byteBuffer.get(b, offset, len); +return len; + } + + @Override + public synchronized int available() throws IOException { +return byteBuffer.remaining(); + } + + @Override + public synchronized long skip(long n) throws IOException { +if (n <= 0L) { + return 0L; +} +if (byteBuffer.remaining() >= n) { + // The buffered content is enough to skip + byteBuffer.position(byteBuffer.position() + (int) n); + return n; +} +long skippedFromBuffer = byteBuffer.remaining(); +long toSkipFromFileChannel = n - skippedFromBuffer; +// Discard everything we have read in the buffer. +byteBuffer.position(0); +byteBuffer.flip(); +return skippedFromBuffer + skipFromFileChannel(toSkipFromFileChannel); + } + + private long skipFromFileChannel(long n) throws IOException { +long currentFilePosition = fileChannel.position(); +long size = fileChannel.size(); +if (n > size - currentFilePosition) { + fileChannel.position(size); + return size - currentFilePosition; +} else { + fileChannel.position(currentFilePosition + n); + return n; +} + } + + @Override + public synchronized void close() throws IOException { +fileChannel.close(); +
[GitHub] spark issue #15474: [DO_NOT_MERGE] Test netty
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15474 PS I got this to work by making the pyspark tests specify the "userClassPathFirst" flags. I think that's actually reasonable to set. I will try this in my PR at https://github.com/apache/spark/pull/15436 and hope it works; if so you can close this I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r83533112 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,142 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.ThreadSafe; +import java.io.File; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.file.StandardOpenOption; + +/** + * {@link InputStream} implementation which uses direct buffer + * to read a file to avoid extra copy of data between Java and + * native memory which happens when using {@link java.io.BufferedInputStream}. + * Unfortunately, this is not something already available in JDK, + * {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio, + * but does not support buffering. + * + * TODO: support {@link #mark(int)}/{@link #reset()} + * + */ +@ThreadSafe +public final class NioBufferedFileInputStream extends InputStream { + + private static int DEFAULT_BUFFER_SIZE_BYTES = 8192; + + private final ByteBuffer byteBuffer; + + private final FileChannel fileChannel; + + public NioBufferedFileInputStream(File file, int bufferSizeInBytes) throws IOException { +byteBuffer = ByteBuffer.allocateDirect(bufferSizeInBytes); +fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.READ); +byteBuffer.flip(); + } + + public NioBufferedFileInputStream(File file) throws IOException { +this(file, DEFAULT_BUFFER_SIZE_BYTES); + } + + /** + * Checks weather data is left to be read from the input stream. + * @return true if data is left, false otherwise + * @throws IOException + */ + private boolean refill() throws IOException { +if (!byteBuffer.hasRemaining()) { + byteBuffer.clear(); + int nRead = 0; + while (nRead == 0) { +nRead = fileChannel.read(byteBuffer); + } + if (nRead < 0) { +return false; + } + byteBuffer.flip(); +} +return true; + } + + @Override + public synchronized int read() throws IOException { +if (!refill()) { + return -1; +} +return byteBuffer.get() & 0xFF; + } + + @Override + public synchronized int read(byte[] b, int offset, int len) throws IOException { +if (offset < 0 || len < 0 || (offset + len) < 0 || (b.length - (offset + len)) < 0) { --- End diff -- Ignore my previous comments, we still need it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15436: [SPARK-17875] [BUILD] Remove unneeded direct dependence ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15436 **[Test build #67015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67015/consoleFull)** for PR 15436 at commit [`f49f6a6`](https://github.com/apache/spark/commit/f49f6a6ec956b069b4934a0b94450413529c1b93). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15408 **[Test build #67016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67016/consoleFull)** for PR 15408 at commit [`5306fb0`](https://github.com/apache/spark/commit/5306fb097ecef7ff69c3281f33f221826879ef04). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15499: [SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15499 **[Test build #67011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67011/consoleFull)** for PR 15499 at commit [`0d6e2d1`](https://github.com/apache/spark/commit/0d6e2d1aa6a3348c4fad5256b7580364499d3daf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15499: [SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15499 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15499: [SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15499 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67011/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 @sethah Good suggestion. code updated, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #67017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67017/consoleFull)** for PR 15435 at commit [`1bf5aa4`](https://github.com/apache/spark/commit/1bf5aa4b750899cc7a8ea83368d2ff5a66a76b91). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query i...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/15502 [SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS More Than Once #15048 ### What changes were proposed in this pull request? This PR is to backport https://github.com/apache/spark/pull/15048 and https://github.com/apache/spark/pull/15459. However, in 2.0, we do not have a unified logical node `CreateTable` and the analyzer rule `PreWriteCheck` is also different. To minimize the code changes, this PR adds a new rule `AnalyzeCreateTableAsSelect`. Please treat it as a new PR to review. Thanks! As explained in https://github.com/apache/spark/pull/14797: >Some analyzer rules have assumptions on logical plans, optimizer may break these assumption, we should not pass an optimized query plan into QueryExecution (will be analyzed again), otherwise we may some weird bugs. For example, we have a rule for decimal calculation to promote the precision before binary operations, use PromotePrecision as placeholder to indicate that this rule should not apply twice. But a Optimizer rule will remove this placeholder, that break the assumption, then the rule applied twice, cause wrong result. We should not optimize the query in CTAS more than once. For example, ```Scala spark.range(99, 101).createOrReplaceTempView("tab1") val sqlStmt = "SELECT id, cast(id as long) * cast('1.0' as decimal(38, 18)) as num FROM tab1" sql(s"CREATE TABLE tab2 USING PARQUET AS $sqlStmt") checkAnswer(spark.table("tab2"), sql(sqlStmt)) ``` Before this PR, the results do not match ``` == Results == !== Correct Answer - 2 == == Spark Answer - 2 == ![100,100.00] [100,null] [99,99.00] [99,99.00] ``` After this PR, the results match. ``` +---+--+ |id |num | +---+--+ |99 |99.00 | |100|100.00| +---+--+ ``` In this PR, we do not treat the `query` in CTAS as a child. Thus, the `query` will not be optimized when optimizing CTAS statement. However, we still need to analyze it for normalizing and verifying the CTAS in the Analyzer. Thus, we do it in the analyzer rule `PreprocessDDL`, because so far only this rule needs the analyzed plan of the `query`. ### How was this patch tested? You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark ctasOptimize2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15502.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15502 commit a9931a538912aeed620df216beb355970979c3f0 Author: gatorsmile Date: 2016-10-14T05:16:08Z the first set of changes commit d5f91871b4329ca9292a9ca129d6c603f4cf47fc Author: gatorsmile Date: 2016-10-15T15:06:21Z 2nd change set commit a658da47983001260205c97406dbf744fd9abfcd Author: gatorsmile Date: 2016-10-15T15:12:57Z more comment commit 9cfebc523e4b88c3df3ffae8ca5ea92e98a0a616 Author: gatorsmile Date: 2016-10-15T15:17:26Z rename --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15502 **[Test build #67018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67018/consoleFull)** for PR 15502 at commit [`9cfebc5`](https://github.com/apache/spark/commit/9cfebc523e4b88c3df3ffae8ca5ea92e98a0a616). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15500 **[Test build #67019 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67019/consoleFull)** for PR 15500 at commit [`029c36d`](https://github.com/apache/spark/commit/029c36d345a8f7042e63f4b586cfeaa4362367fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15354 **[Test build #67012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67012/consoleFull)** for PR 15354 at commit [`38d89a6`](https://github.com/apache/spark/commit/38d89a6ab04b9181f7be818a7ee6cf0bd77e2c69). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15354 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15354 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67012/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15500 **[Test build #67013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67013/consoleFull)** for PR 15500 at commit [`58142f1`](https://github.com/apache/spark/commit/58142f136533fe91956deee9575d7bf48164865b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15500 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67013/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15500 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #67017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67017/consoleFull)** for PR 15435 at commit [`1bf5aa4`](https://github.com/apache/spark/commit/1bf5aa4b750899cc7a8ea83368d2ff5a66a76b91). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67017/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15408 **[Test build #67014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67014/consoleFull)** for PR 15408 at commit [`f1f108f`](https://github.com/apache/spark/commit/f1f108f3bffaa9cecbca37dcb6a818b45174e3d3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15408 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15408 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67014/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15408 **[Test build #67016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67016/consoleFull)** for PR 15408 at commit [`5306fb0`](https://github.com/apache/spark/commit/5306fb097ecef7ff69c3281f33f221826879ef04). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15408 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67016/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15408 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15319: [SPARK-17733][SQL] InferFiltersFromConstraints rule neve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15319 **[Test build #67020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67020/consoleFull)** for PR 15319 at commit [`388443d`](https://github.com/apache/spark/commit/388443d2886d09fec6a25b8400c6eb9631373135). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15494: [SPARK-17947] [SQL] Add Doc and Comment about spark.sql....
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15494 You can see the failed test cases in the PR: https://github.com/apache/spark/pull/15478 `ANALYZE TABLE` will fail due to the [checking](https://github.com/apache/spark/blob/6ce1b675ee9fc9a6034439c3ca00441f9f172f84/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L111-L115 ) when we try to alter the table properties. `SHOW CREATE TABLE` also outputs the table properties that should not be part of the output CREAT TABLE statement. `CREATE TABLE LIKE` always excludes all the table properties of the source table. However, we might make a change in this part. This is still waiting for your input in another PR. See the [discussion](https://github.com/apache/spark/pull/14531#issuecomment-252147424) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15502 **[Test build #67018 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67018/consoleFull)** for PR 15502 at commit [`9cfebc5`](https://github.com/apache/spark/commit/9cfebc523e4b88c3df3ffae8ca5ea92e98a0a616). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AnalyzeCreateTableAsSelect(sparkSession: SparkSession) extends Rule[LogicalPlan] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15502 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67018/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15502 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15500 **[Test build #67019 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67019/consoleFull)** for PR 15500 at commit [`029c36d`](https://github.com/apache/spark/commit/029c36d345a8f7042e63f4b586cfeaa4362367fd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15500 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15500 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67019/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15495: [SPARK-17620][SQL] Determine Serde by hive.default.filef...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15495 @yhuai Do you think it is good enough to merge? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15398 Also cc @yhuai , @JoshRosen and @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14547: [SPARK-16718][MLlib] gbm-style treeboost
Github user vlad17 commented on the issue: https://github.com/apache/spark/pull/14547 @sethah You raise good points. Regarding (1), I don't know if it is actually true. I don't want to speak for @jkbradley, but I was just going off of "software engineering intuition" about backwards capability of the algorithm's behavior. But let's consider an analogous example - if LogisticRegression was using regular batch GD, and we moved it to L-BFGS, it wouldn't make much sense to offer a new option for "gd". I think the question is whether reverting to original behavior is common enough to merit a larger, more clunky, and more confusing API. And as the notion of "original" will be changing over time, I'm starting to see the attractiveness of @sethah's original proposition to get rid of this option entirely, and let us do whatever we want under the hood impurity-wise. **TL; DR:** I can see at no point a data scientist saying "you know what will help my l1 error? A mean predictor!" The strongest point in favor of this that comes to me is the following: people who would be changing the impurity metric are going to be people who are working on a GBT model tuning; but there's no good reason to use variance-based impurity with mean predictions for a loss that isn't optimized by those changes! Any model tuning which would, in some way or another, be checking `.setImpurity("variance")` vs `.setImpurity("loss-based")` that happens to show that you do better when choosing variance with CV, then all you've done is grid search on GBT model parameters to overfit to noise in your data. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r83537154 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala --- @@ -74,6 +107,31 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation("a\nb" like regEx, true, create_row("a%b")) checkEvaluation(Literal.create(null, StringType) like regEx, null, create_row("bc%")) + +checkEvaluation("" like regEx, true, create_row("")) +checkEvaluation("a" like regEx, false, create_row("")) +checkEvaluation("" like regEx, false, create_row("a")) + +checkEvaluation("""""" like regEx, true, create_row("""%\\%""")) +checkEvaluation("""%%""" like regEx, true, create_row("""%%""")) +checkEvaluation("""\__""" like regEx, true, create_row("""\\\__""")) +checkEvaluation("""\\\__""" like regEx, false, create_row("""%\\%\%""")) +checkEvaluation("""_\\\%""" like regEx, false, create_row("""%\\""")) + +// scalastyle:off nonascii +checkEvaluation("a\u20ACa" like regEx, true, create_row("_\u20AC_")) +checkEvaluation("aâ¬a" like regEx, true, create_row("_â¬_")) +checkEvaluation("aâ¬a" like regEx, true, create_row("_\u20AC_")) +checkEvaluation("a\u20ACa" like regEx, true, create_row("_â¬_")) +// scalastyle:on nonascii + +// TODO: should throw an exception? --- End diff -- To answer your point 3, I did a try in DB2. ``` db2 => select actkwd from act where actkwd like '%A%\a' escape '\' SQL0130N The ESCAPE clause is not a single character, or the pattern string contains an invalid occurrence of the escape character. SQLSTATE=22025 ``` In DB2, normally, our design is very conservative. If we think this could be a user error, we will stop it with an error. We do not want to give users any surprise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15436: [SPARK-17875] [BUILD] Remove unneeded direct dependence ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15436 **[Test build #67015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67015/consoleFull)** for PR 15436 at commit [`f49f6a6`](https://github.com/apache/spark/commit/f49f6a6ec956b069b4934a0b94450413529c1b93). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15436: [SPARK-17875] [BUILD] Remove unneeded direct dependence ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15436 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67015/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org