[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58845062 This looks great; thanks for adding the comments. I'm going to merge this into master and backport it to `branch-1.0` and `branch-1.1`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2712 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user james64 commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58743254 Can it be that test Flume test failed due to upstream changes? It is passing for me locally now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58743306 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21641/consoleFull) for PR 2712 at commit [`1b20d51`](https://github.com/apache/spark/commit/1b20d5193fa149347f9c8c05bb25298992324d4a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58744782 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21641/consoleFull) for PR 2712 at commit [`1b20d51`](https://github.com/apache/spark/commit/1b20d5193fa149347f9c8c05bb25298992324d4a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58744783 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21641/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58760768 That particular Flume test is known to be flaky; I think that TD is working on a rewrite / fix for that test suite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2712#discussion_r18743728 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import org.scalatest.FunSuite + +import org.apache.hadoop.io.BytesWritable + +class SparkContextSuite extends FunSuite { + test(test of writing spark scala test) { --- End diff -- This test could use a better name. I'd also add a comment, like `// Regression test for SPARK-3121` to help readers link this back to the JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58760950 It looks like the original implementation of this converter was added in 2604939f643bca125f5e2fb53e3221202996d41b, all the way back in 2011, so I believe that this would affect every released version of Spark. How does this error manifest itself in the wild? Does it lead to silent corruption when reading / writing binary data? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58761280 Actually, I don't think that this is a bug. Instead, I think that the behavior that you're seeing could be an instance of [SPARK-1018](https://issues.apache.org/jira/browse/SPARK-1018), where calling `take()` or `collect()` on a non-transformed HadoopRDD returns the same element several times because the same `Writable` object is re-used. There's actually a note about this in the `sequenceFile()` Java/Scaladoc (added by https://github.com/apache/spark/commit/7101017803a70f3267381498594c0e8c604f932c): ```scala /** Get an RDD for a Hadoop SequenceFile with given key and value types. * * '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each * record, directly caching the returned RDD will create many references to the same object. * If you plan to directly cache Hadoop writable objects, you should first copy them using * a `map` function. * */ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58761660 Actually, ignore my earlier (deleted) comments; this looks like a valid issue (see [HADOOP-6298: BytesWritable#getBytes is a bad name that leads to programming mistakes](https://issues.apache.org/jira/browse/HADOOP-6298)). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2712#discussion_r18743890 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1409,7 +1410,9 @@ object SparkContext extends Logging { simpleWritableConverter[Boolean, BooleanWritable](_.get) implicit def bytesWritableConverter(): WritableConverter[Array[Byte]] = { -simpleWritableConverter[Array[Byte], BytesWritable](_.getBytes) +simpleWritableConverter[Array[Byte], BytesWritable](bw = + Arrays.copyOfRange(bw.getBytes, 0, bw.getLength) --- End diff -- Could you add a one-line comment here that explains why we need to make this copy? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58761838 This looks good to me; sorry for my earlier confusion. If you add a comment and change the name of the test, I'll merge this and cherry-pick it back into `branch-1.1` and `branch-1.0`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user james64 commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58767976 Sorry for the test name. Now it should be all fine including commets. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58768088 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21649/consoleFull) for PR 2712 at commit [`f85d24c`](https://github.com/apache/spark/commit/f85d24c954be419045236cfabc5613aab4c2a169). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58769371 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21649/consoleFull) for PR 2712 at commit [`f85d24c`](https://github.com/apache/spark/commit/f85d24c954be419045236cfabc5613aab4c2a169). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58769373 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21649/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58629065 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58717548 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58718323 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21602/consoleFull) for PR 2712 at commit [`f92ffa6`](https://github.com/apache/spark/commit/f92ffa64c9587057dce5a4b51d32074b68aeab23). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58718438 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21602/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58718433 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21602/consoleFull) for PR 2712 at commit [`f92ffa6`](https://github.com/apache/spark/commit/f92ffa64c9587057dce5a4b51d32074b68aeab23). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2712#discussion_r18732020 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1409,7 +1411,7 @@ object SparkContext extends Logging { simpleWritableConverter[Boolean, BooleanWritable](_.get) implicit def bytesWritableConverter(): WritableConverter[Array[Byte]] = { -simpleWritableConverter[Array[Byte], BytesWritable](_.getBytes) +simpleWritableConverter[Array[Byte], BytesWritable](bw = Arrays.copyOfRange(bw.getBytes, 0, bw.getLength)) --- End diff -- it looks like this goes past 100 characters --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58722604 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21603/consoleFull) for PR 2712 at commit [`406e26c`](https://github.com/apache/spark/commit/406e26c5aa9a28b76a69565e939dc1dc557ccc4d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58726788 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21603/consoleFull) for PR 2712 at commit [`406e26c`](https://github.com/apache/spark/commit/406e26c5aa9a28b76a69565e939dc1dc557ccc4d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58726794 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21603/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58732030 It's failing at FlumeStreamSuite.scala:109 which seems to be unrelated to this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58732168 One more nit: the added java import should go with the other java imports. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58732173 Otherwise, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58473731 Hmm, yeah, copyBytes is no good if it doesn't appear in Hadoop 1. My suggestion would be to use from copyOfRange from java.util.Arrays. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
GitHub user james64 opened a pull request: https://github.com/apache/spark/pull/2712 [SPARK-3121] Wrong implementation of implicit bytesWritableConverter val path = ... //path to seq file with BytesWritable as type of both key and value val file = sc.sequenceFile[Array[Byte],Array[Byte]](path) file.take(1)(0)._1 This prints incorrect content of byte array. Actual content starts with correct one and some random bytes and zeros are appended. BytesWritable has two methods: getBytes() - return content of all internal array which is often longer then actual value stored. It usually contains the rest of previous longer values copyBytes() - return just begining of internal array determined by internal length property It looks like in implicit conversion between BytesWritable and Array[byte] getBytes is used instead of correct copyBytes. @dbtsai You can merge this pull request into a Git repository by running: $ git pull https://github.com/james64/spark 3121-bugfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2712.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2712 commit 480f9cdaf69254dd429b949d9ccc6d0b2c617ad0 Author: Dubovsky Jakub dubov...@avast.com Date: 2014-10-08T13:49:41Z Bug 3121 fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58361701 Jenkins, please start the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58361679 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58386907 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/287/consoleFull) for PR 2712 at commit [`480f9cd`](https://github.com/apache/spark/commit/480f9cdaf69254dd429b949d9ccc6d0b2c617ad0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58387498 Great catch. A concern is that calling Array#take requires an implicit conversion, which has some performance impact that might be unacceptable for this method that can get called in a tight loop. http://villane.wordpress.com/2008/02/02/learning-scala-performance-impact-of-implicit-conversions/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58396199 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/287/consoleFull) for PR 2712 at commit [`480f9cd`](https://github.com/apache/spark/commit/480f9cdaf69254dd429b949d9ccc6d0b2c617ad0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...
Github user james64 commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58441095 Originaly I wanted to just replace getBytes method with copyBytes. It is available in newer versions of api but I found an older version is imported in spark. I am not very familiar with what hadoop api is used in spark yet. So do you suggest to implement it without usage of take method? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org