[GitHub] spark pull request: [Spark 2557] fix LOCAL_N_REGEX in createTaskSc...

2014-07-21 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1464#issuecomment-49617743 Well, could someone review this pr? Or should I close this pr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-2603][SQL] Remove unnecessary toMap and...

2014-07-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/1504#discussion_r15208849 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala --- @@ -239,9 +239,9 @@ private[sql] object JsonRDD extends Logging

[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-26 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-50255139 Hi @mateiz, I think ignoring bad dir is needed in production cluster. In production, there is a good chance for disk failures. I always love the idea that we could

[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-27 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-50295974 HI @mateiz , I'd love to make my contribution for spark. However, I believe it's more than one pr work. There must be a lot of details to be considered. I will make

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

2014-10-22 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/2874#discussion_r19196544 --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkPi.scala --- @@ -27,7 +27,7 @@ object SparkPi { val conf = new SparkConf

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

2014-10-22 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/2874#discussion_r19196652 --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkPi.scala --- @@ -27,7 +27,7 @@ object SparkPi { val conf = new SparkConf

[GitHub] spark pull request: [Spark 2557] fix LOCAL_N_REGEX in createTaskSc...

2014-07-17 Thread advancedxy
GitHub user advancedxy opened a pull request: https://github.com/apache/spark/pull/1464 [Spark 2557] fix LOCAL_N_REGEX in createTaskScheduler and make local-n and local-n-failures consistent [SPARK-2557](https://issues.apache.org/jira/browse/SPARK-2557) You can merge

[GitHub] spark pull request: [Spark 2557] fix LOCAL_N_REGEX in createTaskSc...

2014-07-18 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1464#issuecomment-49416016 ping. @aarondav what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-1511] use Files.move instead of renameT...

2014-04-16 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/427#discussion_r11688928 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -102,7 +102,12 @@ private[spark] object TestUtils { val result = new File

[GitHub] spark pull request: [SPARK-1527] change rootDir*.getName to rootDi...

2014-04-17 Thread advancedxy
GitHub user advancedxy opened a pull request: https://github.com/apache/spark/pull/436 [SPARK-1527] change rootDir*.getName to rootDir*.getAbsolutePath JIRA issue: [SPARK-1527](https://issues.apache.org/jira/browse/SPARK-1527) getName() only gets the last component

[GitHub] spark pull request: [SPARK-1527] change rootDir*.getName to rootDi...

2014-04-17 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/436#issuecomment-40737449 As discussed with @srowen on JIRA, I think maybe we should review that relative paths and absolute paths are used appropriately. --- If your project is set up

[GitHub] spark pull request: SPARK-1438 RDD make seed optional in RDD metho...

2014-04-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/462#discussion_r11806419 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala --- @@ -119,7 +119,13 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V

[GitHub] spark pull request: SPARK-1438 RDD make seed optional in RDD metho...

2014-04-21 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/462#issuecomment-40930731 About the seed selection, maybe we should keep all the implementations in sync. That is to say: if we use system.nanotime in scala, we should use it in java

[GitHub] spark pull request: SPARK-1438 RDD make seed optional in RDD metho...

2014-04-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/462#discussion_r11808574 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala --- @@ -119,7 +119,13 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V

[GitHub] spark pull request: SPARK-1438 RDD make seed optional in RDD metho...

2014-04-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/462#discussion_r11835682 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -466,6 +466,12 @@ class RDDSuite extends FunSuite with SharedSparkContext

[GitHub] spark pull request: SPARK-1438 RDD.sample() make seed param option...

2014-04-22 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/477#issuecomment-41007103 ```seed: Int = (math.random * 1000).toInt)``` hi, @mateiz should we use Long instead of Int to avoid collision. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-1438 RDD.sample() make seed param option...

2014-04-22 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/477#discussion_r11840021 --- Diff: python/pyspark/rddsampler.py --- @@ -19,7 +19,7 @@ import random class RDDSampler(object): -def __init__(self

[GitHub] spark pull request: [SPARK-3040] pick up a more proper local ip ad...

2014-08-14 Thread advancedxy
GitHub user advancedxy opened a pull request: https://github.com/apache/spark/pull/1946 [SPARK-3040] pick up a more proper local ip address for Utils.findLocalIpAddress method Short version: NetworkInterface.getNetworkInterfaces returns ifs in reverse order compared to ifconfig

[GitHub] spark pull request: [SPARK-3040] pick up a more proper local ip ad...

2014-08-15 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1946#issuecomment-52280562 Sorry, I didn't realize windows is supported. In that case, I believe a check is necessary. I will update the pr. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-3040] pick up a more proper local ip ad...

2014-08-15 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1946#issuecomment-52283168 @pwendell, would you look at this? It's a fairly simple fix. I don't have windows for primary use, so it's not confirmed on windows. I hope someone who uses windows

[GitHub] spark pull request: [SPARK-3040] pick up a more proper local ip ad...

2014-08-15 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1946#issuecomment-52382756 @pwendell @mridulm no, this behavior is not documented. The java SE doc doesn't say anything about ordering.. So before filing the JIRA, I looked up the OpenJDK

[GitHub] spark pull request: [SPARK-3040] pick up a more proper local ip ad...

2014-08-15 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1946#issuecomment-52383319 And @pwendell, I don't think this is risky. It's a minor change and at least it don't do worse. The current implementation doesn't pick the more proper ip on my

[GitHub] spark pull request: [SPARK-3040] pick up a more proper local ip ad...

2014-08-16 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1946#issuecomment-52412555 @mridulm, could you give me the link to report a jdk bug? It's very confusing that jdk has many different jira place (OpenJDK jira, Oracle etc.). After some

[GitHub] spark pull request: [SPARK-3040] pick up a more proper local ip ad...

2014-09-09 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1946#issuecomment-55064727 I'd love to see this get merged into 1.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-1438 RDD.sample() make seed param option...

2014-04-23 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/477#issuecomment-41244424 hi, @smartnut007, I don't think I could make the call. So, I didn't reply the question which one to use. should ask @mateiz ! --- If your project is set up

[GitHub] spark pull request: SPARK-1438 RDD.sample() make seed param option...

2014-04-23 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/477#discussion_r11936467 --- Diff: python/pyspark/rddsampler.py --- @@ -38,17 +38,15 @@ def initRandomGenerator(self, split): if self._use_numpy

[GitHub] spark pull request: SPARK-1623. Broadcast cleaner should use getCa...

2014-05-10 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/546#issuecomment-42525924 Sorry for being late for this conversation. I am busy with my own project. First, I agree with @srowen. SPARK-1527 was discovered when there were untracked

[GitHub] spark pull request: [SPARK-5201][CORE] deal with int overflow in t...

2015-01-15 Thread advancedxy
GitHub user advancedxy reopened a pull request: https://github.com/apache/spark/pull/4002 [SPARK-5201][CORE] deal with int overflow in the ParallelCollectionRDD.slice method There is an int overflow in the ParallelCollectionRDD.slice method. That's originally reported

[GitHub] spark pull request: [SPARK-5201][CORE] deal with int overflow in t...

2015-01-15 Thread advancedxy
Github user advancedxy closed the pull request at: https://github.com/apache/spark/pull/4002 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-5201][CORE] deal with int overflow in t...

2015-01-15 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4002#issuecomment-70087483 ping @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

2015-01-11 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/2874#issuecomment-69500291 Hi @SaintBacchus, I get the idea that there is a limit in the seq size. But I don't think the buggy code is from implementation you listed. The actually buggy code

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

2015-01-11 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/2874#issuecomment-69521432 @andrewor14, of course, I will file a jira and send a pr when I get some spare time later today. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [Spark-5201] deal with int overflow in the Par...

2015-01-11 Thread advancedxy
GitHub user advancedxy opened a pull request: https://github.com/apache/spark/pull/4002 [Spark-5201] deal with int overflow in the ParallelCollectionRDD.slice method There is an int overflow in the ParallelCollectionRDD.slice method. That's originally reported by SaintBacchus

[GitHub] spark pull request: [SPARK-5201][CORE] deal with int overflow in t...

2015-01-12 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4002#discussion_r22787018 --- Diff: core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala --- @@ -127,18 +127,12 @@ private object ParallelCollectionRDD

[GitHub] spark pull request: [SPARK-5201][CORE] deal with int overflow in t...

2015-01-12 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4002#discussion_r22787610 --- Diff: core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala --- @@ -127,18 +127,12 @@ private object ParallelCollectionRDD

[GitHub] spark pull request: [SPARK-5201][CORE] deal with int overflow in t...

2015-01-12 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4002#discussion_r22838579 --- Diff: core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala --- @@ -127,18 +127,15 @@ private object ParallelCollectionRDD

[GitHub] spark pull request: [SPARK-5201][CORE] deal with int overflow in t...

2015-01-13 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4002#issuecomment-69860273 @andrewor14 would you mind to take a look again? ``` positions(r.length, numSlices).zipWithIndex.map({ case ((start, end), index) = // the redundant

[GitHub] spark pull request: [SPARK-5201][CORE] deal with int overflow in t...

2015-01-14 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4002#discussion_r22933020 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -520,10 +520,12 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-03-19 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-83410489 @shivaram @srowen Tuple2(Int, Int) got specialized to Tuple2$mcII$sp class. But the Tuple2$mcII$sp is a subclass of Tuple2. So in our implementation, the specialized

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-03-19 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-83878165 @rxin as I mentioned in the previous comment, I use the method in this article. http://www.javaworld.com/article/2077496/testing-debugging/java-tip-130--do-you

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

2015-03-06 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4783#discussion_r25992824 --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala --- @@ -49,6 +49,11 @@ private[spark] object SizeEstimator extends Logging

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

2015-03-06 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-77671093 @shivaram I update the comment, please review again. And still unrelated test failure. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-03-12 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-78435831 @shivaram The reason why ExternalSorter failed is that It doesn't spilling files for these two failure tests. ( If we increase the input from `0 until 10

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-03-11 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-78223640 ping @shivaram, @srowen and @mateiz --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-03-11 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4783#discussion_r26209062 --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala --- @@ -279,8 +315,8 @@ private[spark] object SizeEstimator extends Logging

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-03-11 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4783#discussion_r26208786 --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala --- @@ -257,21 +262,52 @@ private[spark] object SizeEstimator extends Logging

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-03-11 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-78410311 @shivaram ExternalSorterSuite failed on my local machine too. Do you have any thought why ExternalSortedSuite is failing. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-03-11 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-78407837 I just run test-only for SizeEstimatorSuite. It do pass the tests. I thought the ExternalSorter was unrelated. I will test the ExternalSorter. --- If your project

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

2015-02-26 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-76352556 @shivaram I updated the gist, you can look at the result. [gist](https://gist.github.com/advancedxy/2ae7c9cc7629f3aeb679) Also you can just download the shell

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

2015-03-04 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-77308814 @shivaram, I add a bunch of comments in the code. I think it's time for you to review it now. And the previous failures are not related to this code change. Wonder

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

2015-03-04 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4783#discussion_r25840206 --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala --- @@ -257,21 +262,56 @@ private[spark] object SizeEstimator extends Logging

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

2015-03-04 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4783#discussion_r25840258 --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala --- @@ -283,4 +323,7 @@ private[spark] object SizeEstimator extends Logging

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

2015-02-26 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-76190779 @shivaram -- I followed the [JDK-8024912](https://bugs.openjdk.java.net/browse/JDK-8024912), [JDK-8024913](https://bugs.openjdk.java.net/browse/JDK-8024913) and [JDK

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

2015-02-26 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-76139869 Ok, Thanks for the update. I will look into the jol and the example when I get some spare time. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

2015-02-25 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4783#discussion_r25408258 --- Diff: core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala --- @@ -59,6 +59,13 @@ class SizeEstimatorSuite assertResult(24

[GitHub] spark pull request: [SPARK-6030][CORE] Don't alignSize the compute...

2015-02-25 Thread advancedxy
GitHub user advancedxy opened a pull request: https://github.com/apache/spark/pull/4783 [SPARK-6030][CORE] Don't alignSize the computed shellSize for classes in the getClassInfo method SizeEstimator gives wrong result for Integer on 64bit JVM with UseCompressedOops on, this pr

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-04-27 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-96685117 Ping @shivaram, @rxin. I finally got some time to look at the code. The ExternalSorterSuite kept failing because there was no spilling when we just insert 10

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-05-01 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-98257433 @JoshRosen @srowen, I already tested in the test suite. Just want to make sure if I don't miss anything obvious. Here is the standalone reproduce sample. https

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-05-01 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-98093859 ping @rxin or @srowen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-05-01 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-98126019 A closer look at the `ResetSystemProperties` and `SizeEstimatorSuite`, the `beforeEach` in `ResetSystemProperties` is overriden by `SizeEstimatorSuite`. Should call

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-05-01 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4783#discussion_r29498301 --- Diff: core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala --- @@ -59,6 +68,22 @@ class SizeEstimatorSuite assertResult(24

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-05-01 Thread advancedxy
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-98116848 `initialize()` is invoked multiple times in the test suite. Maybe it's a hook for tests --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-05-01 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/4783#discussion_r29498601 --- Diff: core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala --- @@ -135,5 +160,20 @@ class SizeEstimatorSuite assertResult(64

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-14 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r187984186 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -287,6 +287,28 @@ private[spark] class Executor( notifyAll

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-14 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 Looks like that simply add fields with default values into case class will break binary compatibility. How should we deal with that? Add to MimaExcludes or add missing methods? @cloud-fan

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-14 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r188005420 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -287,6 +287,28 @@ private[spark] class Executor( notifyAll

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-08 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-09 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r187074691 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-27 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 @jiangxb1987 @cloud-fan I think it's ready for review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-04-27 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r184840246 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-07 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r186468071 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason

[GitHub] spark pull request #21327: [SPARK-24107][CORE][followup] ChunkedByteBuffer.w...

2018-05-16 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21327#discussion_r188532109 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala --- @@ -63,15 +63,18 @@ private[spark] class ChunkedByteBuffer(var chunks

[GitHub] spark pull request #21327: [SPARK-24107][CORE][followup] ChunkedByteBuffer.w...

2018-05-16 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21327#discussion_r188525716 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala --- @@ -63,15 +63,18 @@ private[spark] class ChunkedByteBuffer(var chunks

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-16 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 Gently ping @cloud-fan again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-21 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21399: [SPARK-22269][BUILD] Run Java linter via SBT for ...

2018-05-22 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21399#discussion_r189963174 --- Diff: project/SparkBuild.scala --- @@ -740,6 +741,16 @@ object Unidoc { ) } +object CheckStyle { --- End diff

[GitHub] spark pull request #21397: [SPARK-24334] Fix race condition in ArrowPythonRu...

2018-05-22 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21397#discussion_r189973566 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala --- @@ -94,8 +88,18 @@ class ArrowPythonRunner

[GitHub] spark pull request #21399: [SPARK-22269][BUILD] Run Java linter via SBT for ...

2018-05-22 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21399#discussion_r189960217 --- Diff: project/SparkBuild.scala --- @@ -740,6 +741,16 @@ object Unidoc { ) } +object CheckStyle { --- End diff

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-05-22 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21369 > @advancedxy , using jvisualvm+heap dump I could see that the second introduced test case ("drop all references to the underlying map once the iterator is exhausted") eliminated a

[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...

2018-05-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21369#discussion_r189768116 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -585,17 +591,24 @@ class ExternalAppendOnlyMap[K, V

[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...

2018-05-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21369#discussion_r189768301 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -630,7 +643,7 @@ private[spark] object

[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...

2018-05-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21369#discussion_r189768075 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -305,8 +310,8 @@ class ExternalAppendOnlyMap[K, V, C

[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...

2018-05-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21369#discussion_r189768335 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala --- @@ -23,8 +23,9 @@ import org.apache.spark

[GitHub] spark pull request #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks w...

2018-05-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21369#discussion_r189768271 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -585,17 +591,24 @@ class ExternalAppendOnlyMap[K, V

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r189525864 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -1868,15 +1868,26 @@ class DAGSchedulerSuite extends

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r189530510 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -1868,15 +1868,26 @@ class DAGSchedulerSuite extends

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r189530671 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -1868,15 +1868,26 @@ class DAGSchedulerSuite extends

[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...

2018-05-30 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21445 > I think the best way to do it is to make the shuffle writer responsible for incrementing the epoch within its task, the same way the data source writer does currently. Y

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-02 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r185690094 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason

[GitHub] spark issue #21165: [Spark 20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 > we should document the changes in a migration document or something, I think documentation is necessary, will update the documentation tomorrow (Beijing t

[GitHub] spark pull request #21165: Spark 20087: Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r184440128 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason

[GitHub] spark pull request #21165: Spark 20087: Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r184441076 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1417,10 +1417,13 @@ class DAGScheduler( case

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 > It should be [Spark-20087] instead of [Spark 20087] in the title. --- - To unsubscribe, e-mail: reviews-unsub

[GitHub] spark pull request #21165: Spark 20087: Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r184429082 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -287,6 +287,28 @@ private[spark] class Executor( notifyAll

[GitHub] spark pull request #21165: Spark 20087: Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy
GitHub user advancedxy opened a pull request: https://github.com/apache/spark/pull/21165 Spark 20087: Attach accumulators / metrics to 'TaskKilled' end reason ## What changes were proposed in this pull request? The ultimate goal is for listeners to onTaskEnd to receive metrics

[GitHub] spark issue #21165: Spark 20087: Attach accumulators / metrics to 'TaskKille...

2018-04-26 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 cc @squito @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 > We should not do these 2 things together, and to me the second one is way simpler to get in and we should do it first. Agreed. For the scope of this pr, let's get killed task

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 I add a note for accumulator update. Please comment if more document is needed. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-04-26 Thread advancedxy
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/21165 > However, I don't agree user side accumulators should get updates from killed tasks, that changes the semantic of accumulators. And I don't think end-users need to care about killed ta

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-01 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r185226136 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,9 +212,15 @@ case object TaskResultLost extends TaskFailedReason

  1   2   >