[GitHub] spark pull request: [SPARK-12425][STREAMING] DStream union optimis...

2016-03-21 Thread gpoulin
Github user gpoulin commented on the pull request: https://github.com/apache/spark/pull/10382#issuecomment-199326833 Sorry, for the delay, I was quite busy lately. I should have sometime over easter to look at this. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-12678][CORE] MapPartitionsRDD clearDepe...

2016-01-06 Thread gpoulin
Github user gpoulin commented on the pull request: https://github.com/apache/spark/pull/10623#issuecomment-169431019 I'm not a JVM so I'm not sure how to do a reliable GC unit test apart from calling `System.gc()` and hope it actually worked. If somebody know how, let m

[GitHub] spark pull request: [SPARK-12678][CORE] MapPartitionsRDD clearDepe...

2016-01-06 Thread gpoulin
GitHub user gpoulin opened a pull request: https://github.com/apache/spark/pull/10623 [SPARK-12678][CORE] MapPartitionsRDD clearDependencies MapPartitionsRDD was keeping a reference to `prev` after a call to `clearDependencies` which could lead to memory leak. You can merge

[GitHub] spark pull request: [SPARK-12425][STREAMING] DStream union optimis...

2015-12-22 Thread gpoulin
Github user gpoulin commented on the pull request: https://github.com/apache/spark/pull/10382#issuecomment-166722515 @zsxwing based on your comment, I took the liberty to deduplicate the logic to determine if `UnionRDD` or a `PartitionerAwareUnionRDD` --- If your project is set up

[GitHub] spark pull request: DStream union optimisation

2015-12-18 Thread gpoulin
GitHub user gpoulin opened a pull request: https://github.com/apache/spark/pull/10382 DStream union optimisation Use PartitionerAwareUnionRDD when possbile for optimizing shuffling and preserving the partitioner. You can merge this pull request into a Git repository by running

[GitHub] spark pull request: FIX: rememberDuration reassignment error messa...

2015-10-02 Thread gpoulin
Github user gpoulin commented on the pull request: https://github.com/apache/spark/pull/8966#issuecomment-145159436 @srowen I apply this style and also put a `require` for the `start` method to be more consistent. I don't know if you want me to squash those commits? --- If

[GitHub] spark pull request: FIX: rememberDuration reassignment error messa...

2015-10-02 Thread gpoulin
Github user gpoulin commented on the pull request: https://github.com/apache/spark/pull/8966#issuecomment-145139938 I fixed the lines with more than 100 columns. I wasn't sure of the guideline there. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: FIX: rememberDuration reassignment error messa...

2015-10-02 Thread gpoulin
Github user gpoulin commented on the pull request: https://github.com/apache/spark/pull/8966#issuecomment-145138145 @srowen done. I took the liberty to do the same for `batchDuration`. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: FIX: rememberDuration reassignment error messa...

2015-10-02 Thread gpoulin
GitHub user gpoulin opened a pull request: https://github.com/apache/spark/pull/8966 FIX: rememberDuration reassignment error message I was reading throught the scheduler and found this small mistake. You can merge this pull request into a Git repository by running: $ git pull