[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15037134 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -571,12 +571,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15037164 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -712,8 +701,8 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1450#discussion_r15038311 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -361,11 +361,11 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15038340 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1450#issuecomment-49251632 Pushed a new version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1450#discussion_r15040336 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -214,7 +214,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1452#issuecomment-49259394 That was actually my main concern from the beginning with this change. From my initial observation everything does seem work. I intentionally avoided keeping references

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-16 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1452#issuecomment-49259482 Yes - actions were intentionally not broadcast for now. It makes it more complicated ... let's do that in a separate PR. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1450#issuecomment-49261722 Eh the binary checker is really failing me. Is there a way to disable binary checker for inner functions? @pwendell --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15042897 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-2299] Consolidate various stageIdTo* ha...

2014-07-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1262#issuecomment-49263949 I pushed a new version. I'd first merge this and then have a separate PR to index the hash table by stageId + attempt. Now it includes @kayousterhout's change

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1450#discussion_r15043414 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -214,7 +214,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1447#discussion_r15044062 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -712,8 +701,8 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1450#issuecomment-49350307 Merged in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1469 [SPARK-2534] Avoid pulling in the entire RDD in various operators (branch-1.0 backport) This backports #1450 into branch-1.0. You can merge this pull request into a Git repository by running

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1469#issuecomment-49373136 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1469#issuecomment-49379863 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread rxin
Github user rxin closed the pull request at: https://github.com/apache/spark/pull/1469 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-2299] Consolidate various stageIdTo* ha...

2014-07-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1262#issuecomment-49387996 Merging in master. Thanks for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: Reservoir sampling implementation.

2014-07-17 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1478 Reservoir sampling implementation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark reservoirSample Alternatively you can review

[GitHub] spark pull request: put 'curRequestSize = 0' after 'logDebug' it

2014-07-17 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1477#issuecomment-49397603 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2570] [SQL] Fix the bug of ClassCastExc...

2014-07-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1475#issuecomment-49399326 Merging. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: Fixed a typo in the comments in RangePartition...

2014-07-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1473#discussion_r15098078 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -135,7 +135,7 @@ class RangePartitioner[K : Ordering : ClassTag, V]( val k

[GitHub] spark pull request: Fixed a typo in the comments in RangePartition...

2014-07-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1473#discussion_r15098185 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -135,7 +135,7 @@ class RangePartitioner[K : Ordering : ClassTag, V]( val k

[GitHub] spark pull request: SPARK-2553. Fix compile error

2014-07-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1479#issuecomment-49404215 Merging this ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: Reservoir sampling implementation.

2014-07-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1478#issuecomment-49404318 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Reservoir sampling implementation.

2014-07-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1478#issuecomment-49471598 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: put 'curRequestSize = 0' after 'logDebug' it

2014-07-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1477#issuecomment-49501450 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1452#issuecomment-49501642 Thanks for taking a look. I'm merging this one as is, and will submit a small PR to fix the issues. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1452#discussion_r15142372 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala --- @@ -17,134 +17,68 @@ package org.apache.spark.scheduler

[GitHub] spark pull request: put 'curRequestSize = 0' after 'logDebug' it

2014-07-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1477#issuecomment-49503209 Thanks. Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2583] ConnectionManager cannot distingu...

2014-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1490#discussion_r15145612 --- Diff: core/src/main/scala/org/apache/spark/network/MessageChunkHeader.scala --- @@ -41,6 +42,13 @@ private[spark] class MessageChunkHeader

[GitHub] spark pull request: [SPARK-2583] ConnectionManager cannot distingu...

2014-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1490#discussion_r15145614 --- Diff: core/src/main/scala/org/apache/spark/network/MessageChunkHeader.scala --- @@ -67,13 +75,20 @@ private[spark] object MessageChunkHeader { val

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1452#issuecomment-49532568 Apparently this broke the build. Reverting and will work on a fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-20 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1498 [SPARK-2521] Broadcast RDD object (instead of sending it along with every task) This is a resubmission of #1452. It was reverted because it broke the build. Currently (as of Spark 1.0.1

[GitHub] spark pull request: Fixed a typo in the comments in RangePartition...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1473#issuecomment-49539328 I filed a JIRA: https://issues.apache.org/jira/browse/SPARK-2598 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-2598] RangePartitioner's binary search ...

2014-07-20 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1500 [SPARK-2598] RangePartitioner's binary search does not use the given Ordering We should fix this in branch-1.0 as well. You can merge this pull request into a Git repository by running: $ git

[GitHub] spark pull request: Fixed a typo in the comments in RangePartition...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1473#issuecomment-49539660 @dorx can you close this PR? #1500 includes the change here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15147996 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,37 @@ class Analyzer(catalog: Catalog, registry

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15148000 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,37 @@ class Analyzer(catalog: Catalog, registry

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15148004 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,37 @@ class Analyzer(catalog: Catalog, registry

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15148009 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,37 @@ class Analyzer(catalog: Catalog, registry

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15148014 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,37 @@ class Analyzer(catalog: Catalog, registry

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15148079 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15148086 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2552][MLLIB] stabilize logistic functio...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1493#issuecomment-49540207 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2495][MLLIB] remove private[mllib] from...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1492#issuecomment-49540212 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1465#discussion_r15148111 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1463,12 +1463,13 @@ object SparkContext extends Logging { // Regular

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1465#discussion_r15148112 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1477,7 +1478,8 @@ object SparkContext extends Logging { def localCpuCount

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1465#discussion_r15148114 --- Diff: docs/configuration.md --- @@ -599,6 +599,15 @@ Apart from these, the following properties are also available, and may be useful td

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1447#issuecomment-49540356 Merging this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-49540404 Thanks for submitting this. I think we can still stack overflow in serialization, but I agree it's better to do this non-recursivley. --- If your project is set up

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-49540416 Actually it's late. I will review this tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [WIP][SPARK-2595:]The driver run garbage colle...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1387#issuecomment-49540472 I've talked to many JVM developers (engineers who work on the JVM) and while System.gc is advisory in the spec, it is actually a pretty reliable way of triggering GC

[GitHub] spark pull request: [SPARK-2598] RangePartitioner's binary search ...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1500#issuecomment-49554380 Merged in master branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2495][MLLIB] remove private[mllib] from...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1492#issuecomment-49557834 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49559170 Cool. What about P^3 sort? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2598] RangePartitioner's binary search ...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1500#issuecomment-49565928 0.9.x doesn't have this problem because there was no binary search. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1498#issuecomment-49571549 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

2014-07-20 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1505#issuecomment-49572014 @davies can you take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15155562 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,34 @@ class Analyzer(catalog: Catalog, registry

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-21 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49575156 He did it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-1236 - Upgrade Jetty to 9.1.3.v20140225.

2014-03-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/113#issuecomment-37904304 We are reverting this pull request in #167 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: Revert SPARK-1236 - Upgrade Jetty to 9.1.3.v2...

2014-03-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/167#issuecomment-37906590 Ok I merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1268] Adding XOR and AND-NOT operations...

2014-03-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/172#discussion_r10718417 --- Diff: core/src/main/scala/org/apache/spark/util/collection/BitSet.scala --- @@ -88,6 +88,53 @@ class BitSet(numBits: Int) extends Serializable

[GitHub] spark pull request: [SPARK-1268] Adding XOR and AND-NOT operations...

2014-03-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/172#discussion_r10718405 --- Diff: core/src/main/scala/org/apache/spark/util/collection/BitSet.scala --- @@ -88,6 +88,53 @@ class BitSet(numBits: Int) extends Serializable

[GitHub] spark pull request: [SPARK-1268] Adding XOR and AND-NOT operations...

2014-03-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/172#discussion_r10718473 --- Diff: core/src/main/scala/org/apache/spark/util/collection/BitSet.scala --- @@ -88,6 +88,53 @@ class BitSet(numBits: Int) extends Serializable

[GitHub] spark pull request: [SPARK-1268] Adding XOR and AND-NOT operations...

2014-03-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/172#discussion_r10718582 --- Diff: core/src/main/scala/org/apache/spark/util/collection/BitSet.scala --- @@ -88,6 +88,53 @@ class BitSet(numBits: Int) extends Serializable

[GitHub] spark pull request: [SPARK-1268] Adding XOR and AND-NOT operations...

2014-03-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/172#discussion_r10718634 --- Diff: core/src/test/scala/org/apache/spark/util/collection/BitSetSuite.scala --- @@ -69,4 +69,45 @@ class BitSetSuite extends FunSuite { assert

[GitHub] spark pull request: [SPARK-1268] Adding XOR and AND-NOT operations...

2014-03-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/172#discussion_r10718623 --- Diff: core/src/test/scala/org/apache/spark/util/collection/BitSetSuite.scala --- @@ -69,4 +69,45 @@ class BitSetSuite extends FunSuite { assert

[GitHub] spark pull request: [SPARK-1268] Adding XOR and AND-NOT operations...

2014-03-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/172#issuecomment-37970799 Hi @petko-nikolov, Thanks a lot for contributing this patch! I left some comments to help the code conform to Spark coding style, and on test coverage. It would

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10738878 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10738906 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10738948 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10738993 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LAUtils.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739033 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/LAUtils.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739070 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/PCASuite.scala --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739083 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/PCASuite.scala --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739114 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -142,17 +172,189 @@ object SVD { val vsirdd = sc.makeRDD(Array.tabulate

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739138 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -38,18 +40,49 @@ class SVD

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739162 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -142,17 +172,189 @@ object SVD { val vsirdd = sc.makeRDD(Array.tabulate

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739195 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -142,17 +172,189 @@ object SVD { val vsirdd = sc.makeRDD(Array.tabulate

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739236 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala --- @@ -142,17 +172,189 @@ object SVD { val vsirdd = sc.makeRDD(Array.tabulate

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/88#issuecomment-38024310 Hi @rezazadeh Thanks for working on this! I can't wait for this to be merged and improve the coverage on common ml algorithms in mllib. I am not really

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/88#issuecomment-38024368 Oh and I didn't go through all files for styles and readability. I'm sure you can look at the rest and figure them out yourself. Thanks! --- If your project is set up

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739371 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: Principal Component Analysis

2014-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/88#discussion_r10739380 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/PCA.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: bugfix: Wrong Duration in Active Stages in...

2014-03-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/170#issuecomment-38028082 Thanks. I merged this in master branch-0.9 (fyi @pwendell) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: Make SQL keywords case-insensitive

2014-03-21 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/193#issuecomment-38336273 Thanks. I've merged this., --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Add asCode function for dumping raw tree repre...

2014-03-21 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/200#issuecomment-38336303 Thanks. Merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: Fixed coding style issues in Spark SQL

2014-03-24 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/208#issuecomment-38415798 ```scala package org.apache.spark.sql package catalyst ``` vs ```scala package org.apache.spark.sql.catalyst ``` There are three reasons I

[GitHub] spark pull request: Use Guava's top k implementation rather than o...

2014-03-25 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/229 Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation Also updated the documentation for top and takeOrdered. On my simple test of sorting 100 million (Int

[GitHub] spark pull request: StopAfter / TopK related changes

2014-03-25 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/233 StopAfter / TopK related changes 1. Renamed StopAfter to Limit to be more consistent with naming in other relational databases. 2. Renamed TopK to TakeOrdered to be more consistent with Spark RDD

[GitHub] spark pull request: StopAfter / TopK related changes

2014-03-25 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/233#issuecomment-38648471 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1321 Use Guava's top k implementation ra...

2014-03-25 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/229#issuecomment-38648982 weird i missed that. fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: StopAfter / TopK related changes

2014-03-26 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/233#issuecomment-38653186 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10965746 --- Diff: mllib/src/main/java/org/apache/spark/mllib/input/WholeTextFileRecordReader.java --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10965749 --- Diff: mllib/src/main/java/org/apache/spark/mllib/input/WholeTextFileRecordReader.java --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10965782 --- Diff: project/SparkBuild.scala --- @@ -358,7 +358,7 @@ object SparkBuild extends Build { def mllibSettings = sharedSettings ++ Seq( name

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/164#discussion_r10965793 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/MLContext.scala --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

<    8   9   10   11   12   13   14   15   16   17   >