[GitHub] spark pull request: [Minor] Trivial fix to make codes more readabl...

2014-10-04 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/2654 [Minor] Trivial fix to make codes more readable It should just use `maxResults` there. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya

[GitHub] spark pull request: [SPARK-3801] More efficient app dir cleanup

2014-10-05 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/2660 [SPARK-3801] More efficient app dir cleanup The newly merged more conservative app directory cleanup can be more efficient. Since it is not needed to store newer files, using `exists` to check

[GitHub] spark pull request: [SPARK-3801] More efficient app dir cleanup

2014-10-05 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2660#issuecomment-57939614 @srowen, I think so. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3801] More efficient app dir cleanup

2014-10-05 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/2660 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-26 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50225517 Thanks for commenting. How about the review? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-27 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-50257662 @mateiz Thanks for suggestion. I leave the PageRank example as-is. These braces are added to comply with code style. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-30 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/1418#discussion_r15590707 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -275,18 +286,51 @@ class DAGScheduler( case shufDep

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-30 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/1418#discussion_r15591064 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -265,6 +275,7 @@ class DAGScheduler( private def getParentStages(rdd

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-30 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/1418#discussion_r15592346 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -195,11 +195,21 @@ class DAGScheduler( shuffleToMapStage.get

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-31 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/1418#discussion_r15648333 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -275,18 +287,53 @@ class DAGScheduler( case shufDep

[GitHub] spark pull request: Considering the ordering of qualifiers when co...

2014-10-13 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/2783 Considering the ordering of qualifiers when comparison The orderings should be considered during the comparison between old qualifiers and new qualifiers. You can merge this pull request

[GitHub] spark pull request: [SPARK-3925] Do not consider the ordering of q...

2014-10-13 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2783#issuecomment-58896121 @srowen Yes. Thanks for correcting it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-3925] Do not consider the ordering of q...

2014-10-14 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2783#issuecomment-59070633 Please test again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3925][SQL] Do not consider the ordering...

2014-10-14 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2783#issuecomment-59149334 The title is modified. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: Remove duplicate removal of local dirs

2014-10-16 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/2826 Remove duplicate removal of local dirs The shutdown hook of `DiskBlockManager` would remove localDirs. So do not need to register them with `Utils.registerShutdownDeleteDir`. It causes duplicate

[GitHub] spark pull request: [SPARK-3970] Remove duplicate removal of local...

2014-10-16 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/2826#discussion_r18966331 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -140,7 +140,6 @@ private[spark] class DiskBlockManager(blockManager

[GitHub] spark pull request: [SPARK-3970] Remove duplicate removal of local...

2014-10-16 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/2826#discussion_r18999421 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -140,7 +140,6 @@ private[spark] class DiskBlockManager(blockManager

[GitHub] spark pull request: [SPARK-3925][SQL] Do not consider the ordering...

2014-10-20 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2783#issuecomment-59772318 Other comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3925][SQL] Do not consider the ordering...

2014-10-26 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2783#issuecomment-60514766 Seems the patch is ok to merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3970] Remove duplicate removal of local...

2014-10-26 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2826#issuecomment-60514830 Is this patch ok to merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2355] Add checker for the number of clu...

2014-07-03 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/1293 [SPARK-2355] Add checker for the number of clusters When the number of clusters given to perform with org.apache.spark.mllib.clustering.KMeans under parallel initial mode is greater than data

[GitHub] spark pull request: [SPARK-2355] Add checker for the number of clu...

2014-07-03 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/1293#issuecomment-47961640 The problem lies in `initKMeansParallel`, the implementation of k-means|| algorithm. Since it selects at most the centers as many as the data number, when calling

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-15 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/1418 [SPARK-2490] Change recursive visiting on RDD dependencies to iterative approach When performing some transformations on RDDs after many iterations, the dependencies of RDDs could be very long

[GitHub] spark pull request: [SPARK-2490] Change recursive visiting on RDD ...

2014-07-16 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/1418#issuecomment-49184593 Another example of this problem is the PageRank example bundled in Spark. At this time, since the problem of Java serializer still exists, to avoid causing

[GitHub] spark pull request: [SPARK-3077] fix unnecessarily removing sendin...

2014-08-16 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/1985 [SPARK-3077] fix unnecessarily removing sendingConnection when closing connections Currently in `ConnectionManager`, when a `ReceivingConnection` is closing, the corresponding `SendingConnection

[GitHub] spark pull request: [Minor] fix typo

2014-08-23 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/2105 [Minor] fix typo Fix a typo in comment. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 fix_typo Alternatively you can review

[GitHub] spark pull request: [SPARK-3252] Add missing condition for test

2014-08-27 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/2159 [SPARK-3252] Add missing condition for test According to the text message, both relations should be tested. So add the missing condition. You can merge this pull request into a Git repository

[GitHub] spark pull request: [SPARK-3300][SQL] Should clean old buffer afte...

2014-08-29 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/2195 [SPARK-3300][SQL] Should clean old buffer after copying its content The function `ensureFreeSpace` in object `ColumnBuilder` clears old buffer before copying its content to new buffer. This PR fixes

[GitHub] spark pull request: [SPARK-3300][SQL] Should clean old buffer afte...

2014-08-29 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2195#issuecomment-53854049 I just noticed that `clear()` would not actually erase the data in the buffer. So you can close this PR if you think it is not necessary to make the change. --- If your

[GitHub] spark pull request: [SQL] Directly use currentTable without unnece...

2014-08-29 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/2203 [SQL] Directly use currentTable without unnecessary implicit conversion We can directly use currentTable there without unnecessary implicit conversion. You can merge this pull request into a Git

[GitHub] spark pull request: [SPARK-2355][MLlib] Add checker for the number...

2014-08-29 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/1293#issuecomment-53945559 @mengxr OK. Thanks for informing. SPARK-3218 looks promising. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-2355][MLlib] Add checker for the number...

2014-08-29 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/1293 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-3327] Make broadcasted value mutable fo...

2014-08-30 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/2217 [SPARK-3327] Make broadcasted value mutable for caching useful information This PR makes broadcasted value mutable for caching useful information when implementing some algorithms that iteratively

[GitHub] spark pull request: [SPARK-3300][SQL] No need to call clear() and ...

2014-08-30 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2195#issuecomment-53967657 ok to test please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3327] Make broadcasted value mutable fo...

2014-08-31 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2217#issuecomment-53983398 Thanks for your comments. For the possibility to cause an exception on an executor, it happens when `synchronized` is not there. As `setValue` is wrapped

[GitHub] spark pull request: [SPARK-3345] Do correct parameters for Shuffle...

2014-09-02 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/2235 [SPARK-3345] Do correct parameters for ShuffleFileGroup In the method `newFileGroup` of class `FileShuffleBlockManager`, the parameters for creating new `ShuffleFileGroup` object is in wrong order

[GitHub] spark pull request: [SPARK-3327] Make broadcasted value mutable fo...

2014-09-03 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2217#issuecomment-54278826 @rxin. I need a way to modify broadcasted variables locally and keep those variables for later use. The locally modified variables are used to store some values

[GitHub] spark pull request: [SPARK-3327] Make broadcasted value mutable fo...

2014-09-03 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2217#issuecomment-54279202 @rxin, I can get the idea that immutability makes the whole thing safer for broadcasted variables. So I am just wondering if it is worth providing such mutability

[GitHub] spark pull request: [SPARK-3327] Make broadcasted value mutable fo...

2014-09-03 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2217#issuecomment-54295028 @srowen Thanks for comment. In fact I want some persistent mutable states per data partition. I just achieve that goal with mutable broadcasted variables. I know

[GitHub] spark pull request: [SPARK-3327] Make broadcasted value mutable fo...

2014-09-03 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2217#issuecomment-54396114 Thanks for your comments. I agree with Reynold too that it is a rough idea and I need much consideration with a clear design doc for that if I think we really need

[GitHub] spark pull request: [SPARK-3327] Make broadcasted value mutable fo...

2014-09-03 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/2217 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-3310][SQL] Directly use currentTable wi...

2014-09-04 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/2203#issuecomment-54501755 please test again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SQL][Minor] Let BigDecimal do checking type c...

2014-11-11 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/3208 [SQL][Minor] Let BigDecimal do checking type compatibility Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility. You can merge this pull request into a Git

[GitHub] spark pull request: [SPARK-4358][SQL] Let BigDecimal do checking t...

2014-11-11 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3208#issuecomment-62676407 When parsing NumericLiteral, using more specified numeric types including Byte, Short that may improve memory efficiency slightly. --- If your project is set up

[GitHub] spark pull request: Add locations parameter to Twitter Stream

2014-11-13 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/3246 Add locations parameter to Twitter Stream When we request Tweet stream, geo-location is one of the most important parameters. In addition to the track parameter, the locations parameter is widely

[GitHub] spark pull request: [SPARK-4358][SQL] Let BigDecimal do checking t...

2014-11-15 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3208#discussion_r20401128 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala --- @@ -55,7 +55,11 @@ case class GetItem(child

[GitHub] spark pull request: [SPARK-4358][SQL] Let BigDecimal do checking t...

2014-11-15 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3208#discussion_r20401148 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala --- @@ -257,9 +257,16 @@ case class Substring(str

[GitHub] spark pull request: [SPARK-4358][SQL] Let BigDecimal do checking t...

2014-11-15 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3208#discussion_r20401183 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -47,6 +47,8 @@ object Literal { object

[GitHub] spark pull request: [SPARK-4358][SQL] Let BigDecimal do checking t...

2014-11-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3208#discussion_r20493215 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala --- @@ -460,6 +460,20 @@ trait HiveTypeCoercion

[GitHub] spark pull request: [SPARK-4358][SQL] Let BigDecimal do checking t...

2014-11-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3208#discussion_r20493243 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala --- @@ -460,6 +460,20 @@ trait HiveTypeCoercion

[GitHub] spark pull request: [SPARK-4358][SQL] Let BigDecimal do checking t...

2014-11-21 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3208#discussion_r20705901 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -339,18 +339,15 @@ class SqlParser extends AbstractSparkSQLParser

[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-11-21 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-64071312 Any idea about this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4358][SQL] Let BigDecimal do checking t...

2014-11-23 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3208#discussion_r20775962 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -339,18 +339,15 @@ class SqlParser extends AbstractSparkSQLParser

[GitHub] spark pull request: [SPARK-4597] Use proper exception and reset va...

2014-11-25 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/3449 [SPARK-4597] Use proper exception and reset variable `File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should

[GitHub] spark pull request: [SPARK-4597] Use proper exception and reset va...

2014-11-25 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3449#discussion_r20849117 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -262,7 +262,7 @@ private[spark] object Utils extends Logging

[GitHub] spark pull request: [SPARK-4358][SQL] Let BigDecimal do checking t...

2014-11-26 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3208#issuecomment-64729341 Hi @marmbrus Is this ok to be merged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4597] Use proper exception and reset va...

2014-11-26 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3449#issuecomment-64738171 @JoshRosen @srowen Any other comments? Is this ok to be merged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-4674] Refactor getCallSite

2014-12-01 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/3532 [SPARK-4674] Refactor getCallSite The current version of `getCallSite` visits the collection of `StackTraceElement` twice. However, it is unnecessary since we can perform our work with a single

[GitHub] spark pull request: [SPARK-4674] Refactor getCallSite

2014-12-01 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3532#issuecomment-65106880 Style is fixed. Please test again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: Replace breezeSquaredDistance

2014-12-09 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/3643 Replace breezeSquaredDistance This PR replaces slow breezeSquaredDistance. A simple calculation involving 4 vectors of 2 dims shows: * breezeSquaredDistance: ~12 secs * This PR

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-09 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-66411144 Thanks. I add the consideration for different cases of SparseVector and DenseVector. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-10 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3600#issuecomment-66419204 Thanks. However, I can not see why this is a broken change. Please let me know where it causes problems as it seems to pass tests now. In fact, this PR does

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-10 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3600#issuecomment-66435647 I agree with you that the saved operation here is a cheap one. :-) However the problem you mentioned would not happen with current version of `DeserializationStream

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-10 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3600#issuecomment-66446898 I do know that `finalize` can close wrapped stream. I did not say it would not. But It only can if you implement it as that. There is no such implicit contract

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-10 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3600#issuecomment-66448444 Except for some streams associated with files and network connections, not all streams should always be closed when you're done with them. That is what I know. Maybe

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-10 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/3600 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-10 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3600#issuecomment-66453822 Thanks. But in the end, you still can not provide a rational explanation for the reason why it fails. At least, it is not convincing for me. :-) Anyway, still thanks

[GitHub] spark pull request: [SPARK-4741] Do not destroy FileInputStream an...

2014-12-10 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3600#issuecomment-66455203 Anyway, still thanks for your comments and time to replying this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-11 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-66587933 Thanks for that. I add new commit to make the methods private now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-11 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-66614419 Hi, intersect, diff and foreach are all replaced with while-loop in the new commit to follow BLAS.dot pattern. Please see if there is any problem. Thanks. --- If your

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-16 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3643#discussion_r21885069 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -264,6 +263,92 @@ object MLUtils { } Vectors.fromBreeze

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-16 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-67137308 @jkbradley Thanks. The codes are modified for your comments. The test is also expanded to test the case of the major comment you mentioned. Please check it again

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-16 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-67282437 @jkbradley, Thank you. New commit is added to deal with these comments. Please let me know if any problems. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-17 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3643#discussion_r22025570 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -264,6 +263,84 @@ object MLUtils { } Vectors.fromBreeze

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-17 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-67447322 Yes. The additional tests are used to test these bugs we found in this PR. For example, `fastSquaredDistance(v2, norm2, v4, norm4, precision)` is used to test the case

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-17 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-67450454 Calculating 2 squared distances between the vectors of 2 dims: * `DenseVector` vs. `SparseVector` * breezeSquaredDistance: ~25 secs * This PR

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-18 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-67593772 The indices are gradually increased from 1 to 2. In the `SparseVector` vs. `SparseVector` case (same indices length), the indices are fully overlapping. In other case

[GitHub] spark pull request: [SPARK-4674] Refactor getCallSite

2014-12-19 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3532#issuecomment-67613691 Thanks. I found that `getCallSite` is called in many places. So I am curious about its implementation details. Then I thought it can be more efficient. I have not really

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-19 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-67620316 Thanks @mengxr. Not quite familiar with breeze. But as I roughly go through distance metric implementations of breeze. They are following the same pattern that employs

[GitHub] spark pull request: [Minor] Fix scala doc

2014-12-19 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/3751 [Minor] Fix scala doc Minor fix for an obvious scala doc error. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 fix_scaladoc

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-21 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/3755 [SPARK-4913] Fix incorrect event log path SPARK-2261 uses a single file to log events for an app. `eventLogDir` in `ApplicationDescription` is replaced with `eventLogFile`. However

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-21 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3755#issuecomment-67795433 Looks like the test failed not because of this PR. Please test again. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3083] fix unnecessarily removing sendin...

2014-12-21 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/1985 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-22 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3755#issuecomment-67927237 Thanks @andrewor14 @vanzin. I made corresponding revision. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-23 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-67938403 I have not run pyspark tests. Fixed in update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-23 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-67938482 Please test it again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4913] Fix incorrect event log path

2014-12-24 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3755#issuecomment-68040630 Thanks for your suggestion too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-28 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-68237654 @jkbradley Is there any problem you concern? Is this ready to merge? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-12-30 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-68342204 Anyone would like to review this pr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-12-30 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3246#discussion_r22343534 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala --- @@ -60,6 +61,7 @@ private[streaming] class

[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2014-12-30 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/3246#discussion_r22344303 --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala --- @@ -60,6 +61,7 @@ private[streaming] class

[GitHub] spark pull request: [SPARK-4797] Replace breezeSquaredDistance

2014-12-31 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3643#issuecomment-68434376 @mengxr The implementation is renamed and moved to `linalg.Vectors`. Would you like to test it again? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-5050][Mllib] Add unit test for sqdist

2015-01-03 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3869#issuecomment-68590587 @jkbradley Thanks. I made a more proper unit test with random sparsity pattern. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-5050][Mllib] Add unit test for sqdist

2015-01-03 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3869#issuecomment-68592833 Please test again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [Minor] Fix incorrect warning log

2015-02-04 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/4360 [Minor] Fix incorrect warning log The warning log looks incorrect. Just fix it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1

[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...

2015-02-04 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-73000140 @tdas Can you have a quick look of this pr? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...

2015-02-02 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4301#discussion_r23976526 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -149,6 +165,18 @@ private[clustering] object

[GitHub] spark pull request: [SPARK-5512][Mllib] Run the PIC algorithm with...

2015-02-02 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4301#discussion_r23976829 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -149,6 +165,18 @@ private[clustering] object

[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-72465789 @marmbrus I did some refactoring for the comments. It should be better now. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23932681 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala --- @@ -25,9 +25,18 @@ import

  1   2   3   4   5   6   7   8   9   10   >