[GitHub] spark issue #22790: [SPARK-25793][ML]call SaveLoadV2_0.load for classNameV2_...

2018-10-23 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22790 This shouldn't block 2.4.0 release. Based on the code, it doesn't introduce regression to existing features (just using V1 format and ignore trainingCost and distanceMeasure). Correctness issue

[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

2018-10-19 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22756 We have to revert this PR in branch-2.4. It is not a blocker and we shouldn't merge it to branch-2.4 this late in this already delayed release

[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22675#discussion_r223566032 --- Diff: docs/ml-datasource.md --- @@ -0,0 +1,51 @@ +--- +layout: global +title: Data sources +displayTitle: Data sources

[GitHub] spark issue #22492: [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaki...

2018-10-02 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22492 @cloud-fan said above the next version is very likely to be 2.5.0 instead of 3.0. Well the next version number is not fully discussed yet. For that reason, I think we should revert the changes

[GitHub] spark issue #22492: [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaki...

2018-10-02 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22492 @WeichenXu123 @cloud-fan I made https://github.com/apache/spark/pull/22618 to revert the change in master. --- - To unsubscribe

[GitHub] spark pull request #22618: [SPARK-25321][ML] Revert SPARK-14681 to avoid API...

2018-10-02 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/22618 [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaking change ## What changes were proposed in this pull request? This is the same as #22492 but for master branch. Revert SPARK-14681

[GitHub] spark issue #22492: [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaki...

2018-09-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22492 @WeichenXu123 Please close this PR manually. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22510: [SPARK-25321][ML] Fix local LDA model constructor

2018-09-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22510 LGTM. Merged into master and branch 2.4. Thanks for checking compatibility with MLeap. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22492: [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaki...

2018-09-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22492 LGTM. Merged into branch-2.4. @WeichenXu123 Next time please create dedicated JIRAs for each QA task PR. Thanks

[GitHub] spark issue #22492: [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaki...

2018-09-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22492 We can keep it in master if the next release is 3.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22449: [SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerat...

2018-09-19 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22449 LGTM. Merged into master and branch-2.4. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22449: [SPARK-22666][ML][FOLLOW-UP] Return a correctly formatte...

2018-09-18 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22449 @WeichenXu123 I think we should fix the test instead of removing "//" from URI if authority is empty. Because both "scheme:/" and "scheme:///" are valid. ~~~sca

[GitHub] spark pull request #22449: [SPARK-22666][ML][FOLLOW-UP] Return a correctly f...

2018-09-18 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22449#discussion_r218498363 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala --- @@ -85,7 +85,9 @@ private[image] class ImageFileFormat extends

[GitHub] spark issue #22349: [SPARK-25345][ML] Deprecate public APIs from ImageSchema

2018-09-08 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22349 LGTM. Merged into master and branch-2.4. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22349: [SPARK-25345][ML] Deprecate public APIs from ImageSchema

2018-09-08 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22349 @WeichenXu123 Could you address the comments? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22349: [SPARK-25345][ML] Deprecate public APIs from Imag...

2018-09-06 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22349#discussion_r215840879 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -35,6 +35,8 @@ import org.apache.spark.sql.types

[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22328 Merged into master. Thanks @WeichenXu123 for the implementation and everyone for the review! I created the following JIRAs as follow-ups: * deprecate ImageSchema: https://issues.apache.org

[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22328 The image data source tests passed but JVM crashed at the end. Triggered another test. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22328 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-05 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22328 LGTM pending tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...

2018-09-05 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22165 I didn't make a full pass over the tests. @jiangxb1987 let me know if you need me to take a pass. --- - To unsubscribe, e-mail

[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22165#discussion_r215326727 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BarrierCoordinatorSuite.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22165#discussion_r215326394 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BarrierCoordinatorSuite.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22165: [SPARK-25017][Core] Add test suite for BarrierCoo...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22165#discussion_r215324595 --- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala --- @@ -65,7 +65,7 @@ private[spark] class BarrierCoordinator( // Record

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215322021 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215322673 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageOptions.scala --- @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215320762 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215321353 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215320923 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215323149 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-05 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r215179601 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-04 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22328 That doesn't work for Java, if I remember the issue correctly. On Tue, Sep 4, 2018, 10:31 PM Wenchen Fan wrote: > *@cloud-fan* commented on this pull requ

[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-04 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22328 @mhamilton723 I thought about that option too. Loading general binary files is a useful feature but I don't feel it is necessary to pull it into the current scope. No matter whether the image data

[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-09-04 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22240 Merged into master. Thanks for review! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r214981991 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-04 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22328 Yes, the ImageSchema implementation are used by the data source, which we cannot remove:) We are only going to mark the public APIs there as deprecated. The goal is to provide users a unified

[GitHub] spark issue #22328: [SPARK-22666][ML][SQL] Spark datasource for image format

2018-09-04 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22328 @imatiach-msft @HyukjinKwon The plan is to mark `ImageSchema` deprecated in 2.4 and remove it in 3.0. So loading images will be the same as loading data from other sources. The gaps

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r214969542 --- Diff: mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala --- @@ -28,7 +28,7 @@ import org.apache.spark.sql.types._ class

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r214967994 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r214969782 --- Diff: mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r214967452 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22328: [SPARK-22666][ML][SQL] Spark datasource for image...

2018-09-04 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22328#discussion_r214968664 --- Diff: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala --- @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #22271: [SPARK-25268][GraphX]run Parallel Personalized PageRank ...

2018-08-31 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22271 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22261: [SPARK-25248.1][PYSPARK] update barrier Python API

2018-08-29 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22261 Merged into master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22261: [SPARK-25248.1][PYSPARK] update barrier Python API

2018-08-29 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22261 There are two PRs from that JIRA, one for Scala APIs and one for Python APIs --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22258: [SPARK-25266][CORE] Fix memory leak in Barrier Execution...

2018-08-29 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22258 Merged into master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...

2018-08-28 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22247 @squito Thanks for the refactor! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for...

2018-08-28 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22240#discussion_r213537490 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -68,7 +74,7 @@ class BarrierTaskContext( * * CAUTION

[GitHub] spark pull request #22261: [SPARK-25248.1][PYSPARK] update barrier Python AP...

2018-08-28 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/22261 [SPARK-25248.1][PYSPARK] update barrier Python API ## What changes were proposed in this pull request? I made one pass over the Python APIs for barrier mode and updated them to match

[GitHub] spark issue #22258: [SPARK-25266][CORE] Fix memory leak in Barrier Execution...

2018-08-28 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22258 LGTM pending test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22240: [SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4

2018-08-28 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22240 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22240: [WIP] [SPARK-25248] [CORE] Audit barrier APIs for...

2018-08-26 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22240#discussion_r212863571 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDDBarrier.scala --- @@ -22,15 +22,22 @@ import scala.reflect.ClassTag import

[GitHub] spark pull request #22240: [WIP] [SPARK-25248] [CORE] Audit barrier APIs for...

2018-08-26 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22240#discussion_r212863543 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala --- @@ -28,4 +28,4 @@ import org.apache.spark.annotation.{Experimental, Since

[GitHub] spark pull request #22240: [WIP] [SPARK-25248] [CORE] Audit barrier APIs for...

2018-08-26 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22240#discussion_r212863507 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -21,25 +21,31 @@ import java.util.{Properties, Timer, TimerTask

[GitHub] spark pull request #22240: [WIP] [SPARK-25248] [CORE] Audit barrier APIs for...

2018-08-26 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22240#discussion_r212863444 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -68,7 +74,7 @@ class BarrierTaskContext( * * CAUTION

[GitHub] spark pull request #22240: [WIP] [SPARK-25248] [CORE] Audit barrier APIs for...

2018-08-26 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22240#discussion_r212863381 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -21,25 +21,31 @@ import java.util.{Properties, Timer, TimerTask

[GitHub] spark pull request #22240: [WIP] [SPARK-25248] [CORE] Audit barrier APIs for...

2018-08-26 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/22240 [WIP] [SPARK-25248] [CORE] Audit barrier APIs for 2.4 ## What changes were proposed in this pull request? I made one pass over barrier APIs added to Spark 2.4 and updates some scopes

[GitHub] spark issue #22225: [SPARK-25234][SPARKR] avoid integer overflow in parallel...

2018-08-24 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/5 Merged into master and branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22225: [SPARK-25234][SPARKR] avoid integer overflow in parallel...

2018-08-24 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/5 cc @falaki --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22225: [SPARK-25234][SPARKR] avoid integer overflow in p...

2018-08-24 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/5 [SPARK-25234][SPARKR] avoid integer overflow in parallelize ## What changes were proposed in this pull request? `parallelize` uses integer multiplication to determine the split indices

[GitHub] spark pull request #22171: [SPARK-25177][SQL] When dataframe decimal type co...

2018-08-22 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22171#discussion_r211867603 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -197,7 +197,7 @@ final class Decimal extends Ordered[Decimal

[GitHub] spark issue #22085: [SPARK-25095][PySpark] Python support for BarrierTaskCon...

2018-08-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22085 LGTM. I'm merging this into master. We might need a minor refactor for readability. But it shouldn't block developers testing this new feature. Thanks

[GitHub] spark issue #22158: [SPARK-25161][Core] Fix several bugs in failure handling...

2018-08-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22158 Merged into master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22112 Then it doesn't meet the requirements for those operations used by MLlib: * sampling * zipWithIndex, zipWithUniqueId * we also use zip, assuming the ordering from the source RDD

[GitHub] spark issue #22158: [SPARK-25161][Core] Fix several bugs in failure handling...

2018-08-21 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22158 LGTM pending Jenkins. Thanks for finding those corner cases! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-20 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22112 If "always return the same result with same order when rerun." is the definition of "idempotent", then yes, MLlib RDD closures always returns the same result if the input do

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r211360719 --- Diff: python/pyspark/taskcontext.py --- @@ -95,3 +99,126 @@ def getLocalProperty(self, key): Get a local property set upstream in the driver

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r211356245 --- Diff: python/pyspark/taskcontext.py --- @@ -95,3 +99,126 @@ def getLocalProperty(self, key): Get a local property set upstream in the driver

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r211359959 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -381,6 +465,20 @@ private[spark] abstract class BasePythonRunner

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r211358615 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -180,7 +190,61 @@ private[spark] abstract class BasePythonRunner

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r211356840 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -76,6 +77,15 @@ private[spark] abstract class BasePythonRunner[IN, OUT

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r211355022 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -20,15 +20,16 @@ package org.apache.spark.api.python import java.io

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r211359028 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -180,7 +190,61 @@ private[spark] abstract class BasePythonRunner

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r211358743 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -180,7 +190,61 @@ private[spark] abstract class BasePythonRunner

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-20 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r211357983 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -180,7 +190,61 @@ private[spark] abstract class BasePythonRunner

[GitHub] spark issue #22137: [MINOR][DOC][SQL] use one line for annotation arg value

2018-08-17 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22137 cc: @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22137: [MINOR][DOC][SQL] use one line for annotation arg...

2018-08-17 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/22137 [MINOR][DOC][SQL] use one line for annotation arg value ## What changes were proposed in this pull request? Put annotation args in one line, or API doc generation will fail

[GitHub] spark issue #22085: [SPARK-25095][PySpark] Python support for BarrierTaskCon...

2018-08-15 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22085 @HyukjinKwon Thanks for the feedback! We will replace the py4j route by a special implementation that can only trigger "context.barrier()&qu

[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-15 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22001 LGTM. Merged into master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-14 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22001 @kiszk Thanks for the note! I reverted the change in DAGSchedulerSuite. Let's try Jenkins again. --- - To unsubscribe, e-mail

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-14 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r209846054 --- Diff: python/pyspark/taskcontext.py --- @@ -95,3 +95,92 @@ def getLocalProperty(self, key): Get a local property set upstream in the driver

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-14 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r209846015 --- Diff: python/pyspark/taskcontext.py --- @@ -95,3 +95,92 @@ def getLocalProperty(self, key): Get a local property set upstream in the driver

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-13 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r209830941 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -180,7 +183,42 @@ private[spark] abstract class BasePythonRunner

[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-13 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22001 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-13 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22001 @shaneknapp Maybe we could scan the test history and move some super stable tests to nightly. Apparently, it is not a solution for now. I'm giving another try

[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-13 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22001 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-13 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22001 @shaneknapp Is the timeout due to concurrent workload on Jenkins workers? If so, shall we reduce the concurrency (more wait in the queue but more robust test result

[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-13 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22001 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r209473946 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -180,7 +183,42 @@ private[spark] abstract class BasePythonRunner

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r209473919 --- Diff: python/pyspark/taskcontext.py --- @@ -95,3 +96,33 @@ def getLocalProperty(self, key): Get a local property set upstream in the driver

[GitHub] spark pull request #22085: [SPARK-25095][PySpark] Python support for Barrier...

2018-08-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22085#discussion_r209473887 --- Diff: python/pyspark/taskcontext.py --- @@ -95,3 +96,33 @@ def getLocalProperty(self, key): Get a local property set upstream in the driver

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209460397 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -929,11 +963,38 @@ class DAGScheduler( // HadoopRDD whose

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209460279 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -929,11 +963,38 @@ class DAGScheduler( // HadoopRDD whose

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-12 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209460309 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -929,11 +963,38 @@ class DAGScheduler( // HadoopRDD whose

[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-12 Thread mengxr
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22001 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209304798 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -929,6 +955,28 @@ class DAGScheduler( // HadoopRDD whose

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209294774 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala --- @@ -453,4 +453,8

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209276818 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -929,6 +955,28 @@ class DAGScheduler( // HadoopRDD whose

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209277357 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -929,6 +955,28 @@ class DAGScheduler( // HadoopRDD whose

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209274833 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -929,6 +955,28 @@ class DAGScheduler( // HadoopRDD whose

  1   2   3   4   5   6   7   8   9   10   >