Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21248
cc @mengxr @WeichenXu123 @felixcheung. Can you please verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/21270
Power Iteration Clustering in SparkML throws exception, when the ID in
IntType
While running the following code, PIC throws exception.
```
val data = spark.createDataFrame(Seq
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21270
@WeichenXu123 Thanks for the comment. I have created another Jira and I
have raised a PR for that. That PR will fix this issue as well. Can you please
review the PR?
Jira : https
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/21277
[ML]Power Iteration Clustering is not displaying cluster indices
corresponding to some nodes.
## What changes were proposed in this pull request?
1) Currently PIC in ML displays cluster
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21270
Thank you @jkbradly. Actually one more issue is there. Currently we are
skipping some of the nodes which are not there in the ID column, but there in
the neighboring column. Spark MLLib
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/21283
Java example code for Power Iteration Clustering in spark.ml
## What changes were proposed in this pull request?
Java example code for Power Iteration Clustering in spark.ml
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21274#discussion_r187234165
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -231,8 +231,12 @@ class PowerIterationClustering
GitHub user shahidki31 reopened a pull request:
https://github.com/apache/spark/pull/21277
[SPARK-24217][ML]Power Iteration Clustering is not displaying cluster
indices corresponding to some vertices
## What changes were proposed in this pull request?
1) Currently PIC
Github user shahidki31 closed the pull request at:
https://github.com/apache/spark/pull/21277
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user shahidki31 closed the pull request at:
https://github.com/apache/spark/pull/21270
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21277
Based on the comments in the JIRA, (
https://issues.apache.org/jira/browse/SPARK-24217), I am closing the issue
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/21248
Example code for Power Iteration Clustering
## What changes were proposed in this pull request?
Added example code for Power Iteration Clustering in Spark ML examples
## How
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189692625
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189692779
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189692700
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189693083
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21283#discussion_r189693033
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21283
Thanks @hhbyyh for the review comments. I have modified the example code
based on the review.
---
-
To unsubscribe, e-mail
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21248
Thank you @hhbyyh for the review. I have created a new dataset for the
example code, instead of function generated dataset. I have addressed your
review comments
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21248#discussion_r189704384
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21248#discussion_r189704094
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21248#discussion_r189704948
--- Diff:
examples/src/main/scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala
---
@@ -0,0 +1,114 @@
+/*
+ * Licensed
Github user shahidki31 closed the pull request at:
https://github.com/apache/spark/pull/21277
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21277
Closing the PR due to the discussions in the JIRA,
https://issues.apache.org/jira/browse/SPARK-15784 and the PR
https://github.com/apache/spark/pull/21493
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21248
Thanks @srowen
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21509#discussion_r194144115
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
---
@@ -166,6 +166,7 @@ class PowerIterationClustering
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21283
Hi @hhbyyh @srowen ,
PowerIterationClustering API has made some modifications based on the
discussion in the JIRA, https://issues.apache.org/jira/browse/SPARK-15784.
The examples also
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21283
Thanks @srowen
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/21509
Check for invalid input type of weight data in ml.PowerIterationClustering
## What changes were proposed in this pull request?
The test case will result the following failure. currently
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21248
Hi @hhbyyh @srowen ,
PowerIterationClustering API has some modifications based on the discussion
in the JIRA, https://issues.apache.org/jira/browse/SPARK-15784.
The examples also have
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21283
Hi @srowen , Thanks for the comment. I have modified the code.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21627
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/21627
[SPARK-24484][MLLIB]Power Iteration Clustering is giving incorrect
clustering results when there are mutiple leading eigen values.
## What changes were proposed in this pull request
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/21689
Minor correction in the powerIterationSuite
## What changes were proposed in this pull request?
Currently the power iteration clustering test in ml maps the results to the
labels 0
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21740#discussion_r202534469
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -75,10 +75,22 @@ class
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21740#discussion_r202534384
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModelSuite.scala
---
@@ -72,6 +72,22 @@ class
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21740#discussion_r202547781
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -75,11 +75,29 @@ class
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21740
Hi @srowen. The build has passed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21740#discussion_r202547705
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -75,11 +75,29 @@ class
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/21842
[Minor][ML]Added UT for checking maximum number of features for
GeneralizedLinearRegression and WeightedLeastSquares
Currently in the GeneralizedLinearRegression and WeightedLeastSquare
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21740
Thanks @srowen. yes, my JIRA handle is "shahid".
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22271
@SparkQA Test this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22271
Thank you @jkbradley for merging.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/22271
[SPARK-25268][GraphX]runParallelPersonalizedPageRank throws serialization
Exception
## What changes were proposed in this pull request?
mapValues in scala is currently not serializable
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21689#discussion_r200039891
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala
---
@@ -76,23 +78,25 @@ class
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21689#discussion_r200040108
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala
---
@@ -76,23 +78,25 @@ class
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21740
@jianran please refer the PR, https://github.com/apache/spark/pull/15809.
In this PR, I am checking if the 'userFeatures.lookup(user)', is empty
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/21740
[SPARK-18230][MLLib]Throw better exception,for a non-existing user/product
When invoking MatrixFactorizationModel.recommendProducts(Int, Int) with a
non-existing user
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21689
Thank you @srowen for merging.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/21689
Hi, @srowen . I have modified the code based on your suggestions.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21689#discussion_r200136002
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/PowerIterationClusteringSuite.scala
---
@@ -76,23 +78,31 @@ class
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/22645
[SPARK-25566][SPARK-25567][WEBUI][SQL]Support pagination for SQL tab to
avoid OOM
## What changes were proposed in this pull request?
Currently SQL tab in the WEBUI doesn't have pagination
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22659
In the test, "multinomial logistic regression with intercept with
elasticnet regularization" in the "LogisticRegressionSuite", taking around 1
minute to train 2 logis
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22659
Before the changes:
Running time of logistic regression suite: **4min 35 sec**
After the changes:
Running time of logistic regression suite: **3min 22 sec**
cc @srowen
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22660
Thanks for the suggestion. I will close this and amend in the other PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22659
In Jenkins CI, testing time of logisticRegressionSuite without the PR is 5
min 10 sec and with the PR, 4 min 21 sec
Github user shahidki31 closed the pull request at:
https://github.com/apache/spark/pull/22660
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22659
In the test "binary logistic regression with intercept with ElasticNet
regularization", taking around 30sec to run. But we can reduce the time to 15
sec by reducing the
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22645
cc @vanzin @srowen @cloud-fan @dongjoon-hyun . Kindly review the PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22645
Hi @srowen , There is one behavior change this PR introduces, which is
correct. Sorting Job Ids in the previous versions of spark was not proper.
After the PR the sorting is proper
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22689
Yes. Thank you @srowen .
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22645
Thanks a lot @srowen
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22689
Thanks a lot @srowen
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22645
Hi @cloud-fan , Since other webtabs like jobs, stages etc. embed the
javascript code in scala code, that is why I followed the same. It would be
great if we rewrite the spark UI with some modern
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22714
@gengliangwang Sorry, I didn't see the PR. Yes, that PR also for refreshing
functionality for the webui.
I have taken the patch and checked the functionality, and it seems fine.
Below
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22714
I am closing the PR, since already one PR is there for the webui auto
refresh. Thanks.
---
-
To unsubscribe, e-mail: reviews
Github user shahidki31 closed the pull request at:
https://github.com/apache/spark/pull/22714
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22784
Hi @kiszk ,
I think, INT_MAX is 2147483647, so n ~= sqrt(2*2147483647) = 65536.
Thanks
---
-
To unsubscribe, e-mail
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22784
Hi @srowen , Thanks for the comment. As per my knowledge, PCA/SVD is not
limited on row size.
1) Currently row size is not a constraint. Ultimately we need to compute
graminan matrix
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22784
Hi @kiszk Maximum it can go upto the following limit.
https://github.com/apache/spark/blob/23cfda1547355a823a3b2b2d374e64608c9ce175/mllib/src/main/scala/org/apache/spark/mllib/linalg
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22784#discussion_r226847504
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -424,6 +424,28 @@ object Vectors
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22784
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22784
All the UTs are passing locally. Seems random error.
retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22714
Thank you @srowen for the comment. Yes, we should not hard-code the refresh
interval and we let the user to enable the parameter. I will update the code
accordingly
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/22714
[SPARK-25720][WEBUI] Support auto refresh page for the WEBUI
## What changes were proposed in this pull request?
Currently spark webui doesn't have an option of auto refresh page. Because
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22714
cc @srowen @cloud-fan . Kindly review.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22784#discussion_r228788331
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala ---
@@ -49,7 +50,16 @@ class PCA @Since("1.4.0") (@Since("1.4
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22914
Hi @srowen , The current behavior of WEBUI is,
1) When user enters a page size more than current page size, It falls back
to first page (In all the pages)
2) When the user enters a page
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22864
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22864
I will fix the build error.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22864
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22914
cc @srowen . Kindly review
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22914
The code here (and all the other pages) intended to check out of bound
exception and fall back to the first page.
https://github.com/apache/spark/blob
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22914
Yes. Thank you @srowen
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user shahidki31 opened a pull request:
https://github.com/apache/spark/pull/22914
[SPARK-25900][WEBUI]If the When the page number is more than the total page
size, then fall back to the first page
## What changes were proposed in this pull request?
When we enter
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22914
@gengliangwang IMHO, We should try to avoid exceptions in the WEBUI.
User will come to know which page he is, from the page navigation bar
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22914
The current behavior is, If we enter a value more than the maximum page
number, the page navigation bar shows the user is in first page and throws an
exception. So, if we really want to throw
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22864
Seems random error.
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22914
I have modified based on @gengliangwang 's suggestion. Also, we can remove
the check for page size in all the page class, because we handle the same in
the pagedTable class
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22914
Hi @gengliangwang , Yes. That also doable. Currently when the
IndexOutOfBound Exception comes, the page is navigating to the first page. Also
to make the behavior consistent with the page size
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22914
Hi @srowen , After the PR, the following check also can be removed, because
the check is for preventing the OutOfBoundException, But no OutOfBound
exception happens after the PR
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22784
Thanks a lot @srowen
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22864
Hi @gengliangwang , Yes the parameter is not used from the beginning.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22784#discussion_r228719122
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
---
@@ -384,18 +384,28 @@ class RowMatrix @Since("
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22784#discussion_r228718969
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala ---
@@ -54,4 +55,14 @@ class PCASuite extends SparkFunSuite
Github user shahidki31 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22784#discussion_r228719009
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala ---
@@ -54,4 +55,14 @@ class PCASuite extends SparkFunSuite
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22784
Thank you @srowen for the review. I have addressed the comments.
> I wonder if the SVD should be used at even smaller scales? as you point
out, it's pretty hard to compute a gram
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22914
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22864
Thank you @srowen
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user shahidki31 commented on the issue:
https://github.com/apache/spark/pull/22914
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
1 - 100 of 352 matches
Mail list logo