[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2017-10-05 Thread Ilya Matiach (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193675#comment-16193675
 ] 

Ilya Matiach commented on SPARK-16473:
--

[~podongfeng] interesting - it looks like the dataset representation is somehow 
changing when it is cached?  My guess is that the row order may be changing or 
the numeric values may be changing?  The test failure itself is ok if the 
number of clusters is equal to k (which is actually perfectly fine for the 
algorithm), it just means that the dataset was not generated correctly to hit 
the very special edge case I was looking for, where one cluster is empty after 
a split in bisecting k-means.  I can't seem to see the test failure error 
message in your PR, could you run another build and post it here?  We may need 
to add some debugging/print statements everywhere to determine how the data is 
changing when you cache it - this doesn't mean there is any bug in the 
algorithm, it just means the test needs to be changed so that the test data, 
even after caching, is the same as the original one.

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>Assignee: Ilya Matiach
> Fix For: 2.1.1, 2.2.0
>
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2017-08-16 Thread zhengruifeng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129818#comment-16129818
 ] 

zhengruifeng commented on SPARK-16473:
--

If the {{sparseDataset}} in  {{BisectingKMeansSuite}} is cached, then test case 
{{SPARK-16473: Verify Bisecting K-Means does not fail in edge case whereone 
cluster is empty after split}} always fails.
See PR 
https://github.com/apache/spark/pull/16763/files#diff-beaf4409631709a875704e6a4d0a1c13R37
 

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>Assignee: Ilya Matiach
> Fix For: 2.1.1, 2.2.0
>
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-20 Thread Ilya Matiach (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765027#comment-15765027
 ] 

Ilya Matiach commented on SPARK-16473:
--

Do you have a smaller dataset than the one in the description which can 
reproduce the bug?

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-20 Thread Ilya Matiach (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764964#comment-15764964
 ] 

Ilya Matiach commented on SPARK-16473:
--

If you could put the sample dataset on google drive or one drive and send me 
the link that would be great.  Putting the dataset on github would work too.  
How large is the dataset?

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-20 Thread Ilya Matiach (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764909#comment-15764909
 ] 

Ilya Matiach commented on SPARK-16473:
--

I've added a pull request here:
https://github.com/apache/spark/pull/16355

It would be nice to add a test case in spark itself to verify the code fix.

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764905#comment-15764905
 ] 

Apache Spark commented on SPARK-16473:
--

User 'imatiach-msft' has created a pull request for this issue:
https://github.com/apache/spark/pull/16355

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-20 Thread Ilya Matiach (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764892#comment-15764892
 ] 

Ilya Matiach commented on SPARK-16473:
--

I will start a pull request for the change.  I would like to add a test case 
that verifies the bug is fixed though.  Maybe you can send the sample dataset 
through github, and I can take a look?

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-19 Thread Alok Bhandari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763324#comment-15763324
 ] 

Alok Bhandari commented on SPARK-16473:
---

[~imatiach] , thanks for showing interest in this issue. I will try to share 
the dataset with you , please can you suggest where should I share it ? should 
I share it through github? is it fine?

Also , I have tried to diagnose this issue on my own , from my analysis it 
looks like , it is failing if it tries to bisect a node which does not have any 
children. I also have added a code fix , but not sure if this is the correct 
solution :- 

*Suggested solution*
{code:BisectingKMeans}
 private def updateAssignments(
  assignments: RDD[(Long, VectorWithNorm)],
  divisibleIndices: Set[Long],
  newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, 
VectorWithNorm)] = {
assignments.map { case (index, v) =>
  if (divisibleIndices.contains(index)) {
val children = Seq(leftChildIndex(index), rightChildIndex(index))
if ( children.length>0 ) {
val selected = children.minBy { child =>
  KMeans.fastSquaredDistance(newClusterCenters(child), v)
}
(selected, v)
}else {
  (index, v)
}
  } else {
(index, v)
  }
}
  }
{code}

*Original code* 
{code:BsiectingKMeans}
  private def updateAssignments(
  assignments: RDD[(Long, VectorWithNorm)],
  divisibleIndices: Set[Long],
  newClusterCenters: Map[Long, VectorWithNorm]): RDD[(Long, 
VectorWithNorm)] = {
assignments.map { case (index, v) =>
  if (divisibleIndices.contains(index)) {
val children = Seq(leftChildIndex(index), rightChildIndex(index))
val selected = children.minBy { child =>
  KMeans.fastSquaredDistance(newClusterCenters(child), v)
}
(selected, v)
  } else {
(index, v)
  }
}
  }
{code}

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> 

[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-12-19 Thread Ilya Matiach (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762532#comment-15762532
 ] 

Ilya Matiach commented on SPARK-16473:
--

I'm interested in looking into this issue.  Would it be possible to get a 
dataset (either the original one or some mock dataset) which can be used to 
reproduce this error?

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-11-08 Thread Alok Bhandari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647165#comment-15647165
 ] 

Alok Bhandari commented on SPARK-16473:
---

[~josephkb] , I have just found that you have worked on mllib , please can you 
help me out getting input about this issue.

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-10-27 Thread Alok Bhandari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611340#comment-15611340
 ] 

Alok Bhandari commented on SPARK-16473:
---

This issue continue to exist for spark 2.0 "ml" library. Is this feature going 
to get any support?

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 2.0.0
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

2016-07-25 Thread Alok Bhandari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391713#comment-15391713
 ] 

Alok Bhandari commented on SPARK-16473:
---

After reducing maxIterations for BisectingKMeans it finished successfully , 
does that mean maxIterations are data-set specific ?

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key 
> not found
> --
>
> Key: SPARK-16473
> URL: https://issues.apache.org/jira/browse/SPARK-16473
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.1
> Environment: AWS EC2 linux instance. 
>Reporter: Alok Bhandari
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with 
> java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow 
> exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is 
> failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
> at scala.collection.MapLike$class.default(MapLike.scala:228) 
> at scala.collection.AbstractMap.default(Map.scala:58) 
> at scala.collection.MapLike$class.apply(MapLike.scala:141) 
> at scala.collection.AbstractMap.apply(Map.scala:58) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
> at 
> scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
>  
> at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>  
> at scala.collection.immutable.List.foldLeft(List.scala:84) 
> at 
> scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
>  
> at scala.collection.immutable.List.reduceLeft(List.scala:84) 
> at 
> scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
> at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
>  
> at 
> org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why 
> it failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org