Re: IllegalAccessError in GraphX (Spark 1.3.0 LDA)
Hi Xiangrui, thank you a lot for the hint! I just tried on another machine with a clean project and there it worked like a charm. Will retry on the other machine tomorrow. Regards, Jeff 2015-03-17 19:57 GMT+01:00 Xiangrui Meng : > Please check your classpath and make sure you don't have multiple > Spark versions deployed. If the classpath looks correct, please create > a JIRA for this issue. Thanks! -Xiangrui > > On Tue, Mar 17, 2015 at 2:03 AM, Jeffrey Jedele > wrote: > > Hi all, > > I'm trying to use the new LDA in mllib, but when trying to train the > model, > > I'm getting following error: > > > > java.lang.IllegalAccessError: tried to access class > > org.apache.spark.util.collection.Sorter from class > > org.apache.spark.graphx.impl.EdgePartitionBuilder > > at > > > org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39) > > at > org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109) > > > > Has anyone seen this yet and has an idea what might be the problem? > > It happens both with the provided sample data and with my own corpus. > > > > Full code + more stack below. > > > > Thx and Regards, > > Jeff > > > > Code: > > -- > > object LdaTest { > > > > def main(args: Array[String]) = { > > val conf = new SparkConf().setAppName("LDA").setMaster("local[4]") > > val sc = new SparkContext(conf) > > > > //val data = > > > scala.io.Source.fromFile("/home/jeff/nmf_compare/scikit_v.txt").getLines().toList > > //val parsedData = data.map(s => Vectors.dense(s.trim().split(" > > ").map(_.toDouble))) > > //val corpus = parsedData.zipWithIndex.map( t => (t._2.toLong, t._1) > ) > > > > //val data = sc.textFile("/home/jeff/nmf_compare/scikit_v.txt") > > val data = > > > sc.textFile("/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt") > > val parsedData = data.map(s => Vectors.dense(s.trim().split(" > > ").map(_.toDouble))) > > val corpus = parsedData.zipWithIndex.map(_.swap).cache() > > > > //val parCorpus = sc.parallelize(corpus) > > //println(parCorpus) > > > > val ldaModel = new LDA().setK(10).run(corpus) > > > > println(ldaModel) > > } > > > > } > > > > Stack: > > > > ... > > 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_0 not found, > > computing it > > 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_1 not found, > > computing it > > 15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading > > rdd_8_0, waiting for it to finish... > > 15/03/17 09:48:50 INFO storage.BlockManager: Found block rdd_4_0 locally > > 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_4_1 not found, > > computing it > > 15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading > > rdd_8_1, waiting for it to finish... > > 15/03/17 09:48:50 INFO rdd.HadoopRDD: Input split: > > > file:/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt:132+132 > > 15/03/17 09:48:50 INFO storage.MemoryStore: ensureFreeSpace(1048) called > > with curMem=47264, maxMem=1965104824 > > 15/03/17 09:48:50 INFO spark.CacheManager: Finished waiting for rdd_8_0 > > 15/03/17 09:48:50 ERROR executor.Executor: Exception in task 0.0 in stage > > 3.0 (TID 3) > > java.lang.IllegalAccessError: tried to access class > > org.apache.spark.util.collection.Sorter from class > > org.apache.spark.graphx.impl.EdgePartitionBuilder > > at > > > org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39) > > at > org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109) > > at > org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:104) > > at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609) > > at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609) > > at > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) > > at > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > at org.apache.spark.graphx.EdgeRDD.compute(EdgeRDD.scala:49) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > at > > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > > at > > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal
Re: IllegalAccessError in GraphX (Spark 1.3.0 LDA)
Please check your classpath and make sure you don't have multiple Spark versions deployed. If the classpath looks correct, please create a JIRA for this issue. Thanks! -Xiangrui On Tue, Mar 17, 2015 at 2:03 AM, Jeffrey Jedele wrote: > Hi all, > I'm trying to use the new LDA in mllib, but when trying to train the model, > I'm getting following error: > > java.lang.IllegalAccessError: tried to access class > org.apache.spark.util.collection.Sorter from class > org.apache.spark.graphx.impl.EdgePartitionBuilder > at > org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39) > at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109) > > Has anyone seen this yet and has an idea what might be the problem? > It happens both with the provided sample data and with my own corpus. > > Full code + more stack below. > > Thx and Regards, > Jeff > > Code: > -- > object LdaTest { > > def main(args: Array[String]) = { > val conf = new SparkConf().setAppName("LDA").setMaster("local[4]") > val sc = new SparkContext(conf) > > //val data = > scala.io.Source.fromFile("/home/jeff/nmf_compare/scikit_v.txt").getLines().toList > //val parsedData = data.map(s => Vectors.dense(s.trim().split(" > ").map(_.toDouble))) > //val corpus = parsedData.zipWithIndex.map( t => (t._2.toLong, t._1) ) > > //val data = sc.textFile("/home/jeff/nmf_compare/scikit_v.txt") > val data = > sc.textFile("/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt") > val parsedData = data.map(s => Vectors.dense(s.trim().split(" > ").map(_.toDouble))) > val corpus = parsedData.zipWithIndex.map(_.swap).cache() > > //val parCorpus = sc.parallelize(corpus) > //println(parCorpus) > > val ldaModel = new LDA().setK(10).run(corpus) > > println(ldaModel) > } > > } > > Stack: > > ... > 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_0 not found, > computing it > 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_1 not found, > computing it > 15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading > rdd_8_0, waiting for it to finish... > 15/03/17 09:48:50 INFO storage.BlockManager: Found block rdd_4_0 locally > 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_4_1 not found, > computing it > 15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading > rdd_8_1, waiting for it to finish... > 15/03/17 09:48:50 INFO rdd.HadoopRDD: Input split: > file:/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt:132+132 > 15/03/17 09:48:50 INFO storage.MemoryStore: ensureFreeSpace(1048) called > with curMem=47264, maxMem=1965104824 > 15/03/17 09:48:50 INFO spark.CacheManager: Finished waiting for rdd_8_0 > 15/03/17 09:48:50 ERROR executor.Executor: Exception in task 0.0 in stage > 3.0 (TID 3) > java.lang.IllegalAccessError: tried to access class > org.apache.spark.util.collection.Sorter from class > org.apache.spark.graphx.impl.EdgePartitionBuilder > at > org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39) > at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109) > at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:104) > at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609) > at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.graphx.EdgeRDD.compute(EdgeRDD.scala:49) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 15/03/17 09:48:50 INFO spark.CacheManager: Whoever was loading rdd_8_0 > failed; we'll try
IllegalAccessError in GraphX (Spark 1.3.0 LDA)
Hi all, I'm trying to use the new LDA in mllib, but when trying to train the model, I'm getting following error: java.lang.IllegalAccessError: tried to access class org.apache.spark.util.collection.Sorter from class org.apache.spark.graphx.impl.EdgePartitionBuilder at org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39) at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109) Has anyone seen this yet and has an idea what might be the problem? It happens both with the provided sample data and with my own corpus. Full code + more stack below. Thx and Regards, Jeff Code: -- object LdaTest { def main(args: Array[String]) = { val conf = new SparkConf().setAppName("LDA").setMaster("local[4]") val sc = new SparkContext(conf) //val data = scala.io.Source.fromFile("/home/jeff/nmf_compare/scikit_v.txt").getLines().toList //val parsedData = data.map(s => Vectors.dense(s.trim().split(" ").map(_.toDouble))) //val corpus = parsedData.zipWithIndex.map( t => (t._2.toLong, t._1) ) //val data = sc.textFile("/home/jeff/nmf_compare/scikit_v.txt") val data = sc.textFile("/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt") val parsedData = data.map(s => Vectors.dense(s.trim().split(" ").map(_.toDouble))) val corpus = parsedData.zipWithIndex.map(_.swap).cache() //val parCorpus = sc.parallelize(corpus) //println(parCorpus) val ldaModel = new LDA().setK(10).run(corpus) println(ldaModel) } } Stack: ... 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_0 not found, computing it 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_1 not found, computing it 15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading rdd_8_0, waiting for it to finish... 15/03/17 09:48:50 INFO storage.BlockManager: Found block rdd_4_0 locally 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_4_1 not found, computing it 15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading rdd_8_1, waiting for it to finish... 15/03/17 09:48:50 INFO rdd.HadoopRDD: Input split: file:/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt:132+132 15/03/17 09:48:50 INFO storage.MemoryStore: ensureFreeSpace(1048) called with curMem=47264, maxMem=1965104824 15/03/17 09:48:50 INFO spark.CacheManager: Finished waiting for rdd_8_0 15/03/17 09:48:50 ERROR executor.Executor: Exception in task 0.0 in stage 3.0 (TID 3) java.lang.IllegalAccessError: tried to access class org.apache.spark.util.collection.Sorter from class org.apache.spark.graphx.impl.EdgePartitionBuilder at org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39) at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109) at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:104) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61) at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.graphx.EdgeRDD.compute(EdgeRDD.scala:49) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/03/17 09:48:50 INFO spark.CacheManager: Whoever was loading rdd_8_0 failed; we'll try it ourselves 15/03/17 09:48:50 INFO storage.MemoryStore: Block rdd_4_1 stored as values in memory (estimated size 1048.0 B, free 1874.0 MB) 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_0 not found, computing it 15/03/17 09:48:50 INFO storage.BlockManagerInfo: Added rdd_4_1 in memory on 10.2.200.66:51465 (size: 1048.0 B, free: 1874.1 MB) 15/03/17 09:48:50 INFO storage.BlockManager: Found block rdd_4_0 locally 15/03/17 09