Re: IllegalAccessError in GraphX (Spark 1.3.0 LDA)

2015-03-17 Thread Jeffrey Jedele
Hi Xiangrui,
thank you a lot for the hint!

I just tried on another machine with a clean project and there it worked
like a charm. Will retry on the other machine tomorrow.

Regards,
Jeff

2015-03-17 19:57 GMT+01:00 Xiangrui Meng :

> Please check your classpath and make sure you don't have multiple
> Spark versions deployed. If the classpath looks correct, please create
> a JIRA for this issue. Thanks! -Xiangrui
>
> On Tue, Mar 17, 2015 at 2:03 AM, Jeffrey Jedele
>  wrote:
> > Hi all,
> > I'm trying to use the new LDA in mllib, but when trying to train the
> model,
> > I'm getting following error:
> >
> > java.lang.IllegalAccessError: tried to access class
> > org.apache.spark.util.collection.Sorter from class
> > org.apache.spark.graphx.impl.EdgePartitionBuilder
> > at
> >
> org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39)
> > at
> org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109)
> >
> > Has anyone seen this yet and has an idea what might be the problem?
> > It happens both with the provided sample data and with my own corpus.
> >
> > Full code + more stack below.
> >
> > Thx and Regards,
> > Jeff
> >
> > Code:
> > --
> > object LdaTest {
> >
> >   def main(args: Array[String]) = {
> > val conf = new SparkConf().setAppName("LDA").setMaster("local[4]")
> > val sc = new SparkContext(conf)
> >
> > //val data =
> >
> scala.io.Source.fromFile("/home/jeff/nmf_compare/scikit_v.txt").getLines().toList
> > //val parsedData = data.map(s => Vectors.dense(s.trim().split("
> > ").map(_.toDouble)))
> > //val corpus = parsedData.zipWithIndex.map( t => (t._2.toLong, t._1)
> )
> >
> > //val data = sc.textFile("/home/jeff/nmf_compare/scikit_v.txt")
> > val data =
> >
> sc.textFile("/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt")
> > val parsedData = data.map(s => Vectors.dense(s.trim().split("
> > ").map(_.toDouble)))
> > val corpus = parsedData.zipWithIndex.map(_.swap).cache()
> >
> > //val parCorpus = sc.parallelize(corpus)
> > //println(parCorpus)
> >
> > val ldaModel = new LDA().setK(10).run(corpus)
> >
> > println(ldaModel)
> >   }
> >
> > }
> >
> > Stack:
> > 
> > ...
> > 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_0 not found,
> > computing it
> > 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_1 not found,
> > computing it
> > 15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading
> > rdd_8_0, waiting for it to finish...
> > 15/03/17 09:48:50 INFO storage.BlockManager: Found block rdd_4_0 locally
> > 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_4_1 not found,
> > computing it
> > 15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading
> > rdd_8_1, waiting for it to finish...
> > 15/03/17 09:48:50 INFO rdd.HadoopRDD: Input split:
> >
> file:/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt:132+132
> > 15/03/17 09:48:50 INFO storage.MemoryStore: ensureFreeSpace(1048) called
> > with curMem=47264, maxMem=1965104824
> > 15/03/17 09:48:50 INFO spark.CacheManager: Finished waiting for rdd_8_0
> > 15/03/17 09:48:50 ERROR executor.Executor: Exception in task 0.0 in stage
> > 3.0 (TID 3)
> > java.lang.IllegalAccessError: tried to access class
> > org.apache.spark.util.collection.Sorter from class
> > org.apache.spark.graphx.impl.EdgePartitionBuilder
> > at
> >
> org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39)
> > at
> org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109)
> > at
> org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:104)
> > at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
> > at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
> > at
> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> > at
> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> > at org.apache.spark.graphx.EdgeRDD.compute(EdgeRDD.scala:49)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> > at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> > at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal

Re: IllegalAccessError in GraphX (Spark 1.3.0 LDA)

2015-03-17 Thread Xiangrui Meng
Please check your classpath and make sure you don't have multiple
Spark versions deployed. If the classpath looks correct, please create
a JIRA for this issue. Thanks! -Xiangrui

On Tue, Mar 17, 2015 at 2:03 AM, Jeffrey Jedele
 wrote:
> Hi all,
> I'm trying to use the new LDA in mllib, but when trying to train the model,
> I'm getting following error:
>
> java.lang.IllegalAccessError: tried to access class
> org.apache.spark.util.collection.Sorter from class
> org.apache.spark.graphx.impl.EdgePartitionBuilder
> at
> org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39)
> at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109)
>
> Has anyone seen this yet and has an idea what might be the problem?
> It happens both with the provided sample data and with my own corpus.
>
> Full code + more stack below.
>
> Thx and Regards,
> Jeff
>
> Code:
> --
> object LdaTest {
>
>   def main(args: Array[String]) = {
> val conf = new SparkConf().setAppName("LDA").setMaster("local[4]")
> val sc = new SparkContext(conf)
>
> //val data =
> scala.io.Source.fromFile("/home/jeff/nmf_compare/scikit_v.txt").getLines().toList
> //val parsedData = data.map(s => Vectors.dense(s.trim().split("
> ").map(_.toDouble)))
> //val corpus = parsedData.zipWithIndex.map( t => (t._2.toLong, t._1) )
>
> //val data = sc.textFile("/home/jeff/nmf_compare/scikit_v.txt")
> val data =
> sc.textFile("/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt")
> val parsedData = data.map(s => Vectors.dense(s.trim().split("
> ").map(_.toDouble)))
> val corpus = parsedData.zipWithIndex.map(_.swap).cache()
>
> //val parCorpus = sc.parallelize(corpus)
> //println(parCorpus)
>
> val ldaModel = new LDA().setK(10).run(corpus)
>
> println(ldaModel)
>   }
>
> }
>
> Stack:
> 
> ...
> 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_0 not found,
> computing it
> 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_1 not found,
> computing it
> 15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading
> rdd_8_0, waiting for it to finish...
> 15/03/17 09:48:50 INFO storage.BlockManager: Found block rdd_4_0 locally
> 15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_4_1 not found,
> computing it
> 15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading
> rdd_8_1, waiting for it to finish...
> 15/03/17 09:48:50 INFO rdd.HadoopRDD: Input split:
> file:/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt:132+132
> 15/03/17 09:48:50 INFO storage.MemoryStore: ensureFreeSpace(1048) called
> with curMem=47264, maxMem=1965104824
> 15/03/17 09:48:50 INFO spark.CacheManager: Finished waiting for rdd_8_0
> 15/03/17 09:48:50 ERROR executor.Executor: Exception in task 0.0 in stage
> 3.0 (TID 3)
> java.lang.IllegalAccessError: tried to access class
> org.apache.spark.util.collection.Sorter from class
> org.apache.spark.graphx.impl.EdgePartitionBuilder
> at
> org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39)
> at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109)
> at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:104)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.graphx.EdgeRDD.compute(EdgeRDD.scala:49)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 15/03/17 09:48:50 INFO spark.CacheManager: Whoever was loading rdd_8_0
> failed; we'll try

IllegalAccessError in GraphX (Spark 1.3.0 LDA)

2015-03-17 Thread Jeffrey Jedele
Hi all,
I'm trying to use the new LDA in mllib, but when trying to train the model,
I'm getting following error:

java.lang.IllegalAccessError: tried to access class
org.apache.spark.util.collection.Sorter from class
org.apache.spark.graphx.impl.EdgePartitionBuilder
at
org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39)
at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109)

Has anyone seen this yet and has an idea what might be the problem?
It happens both with the provided sample data and with my own corpus.

Full code + more stack below.

Thx and Regards,
Jeff

Code:
--
object LdaTest {

  def main(args: Array[String]) = {
val conf = new SparkConf().setAppName("LDA").setMaster("local[4]")
val sc = new SparkContext(conf)

//val data =
scala.io.Source.fromFile("/home/jeff/nmf_compare/scikit_v.txt").getLines().toList
//val parsedData = data.map(s => Vectors.dense(s.trim().split("
").map(_.toDouble)))
//val corpus = parsedData.zipWithIndex.map( t => (t._2.toLong, t._1) )

//val data = sc.textFile("/home/jeff/nmf_compare/scikit_v.txt")
val data =
sc.textFile("/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt")
val parsedData = data.map(s => Vectors.dense(s.trim().split("
").map(_.toDouble)))
val corpus = parsedData.zipWithIndex.map(_.swap).cache()

//val parCorpus = sc.parallelize(corpus)
//println(parCorpus)

val ldaModel = new LDA().setK(10).run(corpus)

println(ldaModel)
  }

}

Stack:

...
15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_0 not found,
computing it
15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_1 not found,
computing it
15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading
rdd_8_0, waiting for it to finish...
15/03/17 09:48:50 INFO storage.BlockManager: Found block rdd_4_0 locally
15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_4_1 not found,
computing it
15/03/17 09:48:50 INFO spark.CacheManager: Another thread is loading
rdd_8_1, waiting for it to finish...
15/03/17 09:48:50 INFO rdd.HadoopRDD: Input split:
file:/home/jeff/Downloads/spark-1.3.0-bin-hadoop2.4/data/mllib/sample_lda_data.txt:132+132
15/03/17 09:48:50 INFO storage.MemoryStore: ensureFreeSpace(1048) called
with curMem=47264, maxMem=1965104824
15/03/17 09:48:50 INFO spark.CacheManager: Finished waiting for rdd_8_0
15/03/17 09:48:50 ERROR executor.Executor: Exception in task 0.0 in stage
3.0 (TID 3)
java.lang.IllegalAccessError: tried to access class
org.apache.spark.util.collection.Sorter from class
org.apache.spark.graphx.impl.EdgePartitionBuilder
at
org.apache.spark.graphx.impl.EdgePartitionBuilder.toEdgePartition(EdgePartitionBuilder.scala:39)
at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:109)
at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:104)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:609)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.graphx.EdgeRDD.compute(EdgeRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/03/17 09:48:50 INFO spark.CacheManager: Whoever was loading rdd_8_0
failed; we'll try it ourselves
15/03/17 09:48:50 INFO storage.MemoryStore: Block rdd_4_1 stored as values
in memory (estimated size 1048.0 B, free 1874.0 MB)
15/03/17 09:48:50 INFO spark.CacheManager: Partition rdd_8_0 not found,
computing it
15/03/17 09:48:50 INFO storage.BlockManagerInfo: Added rdd_4_1 in memory on
10.2.200.66:51465 (size: 1048.0 B, free: 1874.1 MB)
15/03/17 09:48:50 INFO storage.BlockManager: Found block rdd_4_0 locally
15/03/17 09