Re: spark-itemsimilarity can't launch on a Spark cluster?

chepoo Tue, 14 Oct 2014 08:32:57 -0700

Hi Pat,
        surely, CPU use is not very high, I will to reduce the number of the 
core.


On Oct 14, 2014, at 21:03, Pat Ferrel <[email protected]> wrote:

> So that is 1g per core? That doesn’t sound like enough. Look for a way to use 
> less cores and allocate more memory per core maybe.
> 
> On Oct 13, 2014, at 8:01 PM, chepoo <[email protected]> wrote:
> 
> Hi Pat,
>       I have no enough memory. A total of three machines, each machine only 
> 16g of memory. Users will be about two million, items about one million. so 
> the history data about 2g.
> 
> On Oct 13, 2014, at 23:34, Pat Ferrel <[email protected]> wrote:
> 
>> You have 256G of memory in each node machine partitioned to 16g per core?
>> 
>> If so you should have -sem 256g or a little less since that is how much 
>> memory per node to allocate. All cores of a node will share this memory.
>> 
>> The only unusual memory consideration is the dictionaries, which are 
>> broadcast to each node and shared by each task on the node during read and 
>> write. So there needs to be enough memory to store one copy of each 
>> dictionary per node. A dictionary is a bi-directional hashmap. This will be 
>> a max of one item and one user id  dictionaries that that are broadcast for 
>> the duration of the read and write tasks. If a problem is occurring during 
>> reading or writing it might be the dictionaries but with 256g per node this 
>> seems unlikely. How many users and items?
>> 
>> 
>> On Oct 13, 2014, at 2:30 AM, pol <[email protected]> wrote:
>> 
>> Hi Pat,
>>      yes, I manually stopped it running, but there are some wrong, may be a 
>> configuration errors may be insufficient memory, I have to spark mailing 
>> lists for help.
>>      The spark-itemsimilarity another problem I consulting in separate mail. 
>> Thank you.
>> 
>> 
>> On Oct 11, 2014, at 09:22, Pat Ferrel <[email protected]> wrote:
>> 
>>> Did you stop the 1.6g job or did it fail?
>>> 
>>> I see task failures but no stage failures.
>>> 
>>> 
>>> On Oct 10, 2014, at 8:49 AM, pol <[email protected]> wrote:
>>> 
>>> Hi Pat,
>>>     Yes, spark-itemsimilarity can be work ok, it had been finished 
>>> calculation on 150m dataset.
>>> 
>>>     The problem above, 1.6g dataset can’t be finishing calculation, I have 
>>> three machines(16 cores and 16g memory per) for this test, the environment 
>>> can't finish the calculation?
>>>     The dataset had archived one file by hadoop archive tool, such as only 
>>> a machine at processing state. To do so because don’t archive will be 
>>> coming some error, about information can refer to the attachment.
>>>     <spark1.png>
>>> 
>>> <spark2.png>
>>> 
>>> <spark3.png>
>>> 
>>> 
>>>     If you can, I will provide the test dataset to you. 
>>> 
>>>     Thank you again.
>>> 
>>> 
>>> On Oct 10, 2014, at 22:07, Pat Ferrel <[email protected]> wrote:
>>> 
>>>> So it is completing some of the spar-itemsimilarity jobs now? That is 
>>>> better at least.
>>>> 
>>>> Yes. More data means you may need more memory or more nodes in your 
>>>> cluster. This is how to scale Spark and Hadoop. Spark in particular needs 
>>>> core memory since it tries to avoid disk read/write.
>>>> 
>>>> Try increasing -sem as fas as you can first then you may need to add 
>>>> machines to your cluster tp speed it up. Do you need results faster than 
>>>> 15 hours.
>>>> 
>>>> Remember the way the Solr recommender works allows you to make 
>>>> recommendations to new users and train less often. The new user data does 
>>>> no have to be in the training/indicator data. You train partly based on 
>>>> how many new user but partly based on how many new items are added to the 
>>>> catalog.
>>>> 
>>>> A\On Oct 10, 2014, at 1:47 AM, pol <[email protected]> wrote:
>>>> 
>>>> Hi Pat,
>>>>    Because of a holiday, now just reply.
>>>> 
>>>>    I changed 1.0.2 to 1.0.1 for mahout-1.0-SNAPSHOT, and use Spark 1.0.1 , 
>>>> Hadoop 2.4.0, spark-itemsimilarity can be work ok. But have a new question:
>>>>    mahout spark-itemsimilarity -i /view_input,/purchase_input -o /output 
>>>> -os -ma spark://recommend1:7077 -sem 15g -f1 purchase -f2 view -ic 2 -fc 1 
>>>> -m 36
>>>> 
>>>>    When "view" data:1.6g and "purchase" data:60m, this shell 15 hours are 
>>>> not performed("indicator-matrix" had computed, and 
>>>> "cross-indicator-matrix" computing), but "view" data:100m finished 2 
>>>> minutes to perform, this is the reason of data?
>>>> 
>>>> 
>>>> On Oct 1, 2014, at 01:10, Pat Ferrel <[email protected]> wrote:
>>>> 
>>>>> This will not be fixed in Mahout 1.0 unless we can find a problem in 
>>>>> Mahout now. I am the one who would fix it. At present it looks to me like 
>>>>> a Spark version or setup problem.
>>>>> 
>>>>> These errors seem to indicate that the build or setup have a problems. It 
>>>>> seems that you cannot use Spark 1.10. Set up your cluster to use 
>>>>> mahout-1.0-SNAPSHOT with pom set to back to spark-1.0.1, Spark 1.0.1 
>>>>> build for Hadoop 2.4, and Hadoop 2.4. This is the only combination that 
>>>>> is supposed to work together.
>>>>> 
>>>>> If this still fails it may be a setup problems since I can run on a 
>>>>> cluster just fine with my setup. When you get an error from this config 
>>>>> send it to me and the Spark user list to see if they can give us a clue.
>>>>> 
>>>>> Question: Do you have mahout-1.0-SNAPSHOT and spark installed on all your 
>>>>> cluster machines, with the correct environment variables and path?
>>>>> 
>>>>> 
>>>>> On Sep 30, 2014, at 12:47 AM, pol <[email protected]> wrote:
>>>>> 
>>>>> Hi Pat, 
>>>>>   It’s problem for Spark version, but spark-itemsimilarity is still can't 
>>>>> the completion of normal.
>>>>> 
>>>>> 1. Change 1.0.1 to 1.1.0 at mahout-1.0-SNAPSHOT/pom.xml, Spark version 
>>>>> compatibility is no problem, but the program has a problem:
>>>>> --------------------------------------------------------------
>>>>> 14/09/30 11:26:04 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 
>>>>> 10.1 (TID 31, Hadoop.Slave1): java.lang.NoClassDefFoundError:  
>>>>>    org/apache/commons/math3/random/RandomGenerator
>>>>>    org.apache.mahout.common.RandomUtils.getRandom(RandomUtils.java:65)
>>>>>    
>>>>> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:228)
>>>>>    
>>>>> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:223)
>>>>>    
>>>>> org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.scala:33)
>>>>>    
>>>>> org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.scala:32)
>>>>>    scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>    scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>>>>    
>>>>> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:235)
>>>>>    org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
>>>>>    org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
>>>>>    org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
>>>>>    org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>>>>    org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>>    org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>>    
>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>>    org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>>    org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>>    org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>>>>    org.apache.spark.scheduler.Task.run(Task.scala:54)
>>>>>    org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>>>>    
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>>    
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>>    java.lang.Thread.run(Thread.java:662)
>>>>> --------------------------------------------------------------
>>>>> I tried to add commons-math3-3.2.jar to mahout-1.0-SNAPSHOT/lib, but 
>>>>> still the same. (It not directly use the RandomGenerator at 
>>>>> RandomUtils.java:65)
>>>>> 
>>>>> 
>>>>> 2. Change 1.0.1 to 1.0.2 at mahout-1.0-SNAPSHOT/pom.xml, there are still 
>>>>> other errors:
>>>>> --------------------------------------------------------------
>>>>> 14/09/30 14:36:57 WARN scheduler.TaskSetManager: Lost TID 427 (task 
>>>>> 7.0:51)
>>>>> 14/09/30 14:36:57 WARN scheduler.TaskSetManager: Loss was due to 
>>>>> java.lang.ClassCastException
>>>>> java.lang.ClassCastException: scala.Tuple1 cannot be cast to scala.Tuple2
>>>>>    at 
>>>>> org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$4.apply(TextDelimitedReaderWriter.scala:75)
>>>>>    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>    at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59)
>>>>>    at 
>>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
>>>>>    at 
>>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
>>>>>    at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594)
>>>>>    at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594)
>>>>>    at 
>>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>>    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>>    at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>>    at 
>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
>>>>>    at 
>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>>>>>    at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>>    at 
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>>    at 
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>>    at 
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>>    at java.lang.Thread.run(Thread.java:662)
>>>>> --------------------------------------------------------------
>>>>> Please refer to the attachment for full log.
>>>>> <screenlog_bash.log>
>>>>> 
>>>>> 
>>>>> 
>>>>> In addition, I used 66 files on HDFS than each file in 20 to 30 M,  if it 
>>>>> is necessary I will provide the data.
>>>>> Shell is : mahout spark-itemsimilarity -i 
>>>>> /rec/input/ss/others,/rec/input/ss/weblog -o /rec/output/ss -os -ma 
>>>>> spark://recommend1:7077 -sem 4g -f1 purchase -f2 view -ic 2 -fc 1
>>>>> Spark cluster: 8 workers, 32 cores total, 32G memory total, at two 
>>>>> machines.
>>>>> 
>>>>> Feeling a few days are not solved, not as good as waiting for Mahout 1.0 
>>>>> release version or use mahout item similarity.
>>>>> 
>>>>> 
>>>>> Thank you again, Pat.
>>>>> 
>>>>> 
>>>>> On Sep 29, 2014, at 00:02, Pat Ferrel <[email protected]> wrote:
>>>>> 
>>>>>> It looks like the cluster version of spark-itemsimilarity is never 
>>>>>> accepted by the Spark master. it fails in 
>>>>>> TextDelimitedReaderWriter.scala because all work is using “lazy” 
>>>>>> evaluation and until the write no actual work is done on the Spark 
>>>>>> cluster.
>>>>>> 
>>>>>> However your cluster seems to be working with the Pi example. Therefore 
>>>>>> there must be something wrong with the Mahout build or config. Some 
>>>>>> ideas:
>>>>>> 
>>>>>> 1) Mahout 1.0-SNAPSHOT is targeted for Spark 1.0.1.  However I use 1.0.2 
>>>>>> and it seems to work. You might try changing the version in the pom.xml 
>>>>>> and do a clean build of Mahout. Change the version number in 
>>>>>> mahout/pom.xml
>>>>>> 
>>>>>> mahout/pom.xml
>>>>>> -     <spark.version>1.0.1</spark.version>
>>>>>> +    <spark.version>1.1.0</spark.version>
>>>>>> 
>>>>>> This may not be needed but it is easier than installing Spark 1.0.1.
>>>>>> 
>>>>>> 2) Try installing and building Mahout on all cluster machines. I do this 
>>>>>> so I can run the Mahout spark-shell on any machine but it may be needed. 
>>>>>> The Mahout jars, path setup, and directory structure should be the same 
>>>>>> on all cluster machines.
>>>>>> 
>>>>>> 3) Try making -sem larger. I usually make it as large a I can on the 
>>>>>> cluster and try smaller until it affects performance. The epinions 
>>>>>> dataset that I use for testing on my cluster requires -sem 6g.
>>>>>> 
>>>>>> My cluster has 3 machines with Hadoop 1.2.1 and Spark 1.0.2.  I can try 
>>>>>> running your data through spark-itemsimilarity on my cluster if you can 
>>>>>> share it. I will sign an NDA and destroy it after the test.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sep 27, 2014, at 5:28 AM, pol <[email protected]> wrote:
>>>>>> 
>>>>>> Hi Pat,
>>>>>>  Thank for your’s reply. It's still can't work normal, I tested it on a 
>>>>>> Spark standalone cluster, don’t tested it on a YARN cluster.
>>>>>> 
>>>>>> First, test the cluster configuration is correct. 
>>>>>> http:///Hadoop.Master:8080 infos:
>>>>>> -----------------------------------
>>>>>> URL: spark://Hadoop.Master:7077
>>>>>> Workers: 2
>>>>>> Cores: 4 Total, 0 Used
>>>>>> Memory: 2.0 GB Total, 0.0 B Used
>>>>>> Applications: 0 Running, 1 Completed
>>>>>> Drivers: 0 Running, 0 Completed
>>>>>> Status: ALIVE
>>>>>> ----------------------------------
>>>>>> 
>>>>>> Environment
>>>>>> ----------------------------------
>>>>>> OS: CentOS release 6.5 (Final)
>>>>>> JDK: 1.6.0_45
>>>>>> Mahout: mahout-1.0-SNAPSHOT(mvn -Dhadoop2.version=2.4.1 -DskipTests 
>>>>>> clean package)
>>>>>> Hadoop: 2.4.1
>>>>>> Spark: spark-1.1.0-bin-2.4.1(mvn -Pyarn -Phadoop-2.4 
>>>>>> -Dhadoop.version=2.4.1 -Phive -DskipTests clean package)
>>>>>> ----------------------------------
>>>>>> 
>>>>>> Shell:
>>>>>> spark-submit --class org.apache.spark.examples.SparkPi --master 
>>>>>> spark://Hadoop.Master:7077 --executor-memory 1g --total-executor-cores 2 
>>>>>> /root/spark-examples_2.10-1.1.0.jar 1000
>>>>>> 
>>>>>> It’s work ok, a part of the log for the shell:
>>>>>> ----------------------------------
>>>>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 995.0 in 
>>>>>> stage 0.0 (TID 995) in 17 ms on Hadoop.Slave1 (996/1000)
>>>>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Starting task 998.0 in 
>>>>>> stage 0.0 (TID 998, Hadoop.Slave2, PROCESS_LOCAL, 1225 bytes)
>>>>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 996.0 in 
>>>>>> stage 0.0 (TID 996) in 20 ms on Hadoop.Slave2 (997/1000)
>>>>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Starting task 999.0 in 
>>>>>> stage 0.0 (TID 999, Hadoop.Slave1, PROCESS_LOCAL, 1225 bytes)
>>>>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 997.0 in 
>>>>>> stage 0.0 (TID 997) in 27 ms on Hadoop.Slave1 (998/1000)
>>>>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 998.0 in 
>>>>>> stage 0.0 (TID 998) in 31 ms on Hadoop.Slave2 (999/1000)
>>>>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 999.0 in 
>>>>>> stage 0.0 (TID 999) in 20 ms on Hadoop.Slave1 (1000/1000)
>>>>>> 14/09/19 19:48:00 INFO scheduler.DAGScheduler: Stage 0 (reduce at 
>>>>>> SparkPi.scala:35) finished in 25.109 s
>>>>>> 14/09/19 19:48:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, 
>>>>>> whose tasks have all completed, from pool
>>>>>> 14/09/19 19:48:00 INFO spark.SparkContext: Job finished: reduce at 
>>>>>> SparkPi.scala:35, took 26.156022565 s
>>>>>> Pi is roughly 3.14156112
>>>>>> ----------------------------------
>>>>>> 
>>>>>> Second, test spark-itemsimilarity on "local", it's work ok, shell:
>>>>>> mahout spark-itemsimilarity -i /test/ss/input/data.txt -o 
>>>>>> /test/ss/output -os -ma local[2] -sem 512m -f1 purchase -f2 view -ic 2 
>>>>>> -fc 1
>>>>>> 
>>>>>> Third, test spark-itemsimilarity on "cluster", shell:
>>>>>> mahout spark-itemsimilarity -i /test/ss/input/data.txt -o 
>>>>>> /test/ss/output -os -ma spark://Hadoop.Master:7077 -sem 512m -f1 
>>>>>> purchase -f2 view -ic 2 -fc 1
>>>>>> 
>>>>>> It’s can’t work, full logs:
>>>>>> ----------------------------------
>>>>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>> SLF4J: Found binding in 
>>>>>> [jar:file:/usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: Found binding in 
>>>>>> [jar:file:/usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: Found binding in 
>>>>>> [jar:file:/usr/spark-1.1.0-bin-2.4.1/lib/spark-assembly-1.1.0-hadoop2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
>>>>>> explanation.
>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>>> 14/09/19 20:31:07 INFO spark.SecurityManager: Changing view acls to: root
>>>>>> 14/09/19 20:31:07 INFO spark.SecurityManager: SecurityManager: 
>>>>>> authentication disabled; ui acls disabled; users with view permissions: 
>>>>>> Set(root)
>>>>>> 14/09/19 20:31:08 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>>>>> 14/09/19 20:31:08 INFO Remoting: Starting remoting
>>>>>> 14/09/19 20:31:08 INFO Remoting: Remoting started; listening on 
>>>>>> addresses :[akka.tcp://[email protected]:47597]
>>>>>> 14/09/19 20:31:08 INFO Remoting: Remoting now listens on addresses: 
>>>>>> [akka.tcp://[email protected]:47597]
>>>>>> 14/09/19 20:31:08 INFO spark.SparkEnv: Registering MapOutputTracker
>>>>>> 14/09/19 20:31:08 INFO spark.SparkEnv: Registering BlockManagerMaster
>>>>>> 14/09/19 20:31:08 INFO storage.DiskBlockManager: Created local directory 
>>>>>> at /tmp/spark-local-20140919203108-e4e3
>>>>>> 14/09/19 20:31:08 INFO storage.MemoryStore: MemoryStore started with 
>>>>>> capacity 2.3 GB.
>>>>>> 14/09/19 20:31:08 INFO network.ConnectionManager: Bound socket to port 
>>>>>> 47186 with id = ConnectionManagerId(Hadoop.Master,47186)
>>>>>> 14/09/19 20:31:08 INFO storage.BlockManagerMaster: Trying to register 
>>>>>> BlockManager
>>>>>> 14/09/19 20:31:08 INFO storage.BlockManagerInfo: Registering block 
>>>>>> manager Hadoop.Master:47186 with 2.3 GB RAM
>>>>>> 14/09/19 20:31:08 INFO storage.BlockManagerMaster: Registered 
>>>>>> BlockManager
>>>>>> 14/09/19 20:31:08 INFO spark.HttpServer: Starting HTTP Server
>>>>>> 14/09/19 20:31:08 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>>>>> 14/09/19 20:31:08 INFO server.AbstractConnector: Started 
>>>>>> [email protected]:41116
>>>>>> 14/09/19 20:31:08 INFO broadcast.HttpBroadcast: Broadcast server started 
>>>>>> at http://192.168.204.128:41116
>>>>>> 14/09/19 20:31:08 INFO spark.HttpFileServer: HTTP File server directory 
>>>>>> is /tmp/spark-10744709-bbeb-4d79-8bfe-d64d77799fb3
>>>>>> 14/09/19 20:31:08 INFO spark.HttpServer: Starting HTTP Server
>>>>>> 14/09/19 20:31:08 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>>>>> 14/09/19 20:31:08 INFO server.AbstractConnector: Started 
>>>>>> [email protected]:59137
>>>>>> 14/09/19 20:31:09 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>>>>> 14/09/19 20:31:09 INFO server.AbstractConnector: Started 
>>>>>> [email protected]:4040
>>>>>> 14/09/19 20:31:09 INFO ui.SparkUI: Started SparkUI at 
>>>>>> http://Hadoop.Master:4040
>>>>>> 14/09/19 20:31:10 WARN util.NativeCodeLoader: Unable to load 
>>>>>> native-hadoop library for your platform... using builtin-java classes 
>>>>>> where applicable
>>>>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR 
>>>>>> /usr/mahout-1.0-SNAPSHOT/math-scala/target/mahout-math-scala_2.10-1.0-SNAPSHOT.jar
>>>>>>  at 
>>>>>> http://192.168.204.128:59137/jars/mahout-math-scala_2.10-1.0-SNAPSHOT.jar
>>>>>>  with timestamp 1411129870562
>>>>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR 
>>>>>> /usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT.jar
>>>>>>  at http://192.168.204.128:59137/jars/mahout-mrlegacy-1.0-SNAPSHOT.jar 
>>>>>> with timestamp 1411129870588
>>>>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR 
>>>>>> /usr/mahout-1.0-SNAPSHOT/math/target/mahout-math-1.0-SNAPSHOT.jar at 
>>>>>> http://192.168.204.128:59137/jars/mahout-math-1.0-SNAPSHOT.jar with 
>>>>>> timestamp 1411129870612
>>>>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR 
>>>>>> /usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAPSHOT.jar 
>>>>>> at http://192.168.204.128:59137/jars/mahout-spark_2.10-1.0-SNAPSHOT.jar 
>>>>>> with timestamp 1411129870618
>>>>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR 
>>>>>> /usr/mahout-1.0-SNAPSHOT/math-scala/target/mahout-math-scala_2.10-1.0-SNAPSHOT.jar
>>>>>>  at 
>>>>>> http://192.168.204.128:59137/jars/mahout-math-scala_2.10-1.0-SNAPSHOT.jar
>>>>>>  with timestamp 1411129870620
>>>>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR 
>>>>>> /usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT.jar
>>>>>>  at http://192.168.204.128:59137/jars/mahout-mrlegacy-1.0-SNAPSHOT.jar 
>>>>>> with timestamp 1411129870631
>>>>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR 
>>>>>> /usr/mahout-1.0-SNAPSHOT/math/target/mahout-math-1.0-SNAPSHOT.jar at 
>>>>>> http://192.168.204.128:59137/jars/mahout-math-1.0-SNAPSHOT.jar with 
>>>>>> timestamp 1411129870644
>>>>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR 
>>>>>> /usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAPSHOT.jar 
>>>>>> at http://192.168.204.128:59137/jars/mahout-spark_2.10-1.0-SNAPSHOT.jar 
>>>>>> with timestamp 1411129870647
>>>>>> 14/09/19 20:31:10 INFO client.AppClient$ClientActor: Connecting to 
>>>>>> master spark://Hadoop.Master:7077...
>>>>>> 14/09/19 20:31:13 INFO storage.MemoryStore: ensureFreeSpace(86126) 
>>>>>> called with curMem=0, maxMem=2491102003
>>>>>> 14/09/19 20:31:13 INFO storage.MemoryStore: Block broadcast_0 stored as 
>>>>>> values to memory (estimated size 84.1 KB, free 2.3 GB)
>>>>>> 14/09/19 20:31:13 INFO mapred.FileInputFormat: Total input paths to 
>>>>>> process : 1
>>>>>> 14/09/19 20:31:13 INFO spark.SparkContext: Starting job: collect at 
>>>>>> TextDelimitedReaderWriter.scala:74
>>>>>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Registering RDD 7 
>>>>>> (distinct at TextDelimitedReaderWriter.scala:74)
>>>>>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Got job 0 (collect at 
>>>>>> TextDelimitedReaderWriter.scala:74) with 2 output partitions 
>>>>>> (allowLocal=false)
>>>>>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Final stage: Stage 
>>>>>> 0(collect at TextDelimitedReaderWriter.scala:74)
>>>>>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Parents of final stage: 
>>>>>> List(Stage 1)
>>>>>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Missing parents: 
>>>>>> List(Stage 1)
>>>>>> 14/09/19 20:31:14 INFO scheduler.DAGScheduler: Submitting Stage 1 
>>>>>> (MapPartitionsRDD[7] at distinct at TextDelimitedReaderWriter.scala:74), 
>>>>>> which has no missing parents
>>>>>> 14/09/19 20:31:14 INFO scheduler.DAGScheduler: Submitting 2 missing 
>>>>>> tasks from Stage 1 (MapPartitionsRDD[7] at distinct at 
>>>>>> TextDelimitedReaderWriter.scala:74)
>>>>>> 14/09/19 20:31:14 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 
>>>>>> with 2 tasks
>>>>>> 14/09/19 20:31:29 WARN scheduler.TaskSchedulerImpl: Initial job has not 
>>>>>> accepted any resources; check your cluster UI to ensure that workers are 
>>>>>> registered and have sufficient memory
>>>>>> 14/09/19 20:31:30 INFO client.AppClient$ClientActor: Connecting to 
>>>>>> master spark://Hadoop.Master:7077...
>>>>>> 14/09/19 20:31:44 WARN scheduler.TaskSchedulerImpl: Initial job has not 
>>>>>> accepted any resources; check your cluster UI to ensure that workers are 
>>>>>> registered and have sufficient memory
>>>>>> 14/09/19 20:31:50 INFO client.AppClient$ClientActor: Connecting to 
>>>>>> master spark://Hadoop.Master:7077...
>>>>>> 14/09/19 20:31:59 WARN scheduler.TaskSchedulerImpl: Initial job has not 
>>>>>> accepted any resources; check your cluster UI to ensure that workers are 
>>>>>> registered and have sufficient memory
>>>>>> 14/09/19 20:32:10 ERROR cluster.SparkDeploySchedulerBackend: Application 
>>>>>> has been killed. Reason: All masters are unresponsive! Giving up.
>>>>>> 14/09/19 20:32:10 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, 
>>>>>> whose tasks have all completed, from pool
>>>>>> 14/09/19 20:32:10 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1
>>>>>> 14/09/19 20:32:10 INFO scheduler.DAGScheduler: Failed to run collect at 
>>>>>> TextDelimitedReaderWriter.scala:74
>>>>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted 
>>>>>> due to stage failure: All masters are unresponsive! Giving up.
>>>>>> at 
>>>>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>>>> at 
>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>>>>> at 
>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>>>>> at 
>>>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>>>> at 
>>>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>>>>> at 
>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>>> at 
>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>>> at scala.Option.foreach(Option.scala:236)
>>>>>> at 
>>>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>>>>> at 
>>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>>>> at 
>>>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>> at 
>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>> at 
>>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>> at 
>>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/metrics/json,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/static,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/executors/json,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/executors,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/environment/json,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/environment,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/storage/rdd,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/storage/json,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/storage,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/stages/pool/json,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/stages/pool,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/stages/stage/json,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/stages/stage,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/stages/json,null}
>>>>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped 
>>>>>> o.e.j.s.ServletContextHandler{/stages,null}
>>>>>> ----------------------------------
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sep 27, 2014, at 01:05, Pat Ferrel <[email protected]> wrote:
>>>>>> 
>>>>>>> Any luck with this?
>>>>>>> 
>>>>>>> If not could you send a full stack trace and check on the cluster 
>>>>>>> machines for other logs that might help.
>>>>>>> 
>>>>>>> 
>>>>>>> On Sep 25, 2014, at 6:34 AM, Pat Ferrel <[email protected]> wrote:
>>>>>>> 
>>>>>>> Looks like a Spark error as far as I can tell. This error is very 
>>>>>>> generic and indicates that the job was not accepted for execution so 
>>>>>>> Spark may be configured wrong. This looks like a question for the Spark 
>>>>>>> people
>>>>>>> 
>>>>>>> My Spark sanity check:
>>>>>>> 
>>>>>>> 1)  In the Spark UI at  http:///Hadoop.Master:8080 does everything look 
>>>>>>> correct?
>>>>>>> 2) Have you tested your spark *cluster* with one of their examples? 
>>>>>>> Have you run *any non-Mahout* code on the cluster to check that it is 
>>>>>>> configured properly? 
>>>>>>> 3) Are you using exactly the same Spark and Hadoop locally as on the 
>>>>>>> cluster? 
>>>>>>> 4) Did you launch both local and cluster jobs from the same cluster 
>>>>>>> machine? The only difference being the master URL (local[2] vs. 
>>>>>>> spark://Hadoop.Master:7077)?
>>>>>>> 
>>>>>>> 14/09/22 04:12:47 WARN scheduler.TaskSchedulerImpl: Initial job has not 
>>>>>>> accepted any resources; check your cluster UI to ensure that workers 
>>>>>>> are registered and have sufficient memory
>>>>>>> 14/09/22 04:12:49 INFO client.AppClient$ClientActor: Connecting to 
>>>>>>> master spark://Hadoop.Master:7077...
>>>>>>> 
>>>>>>> 
>>>>>>> On Sep 24, 2014, at 8:18 PM, pol <[email protected]> wrote:
>>>>>>> 
>>>>>>> Hi, Pat
>>>>>>>         Dataset is the same, and the data is very few for test. This is 
>>>>>>> a bug?
>>>>>>> 
>>>>>>> 
>>>>>>> On Sep 25, 2014, at 02:57, Pat Ferrel <[email protected]> wrote:
>>>>>>> 
>>>>>>>> Are you using different data sets on the local and cluster?
>>>>>>>> 
>>>>>>>> Try increasing spark memory with -sem, I use -sem 6g for the epinions 
>>>>>>>> data set.
>>>>>>>> 
>>>>>>>> The ID dictionaries are kept in-memory on each cluster machine so a 
>>>>>>>> large number of user or item IDs will need more memory.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sep 24, 2014, at 9:31 AM, pol <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Hi, All
>>>>>>>>        
>>>>>>>>        I’m sure it’s ok that launching Spark standalone to a cluster, 
>>>>>>>> but it can’t work used for spark-itemsimilarity.
>>>>>>>> 
>>>>>>>>        Launching on 'local' it’s ok:
>>>>>>>> mahout spark-itemsimilarity -i /user/root/test/input/data.txt -o 
>>>>>>>> /user/root/test/output -os -ma local[2] -f1 purchase -f2 view -ic 2 
>>>>>>>> -fc 1 -sem 1g
>>>>>>>> 
>>>>>>>>        but launching on a standalone cluster will be an error:
>>>>>>>> mahout spark-itemsimilarity -i /user/root/test/input/data.txt -o 
>>>>>>>> /user/root/test/output -os -ma spark://Hadoop.Master:7077 -f1 purchase 
>>>>>>>> -f2 view -ic 2 -fc 1 -sem 1g
>>>>>>>> ------------
>>>>>>>> 14/09/22 04:12:47 WARN scheduler.TaskSchedulerImpl: Initial job has 
>>>>>>>> not accepted any resources; check your cluster UI to ensure that 
>>>>>>>> workers are registered and have sufficient memory
>>>>>>>> 14/09/22 04:12:49 INFO client.AppClient$ClientActor: Connecting to 
>>>>>>>> master spark://Hadoop.Master:7077...
>>>>>>>> 14/09/22 04:13:02 WARN scheduler.TaskSchedulerImpl: Initial job has 
>>>>>>>> not accepted any resources; check your cluster UI to ensure that 
>>>>>>>> workers are registered and have sufficient memory
>>>>>>>> 14/09/22 04:13:09 INFO client.AppClient$ClientActor: Connecting to 
>>>>>>>> master spark://Hadoop.Master:7077...
>>>>>>>> 14/09/22 04:13:17 WARN scheduler.TaskSchedulerImpl: Initial job has 
>>>>>>>> not accepted any resources; check your cluster UI to ensure that 
>>>>>>>> workers are registered and have sufficient memory
>>>>>>>> 14/09/22 04:13:29 ERROR cluster.SparkDeploySchedulerBackend: 
>>>>>>>> Application has been killed. Reason: All masters are unresponsive! 
>>>>>>>> Giving up.
>>>>>>>> 14/09/22 04:13:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 
>>>>>>>> 1.0, whose tasks have all completed, from pool 
>>>>>>>> 14/09/22 04:13:29 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1
>>>>>>>> 14/09/22 04:13:29 INFO scheduler.DAGScheduler: Failed to run collect 
>>>>>>>> at TextDelimitedReaderWriter.scala:74
>>>>>>>> Exception in thread "main" org.apache.spark.SparkException: Job 
>>>>>>>> aborted due to stage failure: All masters are unresponsive! Giving up.
>>>>>>>>        at 
>>>>>>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>>>>>>        at 
>>>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>>>>>>>        at 
>>>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>>>>>>>        at 
>>>>>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>>>>>>        at 
>>>>>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>>>>>>        at 
>>>>>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>>>>>>>        at 
>>>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>>>>>        at 
>>>>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>>>>>        at scala.Option.foreach(Option.scala:236)
>>>>>>>>        at 
>>>>>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>>>>>>>        at 
>>>>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>>>>>>>        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>>>>>>        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>>>>>>        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>>>>>>        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>>>>>>        at 
>>>>>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>>>>>>        at 
>>>>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>>>        at 
>>>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>>>>        at 
>>>>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>>>>        at 
>>>>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>>>> ------------
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> 
>

Re: spark-itemsimilarity can't launch on a Spark cluster?

Reply via email to