Min Li created SPARK-1967:
-----------------------------

             Summary: Using parallelize method to create RDD, wordcount app 
just hanging there without errors or warnings
                 Key: SPARK-1967
                 URL: https://issues.apache.org/jira/browse/SPARK-1967
             Project: Spark
          Issue Type: Bug
    Affects Versions: 0.9.1
         Environment: Ubuntu-12.04, single machine spark standalone, 8 core, 8G 
mem, spark 0.9.1, java-1.7
            Reporter: Min Li


I was trying the parallelize method to create RDD. I used Java. And it's a 
simple wordcount program, except that I first read the input into memory and 
then use the parallelize method to create the RDD, rather than the default 
textFile method in the given example. 
Pseudo codes:
JavaSparkContext ctx = new JavaSparkContext($SparkMasterURL, $NAME, $SparkHome, 
$jars);
List<String> input = #read lines from input file and form a ArrayList<String>
JavaRDD lines = ctx.parallelize(input);
//followed by wordcount
----above is not working.
JavaRDD lines = ctx.textFile(file);
//followed by wordcount
----this is working

The log is:
14/05/29 16:18:43 INFO Slf4jLogger: Slf4jLogger started
14/05/29 16:18:43 INFO Remoting: Starting remoting
14/05/29 16:18:43 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://spark@spark:55224]
14/05/29 16:18:43 INFO Remoting: Remoting now listens on addresses: 
[akka.tcp://spark@spark:55224]
14/05/29 16:18:43 INFO SparkEnv: Registering BlockManagerMaster
14/05/29 16:18:43 INFO DiskBlockManager: Created local directory at 
/tmp/spark-local-20140529161843-836a
14/05/29 16:18:43 INFO MemoryStore: MemoryStore started with capacity 1056.0 MB.
14/05/29 16:18:43 INFO ConnectionManager: Bound socket to port 42942 with id = 
ConnectionManagerId(spark,42942)
14/05/29 16:18:43 INFO BlockManagerMaster: Trying to register BlockManager
14/05/29 16:18:43 INFO BlockManagerMasterActor$BlockManagerInfo: Registering 
block manager spark:42942 with 1056.0 MB RAM
14/05/29 16:18:43 INFO BlockManagerMaster: Registered BlockManager
14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server
14/05/29 16:18:43 INFO HttpBroadcast: Broadcast server started at 
http://10.227.119.185:43522
14/05/29 16:18:43 INFO SparkEnv: Registering MapOutputTracker
14/05/29 16:18:43 INFO HttpFileServer: HTTP File server directory is 
/tmp/spark-3704a621-789c-4d97-b1fc-9654236dba3e
14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server
14/05/29 16:18:43 INFO SparkUI: Started Spark Web UI at http://spark:4040
14/05/29 16:18:44 INFO SparkContext: Added JAR 
/home/maxmin/tmp/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar at 
http://10.227.119.185:55286/jars/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar
 with timestamp 1401394724045
14/05/29 16:18:44 INFO AppClient$ClientActor: Connecting to master 
spark://spark:7077...
14/05/29 16:18:44 INFO SparkDeploySchedulerBackend: Connected to Spark cluster 
with app ID app-20140529161844-0001
14/05/29 16:18:44 INFO AppClient$ClientActor: Executor added: 
app-20140529161844-0001/0 on worker-20140529155406-spark-59658 (spark:59658) 
with 8 cores

The app is hanging here forever. And spark:8080 spark:4040 are not showing any 
strange info. The Spark Stages page shows the Active Stages is reduceByKey, 
tasks: Succeeded/Total is 0/2. I've also tried directly call lines.count after 
parallelize, and the app will stuck at the count stage.

I used spark-0.9.1 and used default spark-env.sh. In the slaves file I have 
only one host. I used maven to compile a fat jar with spark specified as 
provided. I modified the run-example script to submit the jar.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to