Re: How to submit a job to Spark cluster?

2014-02-20 Thread Nan Zhu
I’m not sure if I understand your question correctly do you mean you didn’t see the application information in Spark Web UI even it generates the expected results? Best, -- Nan Zhu On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote: My application source file

Re: ADD_JARS not working on 0.9

2014-02-20 Thread Nan Zhu
Someone wants to review the fix? https://github.com/apache/incubator-spark/pull/614 Best, -- Nan Zhu On Sunday, February 16, 2014 at 8:37 PM, Nan Zhu wrote: I’m interested in fixing this Can anyone assign the JIRA to me? Best, -- Nan Zhu On Sunday, February

Re: Building Spark reports Could not create directory /usr/local/ims/spark/spark-0.9.0-incubating-bin-hadoop1/assembly/target/streams/compile/$global/$global

2014-02-19 Thread Nan Zhu
do you have write permission to the directory? -- Nan Zhu On Wednesday, February 19, 2014 at 10:44 AM, Tao Xiao wrote: I downloaded spark-0.9.0-incubating-bin-hadoop1.tgz and extracted it to /usr/local/ims/spark/spark-0.9.0-incubating-bin-hadoop1, then I tried to build it using

Re: question about compiling SimpleApp

2014-02-18 Thread Nan Zhu
oh~Sorry, Andrew I just made the PR, it’s a error in site config.yml Best, -- Nan Zhu On Tuesday, February 18, 2014 at 7:16 PM, Andrew Ash wrote: Dachuan, Where did you find that faulty documentation? I'd like to get it fixed. Thanks! Andrew On Tue, Feb 18, 2014

Re: Question on web UI

2014-02-18 Thread Nan Zhu
the driver is running on the machine where you run command like ./spark-shell but in 0.9, you can run in-cluster driver http://spark.incubator.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster Best, -- Nan Zhu On Tuesday, February 18, 2014 at 10:06 PM

Re: Setting serializer in Spark shell

2014-02-16 Thread Nan Zhu
by myself, but I just modified something similar, I guess it will work Best, -- Nan Zhu On Sunday, February 16, 2014 at 4:22 PM, David Thomas wrote: How can I set the default serializer to Kyro when using Spark Shell?

Re: ADD_JARS not working on 0.9

2014-02-16 Thread Nan Zhu
I’m interested in fixing this Can anyone assign the JIRA to me? Best, -- Nan Zhu On Sunday, February 16, 2014 at 6:17 PM, Andrew Ash wrote: // cc Patrick, who I think helps with the Amplab Jira Amplab Jira admins, can we make sure that newly-created users have comment

Re: Standalone cluster setup: binding to private IP

2014-02-15 Thread Nan Zhu
by default so you can either ensure that your default NIC is the one binding with private IP or set SPARK_LOCAL_IP to your private address in worker nodes Best, -- Nan Zhu On Saturday, February 15, 2014 at 3:27 PM, David Thomas wrote: That didn't work. I listed private IPs of the worker

Re: Standalone cluster setup: binding to private IP

2014-02-15 Thread Nan Zhu
Hi, which one is your default NIC depends on your default gateway setup Best, -- Nan Zhu On Saturday, February 15, 2014 at 3:55 PM, David Thomas wrote: Thanks for your prompt reply. ensure that your default NIC is the one binding with private IP Can you give me some pointers on how

Re: Standalone cluster setup: binding to private IP

2014-02-15 Thread Nan Zhu
did you set any env variable to public IP? like SPARK_LOCAL_IP? the PublicIP appears again…. Best, -- Nan Zhu On Saturday, February 15, 2014 at 7:46 PM, David Thomas wrote: I've finally been able to setup the cluster. I can access the cluster URL and see alll the workers registered

Re: spark 0.9.0 sbt build [error] Nonzero exit code (128)

2014-02-14 Thread Nan Zhu
Can you run this command successfully? “git clone git://github.com/ijuma/junit_xml_listener.git (http://github.com/ijuma/junit_xml_listener.git) -- Nan Zhu On Friday, February 14, 2014 at 4:24 PM, srikanth wrote: Hi, here is the log. I get the same error for cleanup also. Thanks

Yarn configuration file doesn't work when run with yarn-client mode

2014-02-11 Thread Nan Zhu
) is it a bug? Best, -- Nan Zhu

Re: Which of the hadoop file formats are supported by Spark ?

2014-01-18 Thread Nan Zhu
Hi, text and seq are definitely supported you can check http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.SparkContext I don't think other types have been considered...anyone correct me? Best, Nan On Sun, Jan 19, 2014 at 12:47 AM, Manoj Samel

Re: master attempted to re-register the worker and then took all workers as unregistered

2014-01-15 Thread Nan Zhu
sending the heartbeat but gets no reply, finally all workers are dead… but obviously it should not work in this way, the problematic application code should not make all workers dead I’m checking the source code to find the reason Best, -- Nan Zhu On Tuesday, January 14, 2014 at 8:53 PM, Nan

Re: Please help: change $SPARK_HOME/work directory for spark applications

2014-01-15 Thread Nan Zhu
Hi, Jin It’s SPARK_WORKER_DIR Line 48 WorkerArguments.scala if (System.getenv(SPARK_WORKER_DIR) != null) { workDir = System.getenv(SPARK_WORKER_DIR) } Best, -- Nan Zhu On Wednesday, January 15, 2014 at 2:03 PM, Chen Jin wrote: Hi, Currently my application jars and logs

Re: Master and worker nodes in standalone deployment

2014-01-15 Thread Nan Zhu
you can start a worker process in the master node so that all nodes in your cluster can participate in the computation Best, -- Nan Zhu On Wednesday, January 15, 2014 at 11:32 PM, Manoj Samel wrote: When spark is deployed on cluster in standalone deployment mode (V 0.81), one

Re: Master and worker nodes in standalone deployment

2014-01-15 Thread Nan Zhu
it maintains the running of worker process, create executor for the tasks in the worker nodes, contacts with driver program, etc. -- Nan Zhu On Wednesday, January 15, 2014 at 11:37 PM, Manoj Samel wrote: Thanks, Could you still explain what does master process does ? On Wed, Jan 15

master attempted to re-register the worker and then took all workers as unregistered

2014-01-14 Thread Nan Zhu
Thank you very much! Best, -- Nan Zhu

confusion on RDD usage in MatrixFactorizationModel (master branch)

2014-01-08 Thread Nan Zhu
(productVector)) } } it seems that the author can directly call join with a RDD object? It’s a new feature in next version? I’m used to creating a PairRDDFunctions with the current RDD and then calls join, etc. Did I misunderstand something? Best, -- Nan Zhu

Re: confusion on RDD usage in MatrixFactorizationModel (master branch)

2014-01-08 Thread Nan Zhu
ignore that These operations are * automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit * conversions when you `import org.apache.spark.SparkContext._`. -- Nan Zhu On Wednesday, January 8, 2014 at 10:38 AM, Nan Zhu wrote: Hi, all I’m

Re: is it forgotten to document how to set SPARK_WORKER_DIR?

2014-01-07 Thread Nan Zhu
In worker machines, you will find $SPARK_HOME/work/ that’s worker dir -- Nan Zhu On Tuesday, January 7, 2014 at 7:29 AM, Archit Thakur wrote: What do you mean by worker dir? On Tue, Jan 7, 2014 at 11:43 AM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote

Re: How to make Spark merge the output file?

2014-01-07 Thread Nan Zhu
Hi, all Thanks for the reply I actually need to provide a single file to an external system to process it…seems that I have to make the consumer of the file to support multiple inputs Best, -- Nan Zhu On Tuesday, January 7, 2014 at 12:37 PM, Aaron Davidson wrote: HDFS, since 0.21

ship MatrixFactorizationModel with each partition?

2014-01-07 Thread Nan Zhu
= str += (userMoviePair._1 + , + userMoviePair._2 + , + result.predict(userMoviePair._1, userMoviePair._2)) + \n str }) so the exception seems to be related to how to share the MatrixFactorizationModel in each partition? can anyone give me the hint Thank you very much! -- Nan

shared variable and ALS in mllib

2014-01-06 Thread Nan Zhu
the product id to each line? Best, -- Nan Zhu

How to make Spark merge the output file?

2014-01-06 Thread Nan Zhu
Hi, all maybe a stupid question, but is there any way to make Spark write a single file instead of partitioned files? Best, -- Nan Zhu

Re: shared variable and ALS in mllib

2014-01-06 Thread Nan Zhu
Thanks Jason, yes, that’s true, but how to finish the first step it seems that sc.textFile() has no parameters to achieve the goal, I stored the file on s3 Best, -- Nan Zhu On Monday, January 6, 2014 at 11:27 PM, Jason Dai wrote: If you assign each file to a standalone partition

is it forgotten to document how to set SPARK_WORKER_DIR?

2014-01-06 Thread Nan Zhu
) } is it forgotten to be documented? Best, -- Nan Zhu

how to bind spark-master to the public IP of EC2

2014-01-05 Thread Nan Zhu
Hi, all How to bind spark-master to the public IP of EC2? I tried to set spark-env.sh, but it failed Thank you Best, -- Nan Zhu

debug standalone Spark jobs?

2014-01-05 Thread Nan Zhu
WARNs from log4j, I received the same WARNING when I run spark-shell, while in there, I can see detailed information like which task is running, etc. Best, -- Nan Zhu

Re: debug standalone Spark jobs?

2014-01-05 Thread Nan Zhu
Ah, yes, I think application logs really help Thank you -- Nan Zhu On Sunday, January 5, 2014 at 10:13 AM, Sriram Ramachandrasekaran wrote: Did you get to look at the spark worker logs? They would be at SPARK_HOME/logs/ Also, you should look at the application logs itself

Re: how to bind spark-master to the public IP of EC2

2014-01-05 Thread Nan Zhu
, -- Nan Zhu On Sunday, January 5, 2014 at 11:10 AM, Prashant Sharma wrote: Are you sure you did what is said here ? http://spark.incubator.apache.org/docs/latest/spark-standalone.html in case yes, please more details which env you have set ? On Sun, Jan 5, 2014 at 7:15 PM, Nan Zhu

www.spark-project.org down?

2014-01-02 Thread Nan Zhu
cannot access from Montreal, Canada Best, -- Nan Zhu

Re: failed to compile spark because of the missing packages

2013-12-23 Thread Nan Zhu
/.gitignore) 3. push files to git repo 4. pull files in the desktop 5. sbt/sbt assembly/assembly, failed with the same error as my last email any further comments? Best, -- Nan Zhu On Monday, December 23, 2013 at 12:22 PM, Patrick Wendell wrote: Hey Nan, You shouldn't copy lib_managed

Re: failed to compile spark because of the missing packages

2013-12-23 Thread Nan Zhu
…. Best, -- Nan Zhu On Monday, December 23, 2013 at 4:12 PM, Nan Zhu wrote: Hi, Patrick Thanks for the reply I still failed to compile the code, even I made the following attempts 1. download spark-0.8.1.tgz, 2. decompress, and copy the files to the github local repo directory

Re: which line in SparkBuild.scala specifies hadoop-core-xxx.jar?

2013-12-21 Thread Nan Zhu
Hi, Azuryy, I’m working on macbook pro so it is indeed “Users” Best, -- Nan Zhu On Saturday, December 21, 2013 at 9:31 AM, Azuryy wrote: Hi Nan I think there is a typo here: file:///Users/nanzhu/.m2/repository”), It should be lowercase. Sent from my iPhone

Re: which line in SparkBuild.scala specifies hadoop-core-xxx.jar?

2013-12-20 Thread Nan Zhu
”)) doesn’t work…. b. in 4, the cllient.jar dependency cannot download core.jar in automatic (why?) I have to add an explicit dependency on core.jar Best, -- Nan Zhu On Monday, December 16, 2013 at 2:41 PM, Gary Malouf wrote: Check out the dependencies for the version of hadoop-client you

Re: which line in SparkBuild.scala specifies hadoop-core-xxx.jar?

2013-12-16 Thread Nan Zhu
-core.jar, but I didn’t find any line specified hadoop-core-1.0.4.jar in pom.xml and SparkBuild.scala, Can you explain a bit to me? Best, -- Nan Zhu School of Computer Science, McGill University On Monday, December 16, 2013 at 3:58 AM, Azuryy Yu wrote: Hi Nan, I am also using our

Re: which line in SparkBuild.scala specifies hadoop-core-xxx.jar?

2013-12-16 Thread Nan Zhu
Hi, Gary, The page says Spark uses hadoop-client.jar to interact with HDFS, but why it also downloads hadoop-core? Do I just need to change the dependency on hadoop-client to my local repo? Best, -- Nan Zhu School of Computer Science, McGill University On Monday, December 16, 2013

Re: set fs.default.name to s3:// but spark still tries to find namenode?

2013-12-15 Thread Nan Zhu
finally understand it solved -- Nan Zhu School of Computer Science, McGill University On Sunday, December 15, 2013 at 1:43 AM, Nan Zhu wrote: Hi, all I’m trying to run Spark on EC2 and using S3 as the data storage service, I set fs.default.name (http://fs.default.name

Re: set fs.default.name to s3:// but spark still tries to find namenode?

2013-12-15 Thread Nan Zhu
daemons(? but what’s that, the default port should be 7070?) I didn’t step into the details of spark-ec2, just manually setup a cluster in ec2, and directly pass s3n:// as the input and output path everything works now Best, -- Nan Zhu School of Computer Science, McGill University On Sunday

which line in SparkBuild.scala specifies hadoop-core-xxx.jar?

2013-12-14 Thread Nan Zhu
lines about yarn jars in ScalaBuild.scala, Can you tell me which line I should modify to achieve the goal? Best, -- Nan Zhu School of Computer Science, McGill University

set fs.default.name to s3:// but spark still tries to find namenode?

2013-12-14 Thread Nan Zhu
Hi, all I’m trying to run Spark on EC2 and using S3 as the data storage service, I set fs.default.name to s3://myaccessid:mysecreteid@bucketid, and I tried to load a local file with textFile I found that Spark still tries to find http://mymasterip:9000 I also tried to load a file stored in

some wrong link in Spark Summit web page

2013-12-13 Thread Nan Zhu
Hi, I'm not sure if it is the right place to talk about this, if not, I'm very sorry about that - 9-9:30am The State of Spark, and Where We’re Going Nexthttp://spark-summit.org/talk/zaharia-the-state-of-spark-and-where-were-going/ – pptx

Re: solution to write data to S3?

2013-10-23 Thread Nan Zhu
Great!!! On Wed, Oct 23, 2013 at 9:21 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Yes, take a look at http://spark.incubator.apache.org/docs/latest/ec2-scripts.html#accessing-data-in-s3 Matei On Oct 23, 2013, at 6:17 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, all Is there any

dynamically resizing Spark cluster

2013-10-23 Thread Nan Zhu
on there, what will happen on Spark? it will recover those tasks with something like speculative execution? or the job will unfortunately fail? Best, -- Nan Zhu School of Computer Science, McGill University