Re: Apache spark on 27gb wikipedia data

2014-05-05 Thread Ajay Nair
Hi, Is there any way to overcome this error? I am running this from the spark-shell, is that the cause of concern ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-spark-on-27gb-wikipedia-data-tp6487p6490.html Sent from the Apache Spark Develop

Apache spark on 27gb wikipedia data

2014-05-05 Thread Ajay Nair
Hi, I am using 1 master and 3 slave workers for processing 27gb of Wikipedia data that is tab separated and every line contains wikipedia page information. The tab separated data has title of the page and the page contents. I am using the regular expression to extract links as mentioned in the sit

Re: Apache Spark running out of the spark shell

2014-05-04 Thread Ajay Nair
Now I got it to work .. well almost. However I needed to copy the project/ folder to the spark-standalone folder as the package build was failing because it could not find buil properties. After the copy the build was successful. However when I run it I get errors but it still gives me the output.

Re: Apache Spark running out of the spark shell

2014-05-04 Thread Ajay Nair
Thank you. I am trying this now -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459p6472.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Apache Spark running out of the spark shell

2014-05-03 Thread Ajay Nair
Quick question, where should I place your folder. Inside the spark directory. My Spark directory is in /root/spark So currently I tried pulling your github code in /root/spark/spark-examples and modified my home spark directory in the scala code. I copied the sbt folder within the spark-examples fo

Re: Apache Spark running out of the spark shell

2014-05-03 Thread Ajay Nair
Thank you. Let me try this quickly ! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459p6463.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Apache Spark running out of the spark shell

2014-05-03 Thread Ajay Nair
Thank you for the reply. Have you posted a link from where I follow the steps ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459p6462.html Sent from the Apache Spark Developers List mailing list archive

Apache Spark running out of the spark shell

2014-05-03 Thread Ajay Nair
Hi, I have written a code that works just about fine in the spark shell on EC2. The ec2 script helped me configure my master and worker nodes. Now I want to run the scala-spark code out side the interactive shell. How do I go about doing it. I was referring to the instructions mentioned here: htt

Parsing wikipedia xml data in Spark

2014-04-26 Thread Ajay Nair
Is there a way in spark to parse wikipedia xml dump? It seems like the freebase dump is longer available. Also does the spark shell support the xml load file sax parser that is present in scala. Thanks AJ

Spark on wikipedia dataset

2014-04-22 Thread Ajay Nair
I am going to perform some test experiments on the wikipedia dataset using the spark framework. I know wikipedia data set might already have been analyzed, but what are the potential explored/unexplored aspects of spark that can be tested and benchmarked on wikipedia dataset? Thanks AJ