Hi,
Is there any way to overcome this error? I am running this from the
spark-shell, is that the cause of concern ?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-spark-on-27gb-wikipedia-data-tp6487p6490.html
Sent from the Apache Spark Develop
Hi,
I am using 1 master and 3 slave workers for processing 27gb of Wikipedia
data that is tab separated and every line contains wikipedia page
information. The tab separated data has title of the page and the page
contents. I am using the regular expression to extract links as mentioned in
the sit
Now I got it to work .. well almost. However I needed to copy the project/
folder to the spark-standalone folder as the package build was failing
because it could not find buil properties. After the copy the build was
successful. However when I run it I get errors but it still gives me the
output.
Thank you. I am trying this now
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459p6472.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Quick question, where should I place your folder. Inside the spark directory.
My Spark directory is in /root/spark
So currently I tried pulling your github code in /root/spark/spark-examples
and modified my home spark directory in the scala code.
I copied the sbt folder within the spark-examples fo
Thank you. Let me try this quickly !
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459p6463.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Thank you for the reply. Have you posted a link from where I follow the steps
?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459p6462.html
Sent from the Apache Spark Developers List mailing list archive
Hi,
I have written a code that works just about fine in the spark shell on EC2.
The ec2 script helped me configure my master and worker nodes. Now I want to
run the scala-spark code out side the interactive shell. How do I go about
doing it.
I was referring to the instructions mentioned here:
htt
Is there a way in spark to parse wikipedia xml dump? It seems like the
freebase dump is longer available. Also does the spark shell support the
xml load file sax parser that is present in scala.
Thanks
AJ
I am going to perform some test experiments on the wikipedia dataset using
the spark framework. I know wikipedia data set might already have been
analyzed, but what are the potential explored/unexplored aspects of spark
that can be tested and benchmarked on wikipedia dataset?
Thanks
AJ
10 matches
Mail list logo