Awesome, thanks
On Sunday, June 22, 2014, Matei Zaharia matei.zaha...@gmail.com wrote:
Alright, added you.
On Jun 20, 2014, at 2:52 PM, Ricky Thomas ri...@truedash.io
javascript:_e(%7B%7D,'cvml','ri...@truedash.io'); wrote:
Hi,
Would like to add ourselves to the user list if possible
Right problem solved in a most disgraceful manner. Just add a package
relocation in maven shade config.
The downside is that it is not compatible with my IDE (IntelliJ IDEA), will
cause:
Error:scala.reflect.internal.MissingRequirementError: object scala.runtime
in compiler mirror not found.:
Hello ,I am a new guy on scala spark, yestday i compile spark from 1.0.0
source code and run test,there is and testcase failed:
For example run command in shell : sbt/sbt testOnly
org.apache.spark.streaming.InputStreamsSuite
the testcase: test(socket input stream) would
Hi folks,
I was looking at the benchmark provided by Cloudera at
http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/
.
Is it real that Shark cannot execute some query if you don't have enough
memory?
And is it true/reliable that Impala
For the second question, I would say it is mainly because the projects have
not the same aim. Impala does have a cost-based optimizer and predicate
propagation capability which is natural because it is interpreting
pseudo-SQL query. In the realm of relational database, it is often not a
good idea
I've just benchmarked Spark and Impala. Same data (in s3), same query,
same cluster.
Impala has a long load time, since it cannot load directly from s3. I have
to create a Hive table on s3, then insert from that to an Impala table.
This takes a long time; Spark took about 600s for the query,
600s for Spark vs 5s for Redshift...The numbers look much different from
the amplab benchmark...
https://amplab.cs.berkeley.edu/benchmark/
Is it like SSDs or something that's helping redshift or the whole data is
in memory when you run the query ? Could you publish the query ?
Also after
Hello,
I am looking into a couple of MLLib data files in
https://github.com/apache/spark/tree/master/data/mllib. But I cannot find
any explanation for these files? Does anyone know if they are documented?
Thanks.
Justin
Hi Shuo,
Yes. I was reading the guide as well as the sample code.
For example, in
http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm,
nowhere in the github repository I can find the file: sc.textFile(
mllib/data/ridge-data/lpsa.data).
Thanks.
Justin
Hi Shuo,
Yes. I was reading the guide as well as the sample code.
For example, in
http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm,
now where in the github repository I can find the file: sc.textFile(
mllib/data/ridge-data/lpsa.data).
Thanks.
These files follow the libsvm format where each line is a record, the first
column is a label, and then after that the fields are offset:value where offset
is the offset into the feature vector, and value is the value of the input
feature.
This is a fairly efficient representation for sparse
Hi
Can someone help me with the following error that I faced while setting
up single node spark framework.
karthik@karthik-OptiPlex-9020:~/spark-1.0.0$ MASTER=spark://localhost:7077
sbin/spark-shell
bash: sbin/spark-shell: No such file or directory
karthik@karthik-OptiPlex-9020:~/spark-1.0.0$
*TL;DR:* I want to run a pre-processing step on the data from each partition
(such as parsing) and retain the parsed object on each node for future
processing calls to avoid repeated parsing.
/More detail:/
I have a server and two nodes in my cluster, and data partitioned using
hdfs.
I am trying
I see. That's good. Thanks.
Justin
On Sun, Jun 22, 2014 at 4:59 PM, Evan Sparks evan.spa...@gmail.com wrote:
Oh, and the movie lens one is userid::movieid::rating
- Evan
On Jun 22, 2014, at 3:35 PM, Justin Yip yipjus...@gmail.com wrote:
Hello,
I am looking into a couple of MLLib data
Will using mapPartitions and creating a new RDD of ParsedData objects avoid
multiple parsing?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Persistent-Local-Node-variables-tp8104p8107.html
Sent from the Apache Spark User List mailing list archive at
Open your webUI in the browser and see the spark url in the top left corner
of the page and use it while starting your spark shell instead of
localhost:7077.
Thanks
Best Regards
On Mon, Jun 23, 2014 at 10:56 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi
Can someone help me with
16 matches
Mail list logo