i was under the impression that running jobs could not evict cached rdds
from memory as long as they are below spark.storage.memoryFraction. however
what i observe seems to indicate the opposite. did anything change?
thanks! koert
it is possible to run multiple queries using a shared SparkContext (which
holds the shared RDD). however this is not easily available in spark-shell
i believe.
alternatively tachyon can be used to share (serialized) RDDs
On Mon, Feb 17, 2014 at 11:41 AM, David Thomas dt5434...@gmail.com wrote:
i just managed to upgrade my 0.9-SNAPSHOT from the last scala 2.9.x version
to the latest.
everything seems good except that my default parallelism is now set to 2
for jobs instead of some smart number based on the number of cores (i think
that is what it used to do). it this change on purpose?
since we are still on scala 2.9.x and trunk migrated to 2.10.x i hope
graphx will get merged into the 0.8.x series at some point, and not just
0.9.x (which is now scala 2.10), since that would make it hard for us to
use in the near future.
best, koert
the
master branch. These patch can support to access hdfs with the username you
start the Spark application, not the one who starts Spark service.
Thanks
Jerry
*From:* Koert Kuipers [mailto:ko...@tresata.com]
*Sent:* Friday, December 13, 2013 8:39 AM
*To:* user@spark.incubator.apache.org
Hey Philip,
how do you get spark to write to hdfs with your user name? When i use spark
it writes to hdfs as the user that runs the spark services... i wish it
read and wrote as me.
On Thu, Dec 12, 2013 at 6:37 PM, Philip Ogren philip.og...@oracle.comwrote:
When I call
message as to why the calculation failed (as opposed to: fetch failed more
than 4 times).
On Fri, Nov 29, 2013 at 3:09 PM, Koert Kuipers ko...@tresata.com wrote:
in 0.9-SNAPSHOT StageInfo has been changed to make the stage itself no
longer accessible.
however the stage contains the rdd, which
we use PartitionBy a lot to keep multiple datasets co-partitioned before
caching.
it works well.
On Sat, Nov 16, 2013 at 5:10 AM, guojc guoj...@gmail.com wrote:
After looking at the api more carefully, I just found I overlooked the
partitionBy function on PairRDDFunction. It's the function
in fact co-partitioning was one of the main reason we started using spark.
in map-reduce its a giant pain to implement
On Sat, Nov 16, 2013 at 3:05 PM, Koert Kuipers ko...@tresata.com wrote:
we use PartitionBy a lot to keep multiple datasets co-partitioned before
caching.
it works well
scrapco...@gmail.comwrote:
Hey Koert,
Can you give me steps to reproduce this ?
On Tue, Oct 29, 2013 at 10:06 AM, Koert Kuipers ko...@tresata.com wrote:
Matei,
We have some jobs where even the input for a single key in a groupBy
would not fit in the the tasks memory. We rely on mapred to stream
it.
Matei
On Oct 28, 2013, at 5:32 PM, Koert Kuipers ko...@tresata.com wrote:
no problem :) i am actually not familiar with what oscar has said on this.
can you share or point me to the conversation thread?
it is my opinion based on the little experimenting i have done. but i am
willing
after upgrading from spark 0.7 to spark 0.8 i can no longer access any
files on HDFS.
i see the error below. any ideas?
i am running spark standalone on a cluster that also has CDH4.3.0 and
rebuild spark accordingly. the jars in lib_managed look good to me.
i noticed similar errors in the
now, it works..
On Thu, Oct 17, 2013 at 6:05 PM, Koert Kuipers ko...@tresata.com wrote:
after upgrading from spark 0.7 to spark 0.8 i can no longer access any
files on HDFS.
i see the error below. any ideas?
i am running spark standalone on a cluster that also has CDH4.3.0 and
rebuild spark
for
your version of Hadoop. See
http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala
for
example.
Matei
On Oct 17, 2013, at 4:38 PM, Koert Kuipers ko...@tresata.com wrote:
i got the job a little further along by also setting this:
System.setProperty
i have my spark and hadoop related dependencies as provided for my spark
job. this used to work with previous versions. are these now supposed to be
compile/runtime/default dependencies?
On Thu, Oct 17, 2013 at 8:04 PM, Koert Kuipers ko...@tresata.com wrote:
yes i did that and i can see
15 matches
Mail list logo