Hi All,
Im using spark 1.4.1 to to analyze a largish data set (several Gigabytes
of data). The RDD is partitioned into 2048 partitions which are more or
less equal and entirely cached in RAM.
I evaluated the performance on several cluster sizes, and am witnessing
a non linear (power)
Additional missing relevant information:
Im running a transformation, there are no Shuffles occurring and at the
end im performing a lookup of 4 partitions on the driver.
On 10/7/15 11:26 AM, Yadid Ayzenberg wrote:
Hi All,
Im using spark 1.4.1 to to analyze a largish data set (several
going on
Thanks
Best Regards
On Sun, Aug 23, 2015 at 1:27 AM, Yadid Ayzenberg <ya...@media.mit.edu
<mailto:ya...@media.mit.edu>> wrote:
Hi All,
We have a spark standalone cluster running 1.4.1 and we are
setting spark.io.compression.codec to lzf.
I have a
Hi All,
We have a spark standalone cluster running 1.4.1 and we are setting
spark.io.compression.codec to lzf.
I have a long running interactive application which behaves as normal,
but after a few days I get the following exception in multiple jobs. Any
ideas on what could be causing this
on
the Row objects that are returned.
For example, if you'd rather the delimiter was '|':
sql(SELECT * FROM src).map(_.mkString(|)).collect()
On Thu, Aug 28, 2014 at 7:58 AM, yadid ayzenberg ya...@media.mit.edu
wrote:
Hi All,
Is there any way to change the delimiter from being a comma ?
Some
Hi All,
Is there any way to change the delimiter from being a comma ?
Some of the strings in my data contain commas as well, making it very
difficult to parse the results.
Yadid
Hi all,
I have a spark cluster of 30 machines, 16GB / 8 cores on each running in
standalone mode. Previously my application was working well ( several
RDDs the largest being around 50G).
When I started processing larger amounts of data (RDDs of 100G) my app
is losing executors. Im currently
Yep, I just issued a pull request.
Yadid
On 5/31/14, 1:25 PM, Patrick Wendell wrote:
1. ctx is an instance of JavaSQLContext but the textFile method is called as
a member of ctx.
According to the API JavaSQLContext does not have such a member, so im
guessing this should be sc instead.
Yeah,
Congrats on the new 1.0 release. Amazing work !
It looks like there may some typos in the latest
http://spark.apache.org/docs/latest/sql-programming-guide.html
in the Running SQL on RDDs section when choosing the java example:
1. ctx is an instance of JavaSQLContext but the textFile method
An additional option 4) Use SparkContext.addJar() and have the
application ship your jar to all the nodes.
Yadid
On 5/4/14, 4:07 PM, DB Tsai wrote:
If you add the breeze dependency in your build.sbt project, it will
not be available to all the workers.
There are couple options, 1) use sbt
Dear Sparkers,
Has anyone got any insight on this ? I am really stuck.
Yadid
On 4/28/14, 11:28 AM, Yadid Ayzenberg wrote:
Thanks for your answer.
I tried running on a single machine - master and worker on one host. I
get exactly the same results.
Very little CPU activity on the machine
.
If you still see the issue, I'd check whether the task has really
completed. What do you see on the web UI? Is the executor using CPU?
Good luck.
On Mon, Apr 28, 2014 at 2:35 AM, Yadid Ayzenberg ya...@media.mit.edu
mailto:ya...@media.mit.edu wrote:
Can someone please suggest how I can
)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
On 4/28/14 11:28 AM, Yadid Ayzenberg wrote:
Thanks for your answer.
I tried running on a single machine - master and worker on one host. I
get exactly the same results.
Very little CPU activity on the machine in question. The web UI shows
a single
:37 PM, Yadid Ayzenberg wrote:
Some additional information - maybe this rings a bell with someone:
I suspect this happens when the lookup returns more than one value.
For 0 and 1 values, the function behaves as you would expect.
Anyone ?
On 4/25/14, 1:55 PM, Yadid Ayzenberg wrote:
Hi All,
Im
Hi All,
Im running a lookup on a JavaPairRDDString, Tuple2.
When running on local machine - the lookup is successfull. However, when
running a standalone cluster with the exact same dataset - one of the
tasks never ends (constantly in RUNNING status).
When viewing the worker log, it seems that
Some additional information - maybe this rings a bell with someone:
I suspect this happens when the lookup returns more than one value.
For 0 and 1 values, the function behaves as you would expect.
Anyone ?
On 4/25/14, 1:55 PM, Yadid Ayzenberg wrote:
Hi All,
Im running a lookup
16 matches
Mail list logo