Hi,
I'm looking for a way to estimate the amount of memory that will be needed
for a task looking at the size of its input data. It clearly depends on
what the task is doing, but is there a place to look in the logs exported
by Spark to see this information?
Thanks
Does the pre-build come with hive support?
Namely, has it been built with -Phive and -Phive-thriftserver?
On Fri, Jun 12, 2015, 9:32 AM ayan guha wrote:
> Thanks guys, my question must look like a stupid one today :) Looking
> forward to test out 1.4.0, just downloaded it.
>
> Congrats to the t
I'm running PageRank on datasets with different sizes (from 1GB to 100GB).
Sometime my job is aborted showing this error:
Job aborted due to stage failure: Task 0 in stage 4.1 failed 4 times,
most recent failure: Lost task 0.3 in stage 4.1 (TID 2051,
9.12.247.250): java.io.FileNotFoundException:
/
Hi,
I'm trying to run an application that uses a Hive context to perform some
queries over JSON files.
The code of the application is here:
https://github.com/GiovanniPaoloGibilisco/spark-log-processor/tree/fca93d95a227172baca58d51a4d799594a0429a1
I can run it on Spark 1.3.1 after rebuilding it wi
Hi,
I'm trying to build the DAG of an application from the logs.
I've had a look at SparkReplayDebugger but it doesn't operato offline on
logs. I looked also at the one in this pull:
https://github.com/apache/spark/pull/2077 that seems to operate only on
logs but it doesn't clealry show the depende
Hi, I'm trying to parse log files generated by Spark using SparkSQL.
In the JSON elements related to the StageCompleted event we have a nested
structre containing an array of elements with RDD Info. (see the log below
as an example (omitting some parts).
{
"Event": "SparkListenerStageComplete
Hi,
I would like to collect some metrics from spark and plot them with
graphite. I managed to do that withe the metrics provided by the
or.apache.park.metrics.source.JvmSource but I would like to know if there
are other sources available beside this one.
Best,
Giovanni
Hi,
I would like to know if it is possible to build the DAG before actually
executing the application. My guess is that in the scheduler the DAG is
built dynamically at runtime since it might depend on the data, but I was
wondering if there is a way (and maybe a tool already) to analyze the code
an