Hi Yang!
I don't know exactly why this happen, but i think GC can't work to fast
enough or size of data with additional objects created while computations
to big for executor.
And i found that this problem only if you make some data manipulations. You
can cache you data first, after that, write
Hello spark users,
I do have the same question as Daniel.
I would like to save the state in Cassandra and on failure recover using
the initialState. If some one has already tried this, please share your
experience and sample code.
Thanks.
On Thu, Nov 17, 2016 at 9:45 AM, Daniel Haviv <
Hi
I am trying to write a dataframe into salesforce table.
facWriteBack.write.mode("append").jdbc(s"jdbc:salesforce:UseSandbox=True;User=;Password=;Security
Token=;Timeout=0", "SFORCE.")
Am I doing it correctly?
And also suggest, how to upsert a record?
Thanks in advance
Niraj
Disclaimer :
Hi, there
From the Spark UI, we can monitor the following two metrics:
• Processing Time - The time to process each batch of data.
• Scheduling Delay - the time a batch waits in a queue for the
processing of previous batches to finish.
However, what is the best way to monitor
+1 to Ryan's suggestion of setting maxOffsetsPerTrigger. This way you can
at least see how quickly it is making progress towards catching up.
On Sun, Jan 22, 2017 at 7:02 PM, Timothy Chan wrote:
> I'm using version 2.02.
>
> The difference I see between using latest and
No. Each app has its own UI which runs (starting on) port 4040.
On Mon, Jan 23, 2017 at 12:05 PM, kant kodali wrote:
> I am using standalone mode so wouldn't be 8080 for my app web ui as well?
> There is nothing running on 4040 in my cluster.
>
>
I am using standalone mode so wouldn't be 8080 for my app web ui as well?
There is nothing running on 4040 in my cluster.
http://spark.apache.org/docs/latest/security.html#standalone-mode-only
On Mon, Jan 23, 2017 at 11:51 AM, Marcelo Vanzin
wrote:
> That's the Master,
hmm..I guess in that case my assumption of "app" is wrong. I thought the
app is a client jar that you submit. no? If so, say I submit multiple jobs
then I get two UI'S?
On Mon, Jan 23, 2017 at 12:07 PM, Marcelo Vanzin
wrote:
> No. Each app has its own UI which runs
As I said.
Each app gets their own UI. Look at the logs printed to the output.
The port will depend on whether they're running on the same host at
the same time.
This is irrespective of how they are run.
On Mon, Jan 23, 2017 at 12:40 PM, kant kodali wrote:
> yes I meant
Depends on what you mean by "job". Which is why I prefer "app", which
is clearer (something you submit using "spark-submit", for example).
But really, I'm not sure what you're asking now.
On Mon, Jan 23, 2017 at 12:15 PM, kant kodali wrote:
> hmm..I guess in that case my
That's the Master, whose default port is 8080 (not 4040). The default
port for the app's UI is 4040.
On Mon, Jan 23, 2017 at 11:47 AM, kant kodali wrote:
> I am not sure why Spark web UI keeps changing its port every time I restart
> a cluster? how can I make it run always on
I am not sure why Spark web UI keeps changing its port every time I restart
a cluster? how can I make it run always on one port? I did make sure there
is no process running on 4040(spark default web ui port) however it still
starts at 8080. any ideas?
MasterWebUI: Bound MasterWebUI to 0.0.0.0,
Hi everyone,
I have a spark (1.6.0-cdh5.7.1) streaming job which receives data from
multiple kafka topics. After starting the job, everything works fine first
(like 700 req/sec) but after a while (couples of days or a week) it starts
processing only some part of the data (like 350 req/sec). When
Hi,
I think Spark UDF can only handle `java.sql.Date`.
So, you need to change the return type in you UDF.
// maropu
On Tue, Jan 24, 2017 at 8:18 AM, Marco Mistroni wrote:
> HI all
> i am trying to convert a string column, in a Dataframe , to a
> java.util.Date but i am
HI all
i am trying to convert a string column, in a Dataframe , to a
java.util.Date but i am getting this exception
[dispatcher-event-loop-0] INFO org.apache.spark.storage.BlockManagerInfo -
Removed broadcast_0_piece0 on 169.254.2.140:53468 in memory (size: 14.3 KB,
free: 767.4 MB)
Exception
In general, Java processes fail with an OutOfMemoryError when your code and
data does not fit into the memory allocated to the runtime. In Spark, that
memory is controlled through the --executor-memory flag.
If you are running Spark on YARN, then YARN configuration will dictate the
maximum memory
This article recommends setting spark.locality.wait to 10 (milliseconds) in
the case of using Spark Streaming and gives an explanation of why they
chose that value. If using batch Spark, that value should still be a good
starting place
In a Wordcount application which stores the count of all the words input
so far using mapWithState. How do I make sure my counts are not messed up
if I happen to read a line more than once?
Appreciate your response.
Thanks
Hi,
I am planning to go production using spark standalone mode using the
following configuration and I would like to know if I am missing something
or any other suggestions are welcome.
1) Three Spark Standalone Master deployed on different nodes and using
Apache Zookeeper for Leader Election.
yes I meant submitting through spark-submit.
so If I do spark-submit A.jar and spark-submit A.jar again. Do I get two
UI's or one UI'? and which ports do they run on when using the stand alone
mode?
On Mon, Jan 23, 2017 at 12:19 PM, Marcelo Vanzin
wrote:
> Depends on what
Are you using receiver-based or direct stream?
Are you doing 1 stream per topic, or 1 stream for all topics?
If you're using the direct stream, the actual topics and offset ranges
should be visible in the logs, so you should be able to see more
detail about what's happening (e.g. all topics are
ok
On Tue, Jan 24, 2017 at 7:13 AM, Matthew Dailey
wrote:
> In general, Java processes fail with an OutOfMemoryError when your code
> and data does not fit into the memory allocated to the runtime. In Spark,
> that memory is controlled through the --executor-memory
I'm using DirectStream as one stream for all topics. I check the offset
ranges from Kafka Manager and don't see any significant deltas.
On Tue, Jan 24, 2017 at 4:42 AM, Cody Koeninger wrote:
> Are you using receiver-based or direct stream?
>
> Are you doing 1 stream per
Hi Keith,
Can you try including a clean-up step at the end of job, before driver is
out of SparkContext, to clean the necessary files through some regex
patterns or so, on all nodes in your cluster by default. If files are not
available on few nodes, that should not be a problem, isnnt?
On
Hi Team,
I am trying to keep below code in get method and calling that get mthod in
another hive UDF
and running the hive UDF using Hive Context.sql procedure..
switch (f) {
case "double" : return ((DoubleWritable)obj).get();
case "bigint" : return ((LongWritable)obj).get();
Hi Team,
I am trying to keep below code in get method and calling that get mthod in
another hive UDF
and running the hive UDF using Hive Context.sql procedure..
switch (f) {
case "double" : return ((DoubleWritable)obj).get();
case "bigint" : return ((LongWritable)obj).get();
Even if you don't need the checkpointing data for recovery, "mapWithState"
still needs to use "checkpoint" to cut off the RDD lineage.
On Mon, Jan 23, 2017 at 12:30 AM, shyla deshpande
wrote:
> Hello spark users,
>
> I do have the same question as Daniel.
>
> I would
looking at the docs for org.apache.spark.sql.expressions.Aggregator it says
for reduce method: "For performance, the function may modify `b` and return
it instead of constructing new object for b.".
it makes no such comment for the merge method.
this is surprising to me because i know that for
i get the same error using latest spark master branch
On Tue, Jan 17, 2017 at 6:24 PM, Koert Kuipers wrote:
> and to be clear, this is not in the REPL or with Hive (both well known
> situations in which these errors arise)
>
> On Mon, Jan 16, 2017 at 11:51 PM, Koert Kuipers
29 matches
Mail list logo