Re: physical memory usage keep increasing for spark app on Yarn

2017-01-23 Thread Pavel Plotnikov
Hi Yang! I don't know exactly why this happen, but i think GC can't work to fast enough or size of data with additional objects created while computations to big for executor. And i found that this problem only if you make some data manipulations. You can cache you data first, after that, write

Re: Using mapWithState without a checkpoint

2017-01-23 Thread shyla deshpande
Hello spark users, I do have the same question as Daniel. I would like to save the state in Cassandra and on failure recover using the initialState. If some one has already tried this, please share your experience and sample code. Thanks. On Thu, Nov 17, 2016 at 9:45 AM, Daniel Haviv <

How to write Spark DataFrame to salesforce table

2017-01-23 Thread Niraj Kumar
Hi I am trying to write a dataframe into salesforce table. facWriteBack.write.mode("append").jdbc(s"jdbc:salesforce:UseSandbox=True;User=;Password=;Security Token=;Timeout=0", "SFORCE.") Am I doing it correctly? And also suggest, how to upsert a record? Thanks in advance Niraj Disclaimer :

Spark Streaming proactive monitoring

2017-01-23 Thread Lan Jiang
Hi, there From the Spark UI, we can monitor the following two metrics: • Processing Time - The time to process each batch of data. • Scheduling Delay - the time a batch waits in a queue for the processing of previous batches to finish. However, what is the best way to monitor

Re: Setting startingOffsets to earliest in structured streaming never catches up

2017-01-23 Thread Michael Armbrust
+1 to Ryan's suggestion of setting maxOffsetsPerTrigger. This way you can at least see how quickly it is making progress towards catching up. On Sun, Jan 22, 2017 at 7:02 PM, Timothy Chan wrote: > I'm using version 2.02. > > The difference I see between using latest and

Re: why does spark web UI keeps changing its port?

2017-01-23 Thread Marcelo Vanzin
No. Each app has its own UI which runs (starting on) port 4040. On Mon, Jan 23, 2017 at 12:05 PM, kant kodali wrote: > I am using standalone mode so wouldn't be 8080 for my app web ui as well? > There is nothing running on 4040 in my cluster. > >

Re: why does spark web UI keeps changing its port?

2017-01-23 Thread kant kodali
I am using standalone mode so wouldn't be 8080 for my app web ui as well? There is nothing running on 4040 in my cluster. http://spark.apache.org/docs/latest/security.html#standalone-mode-only On Mon, Jan 23, 2017 at 11:51 AM, Marcelo Vanzin wrote: > That's the Master,

Re: why does spark web UI keeps changing its port?

2017-01-23 Thread kant kodali
hmm..I guess in that case my assumption of "app" is wrong. I thought the app is a client jar that you submit. no? If so, say I submit multiple jobs then I get two UI'S? On Mon, Jan 23, 2017 at 12:07 PM, Marcelo Vanzin wrote: > No. Each app has its own UI which runs

Re: why does spark web UI keeps changing its port?

2017-01-23 Thread Marcelo Vanzin
As I said. Each app gets their own UI. Look at the logs printed to the output. The port will depend on whether they're running on the same host at the same time. This is irrespective of how they are run. On Mon, Jan 23, 2017 at 12:40 PM, kant kodali wrote: > yes I meant

Re: why does spark web UI keeps changing its port?

2017-01-23 Thread Marcelo Vanzin
Depends on what you mean by "job". Which is why I prefer "app", which is clearer (something you submit using "spark-submit", for example). But really, I'm not sure what you're asking now. On Mon, Jan 23, 2017 at 12:15 PM, kant kodali wrote: > hmm..I guess in that case my

Re: why does spark web UI keeps changing its port?

2017-01-23 Thread Marcelo Vanzin
That's the Master, whose default port is 8080 (not 4040). The default port for the app's UI is 4040. On Mon, Jan 23, 2017 at 11:47 AM, kant kodali wrote: > I am not sure why Spark web UI keeps changing its port every time I restart > a cluster? how can I make it run always on

why does spark web UI keeps changing its port?

2017-01-23 Thread kant kodali
I am not sure why Spark web UI keeps changing its port every time I restart a cluster? how can I make it run always on one port? I did make sure there is no process running on 4040(spark default web ui port) however it still starts at 8080. any ideas? MasterWebUI: Bound MasterWebUI to 0.0.0.0,

Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-23 Thread hakanilter
Hi everyone, I have a spark (1.6.0-cdh5.7.1) streaming job which receives data from multiple kafka topics. After starting the job, everything works fine first (like 700 req/sec) but after a while (couples of days or a week) it starts processing only some part of the data (like 350 req/sec). When

Re: converting timestamp column to a java.util.Date

2017-01-23 Thread Takeshi Yamamuro
Hi, I think Spark UDF can only handle `java.sql.Date`. So, you need to change the return type in you UDF. // maropu On Tue, Jan 24, 2017 at 8:18 AM, Marco Mistroni wrote: > HI all > i am trying to convert a string column, in a Dataframe , to a > java.util.Date but i am

converting timestamp column to a java.util.Date

2017-01-23 Thread Marco Mistroni
HI all i am trying to convert a string column, in a Dataframe , to a java.util.Date but i am getting this exception [dispatcher-event-loop-0] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_0_piece0 on 169.254.2.140:53468 in memory (size: 14.3 KB, free: 767.4 MB) Exception

Re: Do jobs fail because of other users of a cluster?

2017-01-23 Thread Matthew Dailey
In general, Java processes fail with an OutOfMemoryError when your code and data does not fit into the memory allocated to the runtime. In Spark, that memory is controlled through the --executor-memory flag. If you are running Spark on YARN, then YARN configuration will dictate the maximum memory

Re: tuning the spark.locality.wait

2017-01-23 Thread Matthew Dailey
This article recommends setting spark.locality.wait to 10 (milliseconds) in the case of using Spark Streaming and gives an explanation of why they chose that value. If using batch Spark, that value should still be a good starting place

How to make the state in a streaming application idempotent?

2017-01-23 Thread shyla deshpande
In a Wordcount application which stores the count of all the words input so far using mapWithState. How do I make sure my counts are not messed up if I happen to read a line more than once? Appreciate your response. Thanks

what would be the recommended production requirements?

2017-01-23 Thread kant kodali
Hi, I am planning to go production using spark standalone mode using the following configuration and I would like to know if I am missing something or any other suggestions are welcome. 1) Three Spark Standalone Master deployed on different nodes and using Apache Zookeeper for Leader Election.

Re: why does spark web UI keeps changing its port?

2017-01-23 Thread kant kodali
yes I meant submitting through spark-submit. so If I do spark-submit A.jar and spark-submit A.jar again. Do I get two UI's or one UI'? and which ports do they run on when using the stand alone mode? On Mon, Jan 23, 2017 at 12:19 PM, Marcelo Vanzin wrote: > Depends on what

Re: Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-23 Thread Cody Koeninger
Are you using receiver-based or direct stream? Are you doing 1 stream per topic, or 1 stream for all topics? If you're using the direct stream, the actual topics and offset ranges should be visible in the logs, so you should be able to see more detail about what's happening (e.g. all topics are

Re: Do jobs fail because of other users of a cluster?

2017-01-23 Thread Sirisha Cheruvu
ok On Tue, Jan 24, 2017 at 7:13 AM, Matthew Dailey wrote: > In general, Java processes fail with an OutOfMemoryError when your code > and data does not fit into the memory allocated to the runtime. In Spark, > that memory is controlled through the --executor-memory

Re: Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-23 Thread Hakan İlter
I'm using DirectStream as one stream for all topics. I check the offset ranges from Kafka Manager and don't see any significant deltas. On Tue, Jan 24, 2017 at 4:42 AM, Cody Koeninger wrote: > Are you using receiver-based or direct stream? > > Are you doing 1 stream per

Re:

2017-01-23 Thread Naveen
Hi Keith, Can you try including a clean-up step at the end of job, before driver is out of SparkContext, to clean the necessary files through some regex patterns or so, on all nodes in your cluster by default. If files are not available on few nodes, that should not be a problem, isnnt? On

Fwd: hc.sql("hive udf query") not able to convert Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.hadoop.io.LongW

2017-01-23 Thread Sirisha Cheruvu
Hi Team, I am trying to keep below code in get method and calling that get mthod in another hive UDF and running the hive UDF using Hive Context.sql procedure.. switch (f) { case "double" : return ((DoubleWritable)obj).get(); case "bigint" : return ((LongWritable)obj).get();

hc.sql("hive udf query") not able to convert Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.hadoop.io.LongWritab

2017-01-23 Thread Sirisha Cheruvu
Hi Team, I am trying to keep below code in get method and calling that get mthod in another hive UDF and running the hive UDF using Hive Context.sql procedure.. switch (f) { case "double" : return ((DoubleWritable)obj).get(); case "bigint" : return ((LongWritable)obj).get();

Re: Using mapWithState without a checkpoint

2017-01-23 Thread Shixiong(Ryan) Zhu
Even if you don't need the checkpointing data for recovery, "mapWithState" still needs to use "checkpoint" to cut off the RDD lineage. On Mon, Jan 23, 2017 at 12:30 AM, shyla deshpande wrote: > Hello spark users, > > I do have the same question as Daniel. > > I would

Aggregator mutate b1 in place in merge

2017-01-23 Thread Koert Kuipers
looking at the docs for org.apache.spark.sql.expressions.Aggregator it says for reduce method: "For performance, the function may modify `b` and return it instead of constructing new object for b.". it makes no such comment for the merge method. this is surprising to me because i know that for

Re: ScalaReflectionException (class not found) error for user class in spark 2.1.0

2017-01-23 Thread Koert Kuipers
i get the same error using latest spark master branch On Tue, Jan 17, 2017 at 6:24 PM, Koert Kuipers wrote: > and to be clear, this is not in the REPL or with Hive (both well known > situations in which these errors arise) > > On Mon, Jan 16, 2017 at 11:51 PM, Koert Kuipers