Re: Creating HiveContext withing Spark streaming

2016-09-08 Thread Mich Talebzadeh
Hi Todd,

Thanks for the hint.

As it happened this works

//Create the sparkconf for streaming as usual

 val sparkConf = new SparkConf().
 setAppName(sparkAppName).
 set("spark.driver.allowMultipleContexts", "true").
 set("spark.hadoop.validateOutputSpecs", "false")
 // change the values accordingly.
 sparkConf.set("sparkDefaultParllelism",
sparkDefaultParallelismValue)
 sparkConf.set("sparkSerializer", sparkSerializerValue)
 sparkConf.set("sparkNetworkTimeOut", sparkNetworkTimeOutValue)
 // If you want to see more details of batches please increase
the value
 // and that will be shown UI.
 sparkConf.set("sparkStreamingUiRetainedBatches",
   sparkStreamingUiRetainedBatchesValue)
 sparkConf.set("sparkWorkerUiRetainedDrivers",
   sparkWorkerUiRetainedDriversValue)
 sparkConf.set("sparkWorkerUiRetainedExecutors",
   sparkWorkerUiRetainedExecutorsValue)
 sparkConf.set("sparkWorkerUiRetainedStages",
   sparkWorkerUiRetainedStagesValue)
 sparkConf.set("sparkUiRetainedJobs", sparkUiRetainedJobsValue)

*sparkConf.set("enableHiveSupport","true")* if (memorySet ==
"T") {
   sparkConf.set("spark.driver.memory", "18432M")
 }

sparkConf.set("spark.streaming.stopGracefullyOnShutdown","true")
 sparkConf.set("spark.streaming.receiver.writeAheadLog.enable",
"true")

sparkConf.set("spark.streaming.driver.writeAheadLog.closeFileAfterWrite",
"true")

sparkConf.set("spark.streaming.receiver.writeAheadLog.closeFileAfterWrite",
"true")

 val batchInterval = 2

// Create the streamingContext

 val streamingContext = new StreamingContext(sparkConf,
Seconds(batchInterval))
// Create SparkContext based on streamingContext

 val sparkContext  = streamingContext.sparkContext

// Create HiveContext based on streamingContext and sparkContext

val HiveContext = new HiveContext(streamingContext.sparkContext)


And that works although sometimes it feels like black art to make it work :)

Regards



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 8 September 2016 at 15:08, Todd Nist  wrote:

> Hi Mich,
>
> Perhaps the issue is having multiple SparkContexts in the same JVM (
> https://issues.apache.org/jira/browse/SPARK-2243).
> While it is possible, I don't think it is encouraged.
>
> As you know, the call your currently invoking to create the
> StreamingContext also creates a
> SparkContext.
>
> /** * Create a StreamingContext by providing the configuration necessary
> for a new SparkContext.
> * @param conf a org.apache.spark.SparkConf object specifying Spark
> parameters
> * @param batchDuration the time interval at which streaming data will be
> divided into batches
> */
> def this(conf: SparkConf, batchDuration: Duration) = {
> this(StreamingContext.createNewSparkContext(conf), null, batchDuration)
> }
>
>
> Could you rearrange the code slightly to either create the SparkContext
> first and pass that to the creation of the StreamContext
> like below:
>
> val sc = new SparkContext(sparkConf)
> val streamingContext = new StreamingContext(sc, Seconds(batchInterval))
>
> *val HiveContext = new HiveContext(sc)*
>
> Or remove / replace the line in red from your code and just set the val
> sparkContext = streamingContext.sparkContext.
>
> val streamingContext = new StreamingContext(sparkConf,
> Seconds(batchInterval))
> *val sparkContext  = new SparkContext(sparkConf)*
> val HiveContext = new HiveContext(streamingContext.sparkContext)
>
> HTH.
>
> -Todd
>
>
> On Thu, Sep 8, 2016 at 9:11 AM, Mich Talebzadeh  > wrote:
>
>> Ok I managed to sort that one out.
>>
>> This is what I am facing
>>
>>  val sparkConf = new SparkConf().
>>  setAppName(sparkAppName).
>>  set("spark.driver.allowMultipleContexts", "true").
>>  set("spark.hadoop.validateOutputSpecs", "false")
>>  // change the values accordingly.
>>  sparkConf.set("sparkDefaultParllelism",
>> sparkDefaultParallelismValue)
>>  sparkConf.set("sparkSerializer", sparkSerializerValue)
>>  sparkConf.set("sparkNetworkTimeOut",
>> sparkNetworkTimeOutValue)
>>  // If you want to see more details of batches please

Re: Creating HiveContext withing Spark streaming

2016-09-08 Thread Todd Nist
Hi Mich,

Perhaps the issue is having multiple SparkContexts in the same JVM (
https://issues.apache.org/jira/browse/SPARK-2243).
While it is possible, I don't think it is encouraged.

As you know, the call your currently invoking to create the
StreamingContext also creates a
SparkContext.

/** * Create a StreamingContext by providing the configuration necessary
for a new SparkContext.
* @param conf a org.apache.spark.SparkConf object specifying Spark
parameters
* @param batchDuration the time interval at which streaming data will be
divided into batches
*/
def this(conf: SparkConf, batchDuration: Duration) = {
this(StreamingContext.createNewSparkContext(conf), null, batchDuration)
}


Could you rearrange the code slightly to either create the SparkContext
first and pass that to the creation of the StreamContext
like below:

val sc = new SparkContext(sparkConf)
val streamingContext = new StreamingContext(sc, Seconds(batchInterval))

*val HiveContext = new HiveContext(sc)*

Or remove / replace the line in red from your code and just set the val
sparkContext = streamingContext.sparkContext.

val streamingContext = new StreamingContext(sparkConf,
Seconds(batchInterval))
*val sparkContext  = new SparkContext(sparkConf)*
val HiveContext = new HiveContext(streamingContext.sparkContext)

HTH.

-Todd


On Thu, Sep 8, 2016 at 9:11 AM, Mich Talebzadeh 
wrote:

> Ok I managed to sort that one out.
>
> This is what I am facing
>
>  val sparkConf = new SparkConf().
>  setAppName(sparkAppName).
>  set("spark.driver.allowMultipleContexts", "true").
>  set("spark.hadoop.validateOutputSpecs", "false")
>  // change the values accordingly.
>  sparkConf.set("sparkDefaultParllelism",
> sparkDefaultParallelismValue)
>  sparkConf.set("sparkSerializer", sparkSerializerValue)
>  sparkConf.set("sparkNetworkTimeOut",
> sparkNetworkTimeOutValue)
>  // If you want to see more details of batches please increase
> the value
>  // and that will be shown UI.
>  sparkConf.set("sparkStreamingUiRetainedBatches",
>sparkStreamingUiRetainedBatchesValue)
>  sparkConf.set("sparkWorkerUiRetainedDrivers",
>sparkWorkerUiRetainedDriversValue)
>  sparkConf.set("sparkWorkerUiRetainedExecutors",
>sparkWorkerUiRetainedExecutorsValue)
>  sparkConf.set("sparkWorkerUiRetainedStages",
>sparkWorkerUiRetainedStagesValue)
>  sparkConf.set("sparkUiRetainedJobs",
> sparkUiRetainedJobsValue)
>  sparkConf.set("enableHiveSupport",enableHiveSupportValue)
>  sparkConf.set("spark.streaming.stopGracefullyOnShutdown","
> true")
>  sparkConf.set("spark.streaming.receiver.writeAheadLog.enable",
> "true")
>  
> sparkConf.set("spark.streaming.driver.writeAheadLog.closeFileAfterWrite",
> "true")
>  
> sparkConf.set("spark.streaming.receiver.writeAheadLog.closeFileAfterWrite",
> "true")
>  var sqltext = ""
>  val batchInterval = 2
>  val streamingContext = new StreamingContext(sparkConf,
> Seconds(batchInterval))
>
> With the above settings,  Spark streaming works fine. *However, after
> adding the first line below (in red)*
>
> *val sparkContext  = new SparkContext(sparkConf)*
> val HiveContext = new HiveContext(streamingContext.sparkContext)
>
> I get the following errors:
>
> 16/09/08 14:02:32 ERROR JobScheduler: Error running job streaming job
> 1473339752000 ms.0
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
> in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
> 0.0 (TID 7, 50.140.197.217): java.io.IOException:
> *org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of
> broadcast_0*at org.apache.spark.util.Utils$.
> tryOrIOException(Utils.scala:1260)
> at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(
> TorrentBroadcast.scala:174)
> at org.apache.spark.broadcast.TorrentBroadcast._value$
> lzycompute(TorrentBroadcast.scala:65)
> at org.apache.spark.broadcast.TorrentBroadcast._value(
> TorrentBroadcast.scala:65)
> at org.apache.spark.broadcast.TorrentBroadcast.getValue(
> TorrentBroadcast.scala:89)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:67)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:274)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: 

Re: Creating HiveContext withing Spark streaming

2016-09-08 Thread Mich Talebzadeh
Ok I managed to sort that one out.

This is what I am facing

 val sparkConf = new SparkConf().
 setAppName(sparkAppName).
 set("spark.driver.allowMultipleContexts", "true").
 set("spark.hadoop.validateOutputSpecs", "false")
 // change the values accordingly.
 sparkConf.set("sparkDefaultParllelism",
sparkDefaultParallelismValue)
 sparkConf.set("sparkSerializer", sparkSerializerValue)
 sparkConf.set("sparkNetworkTimeOut", sparkNetworkTimeOutValue)
 // If you want to see more details of batches please increase
the value
 // and that will be shown UI.
 sparkConf.set("sparkStreamingUiRetainedBatches",
   sparkStreamingUiRetainedBatchesValue)
 sparkConf.set("sparkWorkerUiRetainedDrivers",
   sparkWorkerUiRetainedDriversValue)
 sparkConf.set("sparkWorkerUiRetainedExecutors",
   sparkWorkerUiRetainedExecutorsValue)
 sparkConf.set("sparkWorkerUiRetainedStages",
   sparkWorkerUiRetainedStagesValue)
 sparkConf.set("sparkUiRetainedJobs", sparkUiRetainedJobsValue)
 sparkConf.set("enableHiveSupport",enableHiveSupportValue)

sparkConf.set("spark.streaming.stopGracefullyOnShutdown","true")
 sparkConf.set("spark.streaming.receiver.writeAheadLog.enable",
"true")

sparkConf.set("spark.streaming.driver.writeAheadLog.closeFileAfterWrite",
"true")

sparkConf.set("spark.streaming.receiver.writeAheadLog.closeFileAfterWrite",
"true")
 var sqltext = ""
 val batchInterval = 2
 val streamingContext = new StreamingContext(sparkConf,
Seconds(batchInterval))

With the above settings,  Spark streaming works fine. *However, after
adding the first line below (in red)*

*val sparkContext  = new SparkContext(sparkConf)*
val HiveContext = new HiveContext(streamingContext.sparkContext)

I get the following errors:

16/09/08 14:02:32 ERROR JobScheduler: Error running job streaming job
1473339752000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 7, 50.140.197.217): java.io.IOException:
*org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of
broadcast_0*at
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1260)
at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:174)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:67)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to get
broadcast_0_piece0 of broadcast_0


Hm any ideas?

Thanks





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 8 September 2016 at 12:28, Mich Talebzadeh 
wrote:

>
> Hi,
>
> This may not be feasible in Spark streaming.
>
> I am trying to create a HiveContext in Spark streaming within the
> streaming context
>
> // Create a local StreamingContext with two working thread and batch
> interval of 2 seconds.
>
>  val sparkConf = new SparkConf().
>  setAppName(sparkAppName).
>  set("spark.driver.allowMultipleContexts", "true").
>  set("spark.hadoop.validateOutputSpecs", "false")
> .
>
> Now try to create an sc
>
> val sc = new SparkContext(sparkConf)
> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>
> This is accepted but it creates two spark jobs
>
>
> [image: Inline images 1]
>
> And basically it goes to a waiting state
>
> Any ideas how one  can create a HiveContext within Spark streaming?
>
> Thanks
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> 

Creating HiveContext withing Spark streaming

2016-09-08 Thread Mich Talebzadeh
Hi,

This may not be feasible in Spark streaming.

I am trying to create a HiveContext in Spark streaming within the streaming
context

// Create a local StreamingContext with two working thread and batch
interval of 2 seconds.

 val sparkConf = new SparkConf().
 setAppName(sparkAppName).
 set("spark.driver.allowMultipleContexts", "true").
 set("spark.hadoop.validateOutputSpecs", "false")
.

Now try to create an sc

val sc = new SparkContext(sparkConf)
val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

This is accepted but it creates two spark jobs


[image: Inline images 1]

And basically it goes to a waiting state

Any ideas how one  can create a HiveContext within Spark streaming?

Thanks






Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.