smrosenberry commented on issue #23926: [SPARK-26872][STREAMING] Use a 
configurable value for final termination in the JobScheduler.stop() method
URL: https://github.com/apache/spark/pull/23926#issuecomment-468943623
 
 
   Basically, I found I could process a single batch of file input data through 
a streaming pipeline by:
    
   1. Preloading the streaming context queue with an RDD of the records from 
the file(s): `StreamingContext.queueStream(queue,false)`
   2. Starting the streaming context: `StreamingContext.start()`
   3. Immediately and gracefully stopping the streaming context: 
`StreamingContext.stop(true,true)`
   
   The batch interval, not unexpectedly, determines when the first (and in my 
case only) batch actually begins processing.  Since I'm impatient (and who 
among us isn't?), my batch interval is 1 millisecond. Processing begins 
immediately.
   
   Based upon the size of the input file, my expectation is to set the new 
spark.streaming.jobTimeout value to twice the guestimated run time.
   
   I expect my jobs to run for hours, not days.  While specifying the 
jobTimeout in units of hours is acceptable, it may not be granular enough for 
other potential use cases.  Specifying the timeout in minutes feels like the 
proper compromise between flexibility and awkwardly large numbers.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to