[GitHub] spark pull request: [SPARK-7356][STREAMING] Fix flakey tests in Fl...

harishreedharan Thu, 07 May 2015 10:54:07 -0700

Github user harishreedharan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5918#discussion_r29877536
  
    --- Diff: 
external/flume/src/test/scala/org/apache/spark/streaming/flume/FlumePollingStreamSuite.scala
 ---
    @@ -167,26 +167,24 @@ class FlumePollingStreamSuite extends FunSuite with 
BeforeAndAfter with Logging
         }
       }
     
    -  def writeAndVerify(channels: Seq[MemoryChannel], ssc: StreamingContext,
    +  def writeAndVerify(sinks: Seq[SparkSink], channels: Seq[MemoryChannel], 
ssc: StreamingContext,
         outputBuffer: ArrayBuffer[Seq[SparkFlumeEvent]]) {
         val clock = ssc.scheduler.clock.asInstanceOf[ManualClock]
         val executor = Executors.newCachedThreadPool()
         val executorCompletion = new ExecutorCompletionService[Void](executor)
    -    channels.map(channel => {
    +
    +    val latch = new CountDownLatch(batchCount * channels.size)
    +    sinks.foreach(_.countdownWhenBatchReceived(latch))
    +
    +    channels.foreach(channel => {
           executorCompletion.submit(new TxnSubmitter(channel, clock))
         })
    +
         for (i <- 0 until channels.size) {
           executorCompletion.take()
         }
    -    val startTime = System.currentTimeMillis()
    -    while (outputBuffer.size < batchCount * channels.size &&
    -      System.currentTimeMillis() - startTime < 15000) {
    -      logInfo("output.size = " + outputBuffer.size)
    -      Thread.sleep(100)
    -    }
    -    val timeTaken = System.currentTimeMillis() - startTime
    -    assert(timeTaken < 15000, "Operation timed out after " + timeTaken + " 
ms")
    -    logInfo("Stopping context")
    +
    +    latch.await(15, TimeUnit.SECONDS)
    --- End diff --
    
    So in the old version, we use manual clock yet we don;t actually move the 
batches forward after sending the data (but after we add to the channel) - 
which is a bug in itself. So we need to do two things - ensure the data is sent 
using the latches, and then move the batches forward via the clock. The bug was 
basically caused by us moving the batches forward without ensuring the data was 
actually sent (we ensured the data was being added to the channel, not 
essentially pulled off the sink)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7356][STREAMING] Fix flakey tests in Fl...

Reply via email to