Re: Spark Kafka Performance

Eduardo Costa Alfaia Wed, 05 Nov 2014 06:24:20 -0800

Hi Bhavesh

I will collect the  dump and I will send for you.


I am using a program that I have caught here  
https://github.com/edenhill/librdkafka/tree/master/examples 
<https://github.com/edenhill/librdkafka/tree/master/examples> and I have 
changed to meet my tests. I have attached the files.




        
        
> On Nov 5, 2014, at 04:45, Bhavesh Mistry <mistry.p.bhav...@gmail.com> wrote:
> 
> Hi Eduardo,
> 
> Can you please take thread dump and see if there are blocking issues on
> producer side ?  Do you have single instance of Producers and Multiple
> treads ?
> 
> Are you using Scala Producer or New Java Producer ?  Also, what is your
> producer property ?
> 
> 
> Thanks,
> 
> Bhavesh
> 
> On Tue, Nov 4, 2014 at 12:40 AM, Eduardo Alfaia <e.costaalf...@unibs.it>
> wrote:
> 
>> Hi Gwen,
>> I have changed the java code kafkawordcount to use reducebykeyandwindow in
>> spark.
>> 
>> ----- Messaggio originale -----
>> Da: "Gwen Shapira" <gshap...@cloudera.com>
>> Inviato: ‎03/‎11/‎2014 21:08
>> A: "users@kafka.apache.org" <users@kafka.apache.org>
>> Cc: "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
>> Oggetto: Re: Spark Kafka Performance
>> 
>> Not sure about the throughput, but:
>> 
>> "I mean that the words counted in spark should grow up" - The spark
>> word-count example doesn't accumulate.
>> It gets an RDD every n seconds and counts the words in that RDD. So we
>> don't expect the count to go up.
>> 
>> 
>> 
>> On Mon, Nov 3, 2014 at 6:57 AM, Eduardo Costa Alfaia <
>> e.costaalf...@unibs.it
>>> wrote:
>> 
>>> Hi Guys,
>>> Anyone could explain me how to work Kafka with Spark, I am using the
>>> JavaKafkaWordCount.java like a test and the line command is:
>>> 
>>> ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount
>>> spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3
>>> 
>>> and like a producer I am using this command:
>>> 
>>> rdkafka_cachesender -t unibs.nec -p 1 -b 192.168.0.46:9092 -f output.txt
>>> -l 100 -n 10
>>> 
>>> 
>>> rdkafka_cachesender is a program that was developed by me which send to
>>> kafka the output.txt’s content where -l is the length of each send(upper
>>> bound) and -n is the lines to send in a row. Bellow is the throughput
>>> calculated by the program:
>>> 
>>> File is 2235755 bytes
>>> throughput (b/s) = 699751388
>>> throughput (b/s) = 723542382
>>> throughput (b/s) = 662989745
>>> throughput (b/s) = 505028200
>>> throughput (b/s) = 471263416
>>> throughput (b/s) = 446837266
>>> throughput (b/s) = 409856716
>>> throughput (b/s) = 373994467
>>> throughput (b/s) = 366343097
>>> throughput (b/s) = 373240017
>>> throughput (b/s) = 386139016
>>> throughput (b/s) = 373802209
>>> throughput (b/s) = 369308515
>>> throughput (b/s) = 366935820
>>> throughput (b/s) = 365175388
>>> throughput (b/s) = 362175419
>>> throughput (b/s) = 358356633
>>> throughput (b/s) = 357219124
>>> throughput (b/s) = 352174125
>>> throughput (b/s) = 348313093
>>> throughput (b/s) = 355099099
>>> throughput (b/s) = 348069777
>>> throughput (b/s) = 348478302
>>> throughput (b/s) = 340404276
>>> throughput (b/s) = 339876031
>>> throughput (b/s) = 339175102
>>> throughput (b/s) = 327555252
>>> throughput (b/s) = 324272374
>>> throughput (b/s) = 322479222
>>> throughput (b/s) = 319544906
>>> throughput (b/s) = 317201853
>>> throughput (b/s) = 317351399
>>> throughput (b/s) = 315027978
>>> throughput (b/s) = 313831014
>>> throughput (b/s) = 310050384
>>> throughput (b/s) = 307654601
>>> throughput (b/s) = 305707061
>>> throughput (b/s) = 307961102
>>> throughput (b/s) = 296898200
>>> throughput (b/s) = 296409904
>>> throughput (b/s) = 294609332
>>> throughput (b/s) = 293397843
>>> throughput (b/s) = 293194876
>>> throughput (b/s) = 291724886
>>> throughput (b/s) = 290031314
>>> throughput (b/s) = 289747022
>>> throughput (b/s) = 289299632
>>> 
>>> The throughput goes down after some seconds and it does not maintain the
>>> performance like the initial values:
>>> 
>>> throughput (b/s) = 699751388
>>> throughput (b/s) = 723542382
>>> throughput (b/s) = 662989745
>>> 
>>> Another question is about spark, after I have started the spark line
>>> command after 15 sec spark continue to repeat the words counted, but my
>>> program continue to send words to kafka, so I mean that the words counted
>>> in spark should grow up. I have attached the log from spark.
>>> 
>>> My Case is:
>>> 
>>> ComputerA(Kafka_cachsesender) -> ComputerB(Kakfa-Brokers-Zookeeper) ->
>>> ComputerC (Spark)
>>> 
>>> If I don’t explain very well send a reply to me.
>>> 
>>> Thanks Guys
>>> --
>>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>>> 
>> 
>> --
>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>> 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

Re: Spark Kafka Performance

Reply via email to