Hi Harsha and Andrew,



I’ve written a custom producer with random generated keys and now data is 
distributed evenly among partitions. Thank you so much for your support.



Cheers,

Huy, Le Van








On Friday, Dec 5, 2014 at 5:02 a.m., Harsha <[email protected]>, wrote:

Using kafka-console-producer is a bad idea. It should only be used for testing 
a topic. I highly recommend writing your own producer.  KafkaSpout uses simple 
level api which doesn't have consumer group . But you can try using 


bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker

to check the partition size for a topic. 



https://cwiki.apache.org/confluence/display/KAFKA/System+Tools#SystemTools-ConsumerOffsetChecker


 

 

On Thu, Dec 4, 2014, at 05:50 PM, Andrew Neilson wrote:




Over the long term the partitions would be used evenly, but unless you change 
the partitioning scheme or message key then at any given time only one 
partition will be receiving *new* messages.


 


If you want to test that your topology properly distributes the work at the 
spout level, you could try loading from the beginning of your topic rather than 
from the end.


 

To do that, set these values in your TridentKafkaConfig:


 


spoutConf.forceFromStart = true;



spoutConf.startOffsetTime = kafka.api.OffsetRequest.EarliestTime(); // actually 
the default, so you don't necessarily need this line







 


On Thu, Dec 4, 2014 at 3:28 PM, Huy Le Van <[email protected]> wrote:








 


I just dumped from text files directly to kafka producer using 
bin/kafka-console-producer.sh so I guess the keys were all null. I’ll write a 
producer to see. By the way, what is the command to show the distribution of my 
data in kafka?


 

 




 


Best regards,



Huy, Le Van


 







 



On Thursday, Dec 4, 2014 at 11:23 p.m., Harsha <[email protected]>, wrote:




 



 



It doesnt' look like your kafka producer is distributing data across the 
partitions. Whats your producer looks like . Are you sending any key with each 
message or using null. If you are using null than what Andrew is saying might 
be the problem. I would recommend using random UUID as a key to send messages 
to your partition.


 

 


On Thu, Dec 4, 2014, at 03:12 PM, Huy Le Van wrote:



 


Hi Harsha,



I’ve attached 2 images below. You can see that I assigned 16 executors, only 
one seemed to work. The other screenshot is the partition table.


 


Hi Andrew,



That’s an interesting. I’m quite new to Kafka. May you take a look at the 
second screenshot to see if the data was distributed evenly? Let’s say it was 
written to one partition at a time (yes, this is the case where I used only one 
producer), would it be rebalanced afterward?


 





 

 





 





Best regards,



Huy, Le Van


 








On Thursday, Dec 4, 2014 at 10:00 p.m., Andrew Neilson <[email protected]>, 
wrote:





How is the kafka topic you are reading from partitioned? By default, kafka will 
write to a single random partition at a time for 10 minutes before switching to 
another. So if you are looking at live data, you would only see data in one 
partition at a time unless you use a different partitioning scheme.


 


See the Kafka FAQ for details on this 
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified?





 

 



On Thu, Dec 4, 2014 at 1:51 PM, Harsha <[email protected]> wrote:


 








can you post your storm UI executors page image. If there are 16 executors but 
only 1 seems to have fetching data. Can you please check on your kafka producer 
if its distributing your data among all of your partitions.




 

 


On Thu, Dec 4, 2014, at 12:32 PM, Huy Le Van wrote:



 


Could someone help me please?





 


Best regards,



Huy, Le Van








 


On Thursday, Dec 4, 2014 at 3:35 p.m., Huy Le Van 
<[email protected]>, wrote:



 


Hi,


 


I’m trying to tune Kafka Trident (Transactional) and seeing that the ‘spout0’ 
bolt uses only one executor. The problem is exactly as described in 
https://groups.google.com/forum/#!msg/storm-user/bI7976v9R5g/fulzpnPmzkEJ



However, my Kafka topic has 16 partitions and I already set parallelismHint of 
TransactionalTridentKafkaSpout to 16. What am I doing wrong here? Please advise.




 


Many thanks,



Huy, Le Van






 


 


 










 




 

Email had 2 attachments:


storm01.png
  165k (image/png)

storm02.png
  476k (image/png)

Reply via email to