[GitHub] [spark] gaborgsomogyi commented on issue #25911: [SPARK-29223][SQL][SS] Enable global timestamp per topic while specifying offset by timestamp in Kafka source

GitBox Tue, 01 Oct 2019 03:41:57 -0700

gaborgsomogyi commented on issue #25911: [SPARK-29223][SQL][SS] Enable global 
timestamp per topic while specifying offset by timestamp in Kafka source
URL: https://github.com/apache/spark/pull/25911#issuecomment-536978378
 
 
   What I've seen until now such cases where partition number is huge the list 
is generated with code.
   
   Where there may be potential is the second use-case what you've mentioned. A 
common pattern in the Kafka area to over-allocate the number of partitions 
initially (since not easy to scale Kafka when huge data is handled). Such case 
maybe 1000 partitions created initially but only 200 used. When data volume 
increase additional sleeping partitions can be added without doing a heavy 
re-partitioning. In this quite common use-case I don't see it could help.
   What I can imagine is for debugging purposes where not much things scripted 
and one doesn't care if the topic re-created with different number of 
partitions. Adding ~200 lines for this reason is questionable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gaborgsomogyi commented on issue #25911: [SPARK-29223][SQL][SS] Enable global timestamp per topic while specifying offset by timestamp in Kafka source

Reply via email to