Hi all, I'm looking at setting up a (small) Kafka cluster for streaming microscope data to Spark-Streaming.
The producer would be a single Windows 7 machine with a 1Gb or 10Gb ethernet connection running http posts from Matlab (this bit is a little fuzzy, and I'm not the user, I'm an admin), the consumer would be 10-60 (or more) Linux nodes running Spark-Streaming with 10Gb ethernet connections. Target data rate per the user is <200MB/sec, although I can see this scaling in the future. Based on the documentation, my initial thoughts were as follows: 3 nodes, all running ZK and the broker Dell R620 2x8 core 2.6GHz Xeon 256GB RAM 8x300GB 15K SAS drives (OS runs on 2, ZK on 1, broker on the last 5) 10Gb ethernet (single port) Do these specs make sense? Am I over or under-speccing in any of the areas? It made sense to me to make the filesystem cache as large as possible, particularly when I'm dealing with a small number of brokers. Thanks, Ken Carlile Senior Unix Engineer, Scientific Computing Systems Janelia Farm Research Campus, HHMI