2018-01-04 16:24:57 UTC - Daniel Ferreira Jorge: What is the proxy component? I cannot find any kind of documentation for it (<https://github.com/apache/incubator-pulsar/blob/master/kubernetes/generic/proxy.yaml>). Why is it only used in the generic kubernetes deployment and not in the gke version? ---- 2018-01-04 16:27:13 UTC - Matteo Merli: It’s a component that was introduced recently. Essentially it’s a stateless proxy that speaks that Pulsar binary protocol. The motivation is to avoid (or overcome the impossibility) of direct connection between clients and brokers. ---- 2018-01-04 16:28:35 UTC - Matteo Merli: being a stateless service, it can be exposed through a regular load balancer (eg: ElasticLoadBalancer or clusterIP/nodePort in kubernetes) ---- 2018-01-04 16:29:09 UTC - Matteo Merli: (documentation for that it’s not really “complete”) ---- 2018-01-04 16:30:52 UTC - Matteo Merli: the way it works is to point the clients to the proxy rather than the brokers, and the proxy will make sure to redirect all the connections through itself ---- 2018-01-04 16:33:57 UTC - Daniel Ferreira Jorge: so instead of having the clients connect directly to the brokers, they should connect to the proxies, right? ---- 2018-01-04 16:34:52 UTC - Matteo Merli: correct, there’s the overhead of the extra network hop, but it can simplify the deployments, especially in terms on network ACLs ---- 2018-01-04 16:35:11 UTC - Matteo Merli: or to expose the service outside of a Kubernetes cluster ---- 2018-01-04 16:36:56 UTC - Matteo Merli: because normally, brokers are advertising their own address to client. That could be either the `podIP` or the `nodeIP` but it needs to be accessible from clients (in absence of proxy) ---- 2018-01-04 16:38:40 UTC - Daniel Ferreira Jorge: great, thanks ---- 2018-01-05 01:27:25 UTC - Daniel Ferreira Jorge: I think it would be really nice if pulsar implemented a multi tiered storage system like pravega. Pravega uses bookkeeper for fresher data and some sort of slower and more cost-efficient storage like HDFS, S3 or GCS for older data. From the perspective of clients everything is the same, but historical data comes from cold storage. This is just an idea for the future... <http://pravega.io/docs/pravega-concepts/#a-note-on-tiered-storage> ----