Hi Kenneth,
Thanks for your reply. You are right, I also believe that this has more to
do with Dataflow than Elasticsearch. I don't think the problem is in
classpath or beam unable to find file in classpath. The problem is how to.
set worker VM's keystore and truststore with self signed root ca. Usually
they contain all trusted root CAs like letsencrypt, symantec etc. KafkaIO
provide that via withConsumerFactorFn(loadKafkaConfig) where you can do
something like following:
private void loadKafkaConfig(Map<String, Object> config) {
readJKSFileFromGCS(this.gcsTruststorePath,
"/tmp/kafka.client.truststore.jks");
readJKSFileFromGCS(this.gcsKeystorePath, "/tmp/kafka.client.keystore.jks");
config.put("ssl.truststore.location","/tmp/kafka.client.truststore.jks");
config.put("ssl.truststore.password","clientsecret");
config.put("ssl.keystore.location","/tmp/kafka.client.keystore.jks");
config.put("ssl.keystore.password","clientsecret");
config.put("ssl.key.password","clientsecret");
}
I was wondering if ElasticIO can also provide similar support where we can
provide our self signed root ca.
Thanks and Regards
Mohil
On Tue, Apr 7, 2020 at 8:14 AM Kenneth Knowles <[email protected]> wrote:
> Hi Mohil,
>
> Thanks for the detailed report. I think most people are reduced capacity
> right now. Filing a Jira would be helpful for tracking this.
>
> Since I am writing, I will add a quick guess, but we should move to Jira.
> It seems this has more to do with Dataflow than ElasticSearch. The default
> for staging files is to scan the classpath. To do more, or to fix any
> problem with the autodetection, you will need to specify --filesToStage on
> the command line or setFilesToStage in Java code. Am I correct that this
> symptom is confirmed?
>
> Kenn
>
> On Mon, Apr 6, 2020 at 5:04 PM Mohil Khare <[email protected]> wrote:
>
>> Any update on this? Shall I open a jira for this support ?
>>
>> Thanks and regards
>> Mohil
>>
>> On Sun, Mar 22, 2020 at 9:36 PM Mohil Khare <[email protected]> wrote:
>>
>>> Hi,
>>> This is Mohil from Prosimo, a small bay area based stealth mode startup.
>>> We use Beam (on version 2.19) with google dataflow in our analytics
>>> pipeline with Kafka and PubSub as source while GCS, BigQuery and
>>> ElasticSearch as our sink.
>>>
>>> We want to use our private self signed root ca for tls connections
>>> between our internal services viz kafka, ElasticSearch, beam etc. We are
>>> able to setup secure tls connection between beam and kafka using self
>>> signed root certificate in keystore.jks and truststore.jks and transferring
>>> it to worker VMs running kafkaIO using KafkaIO's read via
>>> withConsumerFactorFn().
>>>
>>> We want to do similar things with elasticseachIO where we want to update
>>> its worker VM's truststore with our self signed root certificate so that
>>> when elasticsearchIO connects using HTTPS, it can connect successfully
>>> without ssl handshake failure. Currently we couldn't find any way to do so
>>> with ElasticsearchIO. We tried various possible workarounds like:
>>>
>>> 1. Trying JvmInitializer to initialise Jvm with truststore using
>>> System.setproperty for javax.net.ssl.trustStore,
>>> 2. Transferring our jar to GCP's appengine where we start jar using
>>> Djavax.net.ssl.trustStore and then triggering beam job from there.
>>> 3. Setting elasticsearchIO flag withTrustSelfSigned to true (I don't
>>> think it will work because looking at the source code, it looks like it has
>>> dependency with keystorePath)
>>>
>>> But nothing worked. When we logged in to worker VMs, it looked like our
>>> trustStore never made it to worker VM. All elasticsearchIO connections
>>> failed with the following exception:
>>>
>>> sun.security.validator.ValidatorException: PKIX path building failed:
>>> sun.security.provider.certpath.SunCertPathBuilderException: unable to find
>>> valid certification path to requested target
>>> sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
>>>
>>>
>>> Right now to unblock ourselves, we have added proxy with letsencrypt
>>> root ca between beam and Elasticsearch cluster and beam's elasticsearchIO
>>> connect successfully to proxy using letsencrypt root certificate. We won't
>>> want to use Letsencrypt root certificater for internal services as it
>>> expires every three months. Is there a way, just like kafkaIO, to use
>>> selfsigned root certificate with elasticsearchIO? Or is there a way to
>>> update java cacerts on worker VMs where beam job is running?
>>>
>>> Looking forward for some suggestions soon.
>>>
>>> Thanks and Regards
>>> Mohil Khare
>>>
>>