You should be able to use a JvmInitializer[1] to set any system properties/configure the JVM trust store. Just make sure it's properly set up in your META-INF/services.
This is supported by Dataflow and all PortableRunners that use a separate process/container for the worker. 1: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/harness/JvmInitializer.java On Thu, Apr 9, 2020 at 10:02 AM Mohil Khare <[email protected]> wrote: > Hi Kenneth, > > Thanks for your reply. You are right, I also believe that this has more to > do with Dataflow than Elasticsearch. I don't think the problem is in > classpath or beam unable to find file in classpath. The problem is how to. > set worker VM's keystore and truststore with self signed root ca. Usually > they contain all trusted root CAs like letsencrypt, symantec etc. KafkaIO > provide that via withConsumerFactorFn(loadKafkaConfig) where you can do > something like following: > > private void loadKafkaConfig(Map<String, Object> config) { > > readJKSFileFromGCS(this.gcsTruststorePath, > "/tmp/kafka.client.truststore.jks"); > readJKSFileFromGCS(this.gcsKeystorePath, > "/tmp/kafka.client.keystore.jks"); > > > config.put("ssl.truststore.location","/tmp/kafka.client.truststore.jks"); > config.put("ssl.truststore.password","clientsecret"); > config.put("ssl.keystore.location","/tmp/kafka.client.keystore.jks"); > config.put("ssl.keystore.password","clientsecret"); > config.put("ssl.key.password","clientsecret"); > } > > > I was wondering if ElasticIO can also provide similar support where we can > provide our self signed root ca. > > Thanks and Regards > Mohil > > > On Tue, Apr 7, 2020 at 8:14 AM Kenneth Knowles <[email protected]> wrote: > >> Hi Mohil, >> >> Thanks for the detailed report. I think most people are reduced capacity >> right now. Filing a Jira would be helpful for tracking this. >> >> Since I am writing, I will add a quick guess, but we should move to Jira. >> It seems this has more to do with Dataflow than ElasticSearch. The default >> for staging files is to scan the classpath. To do more, or to fix any >> problem with the autodetection, you will need to specify --filesToStage on >> the command line or setFilesToStage in Java code. Am I correct that this >> symptom is confirmed? >> >> Kenn >> >> On Mon, Apr 6, 2020 at 5:04 PM Mohil Khare <[email protected]> wrote: >> >>> Any update on this? Shall I open a jira for this support ? >>> >>> Thanks and regards >>> Mohil >>> >>> On Sun, Mar 22, 2020 at 9:36 PM Mohil Khare <[email protected]> wrote: >>> >>>> Hi, >>>> This is Mohil from Prosimo, a small bay area based stealth mode >>>> startup. We use Beam (on version 2.19) with google dataflow in our >>>> analytics pipeline with Kafka and PubSub as source while GCS, BigQuery and >>>> ElasticSearch as our sink. >>>> >>>> We want to use our private self signed root ca for tls connections >>>> between our internal services viz kafka, ElasticSearch, beam etc. We are >>>> able to setup secure tls connection between beam and kafka using self >>>> signed root certificate in keystore.jks and truststore.jks and transferring >>>> it to worker VMs running kafkaIO using KafkaIO's read via >>>> withConsumerFactorFn(). >>>> >>>> We want to do similar things with elasticseachIO where we want to >>>> update its worker VM's truststore with our self signed root certificate so >>>> that when elasticsearchIO connects using HTTPS, it can connect successfully >>>> without ssl handshake failure. Currently we couldn't find any way to do so >>>> with ElasticsearchIO. We tried various possible workarounds like: >>>> >>>> 1. Trying JvmInitializer to initialise Jvm with truststore using >>>> System.setproperty for javax.net.ssl.trustStore, >>>> 2. Transferring our jar to GCP's appengine where we start jar using >>>> Djavax.net.ssl.trustStore and then triggering beam job from there. >>>> 3. Setting elasticsearchIO flag withTrustSelfSigned to true (I don't >>>> think it will work because looking at the source code, it looks like it has >>>> dependency with keystorePath) >>>> >>>> But nothing worked. When we logged in to worker VMs, it looked like our >>>> trustStore never made it to worker VM. All elasticsearchIO connections >>>> failed with the following exception: >>>> >>>> sun.security.validator.ValidatorException: PKIX path building failed: >>>> sun.security.provider.certpath.SunCertPathBuilderException: unable to find >>>> valid certification path to requested target >>>> sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387) >>>> >>>> >>>> Right now to unblock ourselves, we have added proxy with letsencrypt >>>> root ca between beam and Elasticsearch cluster and beam's elasticsearchIO >>>> connect successfully to proxy using letsencrypt root certificate. We won't >>>> want to use Letsencrypt root certificater for internal services as it >>>> expires every three months. Is there a way, just like kafkaIO, to use >>>> selfsigned root certificate with elasticsearchIO? Or is there a way to >>>> update java cacerts on worker VMs where beam job is running? >>>> >>>> Looking forward for some suggestions soon. >>>> >>>> Thanks and Regards >>>> Mohil Khare >>>> >>>
