Thanks Abhishek for the confirmation. I am not able to see the images in the GAAS wiki, the images seems to be coming from the google docs and I could make that my id does not have access. May be making he images public would help, can you please check why I am not able to see the images in the wiki?
Regards, Vicky On Thu, Jul 27, 2017 at 7:41 PM, Abhishek Tiwari <a...@apache.org> wrote: > Hi Vicky, > > My responses are inlined in blue. You are on right track. > > Also the design doc of Gobblin as a Service for your reference: > https://cwiki.apache.org/confluence/display/GOBBLIN/Gobblin+as+a+Service > > Regards, > Abhishek > > On Wed, Jul 26, 2017 at 5:45 AM, Vicky Kak <vicky....@gmail.com> wrote: > >> Hi, >> >> I did spend more time looking at the code details and have following to >> share. >> >> I do see that GobblinServiceManager( this is bootstrap class for the >> gobblin service) performing these >> 1) Initialising the TopologyCatalog,FlowCatalog,He >> lix,ServiceScheduler,EmbeddedLiServer and finally >> Orchestator/TopologySpecFactory. >> 2) The FlowConfigClient seems to creating the FlowConfig, then FlowSpec >> via FlowConfigResource ( via RestEndpoint). >> 3) The JobSpec gets added to the FlowCatalog after which the Orchestrator >> pushes the JobSpec to the Kafka via SimpleKafkaStepExecutionProducer. >> >> I have been looking for a code which will use the >> SimpleKafkaStepExecutionConsumer, but could not find how it is hooked >> with the running instance of the Gobblin. >> > Look at gobblin-cluster and default config for classes being loaded for > listeners, JobConfigurationManager, etc. > > >> >> Here is how the gobblin service will invoke the Jobs on slaves( gobblin >> instances) >> >> 1) We should have the rest endpoint information so that we can send the >> JobSpec via FlowConfigClient or via the HTTP GET( rest call, I have not yet >> tried this). I don't see a way to get the port when the rest server is >> started. >> > We should make it configurable, right now it chooses random port. > > >> 2) The JobSpec is passed to the Kafka via the >> SimpleKafkaStepExecutionProducer from the gobblin service via >> Orchestrator. >> 3) There could be multiple instances of the Gobblin which could be >> listening to the Kafka using the SimpleKafkaStepExecutionConsumer, all >> the Gobblin instance should get the JobSpecs. The one instance which >> matches the job specs should trigger the Job. >> > Yes, we can make this a bit less ambiguous though. > > >> >> The Gobblin service acts as a master and provides the rest endpoint to >> read/create the JobSpecs which will get triggered on the slaves( which are >> the Gobblin instances). >> I have yet not been able to run the flow since there are some build >> issues I am getting via building the gobblin from the master, the tests are >> failing right now. >> >> Can someone from the development team validate if I am on right tract in >> terms of understanding the implementation and flows? >> > You are on right track. > >> >> I have got more questions which I will post after I confirm that I am not >> missing anything. >> >> Thanks, >> Vicky >> >> On Tue, Jul 25, 2017 at 5:03 PM, Vicky Kak <vicky....@gmail.com> wrote: >> >>> To my surprise after I looked at the code and referred the presentation >>> that Shrishanka had send my ignorance about Gobblin As A Service was removed >>> >>> Gobblin As a service : It is a Global Orchestrator which helps in >>> submitting the logical flow specifications which are further compiled to >>> the physical pipelines. >>> >>> We have been triggering the Gobblin Jobs using the RestEnd point and it >>> is done by implementing the custom service as explained here >>> https://groups.google.com/forum/#!topic/gobblin-users/kHrWh6lfGJM >>> >>> I have got the following questions >>> >>> 1) What is the use case for Gobblin As service, I don't see the >>> Orchestrator's rest endpoint port being configurable. If we have to add >>> FlowSpec using the different machine we need to know the Orchestrator's >>> host and port details, how do we do it? >>> >> We use d2 registry internally for it (if you dont already know about it - > search for RESTLI D2) > > >> >>> 2) Does FlowSpec creation creates a new Job deployment which can also by >>> copying the corresponding .pull or .job file in the gobblin distribution? >>> >> If you are saying that if you bundle a pull file in gobblin distribution > and create the same via FlowSpec would it mean the same thing, then yes. > Else I didnt understand the question. > > >> >>> 3) Since the master.out log gets created when starting a service, I >>> assume there could be a way to add more Orchestrators to the master that is >>> started. However I am not sure how to do that, can this be clarified? >>> >> Only one node acts as orchestrator and scheduler. Rest of the nodes > receive requests and pass them to master for scheduling and orchestrating > via Helix messages. > > >> >>> Please note that I have been looking at the older code, the git log is >>> follow. >>> ************************************************************ >>> *********************************** >>> commit 755da9160cd91ea5ebcc752603ce1bffb74a75a1 (HEAD -> master, >>> origin/master, origin/HEAD) >>> Author: Kuai Yu <yukuai...@gmail.com> >>> Date: Tue Apr 11 19:10:53 2017 -0700 >>> ************************************************************ >>> *********************************** >>> >>> >>> Thanks, >>> Vicky >>> >> >> >