Hi Vicky, I have fixed the images, please check again.
Regards, On Thu, Jul 27, 2017 at 8:20 PM, Vicky Kak <vicky....@gmail.com> wrote: > Thanks Abhishek for the confirmation. > > I am not able to see the images in the GAAS wiki, the images seems to be > coming from the google docs and I could make that my id does not have > access. May be making he images public would help, can you please check why > I am not able to see the images in the wiki? > > Regards, > Vicky > > > > > > On Thu, Jul 27, 2017 at 7:41 PM, Abhishek Tiwari <a...@apache.org> wrote: > >> Hi Vicky, >> >> My responses are inlined in blue. You are on right track. >> >> Also the design doc of Gobblin as a Service for your reference: >> https://cwiki.apache.org/confluence/display/GOBBLIN/Gobblin+as+a+Service >> >> Regards, >> Abhishek >> >> On Wed, Jul 26, 2017 at 5:45 AM, Vicky Kak <vicky....@gmail.com> wrote: >> >>> Hi, >>> >>> I did spend more time looking at the code details and have following to >>> share. >>> >>> I do see that GobblinServiceManager( this is bootstrap class for the >>> gobblin service) performing these >>> 1) Initialising the TopologyCatalog,FlowCatalog,He >>> lix,ServiceScheduler,EmbeddedLiServer and finally >>> Orchestator/TopologySpecFactory. >>> 2) The FlowConfigClient seems to creating the FlowConfig, then FlowSpec >>> via FlowConfigResource ( via RestEndpoint). >>> 3) The JobSpec gets added to the FlowCatalog after which the >>> Orchestrator pushes the JobSpec to the Kafka via >>> SimpleKafkaStepExecutionProducer. >>> >>> I have been looking for a code which will use the >>> SimpleKafkaStepExecutionConsumer, but could not find how it is hooked >>> with the running instance of the Gobblin. >>> >> Look at gobblin-cluster and default config for classes being loaded for >> listeners, JobConfigurationManager, etc. >> >> >>> >>> Here is how the gobblin service will invoke the Jobs on slaves( gobblin >>> instances) >>> >>> 1) We should have the rest endpoint information so that we can send the >>> JobSpec via FlowConfigClient or via the HTTP GET( rest call, I have not yet >>> tried this). I don't see a way to get the port when the rest server is >>> started. >>> >> We should make it configurable, right now it chooses random port. >> >> >>> 2) The JobSpec is passed to the Kafka via the >>> SimpleKafkaStepExecutionProducer from the gobblin service via >>> Orchestrator. >>> 3) There could be multiple instances of the Gobblin which could be >>> listening to the Kafka using the SimpleKafkaStepExecutionConsumer, all >>> the Gobblin instance should get the JobSpecs. The one instance which >>> matches the job specs should trigger the Job. >>> >> Yes, we can make this a bit less ambiguous though. >> >> >>> >>> The Gobblin service acts as a master and provides the rest endpoint to >>> read/create the JobSpecs which will get triggered on the slaves( which are >>> the Gobblin instances). >>> I have yet not been able to run the flow since there are some build >>> issues I am getting via building the gobblin from the master, the tests are >>> failing right now. >>> >>> Can someone from the development team validate if I am on right tract in >>> terms of understanding the implementation and flows? >>> >> You are on right track. >> >>> >>> I have got more questions which I will post after I confirm that I am >>> not missing anything. >>> >>> Thanks, >>> Vicky >>> >>> On Tue, Jul 25, 2017 at 5:03 PM, Vicky Kak <vicky....@gmail.com> wrote: >>> >>>> To my surprise after I looked at the code and referred the presentation >>>> that Shrishanka had send my ignorance about Gobblin As A Service was >>>> removed >>>> >>>> Gobblin As a service : It is a Global Orchestrator which helps in >>>> submitting the logical flow specifications which are further compiled to >>>> the physical pipelines. >>>> >>>> We have been triggering the Gobblin Jobs using the RestEnd point and it >>>> is done by implementing the custom service as explained here >>>> https://groups.google.com/forum/#!topic/gobblin-users/kHrWh6lfGJM >>>> >>>> I have got the following questions >>>> >>>> 1) What is the use case for Gobblin As service, I don't see the >>>> Orchestrator's rest endpoint port being configurable. If we have to add >>>> FlowSpec using the different machine we need to know the Orchestrator's >>>> host and port details, how do we do it? >>>> >>> We use d2 registry internally for it (if you dont already know about it >> - search for RESTLI D2) >> >> >>> >>>> 2) Does FlowSpec creation creates a new Job deployment which can also >>>> by copying the corresponding .pull or .job file in the gobblin >>>> distribution? >>>> >>> If you are saying that if you bundle a pull file in gobblin distribution >> and create the same via FlowSpec would it mean the same thing, then yes. >> Else I didnt understand the question. >> >> >>> >>>> 3) Since the master.out log gets created when starting a service, I >>>> assume there could be a way to add more Orchestrators to the master that is >>>> started. However I am not sure how to do that, can this be clarified? >>>> >>> Only one node acts as orchestrator and scheduler. Rest of the nodes >> receive requests and pass them to master for scheduling and orchestrating >> via Helix messages. >> >> >>> >>>> Please note that I have been looking at the older code, the git log is >>>> follow. >>>> ************************************************************ >>>> *********************************** >>>> commit 755da9160cd91ea5ebcc752603ce1bffb74a75a1 (HEAD -> master, >>>> origin/master, origin/HEAD) >>>> Author: Kuai Yu <yukuai...@gmail.com> >>>> Date: Tue Apr 11 19:10:53 2017 -0700 >>>> ************************************************************ >>>> *********************************** >>>> >>>> >>>> Thanks, >>>> Vicky >>>> >>> >>> >> >