Hi Abhishek,

Some of the review points after going through the wiki

1) There is no component available by the name of "FlowManager", it seems
the FlowManager is basically the FlowConfigsResource+RestLi handling the
user invocation.

2) There is not explicit mention of the trigerring of the existing Flow, it
seems to be triggered via the POST call as mentoned in the documentation as

curli http://localhost:8080/flowconfigs -X POST -H 'X-RestLi-Method:
create' -H 'X-RestLi-Protocol-Version: 2.0.0' --data '{"flowName" :
"myflow1", "flowGroup" : "mygroup", "templateNames" :
"FS:///mytemplate.template", "schedule" : "", "properties" : {"prop1" :
"value1"}}'


3) You can see the type in the wiki in 2, check the curli part.

4) I am not able to see the code related Monitoring being present in the
GobblinServiceManager, where is the monitoring piece present?

5) The Appendix section contains the reference to the Components which
seems not be present like SimpleRESTSpecExecutor,OrchestratorModule( module
name should be removed) and many more are possible. Also I am not able to
search for GobblinRestFlowMonitor etc.. I have got build erros in the
Eclipse may be that is the reason I am not able to see these classes.

Also I see the the GAAS sending the Jobs to the SpecExecutorInstance via
Kafka/git etc however I am yet not able to find how the
SpecExecutorInstance is configured in the Gobblin Instances where the Jobs
should be constructed and triggered. How and where do we configure the
SpecExecutorIntance for the Gobblins Instances for which the Jobs can be
configured/triggered via GAAS?


Thanks,
Vicky

On Fri, Jul 28, 2017 at 9:07 AM, Vicky Kak <vicky....@gmail.com> wrote:

> I can see the images now.
>
> Thanks,
> Vicky
>
> On Fri, Jul 28, 2017 at 9:05 AM, Abhishek Tiwari <
> abhishektiwari.bt...@gmail.com> wrote:
>
>> Hi Vicky,
>>
>> I have fixed the images, please check again.
>>
>> Regards,
>> Abhishek
>>
>>
>> On Thu, Jul 27, 2017 at 8:20 PM, Vicky Kak <vicky....@gmail.com> wrote:
>>
>>> Thanks Abhishek for the confirmation.
>>>
>>> I am not able to see the images in the GAAS wiki, the images seems to be
>>> coming from the google docs and I could make that my id does not have
>>> access. May be making he images public would help, can you please check why
>>> I am not able to see the images in the wiki?
>>>
>>> Regards,
>>> Vicky
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 27, 2017 at 7:41 PM, Abhishek Tiwari <a...@apache.org>
>>> wrote:
>>>
>>>> Hi Vicky,
>>>>
>>>> My responses are inlined in blue. You are on right track.
>>>>
>>>> Also the design doc of Gobblin as a Service for your reference:
>>>> https://cwiki.apache.org/confluence/display/GOBBL
>>>> IN/Gobblin+as+a+Service
>>>>
>>>> Regards,
>>>> Abhishek
>>>>
>>>> On Wed, Jul 26, 2017 at 5:45 AM, Vicky Kak <vicky....@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I did spend more time looking at the code details and have following
>>>>> to share.
>>>>>
>>>>> I do see that GobblinServiceManager( this is bootstrap class for the
>>>>> gobblin service) performing these
>>>>> 1) Initialising the TopologyCatalog,FlowCatalog,He
>>>>> lix,ServiceScheduler,EmbeddedLiServer and finally
>>>>> Orchestator/TopologySpecFactory.
>>>>> 2) The FlowConfigClient seems to creating the FlowConfig, then
>>>>> FlowSpec via FlowConfigResource ( via RestEndpoint).
>>>>> 3) The JobSpec gets added to the FlowCatalog after which the
>>>>> Orchestrator pushes the JobSpec to the Kafka via
>>>>> SimpleKafkaStepExecutionProducer.
>>>>>
>>>>> I have been looking for a code which will use the
>>>>> SimpleKafkaStepExecutionConsumer,  but could not find how it is
>>>>> hooked with the running instance of the Gobblin.
>>>>>
>>>> Look at gobblin-cluster and default config for classes being loaded for
>>>> listeners, JobConfigurationManager, etc.
>>>>
>>>>
>>>>>
>>>>> Here is how the gobblin service will invoke the Jobs on slaves(
>>>>> gobblin instances)
>>>>>
>>>>> 1) We should have the rest endpoint information so that we can send
>>>>> the JobSpec via FlowConfigClient or via the HTTP GET( rest call, I have 
>>>>> not
>>>>> yet tried this). I don't see a way to get the port when the rest server is
>>>>> started.
>>>>>
>>>> We should make it configurable, right now it chooses random port.
>>>>
>>>>
>>>>> 2) The JobSpec is passed to the Kafka via the
>>>>> SimpleKafkaStepExecutionProducer from the gobblin service via
>>>>> Orchestrator.
>>>>> 3) There could be multiple instances of the Gobblin which could be
>>>>> listening to the Kafka using the SimpleKafkaStepExecutionConsumer,
>>>>> all the Gobblin instance should get the JobSpecs. The one instance which
>>>>> matches the job specs should trigger the Job.
>>>>>
>>>> Yes, we can make this a bit less ambiguous though.
>>>>
>>>>
>>>>>
>>>>> The Gobblin service acts as a master and provides the rest endpoint to
>>>>> read/create the JobSpecs which will get triggered on the slaves( which are
>>>>> the Gobblin instances).
>>>>> I have yet not been able to run the flow since there are some build
>>>>> issues I am getting via building the gobblin from the master, the tests 
>>>>> are
>>>>> failing right now.
>>>>>
>>>>> Can someone from the development team validate if I am on right tract
>>>>> in terms of understanding the implementation and flows?
>>>>>
>>>> You are on right track.
>>>>
>>>>>
>>>>> I have got more questions which I will post after I confirm that I am
>>>>> not missing anything.
>>>>>
>>>>> Thanks,
>>>>> Vicky
>>>>>
>>>>> On Tue, Jul 25, 2017 at 5:03 PM, Vicky Kak <vicky....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> To my surprise after I looked at the code and referred the
>>>>>> presentation that Shrishanka had send my ignorance about Gobblin As A
>>>>>> Service was removed
>>>>>>
>>>>>> Gobblin As a service : It is a Global Orchestrator which helps in
>>>>>> submitting the logical flow specifications which are further compiled to
>>>>>> the physical pipelines.
>>>>>>
>>>>>> We have been triggering the Gobblin Jobs using the RestEnd point and
>>>>>> it is done by implementing the custom service as explained here
>>>>>> https://groups.google.com/forum/#!topic/gobblin-users/kHrWh6lfGJM
>>>>>>
>>>>>> I have got the following questions
>>>>>>
>>>>>> 1) What is the use case for Gobblin As service, I don't see the
>>>>>> Orchestrator's rest endpoint port being configurable. If we have to add
>>>>>> FlowSpec using the different machine we need to know the Orchestrator's
>>>>>> host and port details, how do we do it?
>>>>>>
>>>>> We use d2 registry internally for it (if you dont already know about
>>>> it - search for RESTLI D2)
>>>>
>>>>
>>>>>
>>>>>> 2) Does FlowSpec creation creates a new Job deployment which can also
>>>>>> by copying the corresponding .pull or .job file in the gobblin 
>>>>>> distribution?
>>>>>>
>>>>> If you are saying that if you bundle a pull file in gobblin
>>>> distribution and create the same via FlowSpec would it mean the same thing,
>>>> then yes. Else I didnt understand the question.
>>>>
>>>>
>>>>>
>>>>>> 3) Since the master.out log gets created when starting a service, I
>>>>>> assume there could be a way to add more Orchestrators to the master that 
>>>>>> is
>>>>>> started. However I am not sure how to do that, can this be clarified?
>>>>>>
>>>>> Only one node acts as orchestrator and scheduler. Rest of the nodes
>>>> receive requests and pass them to master for scheduling and orchestrating
>>>> via Helix messages.
>>>>
>>>>
>>>>>
>>>>>> Please note that I have been looking at the older code, the git log
>>>>>> is follow.
>>>>>> ************************************************************
>>>>>> ***********************************
>>>>>> commit 755da9160cd91ea5ebcc752603ce1bffb74a75a1 (HEAD -> master,
>>>>>> origin/master, origin/HEAD)
>>>>>> Author: Kuai Yu <yukuai...@gmail.com>
>>>>>> Date:   Tue Apr 11 19:10:53 2017 -0700
>>>>>> ************************************************************
>>>>>> ***********************************
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Vicky
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to