Re: oak-run, diet and new module

Arek Kita Fri, 10 Feb 2017 06:20:55 -0800

Hi,

On 10/02/17 10:09, "Francesco Mari" <mari.france...@gmail.com> wrote:


> As much as I like the proposal of slimming down oak-run, I think that
> dividing oak-run in oak-operations and oak-development is the wrong
> way to go. This kind of division is horizontal, since commands
> pertaining to different persistence backends are grouped together
> according to their roles. This division will not solve the problem of
> feature bloat. These two modules will grow over time in the same way
> that oak-run did.


I fully agree here with Francesco. The artificial division of both parts won’t 
help here and some parts might be still in common. 


> 
> I'm more in favour of a vertical separation of oak-run. I explained
> part of this idea in OAK-5437. I think it's more effective to split
> oak-run in vertical slices, where each slice pertains to a persistence
> layer (segment, mongo, etc.) or a well defined functional area
> (indexing, security, etc.). This kind of separation would bring the
> CLI code close to the main code they are working with. Changes in the
> main code are more easily reflected in the CLI code, and the other way
> around. It would also be easier to figure out which individual or
> group of individuals is actively maintaining a certain piece of code.


I think that the above approach is more flexible. 

What for me as developer or user is even better is that I have one tool that 
have all such things in one place with convenient access (pls look at git or 
docker tool). Git in fact has multiple separate binaries but they are 
integrated together so it is not visible for user (skipping some hard to 
understand parts of git).

When it comes to developer side I think the more important is ability of loose 
coupling between different modules/components so they are quite easily testable 
(in separation) and they can work independently with minimal communication. I 
know this might be obvious but CLI tools aren’t using any frameworks that can 
help with that. CLI tools should be fast and simple like commands in UNIX world.

Sorry to be elaborate here but I was working recently on a command line tool 
which has multiple stages and multiple options and they relate to each other, 
so I didn’t want to connect directly parts as it would be hard to test and 
understand them. So, I have decided to wrap them in a simple abstraction that 
will separate those layers (stages, options, commands etc).

I have borrowed a UNIX philosophy to my tool internally: “do one thing but do 
it the best”, the same way in UNIX we have multiple little commands. 
I divided (in my case in Java) different fragments into completely independent 
components.

I my case it was a dynamic pipeline constructed when the tool was starting:

userInput > initializationOfTool | component1 | component2 | component3 > output

where `userInput` is a set of options and switches + environmental variables if 
needed.
The output might be just an exit code or something important that needs to be 
displayed at the end. In case of oak-run most of the operations are in repo 
(side effected).


In reality in UNIX you might have implemented something like that:

cat user-input.properties | pipelineComponent1 | pipelineComponent2 > 
resultsForFurtherProcessing

Obviously, each component might cause side effects but I’m showing here a 
communication model for such simple CLI tooling that has multiple routines and 
options.

The contract here is that: 

• each pipeline component can output on stderr (to user) – this is just for 
logging purposes (that’s the one channel like stderr) for debugging,
• the second channel is for inter-communication between components (more 
structured pipe data) which I describe it below in more details.

In my Java tool, I’ve constructed a very simple structure/type-safe map that is 
passed from previous component, and then it is processed by component and 
passed for further processing for other components. The best I think from this 
approach that might be beneficial here is that components are completely 
independent from each other. They’re passing a map (which represents different 
communication channels) and obviously, components can validate it before 
processing if it contains everything that is needed at that stage but this 
allows to divide such CLI tool in different fragments (no matter how big) and 
it allows you later to decompose bigger parts into smaller ones if needed. 

You can imagine, as an example, that one component might initialize or open 
repository, the second might catch it and do something else with it. 
Some other components in example might handle different options or arguments 
assuming that one of the communication channel will be a list of CLI options.

The proper division and granularity is just a matter of concrete domain but the 
general approach is the same. 
The pipeline also might be variable and have different elements depending on 
user input. The elements might be added in the meantime.

All this should be invisible for users and developers if some changes are 
needed but at the same time the whole process is traceable when needed (you can 
trace the state of pipe data between components – especially useful for 
developers).

I think that this idea might be useful here at least to some point where number 
of components is < i.e. 100, but I haven’t reached such limit yet personally.


> 
> 2017-02-10 9:44 GMT+01:00 Angela Schreiber <anch...@adobe.com>:
…

>> i would rather suggest to move out code that doesn't belong there and keep
>> stuff that more naturally fits into 'run': so, only one additional module
>> and no deletion.
>>

Yes, for users this is especially important to keep all operations together no 
matter if they are dev-related or op-related. It needs to be well formed 
together with a good taxonomy etc. 

I would also not break to much user habits especially if they are used to 
invoke oak-run for all different stuff.

Kind regards,
Arek

Re: oak-run, diet and new module

Reply via email to