Re: oak-run, diet and new module

2017-02-14 Thread Davide Giannella
On 10/02/2017 14:01, Thomas Mueller wrote:
> Hi,
>
> What about moving benchmarks to a new module oak-benchmark? I'm not sure
> if it's feasible, and not sure if it would reduce the size a lot. But it
> is easy to understand, where oak-operations and oak-development is quite
> fuzzy.

Reflecting over the original proposal we came in jira in 2015[0], I
thought it would have been better to keep oak-run for the production
operations (compactions, etc) while moving benchmarks and similar to
oak-devel (or whatever other name we may fancy).

(0) https://issues.apache.org/jira/browse/OAK-3134

The story behind this is that after an initial investigation a year ago
we discovered that largest part of oak-run size, was taken by libraries
which we actually use in a "development phase" like, but not limited
to,  benchmark[1].

(1) https://issues.apache.org/jira/browse/OAK-3766

By simply moving away such code, that doesn't need to be deployed into
maven, we are going to cut considerably the size in oak-run.

Still the pain point around backport remains.

Davide




Re: oak-run, diet and new module

2017-02-10 Thread Arek Kita
Hi,

On 10/02/17 10:09, "Francesco Mari"  wrote:

> As much as I like the proposal of slimming down oak-run, I think that
> dividing oak-run in oak-operations and oak-development is the wrong
> way to go. This kind of division is horizontal, since commands
> pertaining to different persistence backends are grouped together
> according to their roles. This division will not solve the problem of
> feature bloat. These two modules will grow over time in the same way
> that oak-run did.


I fully agree here with Francesco. The artificial division of both parts won’t 
help here and some parts might be still in common. 


> 
> I'm more in favour of a vertical separation of oak-run. I explained
> part of this idea in OAK-5437. I think it's more effective to split
> oak-run in vertical slices, where each slice pertains to a persistence
> layer (segment, mongo, etc.) or a well defined functional area
> (indexing, security, etc.). This kind of separation would bring the
> CLI code close to the main code they are working with. Changes in the
> main code are more easily reflected in the CLI code, and the other way
> around. It would also be easier to figure out which individual or
> group of individuals is actively maintaining a certain piece of code.


I think that the above approach is more flexible. 

What for me as developer or user is even better is that I have one tool that 
have all such things in one place with convenient access (pls look at git or 
docker tool). Git in fact has multiple separate binaries but they are 
integrated together so it is not visible for user (skipping some hard to 
understand parts of git).

When it comes to developer side I think the more important is ability of loose 
coupling between different modules/components so they are quite easily testable 
(in separation) and they can work independently with minimal communication. I 
know this might be obvious but CLI tools aren’t using any frameworks that can 
help with that. CLI tools should be fast and simple like commands in UNIX world.

Sorry to be elaborate here but I was working recently on a command line tool 
which has multiple stages and multiple options and they relate to each other, 
so I didn’t want to connect directly parts as it would be hard to test and 
understand them. So, I have decided to wrap them in a simple abstraction that 
will separate those layers (stages, options, commands etc).

I have borrowed a UNIX philosophy to my tool internally: “do one thing but do 
it the best”, the same way in UNIX we have multiple little commands. 
I divided (in my case in Java) different fragments into completely independent 
components.

I my case it was a dynamic pipeline constructed when the tool was starting:

userInput > initializationOfTool | component1 | component2 | component3 > output

where `userInput` is a set of options and switches + environmental variables if 
needed.
The output might be just an exit code or something important that needs to be 
displayed at the end. In case of oak-run most of the operations are in repo 
(side effected).


In reality in UNIX you might have implemented something like that:

cat user-input.properties | pipelineComponent1 | pipelineComponent2 > 
resultsForFurtherProcessing

Obviously, each component might cause side effects but I’m showing here a 
communication model for such simple CLI tooling that has multiple routines and 
options.

The contract here is that: 

• each pipeline component can output on stderr (to user) – this is just for 
logging purposes (that’s the one channel like stderr) for debugging,
• the second channel is for inter-communication between components (more 
structured pipe data) which I describe it below in more details.

In my Java tool, I’ve constructed a very simple structure/type-safe map that is 
passed from previous component, and then it is processed by component and 
passed for further processing for other components. The best I think from this 
approach that might be beneficial here is that components are completely 
independent from each other. They’re passing a map (which represents different 
communication channels) and obviously, components can validate it before 
processing if it contains everything that is needed at that stage but this 
allows to divide such CLI tool in different fragments (no matter how big) and 
it allows you later to decompose bigger parts into smaller ones if needed. 

You can imagine, as an example, that one component might initialize or open 
repository, the second might catch it and do something else with it. 
Some other components in example might handle different options or arguments 
assuming that one of the communication channel will be a list of CLI options.

The proper division and granularity is just a matter of concrete domain but the 
general approach is the same. 
The pipeline also might be variable and have different elements depending on 
user input. The elements might be added in the 

Re: oak-run, diet and new module

2017-02-10 Thread Francesco Mari
As much as I like the proposal of slimming down oak-run, I think that
dividing oak-run in oak-operations and oak-development is the wrong
way to go. This kind of division is horizontal, since commands
pertaining to different persistence backends are grouped together
according to their roles. This division will not solve the problem of
feature bloat. These two modules will grow over time in the same way
that oak-run did.

I'm more in favour of a vertical separation of oak-run. I explained
part of this idea in OAK-5437. I think it's more effective to split
oak-run in vertical slices, where each slice pertains to a persistence
layer (segment, mongo, etc.) or a well defined functional area
(indexing, security, etc.). This kind of separation would bring the
CLI code close to the main code they are working with. Changes in the
main code are more easily reflected in the CLI code, and the other way
around. It would also be easier to figure out which individual or
group of individuals is actively maintaining a certain piece of code.

2017-02-10 9:44 GMT+01:00 Angela Schreiber :
> hi davide
>
> could you elaborate a bit on your proposal? from the names (oak-operations
> and oak-development) it's not clear to me what code would go into which
> module... also i am not sure about deleting oak-run. for the sake of
> limiting impact (also when it comes to the backport you mention later on)
> i would rather suggest to move out code that doesn't belong there and keep
> stuff that more naturally fits into 'run': so, only one additional module
> and no deletion.
>
> as far as backporting to all branches is concerned: that's for sure not
> feasible for the benchmarks i have been putting into oak-run when
> introducing new features and improvements.
>
> kind regards
> angela
>
> On 09/02/17 20:28, "Davide Giannella"  wrote:
>
>>hello team,
>>
>>while having a bit of time I resumed the topic grouped in the epic
>>https://issues.apache.org/jira/browse/OAK-5599.
>>
>>Part of the discussion we already had in the past 1 or two years is that
>>oak-run is big and begin to be a challenge during releases and the fact
>>that we could split development functionalities from production tooling
>>would allow us to remove quite a bunch of libraries from the jar
>>deployed on mvn for production tooling and will leave the development
>>one not deployed.
>>
>>Main scratching I have now is: assuming we proceed what about backports?
>>So i thought the following:
>>
>>- main goal: create oak-operations and oak-development modules.
>>Eventaully delete oak-run.
>>- backport these on all the branches. Up to what version? Can we blindly
>>backport all of the stuff?
>>- what are the differences nowadays in oak-run between branches?
>>Repository construction? others?
>>
>>Thoughts?
>>
>>Cheers
>>Davide
>


Re: oak-run, diet and new module

2017-02-10 Thread Angela Schreiber
hi davide

could you elaborate a bit on your proposal? from the names (oak-operations
and oak-development) it's not clear to me what code would go into which
module... also i am not sure about deleting oak-run. for the sake of
limiting impact (also when it comes to the backport you mention later on)
i would rather suggest to move out code that doesn't belong there and keep
stuff that more naturally fits into 'run': so, only one additional module
and no deletion.

as far as backporting to all branches is concerned: that's for sure not
feasible for the benchmarks i have been putting into oak-run when
introducing new features and improvements.

kind regards
angela

On 09/02/17 20:28, "Davide Giannella"  wrote:

>hello team,
>
>while having a bit of time I resumed the topic grouped in the epic
>https://issues.apache.org/jira/browse/OAK-5599.
>
>Part of the discussion we already had in the past 1 or two years is that
>oak-run is big and begin to be a challenge during releases and the fact
>that we could split development functionalities from production tooling
>would allow us to remove quite a bunch of libraries from the jar
>deployed on mvn for production tooling and will leave the development
>one not deployed.
>
>Main scratching I have now is: assuming we proceed what about backports?
>So i thought the following:
>
>- main goal: create oak-operations and oak-development modules.
>Eventaully delete oak-run.
>- backport these on all the branches. Up to what version? Can we blindly
>backport all of the stuff?
>- what are the differences nowadays in oak-run between branches?
>Repository construction? others?
>
>Thoughts?
>
>Cheers
>Davide



oak-run, diet and new module

2017-02-09 Thread Davide Giannella
hello team,

while having a bit of time I resumed the topic grouped in the epic 
https://issues.apache.org/jira/browse/OAK-5599.

Part of the discussion we already had in the past 1 or two years is that
oak-run is big and begin to be a challenge during releases and the fact
that we could split development functionalities from production tooling
would allow us to remove quite a bunch of libraries from the jar
deployed on mvn for production tooling and will leave the development
one not deployed.

Main scratching I have now is: assuming we proceed what about backports?
So i thought the following:

- main goal: create oak-operations and oak-development modules.
Eventaully delete oak-run.
- backport these on all the branches. Up to what version? Can we blindly
backport all of the stuff?
- what are the differences nowadays in oak-run between branches?
Repository construction? others?

Thoughts?

Cheers
Davide