Re: oak-run, diet and new module
On 10/02/2017 14:01, Thomas Mueller wrote: > Hi, > > What about moving benchmarks to a new module oak-benchmark? I'm not sure > if it's feasible, and not sure if it would reduce the size a lot. But it > is easy to understand, where oak-operations and oak-development is quite > fuzzy. Reflecting over the original proposal we came in jira in 2015[0], I thought it would have been better to keep oak-run for the production operations (compactions, etc) while moving benchmarks and similar to oak-devel (or whatever other name we may fancy). (0) https://issues.apache.org/jira/browse/OAK-3134 The story behind this is that after an initial investigation a year ago we discovered that largest part of oak-run size, was taken by libraries which we actually use in a "development phase" like, but not limited to, benchmark[1]. (1) https://issues.apache.org/jira/browse/OAK-3766 By simply moving away such code, that doesn't need to be deployed into maven, we are going to cut considerably the size in oak-run. Still the pain point around backport remains. Davide
Re: oak-run, diet and new module
Hi, On 10/02/17 10:09, "Francesco Mari"wrote: > As much as I like the proposal of slimming down oak-run, I think that > dividing oak-run in oak-operations and oak-development is the wrong > way to go. This kind of division is horizontal, since commands > pertaining to different persistence backends are grouped together > according to their roles. This division will not solve the problem of > feature bloat. These two modules will grow over time in the same way > that oak-run did. I fully agree here with Francesco. The artificial division of both parts won’t help here and some parts might be still in common. > > I'm more in favour of a vertical separation of oak-run. I explained > part of this idea in OAK-5437. I think it's more effective to split > oak-run in vertical slices, where each slice pertains to a persistence > layer (segment, mongo, etc.) or a well defined functional area > (indexing, security, etc.). This kind of separation would bring the > CLI code close to the main code they are working with. Changes in the > main code are more easily reflected in the CLI code, and the other way > around. It would also be easier to figure out which individual or > group of individuals is actively maintaining a certain piece of code. I think that the above approach is more flexible. What for me as developer or user is even better is that I have one tool that have all such things in one place with convenient access (pls look at git or docker tool). Git in fact has multiple separate binaries but they are integrated together so it is not visible for user (skipping some hard to understand parts of git). When it comes to developer side I think the more important is ability of loose coupling between different modules/components so they are quite easily testable (in separation) and they can work independently with minimal communication. I know this might be obvious but CLI tools aren’t using any frameworks that can help with that. CLI tools should be fast and simple like commands in UNIX world. Sorry to be elaborate here but I was working recently on a command line tool which has multiple stages and multiple options and they relate to each other, so I didn’t want to connect directly parts as it would be hard to test and understand them. So, I have decided to wrap them in a simple abstraction that will separate those layers (stages, options, commands etc). I have borrowed a UNIX philosophy to my tool internally: “do one thing but do it the best”, the same way in UNIX we have multiple little commands. I divided (in my case in Java) different fragments into completely independent components. I my case it was a dynamic pipeline constructed when the tool was starting: userInput > initializationOfTool | component1 | component2 | component3 > output where `userInput` is a set of options and switches + environmental variables if needed. The output might be just an exit code or something important that needs to be displayed at the end. In case of oak-run most of the operations are in repo (side effected). In reality in UNIX you might have implemented something like that: cat user-input.properties | pipelineComponent1 | pipelineComponent2 > resultsForFurtherProcessing Obviously, each component might cause side effects but I’m showing here a communication model for such simple CLI tooling that has multiple routines and options. The contract here is that: • each pipeline component can output on stderr (to user) – this is just for logging purposes (that’s the one channel like stderr) for debugging, • the second channel is for inter-communication between components (more structured pipe data) which I describe it below in more details. In my Java tool, I’ve constructed a very simple structure/type-safe map that is passed from previous component, and then it is processed by component and passed for further processing for other components. The best I think from this approach that might be beneficial here is that components are completely independent from each other. They’re passing a map (which represents different communication channels) and obviously, components can validate it before processing if it contains everything that is needed at that stage but this allows to divide such CLI tool in different fragments (no matter how big) and it allows you later to decompose bigger parts into smaller ones if needed. You can imagine, as an example, that one component might initialize or open repository, the second might catch it and do something else with it. Some other components in example might handle different options or arguments assuming that one of the communication channel will be a list of CLI options. The proper division and granularity is just a matter of concrete domain but the general approach is the same. The pipeline also might be variable and have different elements depending on user input. The elements might be added in the
Re: oak-run, diet and new module
As much as I like the proposal of slimming down oak-run, I think that dividing oak-run in oak-operations and oak-development is the wrong way to go. This kind of division is horizontal, since commands pertaining to different persistence backends are grouped together according to their roles. This division will not solve the problem of feature bloat. These two modules will grow over time in the same way that oak-run did. I'm more in favour of a vertical separation of oak-run. I explained part of this idea in OAK-5437. I think it's more effective to split oak-run in vertical slices, where each slice pertains to a persistence layer (segment, mongo, etc.) or a well defined functional area (indexing, security, etc.). This kind of separation would bring the CLI code close to the main code they are working with. Changes in the main code are more easily reflected in the CLI code, and the other way around. It would also be easier to figure out which individual or group of individuals is actively maintaining a certain piece of code. 2017-02-10 9:44 GMT+01:00 Angela Schreiber: > hi davide > > could you elaborate a bit on your proposal? from the names (oak-operations > and oak-development) it's not clear to me what code would go into which > module... also i am not sure about deleting oak-run. for the sake of > limiting impact (also when it comes to the backport you mention later on) > i would rather suggest to move out code that doesn't belong there and keep > stuff that more naturally fits into 'run': so, only one additional module > and no deletion. > > as far as backporting to all branches is concerned: that's for sure not > feasible for the benchmarks i have been putting into oak-run when > introducing new features and improvements. > > kind regards > angela > > On 09/02/17 20:28, "Davide Giannella" wrote: > >>hello team, >> >>while having a bit of time I resumed the topic grouped in the epic >>https://issues.apache.org/jira/browse/OAK-5599. >> >>Part of the discussion we already had in the past 1 or two years is that >>oak-run is big and begin to be a challenge during releases and the fact >>that we could split development functionalities from production tooling >>would allow us to remove quite a bunch of libraries from the jar >>deployed on mvn for production tooling and will leave the development >>one not deployed. >> >>Main scratching I have now is: assuming we proceed what about backports? >>So i thought the following: >> >>- main goal: create oak-operations and oak-development modules. >>Eventaully delete oak-run. >>- backport these on all the branches. Up to what version? Can we blindly >>backport all of the stuff? >>- what are the differences nowadays in oak-run between branches? >>Repository construction? others? >> >>Thoughts? >> >>Cheers >>Davide >
Re: oak-run, diet and new module
hi davide could you elaborate a bit on your proposal? from the names (oak-operations and oak-development) it's not clear to me what code would go into which module... also i am not sure about deleting oak-run. for the sake of limiting impact (also when it comes to the backport you mention later on) i would rather suggest to move out code that doesn't belong there and keep stuff that more naturally fits into 'run': so, only one additional module and no deletion. as far as backporting to all branches is concerned: that's for sure not feasible for the benchmarks i have been putting into oak-run when introducing new features and improvements. kind regards angela On 09/02/17 20:28, "Davide Giannella"wrote: >hello team, > >while having a bit of time I resumed the topic grouped in the epic >https://issues.apache.org/jira/browse/OAK-5599. > >Part of the discussion we already had in the past 1 or two years is that >oak-run is big and begin to be a challenge during releases and the fact >that we could split development functionalities from production tooling >would allow us to remove quite a bunch of libraries from the jar >deployed on mvn for production tooling and will leave the development >one not deployed. > >Main scratching I have now is: assuming we proceed what about backports? >So i thought the following: > >- main goal: create oak-operations and oak-development modules. >Eventaully delete oak-run. >- backport these on all the branches. Up to what version? Can we blindly >backport all of the stuff? >- what are the differences nowadays in oak-run between branches? >Repository construction? others? > >Thoughts? > >Cheers >Davide
oak-run, diet and new module
hello team, while having a bit of time I resumed the topic grouped in the epic https://issues.apache.org/jira/browse/OAK-5599. Part of the discussion we already had in the past 1 or two years is that oak-run is big and begin to be a challenge during releases and the fact that we could split development functionalities from production tooling would allow us to remove quite a bunch of libraries from the jar deployed on mvn for production tooling and will leave the development one not deployed. Main scratching I have now is: assuming we proceed what about backports? So i thought the following: - main goal: create oak-operations and oak-development modules. Eventaully delete oak-run. - backport these on all the branches. Up to what version? Can we blindly backport all of the stuff? - what are the differences nowadays in oak-run between branches? Repository construction? others? Thoughts? Cheers Davide