Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-11-28 Thread Łukasz Dywicki
I can put few sentences in this context, because my involvement into 
Cassandra mailing list started with looking at build system. It was 8 or 
maybe 9 years ago.


Simple fact that Cassandra core team do understand their chain of Ant 
task execution doesn't automatically make all other contributors and 
interested parties understand it. More over, another person outside of 
present core team with prior knowledge of Ant, will require some amount 
of time to dive into task definitions and how they are being invoked. 
There is a reason why usage of Ant fade away over time, its because 
maintenance of builds such Cassandra takes with time more and more effort.
Maven is built around convention over configuration, most of external 
contributors who worked with it before will come with enough of 
knowledge to understand what your build is doing and how. A lot of hate 
people feel to Maven is caused by its limitations which they try to 
overcome for some reason. Gradle is closer in its flexibility to Ant and 
gives much more power, but requires a bit of though to not complicate 
things too much.
Switching to standardized build requires a work, but usually it improves 
architecture and separation of concerns. System you build might have a 
complex logic because of business, standards or technology requirements, 
but your build is not part of that complexity. If you assume it is then 
you just double amount of troubles as your technology will depend on 
over-engineered pipeline. One of major pain points I found back in 2013 
or 2014 was that Cassandra had a lot of cross dependencies between 
packages making it impossible to simplify build in first place. Today I 
am even scared to look at it now. Whether these cross dependencies are 
needed or not - I was not able to answer myself, yet most of properly 
designed systems tend to build directed dependencies free of cycles.


From looking at history of root build.xml I see 75 contributors who 
touched it. It is not a small number, I would say that for project witch 
such small amount of external dependencies it is a lot.
It is still a fair question if you will benefit from using other build 
tool. Yet I could put opposite question - where you would be, if you 
would make a change 7 years ago? How many hours of tweaking of Ant you 
would save?


With regard to linters I saw it used in openHAB, I know Apache Maven is 
looking to adopt an unified code style as well. In both cases ecosystem 
of these projects is quite large and amount of people doing 
contributions is larger than core team. Yet, even if cassandra is 
smaller in terms of people contributing patches, it has serious 
enterprise use cases. Spotless, the-last-npe and other build time 
plugins can assure that you never get a PR or patch with amended 
whitespaces or code formatting. Unnecessary parts will be fixed by tool 
so patch will contain only needed change. Whether you will enforce it by 
CI or by hand, that's other question. For sure staying with Ant doesn't


Best regards,
Łukasz


On 28.11.2022 22:19, Benedict wrote:
Scott makes some valid points about legitimate benefits. I personally 
doubt the high upfront cost of migration will take less than a decade to 
pay back in time saved managing shims. But, it’s a tangible 
justification. Conveniently the bulk of my contributions are also at 
Scott’s prerogative, so if he’s fine with me (and others) wasting their 
time battling Gradle or Maven, or losing time to the migration, then I 
think my complaint is functionally neutered.


I think, though, that those pushing such a disruptive change into others 
had better work very darn hard to deliver a smooth experience.


I tend to find that maintaining our current Ant build is a big waste 
of my time, and that every time I need to go to this layer its far 
more brittle than it should be


Whereas I find Ant an absolute pleasure, and Gradle a nightmare, and 
already regret using it for Accord. I wasted more than a day just trying 
to get some test artefacts in one module exposed in another, and 
eventually gave up. I have made dozens of forays into our Ant build and 
*never* abandoned my goal because I couldn’t accomplish it. I think 
people are really glossing over all the pain other build systems bring - 
even without our complex build requirements.


If you’re going to make the project adopt your preferred build system, 
you become responsible for the experience of everyone using it. Make 
sure you’re ready for that.



I think those pushing such a disruptive change on others had better 
work very darn hard to deliver a smooth experience.




On 28 Nov 2022, at 20:18, David Capwell  wrote:

I am strong +1 to new linters, I have been working on SpotBugs but 
not sent a patch due to sickness and holidays…


About the check style as the source of truth for the style guides, I 
am +1 to this as well… I feel that wiki is a bad place for this and we 
can use the check style file to generate the wiki text (no idea 

Re: DataStax role in Cassandra and the ASF

2016-11-05 Thread Łukasz Dywicki
her
> a gatekeeper or shooting down your proposal. I'm just attempting to explain
> my perception **of the view of the existing contributors*.”
I copied links to threaded copy of mailing list so everyone could easily go 
over message exchange and check it out. I quoted parts which were relevant from 
my point of view, after already having discussion with argument exchange 
similar to above. If you are selectively choose mail parts to talk about why 
you won’t defend now a second thread with question about mavenizing build which 
ended up with just "No”? Isn't it shutting down entire discussion?

> You indicate that the decisions made by the PMC force other companies to
> run forks (citing Stratio as an example). Here, again, history doesn't just
> find this unsupportable, but patently untrue. Time and time again the PMC
> made the decision to include code specifically so that Stratio wouldn't
> need to fork.
I don’t track all discussions happening on mailing lists on daily basis. 
Aleksey already made point about that so I asked privately Andreas who was CCed 
in his reply to also make an record of his obviously positive collaboration 
with project. You also share your positive feedback about collaboration with 
project in later part of your mail which is great. After all I might be just 
one frustrated guy who been incubated for over a year working hard on making 
board reaction just to show out some wrongly planned rebellion.

> You indicate that discontinuation of thrift was seen by outsiders as
> marketing driven. The discontinuation of thrift is technical in nature -
> it's implementation has a ton of edge cases, it's existence introduces
> risk. It's more code to maintain, and it's now less performant than the
> native CQL. The preference for CQL over thrift evolved over time, it's
> easier for newcomers, it's easier for most people to reason about, and the
> 3.0 engine (ticket 8099) optimized storage for CQL, moving thrift to second
> class status. This isn't marketing, this is tech. The communication may
> have been poor (though to be fair, it was discussed in detail on various
> JIRA tickets, which is sent to various mailing lists, so it "happened" in
> the Apache sense).
You bring CQL as example saying that now it is more performant than thrift. 
This means that you made great investment in it over past years. You did spend 
great amount of time on that I believe. Wouldn't be great if people who are not 
dedicated to more important parts could focus on infrastructure related changes 
while others worked on project core and performance?

> If this is really an issue you brought to "a friend at ASF" as evidence of
> misconduct by the PMC at the time, which is hinted at in the fact that you
> felt called out by that insinuation in Kelly's original post, the fact that
> it's wrong on so many levels AND the fact that I see no evidence that
> anyone did any meaningful research to understand such a gross
> mischaracterization of "control" is really troubling.
I don’t know who from Apache board members could be my friend, but whole 
discussion was already started long time before I saw it on twitter. I won’t 
play any finger pointing in this case cause washing dirty linen in public was 
already going on before I even could send (per Chris suggestion) my mail to 
appropriate private mailing lists. It already went viral here without even 
myself showing up.

I understand that there might be a tension between board and project PMCs, I’ve 
seen that before, but I can’t get rid of impression that some people are trying 
to present it as personal conflict while it’s not like that. I believe that 
common language will be found, as always been, when this attitude will be 
stripped down.

Cheers,
Łukasz


> 
> On Fri, Nov 4, 2016 at 5:03 PM, Łukasz Dywicki <l...@code-house.org> wrote:
> 
>> Good evening,
>> I feel myself a bit called to table by both Kelly and Chris. Thing is I
>> don’t know personally nor have any relationship with both of you. I’m not
>> even ASF member. My tweet was simply reaction for Kelly complaints about
>> ASF punishing out DataStax. Kelly timeline also contained statement such
>> "forming a long term strategy to grow diversity around” which reminded me
>> my attempts to collaborate on Cassandra and Tinkerpop projects to grow such
>> diversity. I collected message links and quotes and put it into gist who
>> could be read by anyone:
>> https://gist.github.com/splatch/aebe4ad4d127922642bee0dc9a8b1ec1
>> 
>> I don’t want to bring now these topics back and disscuss technical stuff
>> over again. It happened to me in the past to refuse (or vote against) some
>> change proposals in other Apache projects I am involved. I was on the other
>> ("bad guy") side multip

Re: [discuss] Modernization of Cassandra build system

2015-04-13 Thread Łukasz Dywicki
 with no stress, making 
people use it for tests even if in production they use different messaging 
provider. Cause it’s dead easy.
By taking a look on things such netty or jackson json processor which consisted 
just two or three modules in 1.x version you can find fasterxml-jackson 2.x 
continuing library evolution in much wider way. It does provide more 
customizable approach, supports pluggable data formats, data types and so on. 
Library users did suffer a bit from changes, package renaming and all crazy 
stuff which was going on, but now only legacy projects are dependant on old 1.x 
version.
Please don’t get me wrong - I don't want to confront library with database - I 
am just showing an approach which is affecting popular software. Also as 
mentioned above - even entire systems which are older and has similar 
complexity level such Cassandra are making better these days than you. All 
because they have serval jars more. From assembly point of view, for users 
which just download ZIP and unpack it - it doesn’t change anything if you have 
cassandra-all only or devided it into 10 pieces, but from developers point of 
view it makes huge change because these people can decide what parts of 
cassandra they actually need and in which configuration.

Kind regards,
Lukasz

 On Sat, Apr 11, 2015 at 11:12 AM, Łukasz Dywicki l...@code-house.org
 wrote:
 
 Sorry for not coming back to topic for long time.
 
 You are right that what Cassandra project have currently - does work and
 keeping package scoping discipline in such big development community as
 Cassandra is clearly impossible without tool support (if you insist to keep
 ant please try to separate javac tasks for logical parts in current build
 to verify that). I clearly pointed out that it doesn’t work in reliable way
 causing troubles with artifacts uploaded to maven central. As I briefly
 counted in my ealier mail there was 116 issues related to artifacts
 published by build process. It is a lot and these changes requires another
 mainanance releases to fix for example one or another bytecode level
 dependency causing NoClassDefErrors with invalid artifacts. According to
 some recordings from DataStax there is a plan to support in Cassandra
 multiple kinds of store - document, graph so it won’t get easier with the
 time but rather harder - ask yourself do you really want to mess all these
 things together?
 
 Starting from 2.x Cassandra supports triggers but writing even a simplest
 trigger which will drop a log message or publish UDP packet requires entire
 cassandra and all it’s dependencies to be present during development.
 Fact that everything sits in one big ant build.xml is caused by troubles
 generated by ant itself to support multiple build modules, placeholders and
 so on, not because it’s handsome to do such.
 
 Modernization of build and internal dependencies is not something which
 brings huge benefit in first run cause now your frontend is CQL, however it
 gives real boost when it comes to community donations, tool development, or
 even debugging. Sadly keeping current Ant build is silent agreement to keep
 mess internally and rickety architecture of project. Ant was already legacy
 tool when Cassandra has been launched. The longer you will stay with it the
 more troubles you will get with it over time.
 
 Kind regards,
 Lukasz
 
 
 Wiadomość napisana przez Robert Stupp sn...@snazy.de w dniu 2 kwi
 2015, o godz. 14:51:
 
 TL;DR - Benedict is right.
 
 IMO Maven is a nice, straight-forward tool if you know what you’re doing
 and start on a _new_ project.
 But Maven easily becomes a pita if you want to do something that’s not
 supported out-of-the-box.
 I bet that Maven would just not work for C* source tree with all the
 little nice features that C*’s build.xml offers (just look at the scripted
 stuff in build.xml).
 
 Eventually gradle would be an option; I proposed to switch to gradle
 several months ago. Same story (although gradle is better than Maven ;) ).
 But… you need to know that build.xml is not just used to build the code
 and artifacts. It is also used in CI, ccm, cstar-perf and a some other
 custom systems that exist and just work. So - if we would exchange ant with
 something else, it would force a lot of effort to change several tools and
 systems. And there must be a guarantee that everything works like it did
 before.
 
 Regarding IDEs: i’m using IDEA every day and it works like a charm with
 C*. Eclipse is ”supported natively” by ”ant generate-eclipse-files”. TBH I
 don’t know NetBeans.
 
 As Benedict pointed out, the code has improved and still improves a lot
 - in structure, in inline-doc, in nomenclature and whatever else. As soon
 as we can get rid of Thrift in the tree, there’s another big opportunity to
 cleanup more stuff.
 
 TBH I don’t think that (beside the tools) there would be a need to
 generate multiple artifacts for C* daemon - you can do ”separation of
 concerns” (via packages) even with discipline

Re: [discuss] Modernization of Cassandra build system

2015-04-11 Thread Łukasz Dywicki
 a
 recent bootcamp we've run for both internal and external contributors
 http://www.datastax.com/dev/blog/deep-into-cassandra-internals.
 
 The code structure would be great to modularise, but the reality is that it
 is not currently modular. There are no good clear dividing lines for much
 of the project. The problem with refactoring the entire codebase to create
 separate projects is that it is a significant undertaking that makes
 maintenance of the project across versions significantly more costly. This
 create a net drag on all productivity in the project. Such a major change
 requires strong consensus, and strong evidence justifying it. So the
 question is: would this create more new work than it loses? The evidence
 isn't there that it would. It might, but I personally guess that it would
 not, judging by the results of our other attempts to drive up contributions
 to the project. Perhaps we can have a wider dialogue about the endeavour,
 though, and see if a consensus can in fact be built.
 
 
 
 On Thu, Apr 2, 2015 at 9:31 AM, Pierre Devops pierredev...@gmail.com
 wrote:
 
 Hi all,
 
 Not a cassandra contributor here, but I'm working on the cassandra sources
 too.
 
 This big cassandra source root caused me trouble too, firstly it was not
 easy to import in an IDE, try to import cassandra sources in netbeans, it's
 a headcache.
 
 It would be great if we had more small modules/projects in separate POM. It
 will be more easier to work on small part of the project, and as a
 consequences, I'm sure you will have more external contribution to this
 project.
 
 I know cassandra devs are used to ant build model, but it's like a thread I
 opened about updated and more complete documentation about sstable
 structures. I got answer that it was not needed to understand how to use
 Cassandra, and the only way to learn about that is to rtfcode. Because
 people working on cassandra already know how sstable structure are, it's
 not needed to provide up to date documentation.
 So it will take me a very long time to read and understand all the
 serialization code in cassandra to understand the sttable structure before
 I can work on the code. Up to date documentation about internals would have
 gave me the knowledge I need to contribute much quicker.
 
 Here we have the same problem, we have a complex non modular build system,
 and core cassandra dev are used to it, so it's not needed to make something
 more flexible, even if it could facilite external contribution.
 
 
 
 2015-03-31 23:42 GMT+02:00 Benedict Elliott Smith 
 belliottsm...@datastax.com:
 
 I think the problem is everyone currently contributing is comfortable
 with
 ant, and as much as it is imperfect, it isn't clear maven is going to be
 better. Having the requisite maven functionality linked under the hood
 doesn't seem particularly preferable to the inverse. The status quo has
 the
 bonus of zero upheaval for the project and its contributors, though, so
 it
 would have to be a very clear win to justify the change in my opinion.
 
 
 On Tue, Mar 31, 2015 at 10:24 PM, Łukasz Dywicki l...@code-house.org
 wrote:
 
 Hey Tyler,
 Thank you very much for coming back. I already lost faith that I will
 get
 reply. :-) I am fine with code relocations. Moving constants into one
 place
 where they cause no circular dependencies is cool, I’m all for doing
 such
 thing.
 
 Currently Cassandra uses ant for doing some of maven functionalities
 (such
 deploying POM.xml into repositories with dependency information), it
 uses
 also maven type of artifact repositories. This can be easily flipped.
 Maven
 can call ant tasks for these parts which can not be made with existing
 maven plugins. Here is simplest example:
 http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin 
 http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin - you can
 see
 ant task definition embedded in maven pom.xml.
 
 Most of things can be made at this moment via maven plugins:
 apache-rat-plugin:
 
 http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
 
 
 http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
 maven-thrift-plugin:
 
 
 http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
 
 
 
 http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
 
 antlr4-maven-plugin:
 http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5 
 http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5
 or
 antlr3-maven-plugin:
 http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2
 
 http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2
 maven-gpg-plugin:
 
 
 http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
 
 
 
 http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
 
 maven-cobertura-plugin:
 http://mojo.codehaus.org/cobertura-maven-plugin/
 
 http://mojo.codehaus.org/cobertura-maven-plugin

Re: [discuss] Modernization of Cassandra build system

2015-03-31 Thread Łukasz Dywicki
Hey Tyler,
Thank you very much for coming back. I already lost faith that I will get 
reply. :-) I am fine with code relocations. Moving constants into one place 
where they cause no circular dependencies is cool, I’m all for doing such thing.

Currently Cassandra uses ant for doing some of maven functionalities (such 
deploying POM.xml into repositories with dependency information), it uses also 
maven type of artifact repositories. This can be easily flipped. Maven can call 
ant tasks for these parts which can not be made with existing maven plugins. 
Here is simplest example: 
http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin 
http://docs.codehaus.org/display/MAVENUSER/Antrun+Plugin - you can see ant 
task definition embedded in maven pom.xml.

Most of things can be made at this moment via maven plugins:
apache-rat-plugin: 
http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11 
http://mvnrepository.com/artifact/org.apache.rat/apache-rat-plugin/0.11
maven-thrift-plugin: 
http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
 
http://mvnrepository.com/artifact/org.apache.thrift.tools/maven-thrift-plugin/0.1.11
antlr4-maven-plugin: 
http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5 
http://mvnrepository.com/artifact/org.antlr/antlr4-maven-plugin/4.5 or
antlr3-maven-plugin: 
http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2 
http://mvnrepository.com/artifact/org.antlr/antlr3-maven-plugin/3.5.2
maven-gpg-plugin: 
http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6 
http://mvnrepository.com/artifact/org.apache.maven.plugins/maven-gpg-plugin/1.6
maven-cobertura-plugin: http://mojo.codehaus.org/cobertura-maven-plugin/ 
http://mojo.codehaus.org/cobertura-maven-plugin/ (but these days jacoco with 
java agent instrumentation perfoms better)
.. and so on

I already made some evaluation of impact and it is big. Code has to be 
separated into different source roots. It’s not easy even for keeping current 
artifact structure: cassandra-all, cassandra-thrift and clientutil (cause of 
cyclic dependencies). What I can do is prepare of these src roots with 
dependencies which are declared for them and push that to my cassandra fork so 
you will be able to verify that and continue with relocations if you will like 
new build. Creating new modules (source roots) with maven is simple so you 
could possibly extract more than these 3 predefined artifacts/package roots.
Just let me know if you are interested.

Kind regards,
Lukasz


 Wiadomość napisana przez Tyler Hobbs ty...@datastax.com w dniu 31 mar 2015, 
 o godz. 21:57:
 
 Hi Łukasz,
 
 I'm not very familiar with the build system, but I'll try to respond.
 
 The Serializer dependencies on org.apache.cassandra.transport are almost
 certainly uses of Server.CURRENT_VERSION and Server.VERSION_3.  These are
 constants that represent the native protocol version in use, which affects
 how certain types are serialized.  These constants could easily be moved.
 
 The o.a.c.marshal dependency in MapSerializer is on AbstractType, but could
 easily be replaced with java.util.Comparator.
 
 In any case, I'm not necessarily opposed to improving the build system to
 make these errors more apparent.  Would your proposal still allow us to
 build with ant (and just change the way those artifacts are built)?
 
 On Tue, Mar 24, 2015 at 7:58 PM, Łukasz Dywicki l...@code-house.org 
 mailto:l...@code-house.org wrote:
 
 Dear cassandra commiters and development process followers,
 I would like to bring an important topic off build process of cassandra. I
 am an external user from community point of view, however I been walking
 around various  projects close to cassandra over past year or even more.
 What is worrying me a lot is how cassandra is publishing artifacts and how
 many problems are reported due that.
 
 First of all - I want to note that I am not born enemy of Ant itself. I
 never used it. I am also aware of problems with custom builds made with
 Maven, however I don’t really want to discuss any particular replacement,
 yet I want to note that Cassandra JIRA project contains about 116 issues
 related somehow to maven (http://bit.ly/1GRoXl5 http://bit.ly/1GRoXl5 
 http://bit.ly/1GRoXl5 http://bit.ly/1GRoXl5,
 project=CASSANDRA, text ~ maven). Depends on the point of view it might be
 a lot or a little. By simple statistics it is around 21 issues a year or
 almost 2 issues a month, many of them breaking maintanance/major releases
 from user point of view. From other hand it’s not bad considering how
 project is being built.
 
 Current structure has a very big disadvantage - ONE source root for
 multiple artifacts published in maven repositories and copying classes to
 jar AFTER they are compiled. Obviously ant copy task doesn’t follow import
 statements and does not include dependant classes. For example just by
 making test relocations and extraction of clientutil

[discuss] Modernization of Cassandra build system

2015-03-24 Thread Łukasz Dywicki
Dear cassandra commiters and development process followers,
I would like to bring an important topic off build process of cassandra. I am 
an external user from community point of view, however I been walking around 
various  projects close to cassandra over past year or even more. What is 
worrying me a lot is how cassandra is publishing artifacts and how many 
problems are reported due that.

First of all - I want to note that I am not born enemy of Ant itself. I never 
used it. I am also aware of problems with custom builds made with Maven, 
however I don’t really want to discuss any particular replacement, yet I want 
to note that Cassandra JIRA project contains about 116 issues related somehow 
to maven (http://bit.ly/1GRoXl5 http://bit.ly/1GRoXl5, project=CASSANDRA, 
text ~ maven). Depends on the point of view it might be a lot or a little. By 
simple statistics it is around 21 issues a year or almost 2 issues a month, 
many of them breaking maintanance/major releases from user point of view. From 
other hand it’s not bad considering how project is being built.

Current structure has a very big disadvantage - ONE source root for multiple 
artifacts published in maven repositories and copying classes to jar AFTER they 
are compiled. Obviously ant copy task doesn’t follow import statements and does 
not include dependant classes. For example just by making test relocations and 
extraction of clientutil jar on master branch into separate source root I have 
found a bug where ListSerializer depends on org.apache.cassandra.transpor 
package. More over clientutil (MapSerializer) does depends on 
org.apache.cassandra.db.marshal package leading to the fact that it can not be 
used without cassandra-all present at classpath.
Luckily for cassandra CQL as a new interface reduces thrift and clientutil 
usage reducing amount of issues reported around these, however this just hides 
a real problem in previous paragraph. I have found a handy tool and made a 
graph of circular dependencies in cassandra-all.jar. Graph of results can found 
here: http://grab.by/FRnO http://grab.by/FRnO. As you can see this graph has 
multiple levels and solving it is not a simple task. I am afraid a current way 
of building and packaging cassandra can create huge hiccups when it will come 
to code rafactorings cause entire cassandra will become a house of cards.
Restructuring project into smaller pieces is also beneficiary for community 
since solving bugs in smaller units is definitelly easier.

At the end of this mail I would like to propose moving Cassandra build system 
forward, regardless of tool which will be choosen for it. Personally I can 
volunteer in maven related changes to extract cassandra-thrift, 
cassandra-clientutil and cassandra-all to make regular maven build. It might be 
seen as a switch from one big XML into couple smaller. :-) All this depends on 
Cassandra developers decission to devide source roots or not.

Kind regards,
Łukasz Dywicki
—
l...@code-house.org
Twitter: ldywicki
Blog: http://dywicki.pl
Code-House - http://code-house.org



Configuration of network connectors

2013-07-09 Thread Łukasz Dywicki
Hello,
First of all I would like to say hello to cassandra user and developer 
community. :)

I write because we are using Cassandra in our unit tests and we have some 
troubles with network connectivity. We ca not run multiple cassandra instances 
during tests because we would need to randomize configuration of port and so 
on. For now if we try to fork our tests we get address already in use on one 
from two ports - native or thrift. In other apache projects we can VM 
connectors (ActiveMQ, Camel, Mina) based on in-memory queue. I took some time 
to see how CassandraDaemon starts servers and it's kinda of hardcoded. I 
thought about changing configuration to be more like:

servers:
  - class org.apache.cassandra.thrift.ThriftServer
  - class org.apache.cassandra.transport.Server

Then we will be able to disable these servers for unit tests:
servers:
  - class org.apache.cassandra.vm.VmServer

This requires some small changes in daemon code and client libraries. I'm not 
really deeply involved in cassandra stuff so I don't know the internal 
architecture and implications thus I look forward for you to discuss this topic.

Cheers,
Łukasz Dywicki
--
l...@code-house.org
Twitter: ldywicki
Blog: http://dywicki.pl
Code-House - http://code-house.org



Re: Configuration of network connectors

2013-07-09 Thread Łukasz Dywicki
Jeremy,
Sadly it does not cover our case. We have unit tests and we want to test really 
basic things like mappings of data contained in cassandra to our model. For 
that we don't need cluster at all because in unit tests we don't want to test 
data distribution. We also would like to run everything in JVM, thus CCM 
written in Python is not really what we need.
What we are looking for is minimal cassandra set up which could be embedded and 
used concurrently multiple times. For example we now use CassandraUnit:

@Rule
public CassandraUnit unit = new CassandraUnit(new EmptyDataSet(), 
embedded-cassandra.yaml);

@Test
public void fistTest() {
// do something with data
}

@Test
public void secondTest() {
// do something else
}

In this set up JUnit will launch new CassandraDaemon for every test. If we set 
FORK_MODE per test then we may have two cassandra instances running at the same 
time. First test which launch CassandraDaemon will pass, second may fail due 
port usage conflict. That's why we thought about testing without network layer. 
This can save some time. It would be great because for some older hardware used 
by our developers it takes up to 9 minutes to run build with all unit tests. 
Some of this time is consumed by startup and shutdown of cassandra.

Cheers,
Łukasz Dywicki
--
l...@code-house.org
Twitter: ldywicki
Blog: http://dywicki.pl
Code-House - http://code-house.org

Wiadomość napisana przez Jeremy Hanna jeremy.hanna1...@gmail.com w dniu 9 lip 
2013, o godz. 15:22:

 Have you seen https://github.com/pcmanus/ccm as described in 
 http://www.datastax.com/dev/blog/ccm-a-development-tool-for-creating-local-cassandra-clusters
  or does that not fit your use case?
 
 On 9 Jul 2013, at 14:02, Łukasz Dywicki l...@code-house.org wrote:
 
 Hello,
 First of all I would like to say hello to cassandra user and developer 
 community. :)
 
 I write because we are using Cassandra in our unit tests and we have some 
 troubles with network connectivity. We ca not run multiple cassandra 
 instances during tests because we would need to randomize configuration of 
 port and so on. For now if we try to fork our tests we get address already 
 in use on one from two ports - native or thrift. In other apache projects 
 we can VM connectors (ActiveMQ, Camel, Mina) based on in-memory queue. I 
 took some time to see how CassandraDaemon starts servers and it's kinda of 
 hardcoded. I thought about changing configuration to be more like:
 
 servers:
 - class org.apache.cassandra.thrift.ThriftServer
 - class org.apache.cassandra.transport.Server
 
 Then we will be able to disable these servers for unit tests:
 servers:
 - class org.apache.cassandra.vm.VmServer
 
 This requires some small changes in daemon code and client libraries. I'm 
 not really deeply involved in cassandra stuff so I don't know the internal 
 architecture and implications thus I look forward for you to discuss this 
 topic.
 
 Cheers,
 Łukasz Dywicki
 --
 l...@code-house.org
 Twitter: ldywicki
 Blog: http://dywicki.pl
 Code-House - http://code-house.org