Hi everyone, Jorn and I have had a little discussion about a topic I brought up with him that I'd like to get everyone's thoughts on. I'm including our conversation below, but the gist of it is this:
- I've been switching to development in Scala. At this point, I personally see little point in coding in Java given that Scala is available (and very very nice) and it plays very well with existing Java -- I'm very happy with this for several projects I'm working on, including TextGrounder<http://code.google.com/p/textgrounder/>and Junto <http://code.google.com/p/junto/>. So, I'd like to see Scala making its way into OpenNLP. - We need to reorganize the maxent code into the new package opennlp.ml - I'd like to create the new package, retaining the Java code as is, make a first release, and then allow Scala code to mix in with the Java from that point on - A number of issues come up with this, including using another build tool like SBT instead of Maven and ensuring we are Apache compliant and so on. So, this is really just a feeler to see what you all think and see if you have any enthusiasm, reservations or suggestions. Thanks! Jason Forwarded conversation Subject: opennlp.ml + Scala? ------------------------ From: *Jason Baldridge* <[email protected]> Date: Mon, Mar 21, 2011 at 1:28 PM To: Jörn Kottmann <[email protected]> Hi Jorn, I've changed over to doing nearly all my coding in Scala, generally transitioning Java codebases to Scala by writing everything new in Scala and using the existing Java classes as they are. I would like to do this as part of the new opennlp.ml, as I'm not inclined to write any new Java code unless absolutely necessary, and I would very much like to create that new and improved package. What do you think of this? Jason -- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com ---------- From: *Jörn Kottmann* <[email protected]> Date: Mon, Mar 21, 2011 at 2:24 PM To: Jason Baldridge <[email protected]> Hmm, yeah, if we would rewrite it I think it is something we could consider, but in our case we just need to do some reshaping of the existing code and a little refactoring here and there. That is one reason I believe we should be conservative and not use it in this case. Other issues I see is that it will be a message to the mahout people that we do not want to collaborate, which in fact I believe is something we should do to get map reduce training support one day. The people in the team might not be familiar with scala, which could further limit the man power which is available for the re-factoring. Just my 2 cents. I believe we should also do the maxent refactoring slowly and first do everything inside the current structures, and then when everythign is in place do the last changes which break backward compatibilty. Anyway we should start a discussion about the future of OpenNLP, which features do we want to implement for the next few versions? Which new components would be nice to have? I believe there are quit some people who are willing to pick up tasks but are simply not aware about the possibility. Jörn ---------- From: *Jason Baldridge* <[email protected]> Date: Mon, Mar 21, 2011 at 3:29 PM To: Jörn Kottmann <[email protected]> Hmm... what if we did the first refactoring into opennlp.ml with pure Java but the new package structure, then make a first release and then start bringing in Scala? Good points. However, I'm finding that Scala plays *very* nicely with Java (including allowing Java to use Scala classes), so that could be mostly transparent to users of the package, maintaining the API pretty much as it is. So, I *think* we could continue to play nicely with Mahout folks. Also, after coding for a while in Scala, I can't help but feel that Java the language is dead, while the JVM lives gloriously on. :) I think there is a lot of momentum to Scala in general, and my feeling is that it is very friendly for Java programmers. (Though I had experience in functional programming before, so a lot of concepts came easily to me that could be more unusual for others.) What do you mean by "current structures"? Do you mean to keep the classes as they are now, but just switch the package organization first? Yes, perhaps we should do that once the release is all done? (Thanks for all your hard work on that, btw!) Also, perhaps we should bring up the Scala question on the mailing list? I wanted to ask you first to see if you had strong objections first, but since you don't it might be good to sound out the community. Jason ---------- From: *Jörn Kottmann* <[email protected]> Date: Mon, Mar 21, 2011 at 3:38 PM To: Jason Baldridge <[email protected]> I actually think just doing it for maxent/ml doesn't really makes sense, if we want to switch the programming language its for entire code base. Then we speak about the migration of like 400 classes from java to scala, does that really makes sense? Just doing a little scala doesn't sounds reasonable for me. Sure move it to the mailing list. Jörn ---------- From: *Jason Baldridge* <[email protected]> Date: Mon, Mar 21, 2011 at 5:44 PM To: Jörn Kottmann <[email protected]> But, the great thing about Scala is that you can mix Scala and Java and not have to do one or the other -- so I don't think we'd need to do a full migration. Anyway, I'll bring it up on the list! ---------- From: *Jörn Kottmann* <[email protected]> Date: Mon, Mar 21, 2011 at 5:54 PM To: Jason Baldridge <[email protected]> Yeah, but then still most of the code will remain to be pure java mixed with a little scala, but you have to deal with the extra complexity for having a little scala, e.g. more complex build tooling, you need extra IDE support, more complicated compatibility issues, etc. Jörn ---------- From: *Jason Baldridge* <[email protected]> Date: Mon, Mar 21, 2011 at 7:39 PM To: Jörn Kottmann <[email protected]> The build is *really* easy with SBT (which can incorporate maven and ivy dependency declarations). The idea would be to transition to Scala so that it would eventually be mostly scala, if not all scala. A standard jar is still distributed. ---------- From: *Jörn Kottmann* <[email protected]> Date: Tue, Mar 22, 2011 at 4:33 AM To: Jason Baldridge <[email protected]> We are using maven right now, and it does a lot of more than just putting together a jar file e.g.: - Making a release, with code signing, tagging in our SCM, producing rat reports, etc. - Deploying artifacts to the Apache repository - Building our documentation - Testing - Optionally it can run code quality tools like find bugs or a test coverage tools Jörn ---------- From: *Jason Baldridge* <[email protected]> Date: Tue, Mar 22, 2011 at 9:11 AM To: Jörn Kottmann <[email protected]> These might need some looking into, but are probably doable. These are builtin targets for SBT. -j ---------- From: *Jörn Kottmann* <[email protected]> Date: Tue, Mar 22, 2011 at 9:20 AM To: Jason Baldridge <[email protected]> Our entire build system was just rewritten to meet Apache rules and standards, if we do that again now it will set the project back for like a month or so. Jörn ---------- From: *Jason Baldridge* <[email protected]> Date: Tue, Mar 22, 2011 at 9:33 AM To: Jörn Kottmann <[email protected]> Fair enough. I will still bring it up as it now actually pains me to code in Java. ;) Oh, here is how to deploy artifacts: http://henkelmann.eu/2010/11/14/sbt_hudson_with_test_integration I think the others would be straightforward. Possibly one of the bigger sticking points would be IDE integration -- I use Emacs and it all works very well for me, but I don't know how it is for Eclipse and NetBeans folks. ---------- From: *Jörn Kottmann* <[email protected]> Date: Tue, Mar 22, 2011 at 9:40 AM To: Jason Baldridge <[email protected]> I didn't say its not possible to rewrite our build with SBT, but I strongly believe that is an effort which will take quite some time e.g. a month just to get a build which is as good as our maven build we just finished. All the people have to install the scala plugins into their IDEs to get proper support, which is of course also possible. Yeah bring it up on the mailing list. Jörn ---------- From: *Jason Baldridge* <[email protected]> Date: Tue, Mar 22, 2011 at 9:46 AM To: Jörn Kottmann <[email protected]> Sounds good. And I find that it is often straightforward to take Maven specifications and either use them directly from SBT or translate them into the SBT definitions. Perhaps we could start this with opennlp.ml and then see how it goes before doing it in the main OpenNLP code. -- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com
