Jason, Don't worry, hindsight is always 20/20. But, it takes very good planning and a lot of time to do it right the first time.
It always gets better the more you work it. James On 4/11/2011 11:13 PM, Jason Baldridge wrote: > Thanks everyone for your thoughts. I think the first step is to refactor the > package sticking with Java and then we'll see about moving to a Scala/Java > mix after that (but only for the opennlp machine learning package, currently > opennlp-maxent). > > I was actually sort of appalled looking through the code yesterday and > seeing so many global variables used all over the place, making it hard to > know exactly what every method had access to. I think this was sort of an > artifact of how I used Trove functions a loooong time ago to enable quick > iteration over the data structures (which required some objects to be > global). That is obviously gone now, but the global variables didn't go > away... hope I'll find time to improve things over the next 5-6 months. > > Jason > > On Mon, Apr 11, 2011 at 7:27 AM, Tommaso Teofili > <[email protected]>wrote: > >> Hi Jason, >> I personally have some Scala experience while working with Clerezza [1] >> which uses both Java and Scala but what I think is that, while Scala is >> perfectly ok with existing Java standards and allowing functional/dynamic >> programming, it raises the barrier for new users/devs a little bit. >> So I am not so sure that a Scala implementation should totally replace an >> existing one, maybe a graceful introduction would be more welcome. >> My 2 cents, >> Tommaso >> >> >> [1] : http://incubator.apache.org/clerezza >> >> 2011/4/10 Jason Baldridge <[email protected]> >> >>> It's been a while since I posted these request for input... Does anyone >>> have >>> any thoughts on it? Is anyone else interested in Scala being part of >>> OpenNLP? >>> >>> Jason >>> >>> On Tue, Mar 22, 2011 at 10:16 AM, Jason Baldridge >>> <[email protected]>wrote: >>> >>>> Hi everyone, >>>> >>>> Jorn and I have had a little discussion about a topic I brought up with >>> him >>>> that I'd like to get everyone's thoughts on. I'm including our >>> conversation >>>> below, but the gist of it is this: >>>> >>>> - I've been switching to development in Scala. At this point, I >>> personally >>>> see little point in coding in Java given that Scala is available (and >>> very >>>> very nice) and it plays very well with existing Java -- I'm very happy >>> with >>>> this for several projects I'm working on, including TextGrounder< >>> http://code.google.com/p/textgrounder/>and >>>> Junto <http://code.google.com/p/junto/>. So, I'd like to see Scala >>> making >>> >>>> its way into OpenNLP. >>>> - We need to reorganize the maxent code into the new package >>> opennlp.ml >>>> - I'd like to create the new package, retaining the Java code as is, >>> make >>>> a first release, and then allow Scala code to mix in with the Java from >>> that >>>> point on >>>> - A number of issues come up with this, including using another build >>> tool >>>> like SBT instead of Maven and ensuring we are Apache compliant and so >>> on. >>>> So, this is really just a feeler to see what you all think and see if >>> you >>>> have any enthusiasm, reservations or suggestions. Thanks! >>>> >>>> Jason >>>> >>>> >>>> Forwarded conversation >>>> Subject: opennlp.ml + Scala? >>>> ------------------------ >>>> >>>> From: *Jason Baldridge* <[email protected]> >>>> Date: Mon, Mar 21, 2011 at 1:28 PM >>>> To: Jörn Kottmann <[email protected]> >>>> >>>> >>>> Hi Jorn, >>>> >>>> I've changed over to doing nearly all my coding in Scala, generally >>>> transitioning Java codebases to Scala by writing everything new in Scala >>> and >>>> using the existing Java classes as they are. I would like to do this as >>> part >>>> of the new opennlp.ml, as I'm not inclined to write any new Java code >>>> unless absolutely necessary, and I would very much like to create that >>> new >>>> and improved package. What do you think of this? >>>> >>>> Jason >>>> >>>> -- >>>> Jason Baldridge >>>> Assistant Professor, Department of Linguistics >>>> The University of Texas at Austin >>>> http://www.jasonbaldridge.com >>>> >>>> ---------- >>>> From: *Jörn Kottmann* <[email protected]> >>>> Date: Mon, Mar 21, 2011 at 2:24 PM >>>> To: Jason Baldridge <[email protected]> >>>> >>>> >>>> Hmm, yeah, if we would rewrite it I think it is something we could >>>> consider, but in our case we just need >>>> to do some reshaping of the existing code and a little refactoring here >>> and >>>> there. That is one reason >>>> I believe we should be conservative and not use it in this case. >>>> >>>> Other issues I see is that it will be a message to the mahout people >>> that >>>> we do not want to collaborate, >>>> which in fact I believe is something we should do to get map reduce >>>> training support one day. >>>> The people in the team might not be familiar with scala, which could >>>> further limit the man power >>>> which is available for the re-factoring. Just my 2 cents. >>>> >>>> I believe we should also do the maxent refactoring slowly and first do >>>> everything inside the current >>>> structures, and then when everythign is in place do the last changes >>> which >>>> break backward compatibilty. >>>> >>>> Anyway we should start a discussion about the future of OpenNLP, which >>>> features do we want >>>> to implement for the next few versions? Which new components would be >>> nice >>>> to have? >>>> I believe there are quit some people who are willing to pick up tasks >>> but >>>> are simply not >>>> aware about the possibility. >>>> >>>> Jörn >>>> >>>> ---------- >>>> From: *Jason Baldridge* <[email protected]> >>>> Date: Mon, Mar 21, 2011 at 3:29 PM >>>> To: Jörn Kottmann <[email protected]> >>>> >>>> >>>> >>>> >>>> >>>> Hmm... what if we did the first refactoring into opennlp.ml with pure >>> Java >>>> but the new package structure, then make a first release and then start >>>> bringing in Scala? >>>> >>>> >>>> Good points. However, I'm finding that Scala plays *very* nicely with >>> Java >>>> (including allowing Java to use Scala classes), so that could be mostly >>>> transparent to users of the package, maintaining the API pretty much as >>> it >>>> is. So, I *think* we could continue to play nicely with Mahout folks. >>>> >>>> Also, after coding for a while in Scala, I can't help but feel that Java >>>> the language is dead, while the JVM lives gloriously on. :) I think >>> there is >>>> a lot of momentum to Scala in general, and my feeling is that it is very >>>> friendly for Java programmers. (Though I had experience in functional >>>> programming before, so a lot of concepts came easily to me that could be >>>> more unusual for others.) >>>> >>>> >>>> What do you mean by "current structures"? Do you mean to keep the >>> classes >>>> as they are now, but just switch the package organization first? >>>> >>>> >>>> Yes, perhaps we should do that once the release is all done? (Thanks for >>>> all your hard work on that, btw!) >>>> >>>> Also, perhaps we should bring up the Scala question on the mailing list? >>> I >>>> wanted to ask you first to see if you had strong objections first, but >>> since >>>> you don't it might be good to sound out the community. >>>> >>>> Jason >>>> >>>> >>>> ---------- >>>> From: *Jörn Kottmann* <[email protected]> >>>> Date: Mon, Mar 21, 2011 at 3:38 PM >>>> To: Jason Baldridge <[email protected]> >>>> >>>> >>>> I actually think just doing it for maxent/ml doesn't really makes sense, >>> if >>>> we want to switch the programming >>>> language its for entire code base. Then we speak about the migration of >>>> like 400 classes from java >>>> to scala, does that really makes sense? Just doing a little scala >>> doesn't >>>> sounds reasonable for me. >>>> >>>> Sure move it to the mailing list. >>>> >>>> Jörn >>>> >>>> ---------- >>>> From: *Jason Baldridge* <[email protected]> >>>> Date: Mon, Mar 21, 2011 at 5:44 PM >>>> To: Jörn Kottmann <[email protected]> >>>> >>>> >>>> But, the great thing about Scala is that you can mix Scala and Java and >>> not >>>> have to do one or the other -- so I don't think we'd need to do a full >>>> migration. Anyway, I'll bring it up on the list! >>>> >>>> ---------- >>>> From: *Jörn Kottmann* <[email protected]> >>>> Date: Mon, Mar 21, 2011 at 5:54 PM >>>> To: Jason Baldridge <[email protected]> >>>> >>>> >>>> Yeah, but then still most of the code will remain to be pure java mixed >>>> with a little scala, but you have >>>> to deal with the extra complexity for having a little scala, e.g. more >>>> complex build tooling, you need >>>> extra IDE support, more complicated compatibility issues, etc. >>>> >>>> Jörn >>>> >>>> ---------- >>>> From: *Jason Baldridge* <[email protected]> >>>> Date: Mon, Mar 21, 2011 at 7:39 PM >>>> To: Jörn Kottmann <[email protected]> >>>> >>>> >>>> The build is *really* easy with SBT (which can incorporate maven and ivy >>>> dependency declarations). The idea would be to transition to Scala so >>> that >>>> it would eventually be mostly scala, if not all scala. A standard jar is >>>> still distributed. >>>> >>>> ---------- >>>> From: *Jörn Kottmann* <[email protected]> >>>> Date: Tue, Mar 22, 2011 at 4:33 AM >>>> To: Jason Baldridge <[email protected]> >>>> >>>> >>>> We are using maven right now, and it does a lot of more than just >>> putting >>>> together a jar file >>>> e.g.: >>>> - Making a release, with code signing, tagging in our SCM, producing rat >>>> reports, etc. >>>> - Deploying artifacts to the Apache repository >>>> - Building our documentation >>>> - Testing >>>> - Optionally it can run code quality tools like find bugs or a test >>>> coverage tools >>>> >>>> Jörn >>>> >>>> ---------- >>>> From: *Jason Baldridge* <[email protected]> >>>> Date: Tue, Mar 22, 2011 at 9:11 AM >>>> To: Jörn Kottmann <[email protected]> >>>> >>>> >>>> >>>> >>>> >>>> These might need some looking into, but are probably doable. >>>> >>>> >>>> These are builtin targets for SBT. >>>> >>>> -j >>>> >>>> ---------- >>>> From: *Jörn Kottmann* <[email protected]> >>>> Date: Tue, Mar 22, 2011 at 9:20 AM >>>> To: Jason Baldridge <[email protected]> >>>> >>>> >>>> Our entire build system was just rewritten to meet Apache rules and >>>> standards, if we >>>> do that again now it will set the project back for like a month or so. >>>> >>>> Jörn >>>> >>>> ---------- >>>> From: *Jason Baldridge* <[email protected]> >>>> Date: Tue, Mar 22, 2011 at 9:33 AM >>>> To: Jörn Kottmann <[email protected]> >>>> >>>> >>>> Fair enough. I will still bring it up as it now actually pains me to >>> code >>>> in Java. ;) >>>> >>>> Oh, here is how to deploy artifacts: >>>> >>>> http://henkelmann.eu/2010/11/14/sbt_hudson_with_test_integration >>>> >>>> I think the others would be straightforward. Possibly one of the bigger >>>> sticking points would be IDE integration -- I use Emacs and it all works >>>> very well for me, but I don't know how it is for Eclipse and NetBeans >>> folks. >>>> ---------- >>>> From: *Jörn Kottmann* <[email protected]> >>>> Date: Tue, Mar 22, 2011 at 9:40 AM >>>> To: Jason Baldridge <[email protected]> >>>> >>>> >>>> I didn't say its not possible to rewrite our build with SBT, but I >>> strongly >>>> believe that is an effort which >>>> will take quite some time e.g. a month just to get a build which is as >>> good >>>> as our maven build we just >>>> finished. >>>> All the people have to install the scala plugins into their IDEs to get >>>> proper support, which is >>>> of course also possible. >>>> >>>> Yeah bring it up on the mailing list. >>>> >>>> Jörn >>>> >>>> ---------- >>>> From: *Jason Baldridge* <[email protected]> >>>> Date: Tue, Mar 22, 2011 at 9:46 AM >>>> To: Jörn Kottmann <[email protected]> >>>> >>>> >>>> Sounds good. And I find that it is often straightforward to take Maven >>>> specifications and either use them directly from SBT or translate them >>> into >>>> the SBT definitions. Perhaps we could start this with opennlp.ml and >>> then >>>> see how it goes before doing it in the main OpenNLP code. >>>> >>>> >>>> >>>> -- >>>> Jason Baldridge >>>> Assistant Professor, Department of Linguistics >>>> The University of Texas at Austin >>>> http://www.jasonbaldridge.com >>>> >>> >>> >>> -- >>> Jason Baldridge >>> Assistant Professor, Department of Linguistics >>> The University of Texas at Austin >>> http://www.jasonbaldridge.com >>> >> >
