I also wrote a lot of that code back in 1999/2000, and have, ahem, learned a lot since then. :)
Jason On Tue, Apr 12, 2011 at 11:41 PM, James Kosin <[email protected]> wrote: > Jason, > > Don't worry, hindsight is always 20/20. But, it takes very good > planning and a lot of time to do it right the first time. > > It always gets better the more you work it. > > James > > On 4/11/2011 11:13 PM, Jason Baldridge wrote: > > Thanks everyone for your thoughts. I think the first step is to refactor > the > > package sticking with Java and then we'll see about moving to a > Scala/Java > > mix after that (but only for the opennlp machine learning package, > currently > > opennlp-maxent). > > > > I was actually sort of appalled looking through the code yesterday and > > seeing so many global variables used all over the place, making it hard > to > > know exactly what every method had access to. I think this was sort of an > > artifact of how I used Trove functions a loooong time ago to enable quick > > iteration over the data structures (which required some objects to be > > global). That is obviously gone now, but the global variables didn't go > > away... hope I'll find time to improve things over the next 5-6 months. > > > > Jason > > > > On Mon, Apr 11, 2011 at 7:27 AM, Tommaso Teofili > > <[email protected]>wrote: > > > >> Hi Jason, > >> I personally have some Scala experience while working with Clerezza [1] > >> which uses both Java and Scala but what I think is that, while Scala is > >> perfectly ok with existing Java standards and allowing > functional/dynamic > >> programming, it raises the barrier for new users/devs a little bit. > >> So I am not so sure that a Scala implementation should totally replace > an > >> existing one, maybe a graceful introduction would be more welcome. > >> My 2 cents, > >> Tommaso > >> > >> > >> [1] : http://incubator.apache.org/clerezza > >> > >> 2011/4/10 Jason Baldridge <[email protected]> > >> > >>> It's been a while since I posted these request for input... Does anyone > >>> have > >>> any thoughts on it? Is anyone else interested in Scala being part of > >>> OpenNLP? > >>> > >>> Jason > >>> > >>> On Tue, Mar 22, 2011 at 10:16 AM, Jason Baldridge > >>> <[email protected]>wrote: > >>> > >>>> Hi everyone, > >>>> > >>>> Jorn and I have had a little discussion about a topic I brought up > with > >>> him > >>>> that I'd like to get everyone's thoughts on. I'm including our > >>> conversation > >>>> below, but the gist of it is this: > >>>> > >>>> - I've been switching to development in Scala. At this point, I > >>> personally > >>>> see little point in coding in Java given that Scala is available (and > >>> very > >>>> very nice) and it plays very well with existing Java -- I'm very happy > >>> with > >>>> this for several projects I'm working on, including TextGrounder< > >>> http://code.google.com/p/textgrounder/>and > >>>> Junto <http://code.google.com/p/junto/>. So, I'd like to see Scala > >>> making > >>> > >>>> its way into OpenNLP. > >>>> - We need to reorganize the maxent code into the new package > >>> opennlp.ml > >>>> - I'd like to create the new package, retaining the Java code as is, > >>> make > >>>> a first release, and then allow Scala code to mix in with the Java > from > >>> that > >>>> point on > >>>> - A number of issues come up with this, including using another build > >>> tool > >>>> like SBT instead of Maven and ensuring we are Apache compliant and so > >>> on. > >>>> So, this is really just a feeler to see what you all think and see if > >>> you > >>>> have any enthusiasm, reservations or suggestions. Thanks! > >>>> > >>>> Jason > >>>> > >>>> > >>>> Forwarded conversation > >>>> Subject: opennlp.ml + Scala? > >>>> ------------------------ > >>>> > >>>> From: *Jason Baldridge* <[email protected]> > >>>> Date: Mon, Mar 21, 2011 at 1:28 PM > >>>> To: Jörn Kottmann <[email protected]> > >>>> > >>>> > >>>> Hi Jorn, > >>>> > >>>> I've changed over to doing nearly all my coding in Scala, generally > >>>> transitioning Java codebases to Scala by writing everything new in > Scala > >>> and > >>>> using the existing Java classes as they are. I would like to do this > as > >>> part > >>>> of the new opennlp.ml, as I'm not inclined to write any new Java code > >>>> unless absolutely necessary, and I would very much like to create that > >>> new > >>>> and improved package. What do you think of this? > >>>> > >>>> Jason > >>>> > >>>> -- > >>>> Jason Baldridge > >>>> Assistant Professor, Department of Linguistics > >>>> The University of Texas at Austin > >>>> http://www.jasonbaldridge.com > >>>> > >>>> ---------- > >>>> From: *Jörn Kottmann* <[email protected]> > >>>> Date: Mon, Mar 21, 2011 at 2:24 PM > >>>> To: Jason Baldridge <[email protected]> > >>>> > >>>> > >>>> Hmm, yeah, if we would rewrite it I think it is something we could > >>>> consider, but in our case we just need > >>>> to do some reshaping of the existing code and a little refactoring > here > >>> and > >>>> there. That is one reason > >>>> I believe we should be conservative and not use it in this case. > >>>> > >>>> Other issues I see is that it will be a message to the mahout people > >>> that > >>>> we do not want to collaborate, > >>>> which in fact I believe is something we should do to get map reduce > >>>> training support one day. > >>>> The people in the team might not be familiar with scala, which could > >>>> further limit the man power > >>>> which is available for the re-factoring. Just my 2 cents. > >>>> > >>>> I believe we should also do the maxent refactoring slowly and first do > >>>> everything inside the current > >>>> structures, and then when everythign is in place do the last changes > >>> which > >>>> break backward compatibilty. > >>>> > >>>> Anyway we should start a discussion about the future of OpenNLP, which > >>>> features do we want > >>>> to implement for the next few versions? Which new components would be > >>> nice > >>>> to have? > >>>> I believe there are quit some people who are willing to pick up tasks > >>> but > >>>> are simply not > >>>> aware about the possibility. > >>>> > >>>> Jörn > >>>> > >>>> ---------- > >>>> From: *Jason Baldridge* <[email protected]> > >>>> Date: Mon, Mar 21, 2011 at 3:29 PM > >>>> To: Jörn Kottmann <[email protected]> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> Hmm... what if we did the first refactoring into opennlp.ml with pure > >>> Java > >>>> but the new package structure, then make a first release and then > start > >>>> bringing in Scala? > >>>> > >>>> > >>>> Good points. However, I'm finding that Scala plays *very* nicely with > >>> Java > >>>> (including allowing Java to use Scala classes), so that could be > mostly > >>>> transparent to users of the package, maintaining the API pretty much > as > >>> it > >>>> is. So, I *think* we could continue to play nicely with Mahout folks. > >>>> > >>>> Also, after coding for a while in Scala, I can't help but feel that > Java > >>>> the language is dead, while the JVM lives gloriously on. :) I think > >>> there is > >>>> a lot of momentum to Scala in general, and my feeling is that it is > very > >>>> friendly for Java programmers. (Though I had experience in functional > >>>> programming before, so a lot of concepts came easily to me that could > be > >>>> more unusual for others.) > >>>> > >>>> > >>>> What do you mean by "current structures"? Do you mean to keep the > >>> classes > >>>> as they are now, but just switch the package organization first? > >>>> > >>>> > >>>> Yes, perhaps we should do that once the release is all done? (Thanks > for > >>>> all your hard work on that, btw!) > >>>> > >>>> Also, perhaps we should bring up the Scala question on the mailing > list? > >>> I > >>>> wanted to ask you first to see if you had strong objections first, but > >>> since > >>>> you don't it might be good to sound out the community. > >>>> > >>>> Jason > >>>> > >>>> > >>>> ---------- > >>>> From: *Jörn Kottmann* <[email protected]> > >>>> Date: Mon, Mar 21, 2011 at 3:38 PM > >>>> To: Jason Baldridge <[email protected]> > >>>> > >>>> > >>>> I actually think just doing it for maxent/ml doesn't really makes > sense, > >>> if > >>>> we want to switch the programming > >>>> language its for entire code base. Then we speak about the migration > of > >>>> like 400 classes from java > >>>> to scala, does that really makes sense? Just doing a little scala > >>> doesn't > >>>> sounds reasonable for me. > >>>> > >>>> Sure move it to the mailing list. > >>>> > >>>> Jörn > >>>> > >>>> ---------- > >>>> From: *Jason Baldridge* <[email protected]> > >>>> Date: Mon, Mar 21, 2011 at 5:44 PM > >>>> To: Jörn Kottmann <[email protected]> > >>>> > >>>> > >>>> But, the great thing about Scala is that you can mix Scala and Java > and > >>> not > >>>> have to do one or the other -- so I don't think we'd need to do a full > >>>> migration. Anyway, I'll bring it up on the list! > >>>> > >>>> ---------- > >>>> From: *Jörn Kottmann* <[email protected]> > >>>> Date: Mon, Mar 21, 2011 at 5:54 PM > >>>> To: Jason Baldridge <[email protected]> > >>>> > >>>> > >>>> Yeah, but then still most of the code will remain to be pure java > mixed > >>>> with a little scala, but you have > >>>> to deal with the extra complexity for having a little scala, e.g. more > >>>> complex build tooling, you need > >>>> extra IDE support, more complicated compatibility issues, etc. > >>>> > >>>> Jörn > >>>> > >>>> ---------- > >>>> From: *Jason Baldridge* <[email protected]> > >>>> Date: Mon, Mar 21, 2011 at 7:39 PM > >>>> To: Jörn Kottmann <[email protected]> > >>>> > >>>> > >>>> The build is *really* easy with SBT (which can incorporate maven and > ivy > >>>> dependency declarations). The idea would be to transition to Scala so > >>> that > >>>> it would eventually be mostly scala, if not all scala. A standard jar > is > >>>> still distributed. > >>>> > >>>> ---------- > >>>> From: *Jörn Kottmann* <[email protected]> > >>>> Date: Tue, Mar 22, 2011 at 4:33 AM > >>>> To: Jason Baldridge <[email protected]> > >>>> > >>>> > >>>> We are using maven right now, and it does a lot of more than just > >>> putting > >>>> together a jar file > >>>> e.g.: > >>>> - Making a release, with code signing, tagging in our SCM, producing > rat > >>>> reports, etc. > >>>> - Deploying artifacts to the Apache repository > >>>> - Building our documentation > >>>> - Testing > >>>> - Optionally it can run code quality tools like find bugs or a test > >>>> coverage tools > >>>> > >>>> Jörn > >>>> > >>>> ---------- > >>>> From: *Jason Baldridge* <[email protected]> > >>>> Date: Tue, Mar 22, 2011 at 9:11 AM > >>>> To: Jörn Kottmann <[email protected]> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> These might need some looking into, but are probably doable. > >>>> > >>>> > >>>> These are builtin targets for SBT. > >>>> > >>>> -j > >>>> > >>>> ---------- > >>>> From: *Jörn Kottmann* <[email protected]> > >>>> Date: Tue, Mar 22, 2011 at 9:20 AM > >>>> To: Jason Baldridge <[email protected]> > >>>> > >>>> > >>>> Our entire build system was just rewritten to meet Apache rules and > >>>> standards, if we > >>>> do that again now it will set the project back for like a month or so. > >>>> > >>>> Jörn > >>>> > >>>> ---------- > >>>> From: *Jason Baldridge* <[email protected]> > >>>> Date: Tue, Mar 22, 2011 at 9:33 AM > >>>> To: Jörn Kottmann <[email protected]> > >>>> > >>>> > >>>> Fair enough. I will still bring it up as it now actually pains me to > >>> code > >>>> in Java. ;) > >>>> > >>>> Oh, here is how to deploy artifacts: > >>>> > >>>> http://henkelmann.eu/2010/11/14/sbt_hudson_with_test_integration > >>>> > >>>> I think the others would be straightforward. Possibly one of the > bigger > >>>> sticking points would be IDE integration -- I use Emacs and it all > works > >>>> very well for me, but I don't know how it is for Eclipse and NetBeans > >>> folks. > >>>> ---------- > >>>> From: *Jörn Kottmann* <[email protected]> > >>>> Date: Tue, Mar 22, 2011 at 9:40 AM > >>>> To: Jason Baldridge <[email protected]> > >>>> > >>>> > >>>> I didn't say its not possible to rewrite our build with SBT, but I > >>> strongly > >>>> believe that is an effort which > >>>> will take quite some time e.g. a month just to get a build which is as > >>> good > >>>> as our maven build we just > >>>> finished. > >>>> All the people have to install the scala plugins into their IDEs to > get > >>>> proper support, which is > >>>> of course also possible. > >>>> > >>>> Yeah bring it up on the mailing list. > >>>> > >>>> Jörn > >>>> > >>>> ---------- > >>>> From: *Jason Baldridge* <[email protected]> > >>>> Date: Tue, Mar 22, 2011 at 9:46 AM > >>>> To: Jörn Kottmann <[email protected]> > >>>> > >>>> > >>>> Sounds good. And I find that it is often straightforward to take Maven > >>>> specifications and either use them directly from SBT or translate them > >>> into > >>>> the SBT definitions. Perhaps we could start this with opennlp.ml and > >>> then > >>>> see how it goes before doing it in the main OpenNLP code. > >>>> > >>>> > >>>> > >>>> -- > >>>> Jason Baldridge > >>>> Assistant Professor, Department of Linguistics > >>>> The University of Texas at Austin > >>>> http://www.jasonbaldridge.com > >>>> > >>> > >>> > >>> -- > >>> Jason Baldridge > >>> Assistant Professor, Department of Linguistics > >>> The University of Texas at Austin > >>> http://www.jasonbaldridge.com > >>> > >> > > > > -- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
