Re: Mahout as TLP

2010-02-13 Thread Kay Kay
As a lurker around in this community and an active user myself, expressing mine for whatever it is worth. I am happy with the decoupling of ML from Search, with the former warranting a separate attention to itself. So, +1 on this happening eventually to be more independent, but my reservation

Logging format

2010-02-13 Thread Robin Anil
Where can i change the log file format for mvn. The current one is too verbose when doing mvn execution

Re: Mahout as TLP

2010-02-13 Thread Grant Ingersoll
All valid points by the many who have responded. Thanks! When I woke up this morning, I thought maybe we should postpone until 0.3 is out, so it is good to see this expressed here as well. As for concerns about overhead, infra@ will take care of most of the heavy lifting (new mailing lists,

Mass Code Cleanup

2010-02-13 Thread Robin Anil
I just did a mass code cleanup. Mainly comprising of -Extra blank line removal -Organize Imports across all packages. -Making local variables final No reordering of methods or code style changes are applied. Any objections or any particular class to withhold from committing. Robin

Re: Mass Code Cleanup

2010-02-13 Thread Robin Anil
And yes all tests pass. No problems there. Robin On Sat, Feb 13, 2010 at 8:25 PM, Robin Anil robin.a...@gmail.com wrote: I just did a mass code cleanup. Mainly comprising of -Extra blank line removal -Organize Imports across all packages. -Making local variables final No reordering of

Re: Mass Code Cleanup

2010-02-13 Thread Grant Ingersoll
It's fine to go with this one, but in general massive reformatting is not necessarily a good thing b/c it likely breaks a whole bunch of perfectly valid patches in JIRA, thus making more work for you later, not less. Instead, I think the way to handle this stuff is to make sure that, when

Re: Mass Code Cleanup

2010-02-13 Thread Robin Anil
Yeah I thought so. We are just before a release. And very few issues are left with patches submitted. Thats why I wanted that list of files so that I can commit the rest On Sat, Feb 13, 2010 at 10:21 PM, Grant Ingersoll gsing...@apache.orgwrote: It's fine to go with this one, but in general

Confidence estimation in a beam decoder

2010-02-13 Thread Benson Margulies
Folks, Here's one of my occasional questions in which I am, in essence, bartering my code wrangling efforts for expertise on hard stuff. Consider a sequence problem addressed with a perceptron model with an ordinary Viterbi decoder. There's a standard confidence estimation technique borrowed

[jira] Created: (MAHOUT-290) Make SequenceFileFromDirectory input args consistent with others

2010-02-13 Thread Grant Ingersoll (JIRA)
Make SequenceFileFromDirectory input args consistent with others Key: MAHOUT-290 URL: https://issues.apache.org/jira/browse/MAHOUT-290 Project: Mahout Issue Type: Bug

[jira] Created: (MAHOUT-291) Mahout Code Cleanup

2010-02-13 Thread Robin Anil (JIRA)
Mahout Code Cleanup --- Key: MAHOUT-291 URL: https://issues.apache.org/jira/browse/MAHOUT-291 Project: Mahout Issue Type: Improvement Components: Classification, Clustering, Collaborative Filtering, Frequent

Re: Logging format

2010-02-13 Thread Grant Ingersoll
Probably easiest to point you at http://www.slf4j.org/. I usually switch to Log4J and then configure it that way. On Feb 13, 2010, at 4:57 AM, Robin Anil wrote: Where can i change the log file format for mvn. The current one is too verbose when doing mvn execution

Re: Mahout as TLP

2010-02-13 Thread Ted Dunning
+1 to waiting. On Sat, Feb 13, 2010 at 4:45 AM, Grant Ingersoll gsing...@apache.orgwrote: In the end, I still am +1, but think it makes sense to wait until after 0.3. Besides, since the next board meeting is Wednesday, this will give us more time to think about it. -- Ted Dunning, CTO

Re: Mass Code Cleanup

2010-02-13 Thread Sean Owen
I'm late on this but have a question about the final business. I understand the style it is promoting and even like it and used to do it. I stopped because it does get harder to read and its not usual in java code. Any thoughts on that. On Feb 13, 2010 2:55 PM, Robin Anil robin.a...@gmail.com

Re: Mass Code Cleanup

2010-02-13 Thread Robin Anil
Which part is not usual? On Sun, Feb 14, 2010 at 12:33 AM, Sean Owen sro...@gmail.com wrote: I'm late on this but have a question about the final business. I understand the style it is promoting and even like it and used to do it. I stopped because it does get harder to read and its not

Re: Mass Code Cleanup

2010-02-13 Thread Sean Owen
Declaring every local variable final. On Feb 13, 2010 7:05 PM, Robin Anil robin.a...@gmail.com wrote: Which part is not usual? On Sun, Feb 14, 2010 at 12:33 AM, Sean Owen sro...@gmail.com wrote: I'm late on this but hav...

Re: Mass Code Cleanup

2010-02-13 Thread Robin Anil
Not everything is changed. Only at a few places. On Sun, Feb 14, 2010 at 12:42 AM, Sean Owen sro...@gmail.com wrote: Declaring every local variable final. On Feb 13, 2010 7:05 PM, Robin Anil robin.a...@gmail.com wrote: Which part is not usual? On Sun, Feb 14, 2010 at 12:33 AM, Sean Owen

Re: Logging format

2010-02-13 Thread Robin Anil
each log entry is 2 lines for starters, Just point me to the xml where the configuration is there. If there isnt any I will have to explore a bit more On Sun, Feb 14, 2010 at 12:59 AM, Drew Farris drew.far...@gmail.com wrote: Which output are you trying to get rid of? Have you tried 'mvn -q'?

Re: Logging format

2010-02-13 Thread Grant Ingersoll
There isn't one, that is the default JUL configuration. On Feb 13, 2010, at 2:33 PM, Robin Anil wrote: each log entry is 2 lines for starters, Just point me to the xml where the configuration is there. If there isnt any I will have to explore a bit more On Sun, Feb 14, 2010 at 12:59 AM,

Re: Mahout as TLP

2010-02-13 Thread Drew Farris
I can't say that I really understand the issues (if there are any) of the Mahout project running under Lucene's PMC vs. a Mahout PMC, but it sounds like that would be a big factor in deciding whether the project should be migrated to its own TLP, eg: if Mahout discussions took up a significant

CosineDistanceMeasure

2010-02-13 Thread Robin Anil
I have modified the faster function to use centroidLengthSquare. Any reason this wasnt there before? Robin earlier @Override public double distance(double centroidLengthSquare, Vector centroid, Vector v) { return distance(centroid, v); } now @Override public double

Re: Mahout as TLP

2010-02-13 Thread Benson Margulies
The ongoing admin is really no big deal. The PMC has to report to the board once a month. As Grant noted, the initial work is mostly a gift from infra. I don't see any harm in getting 0.3 out first if that makes folks more comfortable. On Sat, Feb 13, 2010 at 2:42 PM, Drew Farris

Re: Mass Code Cleanup

2010-02-13 Thread Robin Anil
Only done for local fields. And handpicked ones in bayes for local variables due to openXXYYHashMap.foreach() requiring final objects On Sun, Feb 14, 2010 at 1:47 AM, Ted Dunning ted.dunn...@gmail.com wrote: I find final slightly helpful on fields, very helpful on static fields, but not very

Re: Mahout as TLP

2010-02-13 Thread Grant Ingersoll
On Feb 13, 2010, at 3:20 PM, Benson Margulies wrote: The ongoing admin is really no big deal. The PMC has to report to the board once a month. Once a quarter normally. As Grant noted, the initial work is mostly a gift from infra. I don't see any harm in getting 0.3 out first if that

Re: Mass Code Cleanup

2010-02-13 Thread Benson Margulies
I'm not very fond of a plague of finals. Here's why. Consider final int[] x = new int[10]; That doesn't, sadly, prevent x[2] = 1; So, to me, final is too weak to be useful. I put them in code when required due to the rules about anonymous functions capturing locals, but never otherwise.

Re: Mass Code Cleanup

2010-02-13 Thread Robin Anil
final doesnt necessarily mean mutable right? On Sun, Feb 14, 2010 at 2:35 AM, Benson Margulies bimargul...@gmail.comwrote: I'm not very fond of a plague of finals. Here's why. Consider final int[] x = new int[10]; That doesn't, sadly, prevent x[2] = 1; So, to me, final is too weak

Re: Mass Code Cleanup

2010-02-13 Thread Benson Margulies
You meant immutable? I wouldn't disagree. I appreciate what it does, I just don't much like it :-) On Sat, Feb 13, 2010 at 4:07 PM, Robin Anil robin.a...@gmail.com wrote: final doesnt necessarily mean mutable right? On Sun, Feb 14, 2010 at 2:35 AM, Benson Margulies

Re: Mass Code Cleanup

2010-02-13 Thread Ted Dunning
I find it mildly expressive to be able to say: public static final String KEYWORD_CONSTANT = foobledy foo; My reader will know that this is a constant that is safe to use as such. Other than this and the syntactic captured variable case, I don't use final. On Sat, Feb 13, 2010 at 1:05 PM,

Re: Mass Code Cleanup

2010-02-13 Thread Robin Anil
On the topic of code cleanup. Current OpenXXYYhashmaps has to throw runtime exception on an IOException in Hadoop This will make that statement clear void map(Text key, Text value, final OutputCollector output){ forEachPair(function(){ @Override bool apply(key, value){ try {

Re: Mass Code Cleanup

2010-02-13 Thread Benson Margulies
Ted: I agree. I forgot about that case. Robin: There is a strange general habit in Java of this situation. Consider Thread. I'm perfectly happy to add the throws clause, but one might wonder why the 'classic' examples all have the problem you cite. On Sat, Feb 13, 2010 at 4:20 PM, Robin Anil

Re: Mass Code Cleanup

2010-02-13 Thread Robin Anil
In our particular case. Hadoop mapreduce creates the issue On Sun, Feb 14, 2010 at 3:25 AM, Benson Margulies bimargul...@gmail.comwrote: Ted: I agree. I forgot about that case. Robin: There is a strange general habit in Java of this situation. Consider Thread. I'm perfectly happy to add the

Re: Mass Code Cleanup

2010-02-13 Thread Jake Mannix
I've got to say, in the future, I really agree with Grant - we can't go and do mass code changes, we've always got little experimental patches and checkouts going on, and it's hard enough making sure to keep all of those svn up'ed as needed, let alone worry about sudden piles of incoming conflicts

Re: Mass Code Cleanup

2010-02-13 Thread Robin Anil
Sorry about that Jake, At one point we had to do this. My feeling was that, this was a better time(with most of the issues closed and all) to this, and would have proved impossible later. Robin PS: I volunteer to help you in getting all the conflicts resolved and checking it in(Offer goes only

Re: Mass Code Cleanup

2010-02-13 Thread Jake Mannix
On Sat, Feb 13, 2010 at 3:18 PM, Robin Anil robin.a...@gmail.com wrote: Sorry about that Jake, At one point we had to do this. My feeling was that, this was a better time(with most of the issues closed and all) to this, and would have proved impossible later. I guess... but it may come a

Re: Mass Code Cleanup

2010-02-13 Thread Benson Margulies
I'm a little bit confused by the process. I thought that a hallmark of the Lucene TLP, and the Mahout project inside of it, was that we do not use lazy consensus. Everything is offered for review in advance. If at all possible, it is posted as a patch. If something can't be posted as a patch on

Re: Mass Code Cleanup

2010-02-13 Thread Ted Dunning
In my own opinion, it is a very rare case that I will let the idiom throws Exception appear in any code that I write. It is almost always either a bug or a bad design. Letting apply throw Exception almost requires that forEachPair throws the same excessively general exception. That means that

Re: Mass Code Cleanup

2010-02-13 Thread Jake Mannix
On Sat, Feb 13, 2010 at 3:37 PM, Benson Margulies bimargul...@gmail.comwrote: I'm a little bit confused by the process. I think you're a little confused because many of us here in Mahout are not also longtime committers on other ASF projects, let alone ones in the Lucene ecosystem. And so

Re: Mass Code Cleanup

2010-02-13 Thread Ted Dunning
Whether or not we adhere to patch and review process, I would like it if we did. On Sat, Feb 13, 2010 at 3:37 PM, Benson Margulies bimargul...@gmail.comwrote: I thought that a hallmark of the Lucene TLP, and the Mahout project inside of it, was that we do not use lazy consensus. Everything is

New thread on 'throws Exception' for 'apply' methods

2010-02-13 Thread Benson Margulies
I'm getting double vision from the multiple discussion on the 'mass checkin' thread, so here's a new one. So, we have a function F that applies another function to some number of things. What should it do about exceptions? Answer 1: (currently in place) No throws clause at all on the signature.

Re: New thread on 'throws Exception' for 'apply' methods

2010-02-13 Thread Ted Dunning
Assuming you meant to say that I didn't want to change from 1 to 2, I agree. (for clarity, I don't like throws Exception even in higher order functions). :-) On Sat, Feb 13, 2010 at 6:42 PM, Benson Margulies bimargul...@gmail.comwrote: I read Ted's views as a resounding -1 for switching from

Re: Confidence estimation in a beam decoder

2010-02-13 Thread Ted Dunning
Benson, Are you using techniques related to this: http://www.it.usyd.edu.au/~james/pubs/pdf/dlp07perc.pdf ? On Sat, Feb 13, 2010 at 9:38 AM, Benson Margulies bimargul...@gmail.comwrote: Folks, Here's one of my occasional questions in which I am, in essence, bartering my code wrangling