Call for Presentations FOSS Backstage open

2018-01-11 Thread Isabel Drost-Fromm
Hi,

As announced on Berlin Buzzwords we (that is Isabel Drost-Fromm, Stefan
Rudnitzki as well as the eventing team over at newthinking communications GmbH)
are working on a new conference in summer in Berlin. The name of this new
conference will be "FOSS Backstage". Backstage comprises all things
FOSS governance, open collaboration and how to build and manage communities
within the open source space.


Submission URL: https://foss-backstage.de/call-papers 

The event will comprise presentations on all things FOSS governance,
decentralised decision making, open collaboration. We invite you to submit talks
on the topics: FOSS project governance, collaboration, community management.
Asynchronous/ decentralised decision making.  Vendor neutrality in FOSS,
sustainable FOSS, cross team collaboration.  Dealing with poisonous people.
Project growth and hand-over. Trademarks. Strategic licensing.  While it's
primarily targeted at contributions from FOSS people, we would love to also
learn more on how typical FOSS collaboration models work well within
enterprises. Closely related topics not explicitly listed above are welcome. 

Important Dates (all dates in GMT +2)

Submission deadline: February 18th, 2018.

Conference: June, 13th/14th, 2018


High quality talks are called for, ranging from principles to practice. We are
looking for real world case studies, background on the social architecture of
specific projects and a deep dive into cross community collaboration.
Acceptance notifications will be sent out soon after the submission deadline.
Please include your name, bio and email, the title of the talk, a brief abstract
in English language.

We have drafted the submission form to allow for regular talks, each 45 min in
length. However you are free to submit your own ideas on how to support the
event: If you would like to take our attendees out to show them your favourite
bar in Berlin, please submit this offer through the CfP form.  If you are
interested in sponsoring the event (e.g. we would be happy to provide videos
after the event, free drinks for attendees as well as an after-show party),
please contact us.

Schedule and further updates on the event will be published soon on the event
web page.

Please re-distribute this CfP to people who might be interested.

 Contact us at:
 newthinking communications GmbH
 Schoenhauser Allee 6/7
 10119 Berlin, Germany
 i...@foss-backstage.de


Looking forward to meeting you all in person in summer :) I would love to see 
all those
tracks filled with lots of valuable talks on the Apache Way, on how we work,
on how the incubator works, on how being a 501(c3) influences how people get 
involved
and projects are being run, on how being a member run organisation is different,
on merit for life, on growing communities, on things gone great - and things
gone entirely wrong in the ASF's history, on how to interact with Apache
projects as a corporation and everything else you can think of.


Isabel


-- 
Sorry for any typos: Mail was typed in vim, written in mutt, via ssh (most 
likely involving some kind of mobile connection only.)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [GSOC] 2010 Timelines

2010-04-09 Thread Isabel Drost

Timeline including Apache internal deadlines:

http://cwiki.apache.org/confluence/display/COMDEVxSITE/GSoC

Mentors, please also click on the ranking link to the ranking explanation [1] 
for more information on how to rank student proposals.

Isabel

[1] 
http://cwiki.apache.org/confluence/display/COMDEVxSITE/Mentee+Ranking+Process


signature.asc
Description: This is a digitally signed message part.


Re: Javadocs?

2010-03-30 Thread Isabel Drost
On Tue Grant Ingersoll gsing...@apache.org wrote:
 If we want, we can keep move aside the old ones and update the
 website to refer to each version.

I think that would be great - now that we are slowly getting to a point
where apis seem to stabilise at least a bit it would be great for users
that don't upgrade to still have the ability to find the docs for their
favourite version online.

Isabel



Re: Javadocs?

2010-03-30 Thread Isabel Drost
On Tue Jake Mannix jake.man...@gmail.com wrote:
 (ie can't we also have daily updates of the 0.4-SNAPSHOT javadocs
 automagically posted up there too?)

Yes - maven can do such a thing. I have configured a job on hudson to
generate code reports for Mahout with maven - Javadocs are one part of
these reports. 

Linked to from the code quality reports page:
http://lucene.apache.org/mahout/quality.html
Click on core - project reports - java doc for the following example:
http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/core-reports/apidocs/index.html

Unfortunately I currently do not have the spare cycles to go there and
look why the job broke lately. Anyone with more time than myself and a
tiny little bit of maven knowledge would be more than welcome to help
out.

The link to the hudson job:
http://hudson.zones.apache.org/hudson/job/MahoutQM/


I would argue against publishing Snapshot JavaDocs too close besides
JavaDocs of official releases - it might trick users into thinking that
the SNAPSHOT is an official release as well... Just my two cents - the
community may feel otherwise.

Isabel



Re: not a lot of mentors for GSoC

2010-03-30 Thread Isabel Drost
On Mon Grant Ingersoll gsing...@apache.org wrote:

 Mentoring sign up is on the GSOC site.  You need to be a committer to
 be a mentor, at least for the ASF anyway.

Please also identify yourself with your GsocLinkId at

https://svn.apache.org/repos/private/committers/GsocLinkId.txt

so Noirins knows who you are.

Isabel


Re: Javadocs?

2010-03-30 Thread Isabel Drost
On Tue Grant Ingersoll gsing...@apache.org wrote:
 We're probably to the point now that we could start doing a nightly
 on Hudson if we aren't already.

http://hudson.zones.apache.org/hudson/job/Mahout%20nightly/

;) (At least this one tracks whether the project still builds and all
unit tests pass.)

The one for building reports, java docs et.al was configured to build
less often - which would be fine, I think - can be triggered manually
in case of major changes anyway.

Isabel


Fw: Mentors for GSoC

2010-03-22 Thread Isabel Drost

Potential GSoC mentors - please tell Noirin who you are, if you want to
mentor a student for Mahout. More details below. If you have not done
so already, please also subscribe to code-awa...@apache.org for more
information on GSoC at Apache.


Begin forwarded message:

Date: Mon, 22 Mar 2010 15:48:17 +0100
From: Noirin Shirley noi...@apache.org
To: code-awa...@apache.org
Subject: Mentors for GSoC


Thanks to all those who've already signed up to be mentors at
http://socghop.appspot.com/ !

Unfortunately, the ASF is a big Foundation, and I don't know who all
those who've signed up are. All I see is whatever's set as your LinkID
and Public Name in your profile on the webapp.

I can work out who Grant Ingersoll(gsingers) is, and I can even give
a reasonable guess as to who isabel(isabel) might be, but relying on
me to know the names of all the people who might mentor, and to be
able to tell who's a student who's clicked the wrong button, isn't
really going to scale!

So please, it would make my job much easier if you could drop a mail
to this list with your LinkID when you sign up to be a mentor :-)

Thanks a million!

Noirin


Re: Look! No more ISSUES

2010-02-23 Thread Isabel Drost
On Tue Sean Owen sro...@gmail.com wrote:
 I'm happy to play release engineer.

Great - Thanks, Sean.

Isabel


Re: 0.3 release issues

2010-02-23 Thread Isabel Drost
On Tue Sean Owen sro...@gmail.com wrote:
 Er, how do we do that? Is it something you can describe, I can
 document and do?

It already has been described - and documented in our wiki:

http://cwiki.apache.org/MAHOUT/thirdpartydependencies.html

Hope that helps,
Isabel



Re: 0.3 release issues

2010-02-23 Thread Isabel Drost
On Tue Grant Ingersoll gsing...@apache.org wrote:
 On Feb 23, 2010, at 9:18 AM, Sean Owen wrote:
 
  It does look imminent. As much as I don't like holding out longer,
  and indefinitely, for this release, somehow I'd also really like to
  link to the latest/greatest and official Hadoop release.
  
  Let's try to be good about sticking to the code freeze -- good
  chance to focus on polish -- and if 0.20.2 isn't out by end of
  week, revisit this.
 
 +1.   We might as well upgrade to the RC, too, by adding it as a
 dependency.

+1 (to both proposals)

Isabel


Re: Welcome Drew Farris

2010-02-20 Thread Isabel Drost
On 18.02.2010 Drew Farris wrote: 
 I'm looking forward to working with you all,

Welcome to the Mahout community, Drew. Looking forward to working with you.

Isabel


signature.asc
Description: This is a digitally signed message part.


Re: Mass Code Cleanup

2010-02-19 Thread Isabel Drost
On 14.02.2010 Grant Ingersoll wrote:
 I don't object to good style.  I object to sweeping changes that break a
  lot of patches.  Maybe not the case here, but it will be in the future and
  unless the whole thing is automated as part of committing (as Hadoop
  does), the code will always have formatting issues causing this exact same
  thing to happen.

I kind of like the automatic patch checks that are activated for patches in 
jira 
over at Hadoop projects. It does highlight trivial problems with the code 
submitted w/o the need for a code review by a committer.

Does anyone here at Mahout know what is needed for such checks?

Isabel


signature.asc
Description: This is a digitally signed message part.


Re: Mass Code Cleanup

2010-02-19 Thread Isabel Drost
On 15.02.2010 Robin Anil wrote:
 SGD kmeans++ pegasus seems fine. Isabel can you check with the latest trunk
 if the perceptron is alright?

Any code I had is already checked in. Any examples I am working on should be 
easy to adopt.

Isabel


signature.asc
Description: This is a digitally signed message part.


Re: Mahout as TLP

2010-02-15 Thread Isabel Drost
On Sat Grant Ingersoll gsing...@apache.org wrote:
  I don't see any harm in getting 0.3 out first if that makes folks
  more comfortable.
 
 Yeah, this feels better to me the more I think about it.

+1 from me as well: I really like the idea of Mahout becoming a TLP -
even before a 1.0 release is available.

However I think it makes sense to sort out the 0.3 release first. If I
am counting correctly, that would make for three reasons for press
releases: A new release, Mahout becoming a TLP and later on a 1.0
release. ;)

Isabel


Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Isabel Drost
On Thu deneche abdelhakim adene...@gmail.com wrote:
 although I maintain two versions of Decision Forests, one with the old
 api and with the new one, the differences between the two APIs are so
 important that I can't just keep working on the two versions. Thus all
 the new stuff is being committed using the new API and as far as I can
 say it seems to work great.

If I understand you correctly, there is code in Mahout that still works
with the old API but also bits and pieces that depend on the new API.

Do we have some documentation we can include in the release that tells
users for which algorithms/ implementations they need to make sure they
are running a Hadoop version that provides the new API?

Isabel



Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Isabel Drost
On Wed, 10 Feb 2010 11:10:41 +
Sean sro...@gmail.com wrote:

 For simplicity, I'd document that Mahout works on 0.19 and 0.20, and
 may work on 0.18

+1

Assuming that the majority of the algorithms may work on e.g. 0.19, we
could tell users something along the lines of works with Hadoop 0.19,
except $algorithms_for_20, may work with 0.18, not guarantee given.

Isabel


Re: Some more dependencies

2010-02-10 Thread Isabel Drost
On Wed Jake Mannix jake.man...@gmail.com wrote:
  May I kick them out?
 
 
 +1

+1 from me as well.

Isabel


Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Isabel Drost
On Wed Sean Owen sro...@gmail.com wrote:

 I'd say we recommend 0.20, since that's what we develop against and
 it's the current stable release, and everything we have works on it.
 
 We can also say it should work on 0.19 and 0.18, but we don't
 guarantee or support that. (Slightly different than my last suggestion
 -- we don't actually know how it all goes on 0.19)

Sounds good to me.

Isabel


[jira] Updated: (MAHOUT-281) scm urls are wrong in the poms

2010-02-10 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost updated MAHOUT-281:


Status: Patch Available  (was: Open)

 scm urls are wrong in the poms
 --

 Key: MAHOUT-281
 URL: https://issues.apache.org/jira/browse/MAHOUT-281
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.3
Reporter: Benson Margulies
Assignee: Benson Margulies
 Fix For: 0.3

 Attachments: MAHOUT-281.diff


 The scm urls in the poms are wrong. This must be fixed before running the 
 release plugin to make an 0.3 release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-281) scm urls are wrong in the poms

2010-02-10 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost updated MAHOUT-281:


Attachment: MAHOUT-281.diff

Changed scm connection strings. (Needed a comparably simple example to show 
students at HPI how svn diff, patch and jira.)

 scm urls are wrong in the poms
 --

 Key: MAHOUT-281
 URL: https://issues.apache.org/jira/browse/MAHOUT-281
 Project: Mahout
  Issue Type: Bug
Affects Versions: 0.3
Reporter: Benson Margulies
Assignee: Benson Margulies
 Fix For: 0.3

 Attachments: MAHOUT-281.diff


 The scm urls in the poms are wrong. This must be fixed before running the 
 release plugin to make an 0.3 release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: GSOC 2010 is here

2010-02-02 Thread Isabel Drost
On Mon Robin Anil robin.a...@gmail.com wrote:
 2. UIMA Integration with Mahout? (Maybe a good project if UIMA folks
 are taking in GSOC students)

I guess one could easily split this one in two:

a) Using UIMA (whole pipeline or just the analysers if that is possible)
for data pre-processing before Mahout algorithms are run.

b) Making it easy to integrate Mahout algorithms (classification models
etc.) as UIMA annotators.

Isabel


Re: Release thinking

2010-02-01 Thread Isabel Drost
On Mon Grant Ingersoll gsing...@apache.org wrote:
  MAHOUT-231  Upgrade QM reports to use Clover 2.6
  
  
  No idea on this one.
 
 That should be independent of a release, I would think.

It is. What would be needed is adjusting our pom and the Hudson job
that builds the reports.

Isabel


Re: Release thinking

2010-02-01 Thread Isabel Drost
On Mon Jake Mannix jake.man...@gmail.com wrote:
 On Mon, Jan 25, 2010 at 10:55 AM, Sean Owen sro...@gmail.com wrote:
 
  Agree that we should start planning 0.3, as it will take over a
  month I bet to actually be ready.
 
 
 +1 to releasing within a month or so.

+1 here as well. I think it would be great to reach a shorter
release cycle for Mahout.

Isabel


Re: GSOC 2010 is here

2010-02-01 Thread Isabel Drost
On Wed Robin Anil robin.a...@gmail.com wrote:
 Greetings! Fellow GSOC alums, administrators and dear mentors, the
 next edition is right here. Details are given in the link below.
 
 https://groups.google.com/group/google-summer-of-code-discuss/browse_thread/thread/d839c0b02ac15b3f

Some additional notes to committers: 

First of all mentoring a GSoC student is a great experience, so if
you do have some cycles left, I would highly recommend participating in
GSoC as a mentor (thanks Grant for convincing myself last year...).

We had several successful students here at Mahout in past GSoC years.
Each year there were strong proposals for projects within Mahout. As a
results projects usually turn out to be interesting for both, mentor
and student.

One final note: If there is anyone on this list who might be interested
in helping with general ASF GSoC logistics and administration tasks,
please have a look at the newly founded community development project
(d...@community.apache.org)

 
 Maybe we could identify key areas in Mahout which we need to develop
 apart from the ML implementations and list it down for students to
 see before they start trickling in.

And motivate students to come up with their own ideas and discuss them
on-list before submitting their submission.


 Some ideas:
 Benchmarking Framework with EC2 wrappers

+1 I would love to see that.


 Commandline Console+Launcher like Hbase and hadoop

+1


 Online Tool/Query UI for Algorithms in Mahout(like CF)
 
 
 Possible ideas(I have no idea what i am talking here but there are
 nice problems to solve)
 Improvements in Math?
 How to tackle management of datasets?
 Error Recovery if a job fails?

How to tackle managment of learned classification models?

Better tooling for Mahout integration? (Lucene for tokenization and
analysers?, data import and export?)



Isabel


Re: [jira] Commented: (MAHOUT-238) Further Dependency Cleanup

2010-01-25 Thread Isabel Drost
On Mon Grant Ingersoll gsing...@apache.org wrote:

 We put it up there.

http://www.lucidimagination.com/search/document/621471200d2182bb/dependencies_outside_maven_central_was_oh_joy#621471200d2182bb

is the link to the posting by Jukka explaining exactly how it was done.

Isabel


[jira] Commented: (MAHOUT-262) Writable for labeled vectors for supervised learning algorithms

2010-01-22 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803690#action_12803690
 ] 

Isabel Drost commented on MAHOUT-262:
-

Should be possible to apply the patch with -p1 instead of -p0 to remove the a/b 
directories.

 Writable for labeled vectors for supervised learning algorithms
 ---

 Key: MAHOUT-262
 URL: https://issues.apache.org/jira/browse/MAHOUT-262
 Project: Mahout
  Issue Type: New Feature
  Components: Classification
Affects Versions: 0.2
Reporter: Olivier Grisel
 Fix For: 0.3

 Attachments: MAHOUT-262-1.patch


 Implement two new classes:
  - SingleLabelVectorWritable for singly classified vectorized data item (one 
 and only one label index per instance)
  - MultiLabelVectorWritable for multi categorized vectorized data item (0 or 
 more category indexes per instance)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-217) Tidy up generated data after unit tests are run

2010-01-21 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803276#action_12803276
 ] 

Isabel Drost commented on MAHOUT-217:
-


The test files I found creating but not deleting data in the tmp directory:

./utils/src/test/java/org/apache/mahout/utils/vectors/io/VectorWriterTest.java
./utils/src/test/java/org/apache/mahout/utils/vectors/SequenceFileVectorIterableTest.java
./core/src/test/java/org/apache/mahout/classifier/bayes/BayesFileFormatterTest.java
./core/src/test/java/org/apache/mahout/cf/taste/impl/model/file/FileDataModelTest.java



 Tidy up generated data after unit tests are run
 ---

 Key: MAHOUT-217
 URL: https://issues.apache.org/jira/browse/MAHOUT-217
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.3
Reporter: Isabel Drost
 Fix For: 0.3


 I tried to compile Mahout on people.apache.org yesterday: The build failed at 
 first, because tests could not generate test data. The reason: Some tests 
 tried to generate test data at /tmp/mahout-dir/... - but those directories 
 did exist already and belonged to Sean. Why? Probably because Sean had run 
 the build earlier this year - but tests did not remove the data they 
 generated.
 Proposed solution: Tests come with setup and with shutdown hooks. We should 
 remove any data when a test is finished and shut down.
 Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-264) Make mahout-math compatible with Java 1.5 (bytecode and standard library).

2010-01-21 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803281#action_12803281
 ] 

Isabel Drost commented on MAHOUT-264:
-

The changes to the pom look good.

But why are the changes to Sorting.java and Arrays.java needed?

 Make mahout-math compatible with Java 1.5 (bytecode and standard library).
 --

 Key: MAHOUT-264
 URL: https://issues.apache.org/jira/browse/MAHOUT-264
 Project: Mahout
  Issue Type: Wish
  Components: Math
Reporter: Dawid Weiss
Assignee: Benson Margulies
Priority: Minor
 Attachments: MAHOUT-264.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Status, IoC, Random numbers, etc.

2010-01-21 Thread Isabel Drost
On Mon Jake Mannix jake.man...@gmail.com wrote:
 I'm down with IoC, it's a great way to program to interfaces and
 abstract away your deep coupling, but open-source libraries I think
 aren't the best place for it.

+1 I agree with your assessment of DI containers: Spring is very
powerful and can simplify wiring large applications together,
especially with the right tools - despite the pain of reading xml
files.

However I do not think we should tie Mahout users to a specific DI
framework. I think, it should be easy to customise the wiring of Mahout
if you are already using DI. But choice should be up to the user.
Mahout should run w/o out of the box.

I am wondering whether providing convenience constructors that set up
the default wiring beside those that get dependencies injected might
help our case? This is no proposal to heavily refactor all existing
code, just an idea one might want to keep in mind when touching code
anyway, when reviewing code etc.


 p.s. two other open source projects I work on - bobo-browse
 for faceted search, and zoie for realtime search, both
 *optionally* couple to Spring, in the sense that they both
 have their example apps that live with their source tree
 use them, but it's just for *apps* built on top of the libraries,
 not for the wiring of anything done inside.

Sounds like a nice approach to me: Using spring (or Guice or whatever
your favourite may be) in some of the examples or demo applications
makes perfect sense.

Isabel


[jira] Commented: (MAHOUT-242) LLR Collocation Identifier

2010-01-21 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803381#action_12803381
 ] 

Isabel Drost commented on MAHOUT-242:
-

{quote}
I am not worried about them at this point.
{quote}

Also not very worried - probably should have indicated that basically 
everything I found could be filed as trivial, minor or style question only...

 LLR Collocation Identifier
 --

 Key: MAHOUT-242
 URL: https://issues.apache.org/jira/browse/MAHOUT-242
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.3
Reporter: Drew Farris
Priority: Minor
 Attachments: MAHOUT-242.patch, mahout-colloc.tar.gz, 
 mahout-colloc.tar.gz


 Identifies interesting Collocations in text using ngrams scored via the 
 LogLikelihoodRatio calculation. 
 As discussed in: 
 * 
 http://www.lucidimagination.com/search/document/d051123800ab6ce7/collocations_in_mahout#26634d6364c2c0d2
 * 
 http://www.lucidimagination.com/search/document/b8d5bb0745eef6e8/n_grams_for_terms#f16fa54417697d8e
 Current form is a tar of a maven project that depends on mahout. Build as 
 usual with 'mvn clean install', can be executed using:
 {noformat}
 mvn -e exec:java  -Dexec.mainClass=org.apache.mahout.colloc.CollocDriver 
 -Dexec.args=--input src/test/resources/article --colloc target/colloc 
 --output target/output -w
 {noformat}
 Output will be placed in target/output and can be viewed nicely using:
 {noformat}
 sort -rn -k1 target/output/part-0
 {noformat}
 Includes rudimentary unit tests. Please review and comment. Needs more work 
 to get this into patch state and integrate with Robin's document vectorizer 
 work in MAHOUT-237
 Some basic TODO/FIXME's include:
 * use mahout math's ObjectInt map implementation when available
 * make the analyzer configurable
 * better input validation + negative unit tests.
 * more flexible ways to generate units of analysis (n-1)grams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-246) upgrade to new lucene TokenStream API to cleanup deprecation warnings

2010-01-21 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost updated MAHOUT-246:


Resolution: Fixed
  Assignee: Olivier Grisel
Status: Resolved  (was: Patch Available)

Patch applies cleanly with -p1,  all tests still work, changes look good. 
Committed in revision 901791.

 upgrade to new lucene TokenStream API to cleanup deprecation warnings
 -

 Key: MAHOUT-246
 URL: https://issues.apache.org/jira/browse/MAHOUT-246
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.2
Reporter: Olivier Grisel
Assignee: Olivier Grisel
Priority: Minor
 Fix For: 0.3

 Attachments: MAHOUT-246-2.patch


 The attached patch use the new ts.incrementToken() / TermAttribute API 
 instead of the deprecated manual Token handling.
 It also replaces to occurrences of the deprecated new StandardAnalyzer() to 
 the more explicit new StandardAnalyzer(Version.LUCENE_CURRENT).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Tapioca anyone (fisheye)

2010-01-20 Thread Isabel Drost
On Sun Benson Margulies bimargul...@gmail.com wrote:
 http://fisheye6.atlassian.com/browse/mahout

Thanks for fisheye integration.

Isabel


[jira] Commented: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-01-16 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801280#action_12801280
 ] 

Isabel Drost commented on MAHOUT-153:
-

Welcome to Mahout. Thanks for stepping up and volunteering to take over the 
work for this issue.

 Implement kmeans++ for initial cluster selection in kmeans
 --

 Key: MAHOUT-153
 URL: https://issues.apache.org/jira/browse/MAHOUT-153
 Project: Mahout
  Issue Type: New Feature
  Components: Clustering
Affects Versions: 0.2
 Environment: OS Independent
Reporter: Panagiotis Papadimitriou
 Fix For: 0.3

   Original Estimate: 336h
  Remaining Estimate: 336h

 The current implementation of k-means includes the following algorithms for 
 initial cluster selection (seed selection): 1) random selection of k points, 
 2) use of canopy clusters.
 I plan to implement k-means++. The details of the algorithm are available 
 here: http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.
 Design Outline: I will create an abstract class SeedGenerator and a subclass 
 KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will 
 become a subclass of SeedGenerator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Welcome Benson Marguiles as Mahout Committer

2010-01-14 Thread Isabel Drost
On Wed Grant Ingersoll gsing...@apache.org wrote:
 The Lucene PMC is pleased to welcome the addition of Benson Marguiles
 as a committer on Mahout.

Welcome Benson - thanks to all the great work you have done so far for
the mahout-math stuff. Looking forward to working together with you.

Isabel


Re: Fisheye?

2010-01-14 Thread Isabel Drost
On Wed Benson Margulies bimargul...@gmail.com wrote:
 Are we set up?

If we are, than at least I am not aware of it.

Isabel


Re: [math] no-such-integer value

2010-01-14 Thread Isabel Drost
On Mon Grant Ingersoll gsing...@apache.org wrote:

 I'm sensing a theme.  I think for this stuff we should prune fairly
 aggressively, then add back in places once we have a need.

+1

Isabel


[jira] Assigned: (MAHOUT-244) Add root log-likelihood method to LogLikehood class.

2010-01-14 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost reassigned MAHOUT-244:
---

Assignee: Drew Farris

 Add root log-likelihood method to LogLikehood class.
 

 Key: MAHOUT-244
 URL: https://issues.apache.org/jira/browse/MAHOUT-244
 Project: Mahout
  Issue Type: Improvement
  Components: Math
Affects Versions: 0.3
Reporter: Drew Farris
Assignee: Drew Farris
Priority: Minor
 Fix For: 0.3

 Attachments: MAHOUT-244.patch


 Per discussion at: 
 http://www.lucidimagination.com/search/document/6dc8709e65a7ced1/llr_scoring_question
 This patch adds a method for root log-likelihood calculation to the existing 
 LogLikelihood class + provides a unit test based on Shashi's numbers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-244) Add root log-likelihood method to LogLikehood class.

2010-01-14 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost updated MAHOUT-244:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch applies cleanly and looks good, project builds with it, unit test is 
included. Committed at revision 899157.

 Add root log-likelihood method to LogLikehood class.
 

 Key: MAHOUT-244
 URL: https://issues.apache.org/jira/browse/MAHOUT-244
 Project: Mahout
  Issue Type: Improvement
  Components: Math
Affects Versions: 0.3
Reporter: Drew Farris
Assignee: Drew Farris
Priority: Minor
 Fix For: 0.3

 Attachments: MAHOUT-244.patch


 Per discussion at: 
 http://www.lucidimagination.com/search/document/6dc8709e65a7ced1/llr_scoring_question
 This patch adds a method for root log-likelihood calculation to the existing 
 LogLikelihood class + provides a unit test based on Shashi's numbers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: New MEAP: Mahout in Action

2010-01-14 Thread Isabel Drost
On 15.01.2010 Grant Ingersoll wrote:
 (BTW, great read so far, I've got 3 more chapters to go in the first
  6!)

Can second that: Great book indeed.


 We should state up front, just like in Lucene land, that anyone who has a
  book on Mahout is welcome to link it on the page.  The more books on
  Mahout the merrier!

+1 (and probably motivate people who are publishing articles or giving talks on 
Mahout to add links to their publications on the Books, Articles, Talks wiki-
page as well).

Isabel


signature.asc
Description: This is a digitally signed message part.


[jira] Resolved: (MAHOUT-85) Perceptron/Winnow Trainer

2010-01-10 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost resolved MAHOUT-85.


Resolution: Fixed

Finally committed.

 Perceptron/Winnow Trainer
 -

 Key: MAHOUT-85
 URL: https://issues.apache.org/jira/browse/MAHOUT-85
 Project: Mahout
  Issue Type: New Feature
  Components: Classification
Affects Versions: 0.1
Reporter: Isabel Drost
Assignee: Isabel Drost
 Fix For: 0.3

 Attachments: MAHOUT-85.patch, MAHOUT-85.patch, 
 perceptronWinnowTrainer.diff


 Please find attached a first sketch for perceptron and winnow training. 
 Please look very, very carefully at the patch, as I added the heart of the 
 algorithms in the emergency room at Charite Berlin (after I broke my leg when 
 cycling to the Hadoop Get Together ;) ). 
 The patch does not yet feature unit tests nor is it parallelised. Currently 
 my plan is to set up an example with the webKb dataset, add unit tests to the 
 code and after that go parallel. I would like to get some feedback early on, 
 in addition I would feel a lot better, if a second and third pair of eyes had 
 a look at the code to make sure all obvious mistakes are out as early as 
 possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAHOUT-240) Parallel version of Perceptron

2010-01-10 Thread Isabel Drost (JIRA)
Parallel version of Perceptron
--

 Key: MAHOUT-240
 URL: https://issues.apache.org/jira/browse/MAHOUT-240
 Project: Mahout
  Issue Type: Improvement
  Components: Classification
Affects Versions: 0.3
Reporter: Isabel Drost
 Fix For: 0.3


So far Perceptron (as well as Winnow) training is still implemented to run w/o 
parallelization. The goal of this issue is to explore ways for parallelization 
and if possible to provide a parallel version, that is one that is based on map 
reduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAHOUT-241) Example for perceptron

2010-01-10 Thread Isabel Drost (JIRA)
Example for perceptron
--

 Key: MAHOUT-241
 URL: https://issues.apache.org/jira/browse/MAHOUT-241
 Project: Mahout
  Issue Type: Improvement
  Components: Classification
Affects Versions: 0.3
Reporter: Isabel Drost
 Fix For: 0.3


The goal is to provide an end-to-end example based on the 20-newsgroups dataset 
to show how to get from a set of labelled training examples to a trained model 
that can later be reused.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: How to apply these patches

2009-12-28 Thread Isabel Drost
On Saturday 19 December 2009 16:15:46 Drew Farris wrote:
 Gang, should the wiki
 (http://cwiki.apache.org/MAHOUT/howtocontribute.html) be updated to
 include -E?

Sure*.

Isabel


* The wiki is open for edits by anyone. All you need is a wiki account which 
you can create without being a committer.

-- 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://main...@spaceboyz.net



signature.asc
Description: This is a digitally signed message part.


Re: Eclipse and checkstyle

2009-12-28 Thread Isabel Drost
On Saturday 19 December 2009 16:30:31 Benson Margulies wrote:
 Since you've got a checkstyle set that you like, can I go ahead and
 build the profile for setting up eclipse to use it?

Sure. There should already be a checkstyle file checked in (maven module) - 
feel free to use that or replace by one that matches the style used for 
Lucene as well.

Isabel

-- 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://main...@spaceboyz.net



signature.asc
Description: This is a digitally signed message part.


Re: [math]: how to test sorts

2009-12-28 Thread Isabel Drost
On Wednesday 23 December 2009 22:09:48 Grant Ingersoll wrote:
 Beyond that, we could start implementing Clover test coverage, I suppose.

It comes with the code quality reports added earlier. They are generated on a 
daily basis through Hudson and are linked to in the dev section of our web 
page. (The maven options to generate the reports locally are documented in 
MAHOUT-210). However we should update report generation to Clover version 
2.6.


Isabel

-- 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://main...@spaceboyz.net



signature.asc
Description: This is a digitally signed message part.


[jira] Created: (MAHOUT-231) Upgrade QM reports to use Clover 2.6

2009-12-27 Thread Isabel Drost (JIRA)
Upgrade QM reports to use Clover 2.6


 Key: MAHOUT-231
 URL: https://issues.apache.org/jira/browse/MAHOUT-231
 Project: Mahout
  Issue Type: Task
  Components: Website
Affects Versions: 0.3
Reporter: Isabel Drost
Priority: Minor
 Fix For: 0.3


Atlassian has donated a license for a new Clover version. The reports provide 
more information and are easier to read. We should upgrade to site reports to 
use that version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-85) Perceptron/Winnow Trainer

2009-12-26 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost updated MAHOUT-85:
---

Attachment: MAHOUT-85.patch

The patch has tests added to the implementation. The additional abstraction 
proposed earlier is integrated. Distance measure is not configurable but 
corresponds to what was defined in the original algorithm formulations.

The implementation currently is sequential-only. Still evaluating, if and how 
is might be possible to parallelize.

Missing so far: An example showing how to use training, how to store the 
resulting model and how to apply the model. Probably should be done in a new 
issue to keep this one focused on the algorithm itself. In addition I still 
have to at least add links from our wiki to the wikipedia pages on both 
algorithms.

(Had some time left during the past few days: Screws in my knee are out now ;) )

 Perceptron/Winnow Trainer
 -

 Key: MAHOUT-85
 URL: https://issues.apache.org/jira/browse/MAHOUT-85
 Project: Mahout
  Issue Type: New Feature
  Components: Classification
Affects Versions: 0.1
Reporter: Isabel Drost
Assignee: Isabel Drost
 Fix For: 0.3

 Attachments: MAHOUT-85.patch, perceptronWinnowTrainer.diff


 Please find attached a first sketch for perceptron and winnow training. 
 Please look very, very carefully at the patch, as I added the heart of the 
 algorithms in the emergency room at Charite Berlin (after I broke my leg when 
 cycling to the Hadoop Get Together ;) ). 
 The patch does not yet feature unit tests nor is it parallelised. Currently 
 my plan is to set up an example with the webKb dataset, add unit tests to the 
 code and after that go parallel. I would like to get some feedback early on, 
 in addition I would feel a lot better, if a second and third pair of eyes had 
 a look at the code to make sure all obvious mistakes are out as early as 
 possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-210) Publish code quality reports through maven

2009-12-18 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792449#action_12792449
 ] 

Isabel Drost commented on MAHOUT-210:
-

Forgot to include what I changed to make it work:

Seems like the workspace directory on hudson is only accessible to users logged 
in to hudson. So I changed the job to stage the generated site to a publicly 
accessible directory and adjust the links accordingly. 

To get Clover to work I gave maven the path to the clover license on Hudson and 
issued report generation and aggregation before the site is generated.

The maven parameters used for building:

-Dmaven.clover.license=$PATH - path to the clover license file
clean install - to clean the target directories and start building and locally 
installing the artifacts
clover:instrument clover:aggregate  - generates the clover reports
site:site - generates the maven site report files and stores them under 
$module/target/site for review
site:stage -DstagingDirectory=/export/home/hudson/hudson/jobs/MahoutQM/site - 
stages the maven report files on a publicly readable directory


 Publish code quality reports through maven
 --

 Key: MAHOUT-210
 URL: https://issues.apache.org/jira/browse/MAHOUT-210
 Project: Mahout
  Issue Type: New Feature
  Components: Website
Affects Versions: 0.1, 0.2
Reporter: Isabel Drost
Assignee: Isabel Drost
 Fix For: 0.3

 Attachments: MAHOUT-210.patch


 We should use mvn site:site to generate code reports and publish them online 
 for users to review and developers to easily spot problems.
 First version that still needs checks adjusted to our needs is available 
 online at:
 http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports.html
 Further discussion on-list at
 http://www.lucidimagination.com/search/document/a13aa5127b47fda3/publish_code_quality_reports_on_web_site##a13aa5127b47fda3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-210) Publish code quality reports through maven

2009-12-17 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791887#action_12791887
 ] 

Isabel Drost commented on MAHOUT-210:
-

Checked in the current status of the report configuration files. Feel free to 
adjust any configuration that does not quite fit our standards yet. I tried to 
address those issues mentioned by Sean earlier in the mail thread.

I setup a Hudson job to build the documentation and linked it such that it gets 
published through Hudson. The URLs for that:

http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/core-reports/index.html
http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/examples-reports/index.html
http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/matrix-reports/index.html
http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/maven-reports/index.html
http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/taste-web-reports/index.html
http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/utils-reports/index.html

Those urls were activated according to the description of Bhuvaneswaran A on 
infrastruct...@apache:

 1) setup Hudson job to generate the reports. 
 2) login to hud...@hudson.zones.apache.org and create a symbolic link:
{code}
  $ sudo su - hudson
  $ cd hudson/userContent
  $ ln -s /export/home/hudson/hudson/jobs/Mahout\ QM/$PATH_TO_DOCS 
./lucene-mahout/$MODULE-reports
{code}
   3) Access via 
http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/$MODULE-reports/index.html

The site should be regenerated once a day. Once that is done today those pages 
available on hudson should match those I already published on people.apache.org

About to add links to our project page to the reports (going to be a separate 
page in the developers section).

Missing: Currently the clover test coverage reports are not yet being generated 
- I need to change the Hudson job to take up the clover license file for that.

 Publish code quality reports through maven
 --

 Key: MAHOUT-210
 URL: https://issues.apache.org/jira/browse/MAHOUT-210
 Project: Mahout
  Issue Type: New Feature
  Components: Website
Affects Versions: 0.1, 0.2
Reporter: Isabel Drost
Assignee: Isabel Drost
 Fix For: 0.3

 Attachments: MAHOUT-210.patch


 We should use mvn site:site to generate code reports and publish them online 
 for users to review and developers to easily spot problems.
 First version that still needs checks adjusted to our needs is available 
 online at:
 http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports.html
 Further discussion on-list at
 http://www.lucidimagination.com/search/document/a13aa5127b47fda3/publish_code_quality_reports_on_web_site##a13aa5127b47fda3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-220) Mahout Bayes Code cleanup

2009-12-15 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790653#action_12790653
 ] 

Isabel Drost commented on MAHOUT-220:
-

Before reorganizing code - could someone who is more familiar with the specific 
rules of the code-style used at Lucene double-check the exact checkstyle rules 
used for site-generation? I reused the checkstyle configuration that was 
already in Mahout-trunk (relaxing some of its rules) but am in doubt whether it 
really reflects our rules.

 Mahout Bayes Code cleanup
 -

 Key: MAHOUT-220
 URL: https://issues.apache.org/jira/browse/MAHOUT-220
 Project: Mahout
  Issue Type: Improvement
  Components: Classification
Affects Versions: 0.3
Reporter: Robin Anil
Assignee: Robin Anil
 Fix For: 0.2

 Attachments: MAHOUT-BAYES.patch


 Following isabel's checkstyle, I am adding a whole slew of code cleanup with 
 the following exceptions
 1.  Line length used is 120 instead of 80. 
 2.  static final log is kept as is. not LOG. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-224) Dependency Cleanup

2009-12-15 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790658#action_12790658
 ] 

Isabel Drost commented on MAHOUT-224:
-

Maven supports marking dependencies as needed for tests only (would be 
appropriate for junit), or as provided by user (might be appropriate for the 
Hadoop stuff that I think is needed only at compile time but is available on 
the Hadoop cluster when deploying Mahout, right?). This should reduce the 
number of jars that need to be distributed as well. But that can be addressed 
in a separate issue.

 Dependency Cleanup
 --

 Key: MAHOUT-224
 URL: https://issues.apache.org/jira/browse/MAHOUT-224
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.2
Reporter: Drew Farris
Assignee: Drew Farris
Priority: Minor
 Attachments: mahout-224.patch


 In preparation for the binary release work described in MAHOUT-215, here's a 
 minor patch that does some some cleanup on the poms. 
 The hadoop and junit dependency versions are now established using the 
 dependencyManagement section of the parent pom in mahout/maven/pom.xml
 A large number of transitive dependencies from the hadoop pom are now 
 excluded there as well -- these were not necessary previously because the 
 hadoop dependency was hand-rolled and did not include them. With the update 
 to the hadoop 0.20.2-SNAPSHOT, they now become required.
 Also, the parent pom no longer has mahout/pom.xml as its parent, this allows 
 binary packaging to be performed in mahout/pom.xml after the build of all of 
 the other sub-modules is complete.
 Also, removed the javamail dependency -- was there a reason this was present?
 Verified that build and unit tests complete.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-85) Perceptron/Winnow Trainer

2009-12-11 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789312#action_12789312
 ] 

Isabel Drost commented on MAHOUT-85:


I am about to add tests currently. I guess, I will commit once I have those 
done and go on with a parallel version from there.

 Perceptron/Winnow Trainer
 -

 Key: MAHOUT-85
 URL: https://issues.apache.org/jira/browse/MAHOUT-85
 Project: Mahout
  Issue Type: New Feature
  Components: Classification
Affects Versions: 0.1
Reporter: Isabel Drost
Assignee: Isabel Drost
 Fix For: 0.3

 Attachments: perceptronWinnowTrainer.diff


 Please find attached a first sketch for perceptron and winnow training. 
 Please look very, very carefully at the patch, as I added the heart of the 
 algorithms in the emergency room at Charite Berlin (after I broke my leg when 
 cycling to the Hadoop Get Together ;) ). 
 The patch does not yet feature unit tests nor is it parallelised. Currently 
 my plan is to set up an example with the webKb dataset, add unit tests to the 
 code and after that go parallel. I would like to get some feedback early on, 
 in addition I would feel a lot better, if a second and third pair of eyes had 
 a look at the code to make sure all obvious mistakes are out as early as 
 possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-210) Publish code quality reports through maven

2009-12-11 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost updated MAHOUT-210:


Attachment: MAHOUT-210.patch

The patch adds clover, findbugs, pmd, cpd and maven dependency reports as well 
as java doc generation.

After application the site can be generated through mvn site:site - I have 
thrown out all general project information that is already available through 
our forest site.

The plan is to run mvn clean install site:site site:deploy on a daily (maybe 
weekly?) basis on people.apache.org and publish the results there so they can 
be linked to from our site.

 Publish code quality reports through maven
 --

 Key: MAHOUT-210
 URL: https://issues.apache.org/jira/browse/MAHOUT-210
 Project: Mahout
  Issue Type: New Feature
  Components: Website
Affects Versions: 0.1, 0.2
Reporter: Isabel Drost
Assignee: Isabel Drost
 Fix For: 0.3

 Attachments: MAHOUT-210.patch


 We should use mvn site:site to generate code reports and publish them online 
 for users to review and developers to easily spot problems.
 First version that still needs checks adjusted to our needs is available 
 online at:
 http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports.html
 Further discussion on-list at
 http://www.lucidimagination.com/search/document/a13aa5127b47fda3/publish_code_quality_reports_on_web_site##a13aa5127b47fda3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).

2009-12-10 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost updated MAHOUT-11:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed. Thanks Drew for your help.

 Static fields used throughout clustering code (Canopy, K-Means).
 

 Key: MAHOUT-11
 URL: https://issues.apache.org/jira/browse/MAHOUT-11
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.1
Reporter: Dawid Weiss
 Fix For: 0.3

 Attachments: MAHOUT-11-all-cleanup-20091128.patch, 
 MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, 
 MAHOUT-11.patch


 I file this as a bug, even though I'm not 100% sure it is one. In the currect 
 code the information is exchanged via static fields (for example, distance 
 measure and thresholds for Canopies are static field). Is it always true in 
 Hadoop that one job runs inside one JVM with exclusive access? I haven't seen 
 it anywhere in Hadoop documentation and my impression was that everything 
 uses JobConf to pass configuration to jobs, but jobs are configured on a 
 per-object basis (a job is an object, a mapper is an object and everything 
 else is basically an object).
 If it's possible for two jobs to run in parallel inside one JVM then this is 
 a limitation and bug in our code that needs to be addressed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).

2009-12-10 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost reassigned MAHOUT-11:
--

Assignee: Isabel Drost

 Static fields used throughout clustering code (Canopy, K-Means).
 

 Key: MAHOUT-11
 URL: https://issues.apache.org/jira/browse/MAHOUT-11
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.1
Reporter: Dawid Weiss
Assignee: Isabel Drost
 Fix For: 0.3

 Attachments: MAHOUT-11-all-cleanup-20091128.patch, 
 MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, 
 MAHOUT-11.patch


 I file this as a bug, even though I'm not 100% sure it is one. In the currect 
 code the information is exchanged via static fields (for example, distance 
 measure and thresholds for Canopies are static field). Is it always true in 
 Hadoop that one job runs inside one JVM with exclusive access? I haven't seen 
 it anywhere in Hadoop documentation and my impression was that everything 
 uses JobConf to pass configuration to jobs, but jobs are configured on a 
 per-object basis (a job is an object, a mapper is an object and everything 
 else is basically an object).
 If it's possible for two jobs to run in parallel inside one JVM then this is 
 a limitation and bug in our code that needs to be addressed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).

2009-12-10 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost reassigned MAHOUT-11:
--

Assignee: Drew Farris  (was: Isabel Drost)

Thanks.

 Static fields used throughout clustering code (Canopy, K-Means).
 

 Key: MAHOUT-11
 URL: https://issues.apache.org/jira/browse/MAHOUT-11
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.1
Reporter: Dawid Weiss
Assignee: Drew Farris
 Fix For: 0.3

 Attachments: MAHOUT-11-all-cleanup-20091128.patch, 
 MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, 
 MAHOUT-11.patch


 I file this as a bug, even though I'm not 100% sure it is one. In the currect 
 code the information is exchanged via static fields (for example, distance 
 measure and thresholds for Canopies are static field). Is it always true in 
 Hadoop that one job runs inside one JVM with exclusive access? I haven't seen 
 it anywhere in Hadoop documentation and my impression was that everything 
 uses JobConf to pass configuration to jobs, but jobs are configured on a 
 per-object basis (a job is an object, a mapper is an object and everything 
 else is basically an object).
 If it's possible for two jobs to run in parallel inside one JVM then this is 
 a limitation and bug in our code that needs to be addressed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).

2009-12-10 Thread Isabel Drost
On Thu Sean Owen sro...@gmail.com wrote:

 Looks like Hudson is saying that broke the build but looks like easily
 addressable stuff.

Fixed it - but only shortly *after* Hudson had already started building
the project :/

Triggered the build on Hudson manually a few minutes ago - now it runs
successfully again.

Isabel



[jira] Assigned: (MAHOUT-210) Publish code quality reports through maven

2009-12-10 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost reassigned MAHOUT-210:
---

Assignee: Isabel Drost

 Publish code quality reports through maven
 --

 Key: MAHOUT-210
 URL: https://issues.apache.org/jira/browse/MAHOUT-210
 Project: Mahout
  Issue Type: New Feature
  Components: Website
Affects Versions: 0.1, 0.2
Reporter: Isabel Drost
Assignee: Isabel Drost
 Fix For: 0.3


 We should use mvn site:site to generate code reports and publish them online 
 for users to review and developers to easily spot problems.
 First version that still needs checks adjusted to our needs is available 
 online at:
 http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports.html
 Further discussion on-list at
 http://www.lucidimagination.com/search/document/a13aa5127b47fda3/publish_code_quality_reports_on_web_site##a13aa5127b47fda3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).

2009-12-09 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788129#action_12788129
 ] 

Isabel Drost commented on MAHOUT-11:


I'll make the changes before committing - no need to submit a new patch version.

 Static fields used throughout clustering code (Canopy, K-Means).
 

 Key: MAHOUT-11
 URL: https://issues.apache.org/jira/browse/MAHOUT-11
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.1
Reporter: Dawid Weiss
 Fix For: 0.3

 Attachments: MAHOUT-11-all-cleanup-20091128.patch, 
 MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, 
 MAHOUT-11.patch


 I file this as a bug, even though I'm not 100% sure it is one. In the currect 
 code the information is exchanged via static fields (for example, distance 
 measure and thresholds for Canopies are static field). Is it always true in 
 Hadoop that one job runs inside one JVM with exclusive access? I haven't seen 
 it anywhere in Hadoop documentation and my impression was that everything 
 uses JobConf to pass configuration to jobs, but jobs are configured on a 
 per-object basis (a job is an object, a mapper is an object and everything 
 else is basically an object).
 If it's possible for two jobs to run in parallel inside one JVM then this is 
 a limitation and bug in our code that needs to be addressed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAHOUT-90) Adding all scripts (for nightly build) to SVN repository.

2009-12-07 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost resolved MAHOUT-90.


Resolution: Later

Marked as Later - currently snapshots are published to the apache maven 
repository. At the moment that should be enough for users to play around with 
latest code.

 Adding all scripts (for nightly build) to SVN repository.
 -

 Key: MAHOUT-90
 URL: https://issues.apache.org/jira/browse/MAHOUT-90
 Project: Mahout
  Issue Type: New Feature
Reporter: Edward J. Yoon
Priority: Minor
 Fix For: 0.3

 Attachments: mahout.tgz


 I made below scripts for the hudson continuous integration service on my 
 hudson account. 
 mahout/hudsonBuildMahoutPatch.sh   
 mahout/processMahoutPatchEmail.sh
 mahout/hudsonPatchQueueAdmin.sh
 They will be modified by only me, so It should be handled via SVN.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-90) Adding all scripts (for nightly build) to SVN repository.

2009-12-06 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786678#action_12786678
 ] 

Isabel Drost commented on MAHOUT-90:


I did add a hudson job to upload maven snapshots of our projects to the apache 
repository on a nightly basis. No idea however how building and publishing 
nightly releases should work at Apache.

 Adding all scripts (for nightly build) to SVN repository.
 -

 Key: MAHOUT-90
 URL: https://issues.apache.org/jira/browse/MAHOUT-90
 Project: Mahout
  Issue Type: New Feature
Reporter: Edward J. Yoon
Assignee: Isabel Drost
Priority: Minor
 Fix For: 0.3

 Attachments: mahout.tgz


 I made below scripts for the hudson continuous integration service on my 
 hudson account. 
 mahout/hudsonBuildMahoutPatch.sh   
 mahout/processMahoutPatchEmail.sh
 mahout/hudsonPatchQueueAdmin.sh
 They will be modified by only me, so It should be handled via SVN.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAHOUT-90) Adding all scripts (for nightly build) to SVN repository.

2009-12-06 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost reassigned MAHOUT-90:
--

Assignee: (was: Isabel Drost)

 Adding all scripts (for nightly build) to SVN repository.
 -

 Key: MAHOUT-90
 URL: https://issues.apache.org/jira/browse/MAHOUT-90
 Project: Mahout
  Issue Type: New Feature
Reporter: Edward J. Yoon
Priority: Minor
 Fix For: 0.3

 Attachments: mahout.tgz


 I made below scripts for the hudson continuous integration service on my 
 hudson account. 
 mahout/hudsonBuildMahoutPatch.sh   
 mahout/processMahoutPatchEmail.sh
 mahout/hudsonPatchQueueAdmin.sh
 They will be modified by only me, so It should be handled via SVN.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).

2009-12-04 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785985#action_12785985
 ] 

Isabel Drost commented on MAHOUT-11:


Applies cleanly and builds w/o unit test failures here.

The changes look all good to me. Great work, Drew.

One question though: In the TestMeanShift test (lines 301 and 304) you removed 
the canopyId adjustments - could you please explain what was the reason this 
was necessary?

I would like to commit this patch next week if noone objects.

 Static fields used throughout clustering code (Canopy, K-Means).
 

 Key: MAHOUT-11
 URL: https://issues.apache.org/jira/browse/MAHOUT-11
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.1
Reporter: Dawid Weiss
 Fix For: 0.3

 Attachments: MAHOUT-11-all-cleanup-20091128.patch, 
 MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, 
 MAHOUT-11.patch


 I file this as a bug, even though I'm not 100% sure it is one. In the currect 
 code the information is exchanged via static fields (for example, distance 
 measure and thresholds for Canopies are static field). Is it always true in 
 Hadoop that one job runs inside one JVM with exclusive access? I haven't seen 
 it anywhere in Hadoop documentation and my impression was that everything 
 uses JobConf to pass configuration to jobs, but jobs are configured on a 
 per-object basis (a job is an object, a mapper is an object and everything 
 else is basically an object).
 If it's possible for two jobs to run in parallel inside one JVM then this is 
 a limitation and bug in our code that needs to be addressed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Publish code quality reports on web-site?

2009-12-03 Thread Isabel Drost
On Sun deneche abdelhakim adene...@gmail.com wrote:

 df/mapred works with the old hadoop API
 df/mapreduce works with hadoop 0.20 API

Hmm. Maybe it would still be possible to factor that code out that is
common to both implementations? That step might make migrating to a
future Hadoop version easier as well as only the API dependent code
would have to be changed.

Isabel


Re: Publish code quality reports on web-site?

2009-12-03 Thread Isabel Drost
On Thu Sean Owen sro...@gmail.com wrote:

 I suggest our current stance be that we use 0.20.x, with the old APIs.
 When 0.21 comes out and stabilizes, we move. So I suggest keeping
 these and deleting 'mapred' at that point.

Sounds good to me.

Isabel


Re: Packaging target + dependencies in one .jar with Maven?

2009-12-03 Thread Isabel Drost
On Thu Sean Owen sro...@gmail.com wrote:

 Anyone know if there is an easy way to package a build target with all
 its dependencies with Maven? I can't find the formula with the
 assembly plugin but guess it is there.

Hmm, judging from the poms in our repo, we are currently doing that
through an ant-script. Just look at the passages that generate *.job
files in the examples and core modules.

Isabel


Re: [OT] who are jteam ?

2009-12-03 Thread Isabel Drost
On Thu, 3 Dec 2009 07:44:06 -0800
patrick o'leary pj...@pjaol.com wrote:

 Got a google alert from a very interesting / confusing page,
 http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/
 
 Anyone know who these guys are?

They did give a rather good talk on what they are doing with Solr at
this year's Apache Con EU in Amsterdam:

http://eu.apachecon.com/c/aceu2009/sessions/251

They are using Solr for customer search projects. Back then they were
planning to contribute back (bug fixes immediately, larger extensions
after some time).

Isabel


Re: Publish code quality reports on web-site?

2009-11-28 Thread Isabel Drost
On Saturday 28 November 2009 08:30:26 Sean Owen wrote:
 I'm all for generating and publishing this.

Great. Than I will go an tweak the checks to match our guidelines, twiddle a 
bit with the output format and than integrate the stuff into our nightly 
build.


 I didn't see anything big flagged, good, but we should all have a look
 at the results and tweak accordingly. In some cases it had a good
 small point, or I was indifferent about the approach it was suggesting
 versus what was in the code, so I changed to comply with the check.

The reports generated are just examples - I am all for adjusting all checks 
(or adding new ones) that do not fit our needs. Going through your list and 
doing the proposed changes, reupload the site so everyone can have a look.

Isabel


-- 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://main...@spaceboyz.net



signature.asc
Description: This is a digitally signed message part.


Re: Publish code quality reports on web-site?

2009-11-28 Thread Isabel Drost
On Saturday 28 November 2009 21:29:05 Drew Farris wrote:
 It will be be interesting to see the reports for the other modules as
 well. examples, utils, matrix.

As a little preview: Just substitute mahout-core with mahout-modulename in 
the url below:

http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports.
html

Fixing the report links is on my list already ;)

Isabel


-- 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://main...@spaceboyz.net



signature.asc
Description: This is a digitally signed message part.


Publish code quality reports on web-site?

2009-11-27 Thread Isabel Drost

Hello,

I just ran several code analysis reports over the Mahout source code.
Results are published at

http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports.html

It includes several reports on code quality, test coverage, java docs
and the like. When generated regularly say on Hudson I think it could
be beneficial both for us (for getting a quick impression of where
cleanup is necessary most) as well as for potential users.

I would like to see a third tab added to our homepage that points to
a page containing reports for each of our modules. I would try to cleanup the 
generated site a little before - we certainly do not need the Project
information stuff in there, as most of this is already generated
through forest. In addition I can take care of setting up a hudson
job to recreate the site on a regular schedule.

Cheers,
Isabel

-- 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://main...@spaceboyz.net



signature.asc
Description: This is a digitally signed message part.


Re: SVM algo, code, etc.

2009-11-25 Thread Isabel Drost
On Fri Grant Ingersoll gsing...@apache.org wrote:
 On Nov 19, 2009, at 1:15 PM, Sean Owen wrote:
  Post a patch if you'd like to proceed, IMHO.
 +1

+1 from me as well. I would love to see solid svm support in Mahout.

Isabel


[jira] Commented: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).

2009-11-25 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782470#action_12782470
 ] 

Isabel Drost commented on MAHOUT-11:


Drew, go ahead then.

 Static fields used throughout clustering code (Canopy, K-Means).
 

 Key: MAHOUT-11
 URL: https://issues.apache.org/jira/browse/MAHOUT-11
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.1
Reporter: Dawid Weiss
 Fix For: 0.3

 Attachments: MAHOUT-11-kmeans-cleanup.patch, 
 MAHOUT-11-RandomSeedGenerator.patch, MAHOUT-11.patch


 I file this as a bug, even though I'm not 100% sure it is one. In the currect 
 code the information is exchanged via static fields (for example, distance 
 measure and thresholds for Canopies are static field). Is it always true in 
 Hadoop that one job runs inside one JVM with exclusive access? I haven't seen 
 it anywhere in Hadoop documentation and my impression was that everything 
 uses JobConf to pass configuration to jobs, but jobs are configured on a 
 per-object basis (a job is an object, a mapper is an object and everything 
 else is basically an object).
 If it's possible for two jobs to run in parallel inside one JVM then this is 
 a limitation and bug in our code that needs to be addressed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).

2009-11-19 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost updated MAHOUT-11:
---

Attachment: MAHOUT-11.patch

Not the original author of the source, but still managed to get the static 
fields out of the k-means clustering code. All unit-tests are still passing. 
However I would feel a lot better, if someone else double-checked the changes 
made.

Looking at the code, I spotted some more points that could benefit from being 
revisited (e.g. usage of deprecated MapReduce APIs and introduction of status 
reports). But this should be done in a separate issue.

 Static fields used throughout clustering code (Canopy, K-Means).
 

 Key: MAHOUT-11
 URL: https://issues.apache.org/jira/browse/MAHOUT-11
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.1
Reporter: Dawid Weiss
 Fix For: 0.3

 Attachments: MAHOUT-11.patch


 I file this as a bug, even though I'm not 100% sure it is one. In the currect 
 code the information is exchanged via static fields (for example, distance 
 measure and thresholds for Canopies are static field). Is it always true in 
 Hadoop that one job runs inside one JVM with exclusive access? I haven't seen 
 it anywhere in Hadoop documentation and my impression was that everything 
 uses JobConf to pass configuration to jobs, but jobs are configured on a 
 per-object basis (a job is an object, a mapper is an object and everything 
 else is basically an object).
 If it's possible for two jobs to run in parallel inside one JVM then this is 
 a limitation and bug in our code that needs to be addressed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).

2009-11-19 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780476#action_12780476
 ] 

Isabel Drost commented on MAHOUT-11:


First of all, thanks for the review.

Passing the output collector directly - Jepp, makes sense. Will change and 
resubmit the patch.

Tests with real data: Big thanks for that.

Isabel

 Static fields used throughout clustering code (Canopy, K-Means).
 

 Key: MAHOUT-11
 URL: https://issues.apache.org/jira/browse/MAHOUT-11
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.1
Reporter: Dawid Weiss
 Fix For: 0.3

 Attachments: MAHOUT-11.patch


 I file this as a bug, even though I'm not 100% sure it is one. In the currect 
 code the information is exchanged via static fields (for example, distance 
 measure and thresholds for Canopies are static field). Is it always true in 
 Hadoop that one job runs inside one JVM with exclusive access? I haven't seen 
 it anywhere in Hadoop documentation and my impression was that everything 
 uses JobConf to pass configuration to jobs, but jobs are configured on a 
 per-object basis (a job is an object, a mapper is an object and everything 
 else is basically an object).
 If it's possible for two jobs to run in parallel inside one JVM then this is 
 a limitation and bug in our code that needs to be addressed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAHOUT-200) Update information on Mahout site

2009-11-18 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost resolved MAHOUT-200.
-

   Resolution: Fixed
Fix Version/s: (was: 0.3)
   0.2

Updated web page and fixed typo in release announcement.

 Update information on Mahout site
 -

 Key: MAHOUT-200
 URL: https://issues.apache.org/jira/browse/MAHOUT-200
 Project: Mahout
  Issue Type: Improvement
  Components: Website
Reporter: Isabel Drost
Assignee: Isabel Drost
Priority: Minor
 Fix For: 0.2

 Attachments: update_site.patch


 After several people had trouble finding the docs we provide in the wiki, I 
 have created a slightly updated version of our website. I added a few links 
 to wiki pages that might be of interest to potential Mahout users.
 I have uploaded the updated version to http://people.apache.org/~isabel/site 
 so all of you can have a look. Will commit on Tuesday next week if noone 
 objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Trunk is now open

2009-11-18 Thread Isabel Drost
On Wed Grant Ingersoll gsing...@apache.org wrote:

 Trunk is now open for commits.

Yeah!


 Seems like we have some good things in store for 0.3, so have at it!

+1

Isabel


Re: [jira] Commented: (MAHOUT-18) Embrace interoperability with other softwares

2009-11-17 Thread Isabel Drost
On Tue Andrew Wang andrew.wang.1...@gmail.com wrote:
 As you know, i am new guy about the Mahout. suppose i have one model
 trained in WEKA using distinct classifiers, if the Mahout have some
 port to import the model, and using the model in the up-coming
 process, it will be very cool.

Could you please explain exactly which models you would like to import
and why?

Assuming we are talking about naive bayes: What is really expensive
about it is training the classifier. I wonder why you would want to do
that within Weka.

With most classification algorithms I am familiar with, training is
expensive, but application to new instances is cheap. That is why
currently I do not really understand, why you would want to run the
training in Weka and use the model in Mahout. However, I could imagine
use cases where you might want to train the model with Mahout and use
it as part of a processing chain within Weka.


Isabel


Re: [VOTE] Release 0.2

2009-11-16 Thread Isabel Drost
On Monday 16 November 2009 19:44:38 Ted Dunning wrote:
 Congrats.  

Congratulations from me as well!

Isabel

-- 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://main...@spaceboyz.net



signature.asc
Description: This is a digitally signed message part.


[jira] Assigned: (MAHOUT-200) Update information on Mahout site

2009-11-13 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost reassigned MAHOUT-200:
---

Assignee: Isabel Drost

 Update information on Mahout site
 -

 Key: MAHOUT-200
 URL: https://issues.apache.org/jira/browse/MAHOUT-200
 Project: Mahout
  Issue Type: Improvement
  Components: Website
Reporter: Isabel Drost
Assignee: Isabel Drost
Priority: Minor
 Fix For: 0.3

 Attachments: update_site.patch


 After several people had trouble finding the docs we provide in the wiki, I 
 have created a slightly updated version of our website. I added a few links 
 to wiki pages that might be of interest to potential Mahout users.
 I have uploaded the updated version to http://people.apache.org/~isabel/site 
 so all of you can have a look. Will commit on Tuesday next week if noone 
 objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAHOUT-200) Update information on Mahout site

2009-11-13 Thread Isabel Drost (JIRA)
Update information on Mahout site
-

 Key: MAHOUT-200
 URL: https://issues.apache.org/jira/browse/MAHOUT-200
 Project: Mahout
  Issue Type: Improvement
  Components: Website
Reporter: Isabel Drost
Priority: Minor
 Fix For: 0.3
 Attachments: update_site.patch

After several people had trouble finding the docs we provide in the wiki, I 
have created a slightly updated version of our website. I added a few links 
to wiki pages that might be of interest to potential Mahout users.

I have uploaded the updated version to http://people.apache.org/~isabel/site so 
all of you can have a look. Will commit on Tuesday next week if noone objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: 0.2 status

2009-11-12 Thread Isabel Drost

Adding and revising a little:

Apache Mahout 0.2 has been released and is now available for public
download at http://www.apache.org/dyn/closer.cgi/lucene/mahout

Up to date maven artifacts can be found in the Apache repository at
https://repository.apache.org/content/repositories/releases/org/apache/mahout/


Apache Mahout is a subproject of Apache Lucene with the goal
of delivering scalable machine learning algorithm implementations
under the Apache license. http://www.apache.org/licenses/LICENSE-2.0

 Mahout is a machine learning library meant to scale to the size of
 data we manage today. Built on top of the powerful map/reduce
 paradigm of Apache Hadoop project, Mahout lets you run popular
 machine learning methods like clustering, collaborative filtering,
 classification over Terabytes of data over thousands of computers.

 - We may want to emphasize that using Mahout makes sense also for
 those people that do not have clusters with thousands of nodes?

Mahout is a machine learning library meant to scale: Scale in terms of
community to support anyone interested in using machine learning. Scale
in terms of business by providing the library under a commercially
friendly, free software license. Scale in terms of computation to the
size of data we manage today.

Built on top of the powerful map/reduce paradigm of the Apache Hadoop
project, Mahout lets you solve popular machine learning problem
settings like clustering, collaborative filtering and classification
over Terabytes of data over thousands of computers.

Implemented with scalability in mind the latest release brings many
performance optimizations so that even in a single node setup the
library performs well.

 - As mentioned earlier by Grant, we do need performance benchmarks at
 least for the the next release to prove that.


The complete changelist can be found here:
http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278

New Mahout 0.2 features include
 
- Major performance enhancements in Collaborative Filtering,
Classification and Clustering
- New: Latent Dirichlet Allocation(LDA) implementation for topic
modelling
- New: Frequent Itemset Mining for mining top-k patterns from a list
of transactions
- New: Decision Forests implementation for Decision Tree classification
(In Memory  Partial Data)
- New: HBase storage support for Naive Bayes model building and
classification
- New: Generation of vectors from Text documents for use with Mahout
Algorithms
- Performance improvements in various Vector implementations
- Tons of bug fixes and code cleanup

Getting started: New to Mahout? 

1) Download Mahout at http://www.apache.org/dyn/closer.cgi/lucene/mahout
2) Check out the Quick start:
http://cwiki.apache.org/MAHOUT/quickstart.html 

3) Read the Mahout Wiki: http://cwiki.apache.org/MAHOUT
4) Join the community by subscribing to mahout-u...@lucene.apache.org
5) Give back: http://www.apache.org/foundation/getinvolved.html
6) Consider adding yourself to the power by Wiki page:
http://cwiki.apache.org/MAHOUT/poweredby.html

For more information on Apache Mahout, see
http://lucene.apache.org/mahout


Additional comment: I suppose, I will copy this over to my personal
blog once the release is out. I would like to invite those interested
in or using Mahout to do so as well.




Re: Informal Mahout MeetUp at ApacheCon Friday

2009-11-05 Thread Isabel Drost
On Friday 06 November 2009 04:27:03 Ted Dunning wrote:
 Pacific Coast Brewery is just down the street.  I am already meeting some
 folks there at about 5 (halfway related to Mahout, but only halfway).

+1

Isabel

-- 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://main...@spaceboyz.net



signature.asc
Description: This is a digitally signed message part.


Re: Feedback on release candidate for 0.2

2009-11-03 Thread Isabel Drost
On Tuesday 03 November 2009 15:45:08 Grant Ingersoll wrote:
 I agree, in general, we need to be able to get releases out faster and
 more reliable.  People also should, especially when it is near release
 time, be encouraged to try trunk, as we aren't going to be making
 drastic changes at that point and it is much better to get the testing
 out of the way up front.

I would hope that putting up nightly snapshots through hudson on 
repository.apache.org should lower the bar to try out trunk. Checking out 
trunk and compiling still involves far more work than simply switching the 
version of your mahout dependency to a snapshot.

Isabel

-- 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://main...@spaceboyz.net



signature.asc
Description: This is a digitally signed message part.


Re: Feedback on release candidate for 0.2

2009-10-31 Thread Isabel Drost
On Friday 30 October 2009 22:16:59 Grant Ingersoll wrote:
 Hopefully, some of us Mahouts can carve out some time at ApacheCon to
 work.

I will arrive Monday afternoon and stay until the following Sunday morning - I 
would guess that there should be some time in between to work on the release.

Isabel

-- 
QOTD: Political history is far too criminal a subject to be a fit thing to 
teach children.   -- W. H. Auden 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://main...@spaceboyz.net



signature.asc
Description: This is a digitally signed message part.


Re: Feedback on release candidate for 0.2

2009-10-31 Thread Isabel Drost
On Sat, Oct 31, 2009 at 10:36:29AM -0700, Jake Mannix wrote:
 Speaking of which, I didn't see a Mahout meetup anywhere - are we planning
 on having  an informal one sometime?  Tuesday is the Lucene MeetUp, and Thurs 
 is Hadoop, we could go out for drinks or something after one of those two, or
 Friday night?

Friday night sounds good - after the Lucene or Hadoop meetup would be fine with 
me as well.

In case of Friday: Should I include a tiny little bit of advertisement for the 
informal meetup in my talk on Mahout?

 Any interest?

+1

Isabel



Re: Success

2009-10-28 Thread Isabel Drost
On Tue Grant Ingersoll gsing...@apache.org wrote:
 Isabel, any idea where those things actually go?  That URL is not  
 browseable.

http://maven.apache.org/developers/release/releasing.html (5th and 6th
point)

- says that for others to be able to view the artifacts you first need
to log into Nexus, and close the repositoriy containing the release
candidate for further deployments:

 Right click on this repository and select Close. This will close the
 repository from future deployments and make it available for others to
 view. 

Currently Nexus does not let me login, so I cannot verify whether I
might see your release :(



Isabel


Re: Success

2009-10-28 Thread Isabel Drost
On Wed Grant Ingersoll gsing...@apache.org wrote:

 Please look them over and give your thoughts on them, then if that  
 looks good, we can call a vote.

First of all - a big Thanks to all who helped get through the issues
from me as well!

Looks good on first sight - will have to digg deeper tomorrow. One
thing I noticed - the 3rd party dependencies (hadoop, commons, kosmofs
and the like) are not signed.


  Currently Nexus does not let me login, so I cannot verify whether I
  might see your release :(
 
 
 It should be your SVN creds.

Just found out: Nexus does not like Konqueror (at least not the version
currently installed on my machine). Any other browser works.

Isabel


Re: Feedback on release candidate for 0.2

2009-10-28 Thread Isabel Drost
On Wed Sean Owen sro...@gmail.com wrote:

 Ran into this --

Currently when trying to build one of the tests fails for me.

 
 [INFO] [remote-resources:process {execution: default}]
 [ERROR] Error loading supplemental data models: Could not find
 resource 'supplemental-models.xml'.
 org.codehaus.plexus.resource.loader.ResourceNotFoundException: Could
 not find resource 'supplemental-models.xml'.
 
 I know we solved this by adding a file,
 src/main/appended-resources/supplemental-models.xml. I guess it just
 needs to be packaged. I'll look at that -- Isabel you might know more
 about this.

That file should contain licensing information for all artifacts that
we depend on through maven that have no description through apache
deployed resources. However I do see it when unpacking the tar.gz file
- it is located under mahout-0.2/src/main/appended-resources/

More information on that:

http://maven.apache.org/plugins/maven-remote-resources-plugin/supplemental-models.html

Isabel


Re: Feedback on release candidate for 0.2

2009-10-28 Thread Isabel Drost
On Wed, 28 Oct 2009 16:03:51 +0100
Isabel Drost isa...@apache.org wrote:

 On Wed Sean Owen sro...@gmail.com wrote:
 
  Ran into this --
 
 Currently when trying to build one of the tests fails for me.

Sorry - forgot to mention the failing test in my last mail:

(org.apache.mahout.clustering.kmeans.TestKmeansClustering) Time
elapsed: 18.9 sec   FAILURE!

Will test on my own laptop to see whether this is simply an environment
issue.

Isabel


Re: Release help, stuck on gpg-sign?

2009-10-26 Thread Isabel Drost
On Fri Grant Ingersoll gsing...@apache.org wrote:

 Why was gpg-plugin just added to the core pom and not higher up?
 All the artifacts produced need to be signed.

The gpg-plugin is part of the apache-root-pom. See also: 

http://svn.apache.org/viewvc/maven/pom/trunk/asf/pom.xml?revision=766951view=markup

http://maven.apache.org/developers/release/releasing.html

If all our artifacts inherit from that, they should all get signed,
right?

Isabel


Re: TAR problems

2009-10-26 Thread Isabel Drost
On Mon Sean Owen sro...@gmail.com wrote:
 I don't know enough about GPG to know whether I should be seeing this
 at all (since my passphrase is already in settings.xml?) or how else
 this is supposed to work? does anyone see this?

You shouldn't be asked for the password if you set it in your
settings.xml and are using the profile you set it in for building.

So, if your setting.xml says:

profiles
  profile
idapache-release/id
properties
gpg.passphrase*/gpg.passphrase
/properties
  /profile
/profiles

and you are building with mvn -Papache-release goal you shouldn't be
asked for the password.

Isabel


Re: Release help, stuck on gpg-sign?

2009-10-21 Thread Isabel Drost
On Tue Sean Owen sro...@gmail.com wrote:
 I wonder, could whoever did the 0.1 release give it a shot? to see if
 it's just me? and, to perhaps just do the deployment? the legwork is
 done, it's ready to publish.

mvn -Papache-release deploy

did the trick for me. Are you sure that gpg is on your path?

Though signing does work for me, the build fails as soon as it tries to
upload our hadoop etc. jars to the Apache repo - I could not figure out
a way to make that work - checking with infra how that is intended to
be done with the apache repository.

https://repository.apache.org/content/repositories/snapshots/org/apache/mahout/


Isabel


Re: Release help, stuck on gpg-sign?

2009-10-21 Thread Isabel Drost
On Wed Grant Ingersoll gsing...@apache.org wrote:

 Are you following: http://cwiki.apache.org/MAHOUT/how-to- 
 release.html ?  What step are you stuck on?

http://maven.apache.org/developers/release/releasing.html (was sent to
mahout-dev by Jukka some weeks ago and is linked to from 
https://issues.apache.org/jira/browse/INFRA-1896 - the jira issue in
INFRA that deals with releasing to repository.apache.org) - probably I
just misunderstood some of the steps mentioned therein?

Isabel


Re: Release help, stuck on gpg-sign?

2009-10-21 Thread Isabel Drost
On Wed Grant Ingersoll gsing...@apache.org wrote:

 I'd like to make sure our Wiki properly reflects the steps, so once
 it is figured out, then our Wiki should be updated.

+1

Isabel


Re: Release help, stuck on gpg-sign?

2009-10-20 Thread Isabel Drost
On Tuesday 20 October 2009 17:11:59 Sean Owen wrote:
 release:prepare is hanging for me at...

 [INFO] [INFO] [gpg:sign {execution: default}]
 I dont' think this is to do with the GPG signing I just added, as it
 shows up even if I remove that bit. Anyone more familiar with this? is
 my settings.xml OK?

Jukka gave me the following guide for releasing according to the new Apache 
parent pom:

http://maven.apache.org/developers/release/releasing.html

It has some additional hints on prerequisites, trouble shooting etc. Not sure 
if that helps in your case.

Isabel

-- 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://main...@spaceboyz.net



signature.asc
Description: This is a digitally signed message part.


Re: Where is CHANGES.txt, and what are your banner changes for 0.2?

2009-10-19 Thread Isabel Drost
On Fri, 16 Oct 2009 14:23:52 -0400
Grant Ingersoll gsing...@apache.org wrote:

 We haven't been keeping a CHANGES, as we're just relying on JIRA's  
 ability to generate a list of what is in a version.

When using mvn site site:deploy to generate a project html-report, you
can generate a changes report as well. It is possible to teach maven to
talk to JIRA to retrieve the current changes:

http://maven.apache.org/plugins/maven-changes-plugin/jira-report-mojo.html

Isabel


[jira] Resolved: (MAHOUT-171) Move deployment to repository.apache.org

2009-10-19 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost resolved MAHOUT-171.
-

Resolution: Fixed

Checked in.

 Move deployment to repository.apache.org
 

 Key: MAHOUT-171
 URL: https://issues.apache.org/jira/browse/MAHOUT-171
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.1
Reporter: Isabel Drost
Assignee: Isabel Drost
 Fix For: 0.2

 Attachments: MAHOUT-171.patch


 Opening a JIRA task to collect what has to be done for moving over to using 
 apache version 5 parent pom (see also 
 http://markmail.org/thread/ld26m3xxzoztqsk6 ).
* Link Apache parent pom into our pom.
* Update hudson to build via maven ( ? ).
* File subtask at INFRA-1896 to include mahout in repository.apache.org

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-171) Move deployment to repository.apache.org

2009-10-19 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767710#action_12767710
 ] 

Isabel Drost commented on MAHOUT-171:
-

It was my own fault - I forgot to svn add the file after I applied and built 
with my own patch. Sorry :/

 Move deployment to repository.apache.org
 

 Key: MAHOUT-171
 URL: https://issues.apache.org/jira/browse/MAHOUT-171
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.1
Reporter: Isabel Drost
Assignee: Isabel Drost
 Fix For: 0.2

 Attachments: MAHOUT-171.patch


 Opening a JIRA task to collect what has to be done for moving over to using 
 apache version 5 parent pom (see also 
 http://markmail.org/thread/ld26m3xxzoztqsk6 ).
* Link Apache parent pom into our pom.
* Update hudson to build via maven ( ? ).
* File subtask at INFRA-1896 to include mahout in repository.apache.org

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

2009-10-15 Thread Isabel Drost (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766030#action_12766030
 ] 

Isabel Drost commented on MAHOUT-157:
-

The patch looks good to me. Good work Robin.

 Frequent Pattern Mining using Parallel FP-Growth
 

 Key: MAHOUT-157
 URL: https://issues.apache.org/jira/browse/MAHOUT-157
 Project: Mahout
  Issue Type: New Feature
  Components: Frequent Itemset/Association Rule Mining
Affects Versions: 0.2
Reporter: Robin Anil
Assignee: Robin Anil
 Fix For: 0.2

 Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch, 
 MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch, 
 MAHOUT-157-codecleanup-javadocs.patch, 
 MAHOUT-157-Combinations-BSD-License.patch, 
 MAHOUT-157-Combinations-BSD-License.patch, 
 MAHOUT-157-CompactTransactionMapperFormat.patch, MAHOUT-157-final.patch, 
 MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-Oct-1.patch, 
 MAHOUT-157-Oct-10.pfpgrowth.patch, MAHOUT-157-Oct-8.pfpgrowth.patch, 
 MAHOUT-157-Oct-8.TestedMapReducePipeline.patch, 
 MAHOUT-157-Oct-9.StreamingDBRead-Inprogress.patch, 
 MAHOUT-157-September-10.patch, MAHOUT-157-September-18.patch, 
 MAHOUT-157-September-5.patch


 Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing

2009-10-15 Thread Isabel Drost (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Drost resolved MAHOUT-138.
-

   Resolution: Fixed
Fix Version/s: (was: 0.3)
   0.2

The last ci changed the remaining classes - so at least grep does not find any 
usages of 'args\[' anywhere in our source code.

 Convert main() methods to use Commons CLI for argument processing
 -

 Key: MAHOUT-138
 URL: https://issues.apache.org/jira/browse/MAHOUT-138
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.2
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 0.2

 Attachments: MAHOUT-138.patch, MAHOUT-138_fuzzyKMeansJob.patch


 Commons CLI is in the classpath and makes it much easier to handle command 
 line args and they are more self-documenting when done right.  We should 
 convert our main methods to use CLI

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   >