Call for Presentations FOSS Backstage open
Hi, As announced on Berlin Buzzwords we (that is Isabel Drost-Fromm, Stefan Rudnitzki as well as the eventing team over at newthinking communications GmbH) are working on a new conference in summer in Berlin. The name of this new conference will be "FOSS Backstage". Backstage comprises all things FOSS governance, open collaboration and how to build and manage communities within the open source space. Submission URL: https://foss-backstage.de/call-papers The event will comprise presentations on all things FOSS governance, decentralised decision making, open collaboration. We invite you to submit talks on the topics: FOSS project governance, collaboration, community management. Asynchronous/ decentralised decision making. Vendor neutrality in FOSS, sustainable FOSS, cross team collaboration. Dealing with poisonous people. Project growth and hand-over. Trademarks. Strategic licensing. While it's primarily targeted at contributions from FOSS people, we would love to also learn more on how typical FOSS collaboration models work well within enterprises. Closely related topics not explicitly listed above are welcome. Important Dates (all dates in GMT +2) Submission deadline: February 18th, 2018. Conference: June, 13th/14th, 2018 High quality talks are called for, ranging from principles to practice. We are looking for real world case studies, background on the social architecture of specific projects and a deep dive into cross community collaboration. Acceptance notifications will be sent out soon after the submission deadline. Please include your name, bio and email, the title of the talk, a brief abstract in English language. We have drafted the submission form to allow for regular talks, each 45 min in length. However you are free to submit your own ideas on how to support the event: If you would like to take our attendees out to show them your favourite bar in Berlin, please submit this offer through the CfP form. If you are interested in sponsoring the event (e.g. we would be happy to provide videos after the event, free drinks for attendees as well as an after-show party), please contact us. Schedule and further updates on the event will be published soon on the event web page. Please re-distribute this CfP to people who might be interested. Contact us at: newthinking communications GmbH Schoenhauser Allee 6/7 10119 Berlin, Germany i...@foss-backstage.de Looking forward to meeting you all in person in summer :) I would love to see all those tracks filled with lots of valuable talks on the Apache Way, on how we work, on how the incubator works, on how being a 501(c3) influences how people get involved and projects are being run, on how being a member run organisation is different, on merit for life, on growing communities, on things gone great - and things gone entirely wrong in the ASF's history, on how to interact with Apache projects as a corporation and everything else you can think of. Isabel -- Sorry for any typos: Mail was typed in vim, written in mutt, via ssh (most likely involving some kind of mobile connection only.) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [GSOC] 2010 Timelines
Timeline including Apache internal deadlines: http://cwiki.apache.org/confluence/display/COMDEVxSITE/GSoC Mentors, please also click on the ranking link to the ranking explanation [1] for more information on how to rank student proposals. Isabel [1] http://cwiki.apache.org/confluence/display/COMDEVxSITE/Mentee+Ranking+Process signature.asc Description: This is a digitally signed message part.
Re: Javadocs?
On Tue Grant Ingersoll gsing...@apache.org wrote: If we want, we can keep move aside the old ones and update the website to refer to each version. I think that would be great - now that we are slowly getting to a point where apis seem to stabilise at least a bit it would be great for users that don't upgrade to still have the ability to find the docs for their favourite version online. Isabel
Re: Javadocs?
On Tue Jake Mannix jake.man...@gmail.com wrote: (ie can't we also have daily updates of the 0.4-SNAPSHOT javadocs automagically posted up there too?) Yes - maven can do such a thing. I have configured a job on hudson to generate code reports for Mahout with maven - Javadocs are one part of these reports. Linked to from the code quality reports page: http://lucene.apache.org/mahout/quality.html Click on core - project reports - java doc for the following example: http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/core-reports/apidocs/index.html Unfortunately I currently do not have the spare cycles to go there and look why the job broke lately. Anyone with more time than myself and a tiny little bit of maven knowledge would be more than welcome to help out. The link to the hudson job: http://hudson.zones.apache.org/hudson/job/MahoutQM/ I would argue against publishing Snapshot JavaDocs too close besides JavaDocs of official releases - it might trick users into thinking that the SNAPSHOT is an official release as well... Just my two cents - the community may feel otherwise. Isabel
Re: not a lot of mentors for GSoC
On Mon Grant Ingersoll gsing...@apache.org wrote: Mentoring sign up is on the GSOC site. You need to be a committer to be a mentor, at least for the ASF anyway. Please also identify yourself with your GsocLinkId at https://svn.apache.org/repos/private/committers/GsocLinkId.txt so Noirins knows who you are. Isabel
Re: Javadocs?
On Tue Grant Ingersoll gsing...@apache.org wrote: We're probably to the point now that we could start doing a nightly on Hudson if we aren't already. http://hudson.zones.apache.org/hudson/job/Mahout%20nightly/ ;) (At least this one tracks whether the project still builds and all unit tests pass.) The one for building reports, java docs et.al was configured to build less often - which would be fine, I think - can be triggered manually in case of major changes anyway. Isabel
Fw: Mentors for GSoC
Potential GSoC mentors - please tell Noirin who you are, if you want to mentor a student for Mahout. More details below. If you have not done so already, please also subscribe to code-awa...@apache.org for more information on GSoC at Apache. Begin forwarded message: Date: Mon, 22 Mar 2010 15:48:17 +0100 From: Noirin Shirley noi...@apache.org To: code-awa...@apache.org Subject: Mentors for GSoC Thanks to all those who've already signed up to be mentors at http://socghop.appspot.com/ ! Unfortunately, the ASF is a big Foundation, and I don't know who all those who've signed up are. All I see is whatever's set as your LinkID and Public Name in your profile on the webapp. I can work out who Grant Ingersoll(gsingers) is, and I can even give a reasonable guess as to who isabel(isabel) might be, but relying on me to know the names of all the people who might mentor, and to be able to tell who's a student who's clicked the wrong button, isn't really going to scale! So please, it would make my job much easier if you could drop a mail to this list with your LinkID when you sign up to be a mentor :-) Thanks a million! Noirin
Re: Look! No more ISSUES
On Tue Sean Owen sro...@gmail.com wrote: I'm happy to play release engineer. Great - Thanks, Sean. Isabel
Re: 0.3 release issues
On Tue Sean Owen sro...@gmail.com wrote: Er, how do we do that? Is it something you can describe, I can document and do? It already has been described - and documented in our wiki: http://cwiki.apache.org/MAHOUT/thirdpartydependencies.html Hope that helps, Isabel
Re: 0.3 release issues
On Tue Grant Ingersoll gsing...@apache.org wrote: On Feb 23, 2010, at 9:18 AM, Sean Owen wrote: It does look imminent. As much as I don't like holding out longer, and indefinitely, for this release, somehow I'd also really like to link to the latest/greatest and official Hadoop release. Let's try to be good about sticking to the code freeze -- good chance to focus on polish -- and if 0.20.2 isn't out by end of week, revisit this. +1. We might as well upgrade to the RC, too, by adding it as a dependency. +1 (to both proposals) Isabel
Re: Welcome Drew Farris
On 18.02.2010 Drew Farris wrote: I'm looking forward to working with you all, Welcome to the Mahout community, Drew. Looking forward to working with you. Isabel signature.asc Description: This is a digitally signed message part.
Re: Mass Code Cleanup
On 14.02.2010 Grant Ingersoll wrote: I don't object to good style. I object to sweeping changes that break a lot of patches. Maybe not the case here, but it will be in the future and unless the whole thing is automated as part of committing (as Hadoop does), the code will always have formatting issues causing this exact same thing to happen. I kind of like the automatic patch checks that are activated for patches in jira over at Hadoop projects. It does highlight trivial problems with the code submitted w/o the need for a code review by a committer. Does anyone here at Mahout know what is needed for such checks? Isabel signature.asc Description: This is a digitally signed message part.
Re: Mass Code Cleanup
On 15.02.2010 Robin Anil wrote: SGD kmeans++ pegasus seems fine. Isabel can you check with the latest trunk if the perceptron is alright? Any code I had is already checked in. Any examples I am working on should be easy to adopt. Isabel signature.asc Description: This is a digitally signed message part.
Re: Mahout as TLP
On Sat Grant Ingersoll gsing...@apache.org wrote: I don't see any harm in getting 0.3 out first if that makes folks more comfortable. Yeah, this feels better to me the more I think about it. +1 from me as well: I really like the idea of Mahout becoming a TLP - even before a 1.0 release is available. However I think it makes sense to sort out the 0.3 release first. If I am counting correctly, that would make for three reasons for press releases: A new release, Mahout becoming a TLP and later on a 1.0 release. ;) Isabel
Re: Mahout 0.3 Plan and other changes
On Thu deneche abdelhakim adene...@gmail.com wrote: although I maintain two versions of Decision Forests, one with the old api and with the new one, the differences between the two APIs are so important that I can't just keep working on the two versions. Thus all the new stuff is being committed using the new API and as far as I can say it seems to work great. If I understand you correctly, there is code in Mahout that still works with the old API but also bits and pieces that depend on the new API. Do we have some documentation we can include in the release that tells users for which algorithms/ implementations they need to make sure they are running a Hadoop version that provides the new API? Isabel
Re: Mahout 0.3 Plan and other changes
On Wed, 10 Feb 2010 11:10:41 + Sean sro...@gmail.com wrote: For simplicity, I'd document that Mahout works on 0.19 and 0.20, and may work on 0.18 +1 Assuming that the majority of the algorithms may work on e.g. 0.19, we could tell users something along the lines of works with Hadoop 0.19, except $algorithms_for_20, may work with 0.18, not guarantee given. Isabel
Re: Some more dependencies
On Wed Jake Mannix jake.man...@gmail.com wrote: May I kick them out? +1 +1 from me as well. Isabel
Re: Mahout 0.3 Plan and other changes
On Wed Sean Owen sro...@gmail.com wrote: I'd say we recommend 0.20, since that's what we develop against and it's the current stable release, and everything we have works on it. We can also say it should work on 0.19 and 0.18, but we don't guarantee or support that. (Slightly different than my last suggestion -- we don't actually know how it all goes on 0.19) Sounds good to me. Isabel
[jira] Updated: (MAHOUT-281) scm urls are wrong in the poms
[ https://issues.apache.org/jira/browse/MAHOUT-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-281: Status: Patch Available (was: Open) scm urls are wrong in the poms -- Key: MAHOUT-281 URL: https://issues.apache.org/jira/browse/MAHOUT-281 Project: Mahout Issue Type: Bug Affects Versions: 0.3 Reporter: Benson Margulies Assignee: Benson Margulies Fix For: 0.3 Attachments: MAHOUT-281.diff The scm urls in the poms are wrong. This must be fixed before running the release plugin to make an 0.3 release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-281) scm urls are wrong in the poms
[ https://issues.apache.org/jira/browse/MAHOUT-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-281: Attachment: MAHOUT-281.diff Changed scm connection strings. (Needed a comparably simple example to show students at HPI how svn diff, patch and jira.) scm urls are wrong in the poms -- Key: MAHOUT-281 URL: https://issues.apache.org/jira/browse/MAHOUT-281 Project: Mahout Issue Type: Bug Affects Versions: 0.3 Reporter: Benson Margulies Assignee: Benson Margulies Fix For: 0.3 Attachments: MAHOUT-281.diff The scm urls in the poms are wrong. This must be fixed before running the release plugin to make an 0.3 release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: GSOC 2010 is here
On Mon Robin Anil robin.a...@gmail.com wrote: 2. UIMA Integration with Mahout? (Maybe a good project if UIMA folks are taking in GSOC students) I guess one could easily split this one in two: a) Using UIMA (whole pipeline or just the analysers if that is possible) for data pre-processing before Mahout algorithms are run. b) Making it easy to integrate Mahout algorithms (classification models etc.) as UIMA annotators. Isabel
Re: Release thinking
On Mon Grant Ingersoll gsing...@apache.org wrote: MAHOUT-231 Upgrade QM reports to use Clover 2.6 No idea on this one. That should be independent of a release, I would think. It is. What would be needed is adjusting our pom and the Hudson job that builds the reports. Isabel
Re: Release thinking
On Mon Jake Mannix jake.man...@gmail.com wrote: On Mon, Jan 25, 2010 at 10:55 AM, Sean Owen sro...@gmail.com wrote: Agree that we should start planning 0.3, as it will take over a month I bet to actually be ready. +1 to releasing within a month or so. +1 here as well. I think it would be great to reach a shorter release cycle for Mahout. Isabel
Re: GSOC 2010 is here
On Wed Robin Anil robin.a...@gmail.com wrote: Greetings! Fellow GSOC alums, administrators and dear mentors, the next edition is right here. Details are given in the link below. https://groups.google.com/group/google-summer-of-code-discuss/browse_thread/thread/d839c0b02ac15b3f Some additional notes to committers: First of all mentoring a GSoC student is a great experience, so if you do have some cycles left, I would highly recommend participating in GSoC as a mentor (thanks Grant for convincing myself last year...). We had several successful students here at Mahout in past GSoC years. Each year there were strong proposals for projects within Mahout. As a results projects usually turn out to be interesting for both, mentor and student. One final note: If there is anyone on this list who might be interested in helping with general ASF GSoC logistics and administration tasks, please have a look at the newly founded community development project (d...@community.apache.org) Maybe we could identify key areas in Mahout which we need to develop apart from the ML implementations and list it down for students to see before they start trickling in. And motivate students to come up with their own ideas and discuss them on-list before submitting their submission. Some ideas: Benchmarking Framework with EC2 wrappers +1 I would love to see that. Commandline Console+Launcher like Hbase and hadoop +1 Online Tool/Query UI for Algorithms in Mahout(like CF) Possible ideas(I have no idea what i am talking here but there are nice problems to solve) Improvements in Math? How to tackle management of datasets? Error Recovery if a job fails? How to tackle managment of learned classification models? Better tooling for Mahout integration? (Lucene for tokenization and analysers?, data import and export?) Isabel
Re: [jira] Commented: (MAHOUT-238) Further Dependency Cleanup
On Mon Grant Ingersoll gsing...@apache.org wrote: We put it up there. http://www.lucidimagination.com/search/document/621471200d2182bb/dependencies_outside_maven_central_was_oh_joy#621471200d2182bb is the link to the posting by Jukka explaining exactly how it was done. Isabel
[jira] Commented: (MAHOUT-262) Writable for labeled vectors for supervised learning algorithms
[ https://issues.apache.org/jira/browse/MAHOUT-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803690#action_12803690 ] Isabel Drost commented on MAHOUT-262: - Should be possible to apply the patch with -p1 instead of -p0 to remove the a/b directories. Writable for labeled vectors for supervised learning algorithms --- Key: MAHOUT-262 URL: https://issues.apache.org/jira/browse/MAHOUT-262 Project: Mahout Issue Type: New Feature Components: Classification Affects Versions: 0.2 Reporter: Olivier Grisel Fix For: 0.3 Attachments: MAHOUT-262-1.patch Implement two new classes: - SingleLabelVectorWritable for singly classified vectorized data item (one and only one label index per instance) - MultiLabelVectorWritable for multi categorized vectorized data item (0 or more category indexes per instance) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-217) Tidy up generated data after unit tests are run
[ https://issues.apache.org/jira/browse/MAHOUT-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803276#action_12803276 ] Isabel Drost commented on MAHOUT-217: - The test files I found creating but not deleting data in the tmp directory: ./utils/src/test/java/org/apache/mahout/utils/vectors/io/VectorWriterTest.java ./utils/src/test/java/org/apache/mahout/utils/vectors/SequenceFileVectorIterableTest.java ./core/src/test/java/org/apache/mahout/classifier/bayes/BayesFileFormatterTest.java ./core/src/test/java/org/apache/mahout/cf/taste/impl/model/file/FileDataModelTest.java Tidy up generated data after unit tests are run --- Key: MAHOUT-217 URL: https://issues.apache.org/jira/browse/MAHOUT-217 Project: Mahout Issue Type: Improvement Affects Versions: 0.3 Reporter: Isabel Drost Fix For: 0.3 I tried to compile Mahout on people.apache.org yesterday: The build failed at first, because tests could not generate test data. The reason: Some tests tried to generate test data at /tmp/mahout-dir/... - but those directories did exist already and belonged to Sean. Why? Probably because Sean had run the build earlier this year - but tests did not remove the data they generated. Proposed solution: Tests come with setup and with shutdown hooks. We should remove any data when a test is finished and shut down. Any thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-264) Make mahout-math compatible with Java 1.5 (bytecode and standard library).
[ https://issues.apache.org/jira/browse/MAHOUT-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803281#action_12803281 ] Isabel Drost commented on MAHOUT-264: - The changes to the pom look good. But why are the changes to Sorting.java and Arrays.java needed? Make mahout-math compatible with Java 1.5 (bytecode and standard library). -- Key: MAHOUT-264 URL: https://issues.apache.org/jira/browse/MAHOUT-264 Project: Mahout Issue Type: Wish Components: Math Reporter: Dawid Weiss Assignee: Benson Margulies Priority: Minor Attachments: MAHOUT-264.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Status, IoC, Random numbers, etc.
On Mon Jake Mannix jake.man...@gmail.com wrote: I'm down with IoC, it's a great way to program to interfaces and abstract away your deep coupling, but open-source libraries I think aren't the best place for it. +1 I agree with your assessment of DI containers: Spring is very powerful and can simplify wiring large applications together, especially with the right tools - despite the pain of reading xml files. However I do not think we should tie Mahout users to a specific DI framework. I think, it should be easy to customise the wiring of Mahout if you are already using DI. But choice should be up to the user. Mahout should run w/o out of the box. I am wondering whether providing convenience constructors that set up the default wiring beside those that get dependencies injected might help our case? This is no proposal to heavily refactor all existing code, just an idea one might want to keep in mind when touching code anyway, when reviewing code etc. p.s. two other open source projects I work on - bobo-browse for faceted search, and zoie for realtime search, both *optionally* couple to Spring, in the sense that they both have their example apps that live with their source tree use them, but it's just for *apps* built on top of the libraries, not for the wiring of anything done inside. Sounds like a nice approach to me: Using spring (or Guice or whatever your favourite may be) in some of the examples or demo applications makes perfect sense. Isabel
[jira] Commented: (MAHOUT-242) LLR Collocation Identifier
[ https://issues.apache.org/jira/browse/MAHOUT-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803381#action_12803381 ] Isabel Drost commented on MAHOUT-242: - {quote} I am not worried about them at this point. {quote} Also not very worried - probably should have indicated that basically everything I found could be filed as trivial, minor or style question only... LLR Collocation Identifier -- Key: MAHOUT-242 URL: https://issues.apache.org/jira/browse/MAHOUT-242 Project: Mahout Issue Type: New Feature Affects Versions: 0.3 Reporter: Drew Farris Priority: Minor Attachments: MAHOUT-242.patch, mahout-colloc.tar.gz, mahout-colloc.tar.gz Identifies interesting Collocations in text using ngrams scored via the LogLikelihoodRatio calculation. As discussed in: * http://www.lucidimagination.com/search/document/d051123800ab6ce7/collocations_in_mahout#26634d6364c2c0d2 * http://www.lucidimagination.com/search/document/b8d5bb0745eef6e8/n_grams_for_terms#f16fa54417697d8e Current form is a tar of a maven project that depends on mahout. Build as usual with 'mvn clean install', can be executed using: {noformat} mvn -e exec:java -Dexec.mainClass=org.apache.mahout.colloc.CollocDriver -Dexec.args=--input src/test/resources/article --colloc target/colloc --output target/output -w {noformat} Output will be placed in target/output and can be viewed nicely using: {noformat} sort -rn -k1 target/output/part-0 {noformat} Includes rudimentary unit tests. Please review and comment. Needs more work to get this into patch state and integrate with Robin's document vectorizer work in MAHOUT-237 Some basic TODO/FIXME's include: * use mahout math's ObjectInt map implementation when available * make the analyzer configurable * better input validation + negative unit tests. * more flexible ways to generate units of analysis (n-1)grams. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-246) upgrade to new lucene TokenStream API to cleanup deprecation warnings
[ https://issues.apache.org/jira/browse/MAHOUT-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-246: Resolution: Fixed Assignee: Olivier Grisel Status: Resolved (was: Patch Available) Patch applies cleanly with -p1, all tests still work, changes look good. Committed in revision 901791. upgrade to new lucene TokenStream API to cleanup deprecation warnings - Key: MAHOUT-246 URL: https://issues.apache.org/jira/browse/MAHOUT-246 Project: Mahout Issue Type: Improvement Affects Versions: 0.2 Reporter: Olivier Grisel Assignee: Olivier Grisel Priority: Minor Fix For: 0.3 Attachments: MAHOUT-246-2.patch The attached patch use the new ts.incrementToken() / TermAttribute API instead of the deprecated manual Token handling. It also replaces to occurrences of the deprecated new StandardAnalyzer() to the more explicit new StandardAnalyzer(Version.LUCENE_CURRENT). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Tapioca anyone (fisheye)
On Sun Benson Margulies bimargul...@gmail.com wrote: http://fisheye6.atlassian.com/browse/mahout Thanks for fisheye integration. Isabel
[jira] Commented: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans
[ https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801280#action_12801280 ] Isabel Drost commented on MAHOUT-153: - Welcome to Mahout. Thanks for stepping up and volunteering to take over the work for this issue. Implement kmeans++ for initial cluster selection in kmeans -- Key: MAHOUT-153 URL: https://issues.apache.org/jira/browse/MAHOUT-153 Project: Mahout Issue Type: New Feature Components: Clustering Affects Versions: 0.2 Environment: OS Independent Reporter: Panagiotis Papadimitriou Fix For: 0.3 Original Estimate: 336h Remaining Estimate: 336h The current implementation of k-means includes the following algorithms for initial cluster selection (seed selection): 1) random selection of k points, 2) use of canopy clusters. I plan to implement k-means++. The details of the algorithm are available here: http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf. Design Outline: I will create an abstract class SeedGenerator and a subclass KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will become a subclass of SeedGenerator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Welcome Benson Marguiles as Mahout Committer
On Wed Grant Ingersoll gsing...@apache.org wrote: The Lucene PMC is pleased to welcome the addition of Benson Marguiles as a committer on Mahout. Welcome Benson - thanks to all the great work you have done so far for the mahout-math stuff. Looking forward to working together with you. Isabel
Re: Fisheye?
On Wed Benson Margulies bimargul...@gmail.com wrote: Are we set up? If we are, than at least I am not aware of it. Isabel
Re: [math] no-such-integer value
On Mon Grant Ingersoll gsing...@apache.org wrote: I'm sensing a theme. I think for this stuff we should prune fairly aggressively, then add back in places once we have a need. +1 Isabel
[jira] Assigned: (MAHOUT-244) Add root log-likelihood method to LogLikehood class.
[ https://issues.apache.org/jira/browse/MAHOUT-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost reassigned MAHOUT-244: --- Assignee: Drew Farris Add root log-likelihood method to LogLikehood class. Key: MAHOUT-244 URL: https://issues.apache.org/jira/browse/MAHOUT-244 Project: Mahout Issue Type: Improvement Components: Math Affects Versions: 0.3 Reporter: Drew Farris Assignee: Drew Farris Priority: Minor Fix For: 0.3 Attachments: MAHOUT-244.patch Per discussion at: http://www.lucidimagination.com/search/document/6dc8709e65a7ced1/llr_scoring_question This patch adds a method for root log-likelihood calculation to the existing LogLikelihood class + provides a unit test based on Shashi's numbers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-244) Add root log-likelihood method to LogLikehood class.
[ https://issues.apache.org/jira/browse/MAHOUT-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-244: Resolution: Fixed Status: Resolved (was: Patch Available) Patch applies cleanly and looks good, project builds with it, unit test is included. Committed at revision 899157. Add root log-likelihood method to LogLikehood class. Key: MAHOUT-244 URL: https://issues.apache.org/jira/browse/MAHOUT-244 Project: Mahout Issue Type: Improvement Components: Math Affects Versions: 0.3 Reporter: Drew Farris Assignee: Drew Farris Priority: Minor Fix For: 0.3 Attachments: MAHOUT-244.patch Per discussion at: http://www.lucidimagination.com/search/document/6dc8709e65a7ced1/llr_scoring_question This patch adds a method for root log-likelihood calculation to the existing LogLikelihood class + provides a unit test based on Shashi's numbers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: New MEAP: Mahout in Action
On 15.01.2010 Grant Ingersoll wrote: (BTW, great read so far, I've got 3 more chapters to go in the first 6!) Can second that: Great book indeed. We should state up front, just like in Lucene land, that anyone who has a book on Mahout is welcome to link it on the page. The more books on Mahout the merrier! +1 (and probably motivate people who are publishing articles or giving talks on Mahout to add links to their publications on the Books, Articles, Talks wiki- page as well). Isabel signature.asc Description: This is a digitally signed message part.
[jira] Resolved: (MAHOUT-85) Perceptron/Winnow Trainer
[ https://issues.apache.org/jira/browse/MAHOUT-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost resolved MAHOUT-85. Resolution: Fixed Finally committed. Perceptron/Winnow Trainer - Key: MAHOUT-85 URL: https://issues.apache.org/jira/browse/MAHOUT-85 Project: Mahout Issue Type: New Feature Components: Classification Affects Versions: 0.1 Reporter: Isabel Drost Assignee: Isabel Drost Fix For: 0.3 Attachments: MAHOUT-85.patch, MAHOUT-85.patch, perceptronWinnowTrainer.diff Please find attached a first sketch for perceptron and winnow training. Please look very, very carefully at the patch, as I added the heart of the algorithms in the emergency room at Charite Berlin (after I broke my leg when cycling to the Hadoop Get Together ;) ). The patch does not yet feature unit tests nor is it parallelised. Currently my plan is to set up an example with the webKb dataset, add unit tests to the code and after that go parallel. I would like to get some feedback early on, in addition I would feel a lot better, if a second and third pair of eyes had a look at the code to make sure all obvious mistakes are out as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAHOUT-240) Parallel version of Perceptron
Parallel version of Perceptron -- Key: MAHOUT-240 URL: https://issues.apache.org/jira/browse/MAHOUT-240 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.3 Reporter: Isabel Drost Fix For: 0.3 So far Perceptron (as well as Winnow) training is still implemented to run w/o parallelization. The goal of this issue is to explore ways for parallelization and if possible to provide a parallel version, that is one that is based on map reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAHOUT-241) Example for perceptron
Example for perceptron -- Key: MAHOUT-241 URL: https://issues.apache.org/jira/browse/MAHOUT-241 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.3 Reporter: Isabel Drost Fix For: 0.3 The goal is to provide an end-to-end example based on the 20-newsgroups dataset to show how to get from a set of labelled training examples to a trained model that can later be reused. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: How to apply these patches
On Saturday 19 December 2009 16:15:46 Drew Farris wrote: Gang, should the wiki (http://cwiki.apache.org/MAHOUT/howtocontribute.html) be updated to include -E? Sure*. Isabel * The wiki is open for edits by anyone. All you need is a wiki account which you can create without being a committer. -- |\ _,,,---,,_ Web: http://www.isabel-drost.de /,`.-'`'-. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: xmpp://main...@spaceboyz.net signature.asc Description: This is a digitally signed message part.
Re: Eclipse and checkstyle
On Saturday 19 December 2009 16:30:31 Benson Margulies wrote: Since you've got a checkstyle set that you like, can I go ahead and build the profile for setting up eclipse to use it? Sure. There should already be a checkstyle file checked in (maven module) - feel free to use that or replace by one that matches the style used for Lucene as well. Isabel -- |\ _,,,---,,_ Web: http://www.isabel-drost.de /,`.-'`'-. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: xmpp://main...@spaceboyz.net signature.asc Description: This is a digitally signed message part.
Re: [math]: how to test sorts
On Wednesday 23 December 2009 22:09:48 Grant Ingersoll wrote: Beyond that, we could start implementing Clover test coverage, I suppose. It comes with the code quality reports added earlier. They are generated on a daily basis through Hudson and are linked to in the dev section of our web page. (The maven options to generate the reports locally are documented in MAHOUT-210). However we should update report generation to Clover version 2.6. Isabel -- |\ _,,,---,,_ Web: http://www.isabel-drost.de /,`.-'`'-. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: xmpp://main...@spaceboyz.net signature.asc Description: This is a digitally signed message part.
[jira] Created: (MAHOUT-231) Upgrade QM reports to use Clover 2.6
Upgrade QM reports to use Clover 2.6 Key: MAHOUT-231 URL: https://issues.apache.org/jira/browse/MAHOUT-231 Project: Mahout Issue Type: Task Components: Website Affects Versions: 0.3 Reporter: Isabel Drost Priority: Minor Fix For: 0.3 Atlassian has donated a license for a new Clover version. The reports provide more information and are easier to read. We should upgrade to site reports to use that version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-85) Perceptron/Winnow Trainer
[ https://issues.apache.org/jira/browse/MAHOUT-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-85: --- Attachment: MAHOUT-85.patch The patch has tests added to the implementation. The additional abstraction proposed earlier is integrated. Distance measure is not configurable but corresponds to what was defined in the original algorithm formulations. The implementation currently is sequential-only. Still evaluating, if and how is might be possible to parallelize. Missing so far: An example showing how to use training, how to store the resulting model and how to apply the model. Probably should be done in a new issue to keep this one focused on the algorithm itself. In addition I still have to at least add links from our wiki to the wikipedia pages on both algorithms. (Had some time left during the past few days: Screws in my knee are out now ;) ) Perceptron/Winnow Trainer - Key: MAHOUT-85 URL: https://issues.apache.org/jira/browse/MAHOUT-85 Project: Mahout Issue Type: New Feature Components: Classification Affects Versions: 0.1 Reporter: Isabel Drost Assignee: Isabel Drost Fix For: 0.3 Attachments: MAHOUT-85.patch, perceptronWinnowTrainer.diff Please find attached a first sketch for perceptron and winnow training. Please look very, very carefully at the patch, as I added the heart of the algorithms in the emergency room at Charite Berlin (after I broke my leg when cycling to the Hadoop Get Together ;) ). The patch does not yet feature unit tests nor is it parallelised. Currently my plan is to set up an example with the webKb dataset, add unit tests to the code and after that go parallel. I would like to get some feedback early on, in addition I would feel a lot better, if a second and third pair of eyes had a look at the code to make sure all obvious mistakes are out as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-210) Publish code quality reports through maven
[ https://issues.apache.org/jira/browse/MAHOUT-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792449#action_12792449 ] Isabel Drost commented on MAHOUT-210: - Forgot to include what I changed to make it work: Seems like the workspace directory on hudson is only accessible to users logged in to hudson. So I changed the job to stage the generated site to a publicly accessible directory and adjust the links accordingly. To get Clover to work I gave maven the path to the clover license on Hudson and issued report generation and aggregation before the site is generated. The maven parameters used for building: -Dmaven.clover.license=$PATH - path to the clover license file clean install - to clean the target directories and start building and locally installing the artifacts clover:instrument clover:aggregate - generates the clover reports site:site - generates the maven site report files and stores them under $module/target/site for review site:stage -DstagingDirectory=/export/home/hudson/hudson/jobs/MahoutQM/site - stages the maven report files on a publicly readable directory Publish code quality reports through maven -- Key: MAHOUT-210 URL: https://issues.apache.org/jira/browse/MAHOUT-210 Project: Mahout Issue Type: New Feature Components: Website Affects Versions: 0.1, 0.2 Reporter: Isabel Drost Assignee: Isabel Drost Fix For: 0.3 Attachments: MAHOUT-210.patch We should use mvn site:site to generate code reports and publish them online for users to review and developers to easily spot problems. First version that still needs checks adjusted to our needs is available online at: http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports.html Further discussion on-list at http://www.lucidimagination.com/search/document/a13aa5127b47fda3/publish_code_quality_reports_on_web_site##a13aa5127b47fda3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-210) Publish code quality reports through maven
[ https://issues.apache.org/jira/browse/MAHOUT-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791887#action_12791887 ] Isabel Drost commented on MAHOUT-210: - Checked in the current status of the report configuration files. Feel free to adjust any configuration that does not quite fit our standards yet. I tried to address those issues mentioned by Sean earlier in the mail thread. I setup a Hudson job to build the documentation and linked it such that it gets published through Hudson. The URLs for that: http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/core-reports/index.html http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/examples-reports/index.html http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/matrix-reports/index.html http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/maven-reports/index.html http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/taste-web-reports/index.html http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/utils-reports/index.html Those urls were activated according to the description of Bhuvaneswaran A on infrastruct...@apache: 1) setup Hudson job to generate the reports. 2) login to hud...@hudson.zones.apache.org and create a symbolic link: {code} $ sudo su - hudson $ cd hudson/userContent $ ln -s /export/home/hudson/hudson/jobs/Mahout\ QM/$PATH_TO_DOCS ./lucene-mahout/$MODULE-reports {code} 3) Access via http://hudson.zones.apache.org/hudson/userContent/lucene-mahout/$MODULE-reports/index.html The site should be regenerated once a day. Once that is done today those pages available on hudson should match those I already published on people.apache.org About to add links to our project page to the reports (going to be a separate page in the developers section). Missing: Currently the clover test coverage reports are not yet being generated - I need to change the Hudson job to take up the clover license file for that. Publish code quality reports through maven -- Key: MAHOUT-210 URL: https://issues.apache.org/jira/browse/MAHOUT-210 Project: Mahout Issue Type: New Feature Components: Website Affects Versions: 0.1, 0.2 Reporter: Isabel Drost Assignee: Isabel Drost Fix For: 0.3 Attachments: MAHOUT-210.patch We should use mvn site:site to generate code reports and publish them online for users to review and developers to easily spot problems. First version that still needs checks adjusted to our needs is available online at: http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports.html Further discussion on-list at http://www.lucidimagination.com/search/document/a13aa5127b47fda3/publish_code_quality_reports_on_web_site##a13aa5127b47fda3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-220) Mahout Bayes Code cleanup
[ https://issues.apache.org/jira/browse/MAHOUT-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790653#action_12790653 ] Isabel Drost commented on MAHOUT-220: - Before reorganizing code - could someone who is more familiar with the specific rules of the code-style used at Lucene double-check the exact checkstyle rules used for site-generation? I reused the checkstyle configuration that was already in Mahout-trunk (relaxing some of its rules) but am in doubt whether it really reflects our rules. Mahout Bayes Code cleanup - Key: MAHOUT-220 URL: https://issues.apache.org/jira/browse/MAHOUT-220 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.3 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.2 Attachments: MAHOUT-BAYES.patch Following isabel's checkstyle, I am adding a whole slew of code cleanup with the following exceptions 1. Line length used is 120 instead of 80. 2. static final log is kept as is. not LOG. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-224) Dependency Cleanup
[ https://issues.apache.org/jira/browse/MAHOUT-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790658#action_12790658 ] Isabel Drost commented on MAHOUT-224: - Maven supports marking dependencies as needed for tests only (would be appropriate for junit), or as provided by user (might be appropriate for the Hadoop stuff that I think is needed only at compile time but is available on the Hadoop cluster when deploying Mahout, right?). This should reduce the number of jars that need to be distributed as well. But that can be addressed in a separate issue. Dependency Cleanup -- Key: MAHOUT-224 URL: https://issues.apache.org/jira/browse/MAHOUT-224 Project: Mahout Issue Type: Improvement Affects Versions: 0.2 Reporter: Drew Farris Assignee: Drew Farris Priority: Minor Attachments: mahout-224.patch In preparation for the binary release work described in MAHOUT-215, here's a minor patch that does some some cleanup on the poms. The hadoop and junit dependency versions are now established using the dependencyManagement section of the parent pom in mahout/maven/pom.xml A large number of transitive dependencies from the hadoop pom are now excluded there as well -- these were not necessary previously because the hadoop dependency was hand-rolled and did not include them. With the update to the hadoop 0.20.2-SNAPSHOT, they now become required. Also, the parent pom no longer has mahout/pom.xml as its parent, this allows binary packaging to be performed in mahout/pom.xml after the build of all of the other sub-modules is complete. Also, removed the javamail dependency -- was there a reason this was present? Verified that build and unit tests complete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-85) Perceptron/Winnow Trainer
[ https://issues.apache.org/jira/browse/MAHOUT-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789312#action_12789312 ] Isabel Drost commented on MAHOUT-85: I am about to add tests currently. I guess, I will commit once I have those done and go on with a parallel version from there. Perceptron/Winnow Trainer - Key: MAHOUT-85 URL: https://issues.apache.org/jira/browse/MAHOUT-85 Project: Mahout Issue Type: New Feature Components: Classification Affects Versions: 0.1 Reporter: Isabel Drost Assignee: Isabel Drost Fix For: 0.3 Attachments: perceptronWinnowTrainer.diff Please find attached a first sketch for perceptron and winnow training. Please look very, very carefully at the patch, as I added the heart of the algorithms in the emergency room at Charite Berlin (after I broke my leg when cycling to the Hadoop Get Together ;) ). The patch does not yet feature unit tests nor is it parallelised. Currently my plan is to set up an example with the webKb dataset, add unit tests to the code and after that go parallel. I would like to get some feedback early on, in addition I would feel a lot better, if a second and third pair of eyes had a look at the code to make sure all obvious mistakes are out as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-210) Publish code quality reports through maven
[ https://issues.apache.org/jira/browse/MAHOUT-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-210: Attachment: MAHOUT-210.patch The patch adds clover, findbugs, pmd, cpd and maven dependency reports as well as java doc generation. After application the site can be generated through mvn site:site - I have thrown out all general project information that is already available through our forest site. The plan is to run mvn clean install site:site site:deploy on a daily (maybe weekly?) basis on people.apache.org and publish the results there so they can be linked to from our site. Publish code quality reports through maven -- Key: MAHOUT-210 URL: https://issues.apache.org/jira/browse/MAHOUT-210 Project: Mahout Issue Type: New Feature Components: Website Affects Versions: 0.1, 0.2 Reporter: Isabel Drost Assignee: Isabel Drost Fix For: 0.3 Attachments: MAHOUT-210.patch We should use mvn site:site to generate code reports and publish them online for users to review and developers to easily spot problems. First version that still needs checks adjusted to our needs is available online at: http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports.html Further discussion on-list at http://www.lucidimagination.com/search/document/a13aa5127b47fda3/publish_code_quality_reports_on_web_site##a13aa5127b47fda3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-11: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed. Thanks Drew for your help. Static fields used throughout clustering code (Canopy, K-Means). Key: MAHOUT-11 URL: https://issues.apache.org/jira/browse/MAHOUT-11 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.1 Reporter: Dawid Weiss Fix For: 0.3 Attachments: MAHOUT-11-all-cleanup-20091128.patch, MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, MAHOUT-11.patch I file this as a bug, even though I'm not 100% sure it is one. In the currect code the information is exchanged via static fields (for example, distance measure and thresholds for Canopies are static field). Is it always true in Hadoop that one job runs inside one JVM with exclusive access? I haven't seen it anywhere in Hadoop documentation and my impression was that everything uses JobConf to pass configuration to jobs, but jobs are configured on a per-object basis (a job is an object, a mapper is an object and everything else is basically an object). If it's possible for two jobs to run in parallel inside one JVM then this is a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost reassigned MAHOUT-11: -- Assignee: Isabel Drost Static fields used throughout clustering code (Canopy, K-Means). Key: MAHOUT-11 URL: https://issues.apache.org/jira/browse/MAHOUT-11 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.1 Reporter: Dawid Weiss Assignee: Isabel Drost Fix For: 0.3 Attachments: MAHOUT-11-all-cleanup-20091128.patch, MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, MAHOUT-11.patch I file this as a bug, even though I'm not 100% sure it is one. In the currect code the information is exchanged via static fields (for example, distance measure and thresholds for Canopies are static field). Is it always true in Hadoop that one job runs inside one JVM with exclusive access? I haven't seen it anywhere in Hadoop documentation and my impression was that everything uses JobConf to pass configuration to jobs, but jobs are configured on a per-object basis (a job is an object, a mapper is an object and everything else is basically an object). If it's possible for two jobs to run in parallel inside one JVM then this is a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost reassigned MAHOUT-11: -- Assignee: Drew Farris (was: Isabel Drost) Thanks. Static fields used throughout clustering code (Canopy, K-Means). Key: MAHOUT-11 URL: https://issues.apache.org/jira/browse/MAHOUT-11 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.1 Reporter: Dawid Weiss Assignee: Drew Farris Fix For: 0.3 Attachments: MAHOUT-11-all-cleanup-20091128.patch, MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, MAHOUT-11.patch I file this as a bug, even though I'm not 100% sure it is one. In the currect code the information is exchanged via static fields (for example, distance measure and thresholds for Canopies are static field). Is it always true in Hadoop that one job runs inside one JVM with exclusive access? I haven't seen it anywhere in Hadoop documentation and my impression was that everything uses JobConf to pass configuration to jobs, but jobs are configured on a per-object basis (a job is an object, a mapper is an object and everything else is basically an object). If it's possible for two jobs to run in parallel inside one JVM then this is a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Assigned: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
On Thu Sean Owen sro...@gmail.com wrote: Looks like Hudson is saying that broke the build but looks like easily addressable stuff. Fixed it - but only shortly *after* Hudson had already started building the project :/ Triggered the build on Hudson manually a few minutes ago - now it runs successfully again. Isabel
[jira] Assigned: (MAHOUT-210) Publish code quality reports through maven
[ https://issues.apache.org/jira/browse/MAHOUT-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost reassigned MAHOUT-210: --- Assignee: Isabel Drost Publish code quality reports through maven -- Key: MAHOUT-210 URL: https://issues.apache.org/jira/browse/MAHOUT-210 Project: Mahout Issue Type: New Feature Components: Website Affects Versions: 0.1, 0.2 Reporter: Isabel Drost Assignee: Isabel Drost Fix For: 0.3 We should use mvn site:site to generate code reports and publish them online for users to review and developers to easily spot problems. First version that still needs checks adjusted to our needs is available online at: http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports.html Further discussion on-list at http://www.lucidimagination.com/search/document/a13aa5127b47fda3/publish_code_quality_reports_on_web_site##a13aa5127b47fda3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788129#action_12788129 ] Isabel Drost commented on MAHOUT-11: I'll make the changes before committing - no need to submit a new patch version. Static fields used throughout clustering code (Canopy, K-Means). Key: MAHOUT-11 URL: https://issues.apache.org/jira/browse/MAHOUT-11 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.1 Reporter: Dawid Weiss Fix For: 0.3 Attachments: MAHOUT-11-all-cleanup-20091128.patch, MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, MAHOUT-11.patch I file this as a bug, even though I'm not 100% sure it is one. In the currect code the information is exchanged via static fields (for example, distance measure and thresholds for Canopies are static field). Is it always true in Hadoop that one job runs inside one JVM with exclusive access? I haven't seen it anywhere in Hadoop documentation and my impression was that everything uses JobConf to pass configuration to jobs, but jobs are configured on a per-object basis (a job is an object, a mapper is an object and everything else is basically an object). If it's possible for two jobs to run in parallel inside one JVM then this is a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAHOUT-90) Adding all scripts (for nightly build) to SVN repository.
[ https://issues.apache.org/jira/browse/MAHOUT-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost resolved MAHOUT-90. Resolution: Later Marked as Later - currently snapshots are published to the apache maven repository. At the moment that should be enough for users to play around with latest code. Adding all scripts (for nightly build) to SVN repository. - Key: MAHOUT-90 URL: https://issues.apache.org/jira/browse/MAHOUT-90 Project: Mahout Issue Type: New Feature Reporter: Edward J. Yoon Priority: Minor Fix For: 0.3 Attachments: mahout.tgz I made below scripts for the hudson continuous integration service on my hudson account. mahout/hudsonBuildMahoutPatch.sh mahout/processMahoutPatchEmail.sh mahout/hudsonPatchQueueAdmin.sh They will be modified by only me, so It should be handled via SVN. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-90) Adding all scripts (for nightly build) to SVN repository.
[ https://issues.apache.org/jira/browse/MAHOUT-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786678#action_12786678 ] Isabel Drost commented on MAHOUT-90: I did add a hudson job to upload maven snapshots of our projects to the apache repository on a nightly basis. No idea however how building and publishing nightly releases should work at Apache. Adding all scripts (for nightly build) to SVN repository. - Key: MAHOUT-90 URL: https://issues.apache.org/jira/browse/MAHOUT-90 Project: Mahout Issue Type: New Feature Reporter: Edward J. Yoon Assignee: Isabel Drost Priority: Minor Fix For: 0.3 Attachments: mahout.tgz I made below scripts for the hudson continuous integration service on my hudson account. mahout/hudsonBuildMahoutPatch.sh mahout/processMahoutPatchEmail.sh mahout/hudsonPatchQueueAdmin.sh They will be modified by only me, so It should be handled via SVN. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAHOUT-90) Adding all scripts (for nightly build) to SVN repository.
[ https://issues.apache.org/jira/browse/MAHOUT-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost reassigned MAHOUT-90: -- Assignee: (was: Isabel Drost) Adding all scripts (for nightly build) to SVN repository. - Key: MAHOUT-90 URL: https://issues.apache.org/jira/browse/MAHOUT-90 Project: Mahout Issue Type: New Feature Reporter: Edward J. Yoon Priority: Minor Fix For: 0.3 Attachments: mahout.tgz I made below scripts for the hudson continuous integration service on my hudson account. mahout/hudsonBuildMahoutPatch.sh mahout/processMahoutPatchEmail.sh mahout/hudsonPatchQueueAdmin.sh They will be modified by only me, so It should be handled via SVN. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785985#action_12785985 ] Isabel Drost commented on MAHOUT-11: Applies cleanly and builds w/o unit test failures here. The changes look all good to me. Great work, Drew. One question though: In the TestMeanShift test (lines 301 and 304) you removed the canopyId adjustments - could you please explain what was the reason this was necessary? I would like to commit this patch next week if noone objects. Static fields used throughout clustering code (Canopy, K-Means). Key: MAHOUT-11 URL: https://issues.apache.org/jira/browse/MAHOUT-11 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.1 Reporter: Dawid Weiss Fix For: 0.3 Attachments: MAHOUT-11-all-cleanup-20091128.patch, MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, MAHOUT-11.patch I file this as a bug, even though I'm not 100% sure it is one. In the currect code the information is exchanged via static fields (for example, distance measure and thresholds for Canopies are static field). Is it always true in Hadoop that one job runs inside one JVM with exclusive access? I haven't seen it anywhere in Hadoop documentation and my impression was that everything uses JobConf to pass configuration to jobs, but jobs are configured on a per-object basis (a job is an object, a mapper is an object and everything else is basically an object). If it's possible for two jobs to run in parallel inside one JVM then this is a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Publish code quality reports on web-site?
On Sun deneche abdelhakim adene...@gmail.com wrote: df/mapred works with the old hadoop API df/mapreduce works with hadoop 0.20 API Hmm. Maybe it would still be possible to factor that code out that is common to both implementations? That step might make migrating to a future Hadoop version easier as well as only the API dependent code would have to be changed. Isabel
Re: Publish code quality reports on web-site?
On Thu Sean Owen sro...@gmail.com wrote: I suggest our current stance be that we use 0.20.x, with the old APIs. When 0.21 comes out and stabilizes, we move. So I suggest keeping these and deleting 'mapred' at that point. Sounds good to me. Isabel
Re: Packaging target + dependencies in one .jar with Maven?
On Thu Sean Owen sro...@gmail.com wrote: Anyone know if there is an easy way to package a build target with all its dependencies with Maven? I can't find the formula with the assembly plugin but guess it is there. Hmm, judging from the poms in our repo, we are currently doing that through an ant-script. Just look at the passages that generate *.job files in the examples and core modules. Isabel
Re: [OT] who are jteam ?
On Thu, 3 Dec 2009 07:44:06 -0800 patrick o'leary pj...@pjaol.com wrote: Got a google alert from a very interesting / confusing page, http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/ Anyone know who these guys are? They did give a rather good talk on what they are doing with Solr at this year's Apache Con EU in Amsterdam: http://eu.apachecon.com/c/aceu2009/sessions/251 They are using Solr for customer search projects. Back then they were planning to contribute back (bug fixes immediately, larger extensions after some time). Isabel
Re: Publish code quality reports on web-site?
On Saturday 28 November 2009 08:30:26 Sean Owen wrote: I'm all for generating and publishing this. Great. Than I will go an tweak the checks to match our guidelines, twiddle a bit with the output format and than integrate the stuff into our nightly build. I didn't see anything big flagged, good, but we should all have a look at the results and tweak accordingly. In some cases it had a good small point, or I was indifferent about the approach it was suggesting versus what was in the code, so I changed to comply with the check. The reports generated are just examples - I am all for adjusting all checks (or adding new ones) that do not fit our needs. Going through your list and doing the proposed changes, reupload the site so everyone can have a look. Isabel -- |\ _,,,---,,_ Web: http://www.isabel-drost.de /,`.-'`'-. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: xmpp://main...@spaceboyz.net signature.asc Description: This is a digitally signed message part.
Re: Publish code quality reports on web-site?
On Saturday 28 November 2009 21:29:05 Drew Farris wrote: It will be be interesting to see the reports for the other modules as well. examples, utils, matrix. As a little preview: Just substitute mahout-core with mahout-modulename in the url below: http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports. html Fixing the report links is on my list already ;) Isabel -- |\ _,,,---,,_ Web: http://www.isabel-drost.de /,`.-'`'-. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: xmpp://main...@spaceboyz.net signature.asc Description: This is a digitally signed message part.
Publish code quality reports on web-site?
Hello, I just ran several code analysis reports over the Mahout source code. Results are published at http://people.apache.org/~isabel/mahout_site/mahout-core/project-reports.html It includes several reports on code quality, test coverage, java docs and the like. When generated regularly say on Hudson I think it could be beneficial both for us (for getting a quick impression of where cleanup is necessary most) as well as for potential users. I would like to see a third tab added to our homepage that points to a page containing reports for each of our modules. I would try to cleanup the generated site a little before - we certainly do not need the Project information stuff in there, as most of this is already generated through forest. In addition I can take care of setting up a hudson job to recreate the site on a regular schedule. Cheers, Isabel -- |\ _,,,---,,_ Web: http://www.isabel-drost.de /,`.-'`'-. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: xmpp://main...@spaceboyz.net signature.asc Description: This is a digitally signed message part.
Re: SVM algo, code, etc.
On Fri Grant Ingersoll gsing...@apache.org wrote: On Nov 19, 2009, at 1:15 PM, Sean Owen wrote: Post a patch if you'd like to proceed, IMHO. +1 +1 from me as well. I would love to see solid svm support in Mahout. Isabel
[jira] Commented: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782470#action_12782470 ] Isabel Drost commented on MAHOUT-11: Drew, go ahead then. Static fields used throughout clustering code (Canopy, K-Means). Key: MAHOUT-11 URL: https://issues.apache.org/jira/browse/MAHOUT-11 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.1 Reporter: Dawid Weiss Fix For: 0.3 Attachments: MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, MAHOUT-11.patch I file this as a bug, even though I'm not 100% sure it is one. In the currect code the information is exchanged via static fields (for example, distance measure and thresholds for Canopies are static field). Is it always true in Hadoop that one job runs inside one JVM with exclusive access? I haven't seen it anywhere in Hadoop documentation and my impression was that everything uses JobConf to pass configuration to jobs, but jobs are configured on a per-object basis (a job is an object, a mapper is an object and everything else is basically an object). If it's possible for two jobs to run in parallel inside one JVM then this is a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-11: --- Attachment: MAHOUT-11.patch Not the original author of the source, but still managed to get the static fields out of the k-means clustering code. All unit-tests are still passing. However I would feel a lot better, if someone else double-checked the changes made. Looking at the code, I spotted some more points that could benefit from being revisited (e.g. usage of deprecated MapReduce APIs and introduction of status reports). But this should be done in a separate issue. Static fields used throughout clustering code (Canopy, K-Means). Key: MAHOUT-11 URL: https://issues.apache.org/jira/browse/MAHOUT-11 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.1 Reporter: Dawid Weiss Fix For: 0.3 Attachments: MAHOUT-11.patch I file this as a bug, even though I'm not 100% sure it is one. In the currect code the information is exchanged via static fields (for example, distance measure and thresholds for Canopies are static field). Is it always true in Hadoop that one job runs inside one JVM with exclusive access? I haven't seen it anywhere in Hadoop documentation and my impression was that everything uses JobConf to pass configuration to jobs, but jobs are configured on a per-object basis (a job is an object, a mapper is an object and everything else is basically an object). If it's possible for two jobs to run in parallel inside one JVM then this is a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).
[ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780476#action_12780476 ] Isabel Drost commented on MAHOUT-11: First of all, thanks for the review. Passing the output collector directly - Jepp, makes sense. Will change and resubmit the patch. Tests with real data: Big thanks for that. Isabel Static fields used throughout clustering code (Canopy, K-Means). Key: MAHOUT-11 URL: https://issues.apache.org/jira/browse/MAHOUT-11 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.1 Reporter: Dawid Weiss Fix For: 0.3 Attachments: MAHOUT-11.patch I file this as a bug, even though I'm not 100% sure it is one. In the currect code the information is exchanged via static fields (for example, distance measure and thresholds for Canopies are static field). Is it always true in Hadoop that one job runs inside one JVM with exclusive access? I haven't seen it anywhere in Hadoop documentation and my impression was that everything uses JobConf to pass configuration to jobs, but jobs are configured on a per-object basis (a job is an object, a mapper is an object and everything else is basically an object). If it's possible for two jobs to run in parallel inside one JVM then this is a limitation and bug in our code that needs to be addressed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAHOUT-200) Update information on Mahout site
[ https://issues.apache.org/jira/browse/MAHOUT-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost resolved MAHOUT-200. - Resolution: Fixed Fix Version/s: (was: 0.3) 0.2 Updated web page and fixed typo in release announcement. Update information on Mahout site - Key: MAHOUT-200 URL: https://issues.apache.org/jira/browse/MAHOUT-200 Project: Mahout Issue Type: Improvement Components: Website Reporter: Isabel Drost Assignee: Isabel Drost Priority: Minor Fix For: 0.2 Attachments: update_site.patch After several people had trouble finding the docs we provide in the wiki, I have created a slightly updated version of our website. I added a few links to wiki pages that might be of interest to potential Mahout users. I have uploaded the updated version to http://people.apache.org/~isabel/site so all of you can have a look. Will commit on Tuesday next week if noone objects. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Trunk is now open
On Wed Grant Ingersoll gsing...@apache.org wrote: Trunk is now open for commits. Yeah! Seems like we have some good things in store for 0.3, so have at it! +1 Isabel
Re: [jira] Commented: (MAHOUT-18) Embrace interoperability with other softwares
On Tue Andrew Wang andrew.wang.1...@gmail.com wrote: As you know, i am new guy about the Mahout. suppose i have one model trained in WEKA using distinct classifiers, if the Mahout have some port to import the model, and using the model in the up-coming process, it will be very cool. Could you please explain exactly which models you would like to import and why? Assuming we are talking about naive bayes: What is really expensive about it is training the classifier. I wonder why you would want to do that within Weka. With most classification algorithms I am familiar with, training is expensive, but application to new instances is cheap. That is why currently I do not really understand, why you would want to run the training in Weka and use the model in Mahout. However, I could imagine use cases where you might want to train the model with Mahout and use it as part of a processing chain within Weka. Isabel
Re: [VOTE] Release 0.2
On Monday 16 November 2009 19:44:38 Ted Dunning wrote: Congrats. Congratulations from me as well! Isabel -- |\ _,,,---,,_ Web: http://www.isabel-drost.de /,`.-'`'-. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: xmpp://main...@spaceboyz.net signature.asc Description: This is a digitally signed message part.
[jira] Assigned: (MAHOUT-200) Update information on Mahout site
[ https://issues.apache.org/jira/browse/MAHOUT-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost reassigned MAHOUT-200: --- Assignee: Isabel Drost Update information on Mahout site - Key: MAHOUT-200 URL: https://issues.apache.org/jira/browse/MAHOUT-200 Project: Mahout Issue Type: Improvement Components: Website Reporter: Isabel Drost Assignee: Isabel Drost Priority: Minor Fix For: 0.3 Attachments: update_site.patch After several people had trouble finding the docs we provide in the wiki, I have created a slightly updated version of our website. I added a few links to wiki pages that might be of interest to potential Mahout users. I have uploaded the updated version to http://people.apache.org/~isabel/site so all of you can have a look. Will commit on Tuesday next week if noone objects. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAHOUT-200) Update information on Mahout site
Update information on Mahout site - Key: MAHOUT-200 URL: https://issues.apache.org/jira/browse/MAHOUT-200 Project: Mahout Issue Type: Improvement Components: Website Reporter: Isabel Drost Priority: Minor Fix For: 0.3 Attachments: update_site.patch After several people had trouble finding the docs we provide in the wiki, I have created a slightly updated version of our website. I added a few links to wiki pages that might be of interest to potential Mahout users. I have uploaded the updated version to http://people.apache.org/~isabel/site so all of you can have a look. Will commit on Tuesday next week if noone objects. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: 0.2 status
Adding and revising a little: Apache Mahout 0.2 has been released and is now available for public download at http://www.apache.org/dyn/closer.cgi/lucene/mahout Up to date maven artifacts can be found in the Apache repository at https://repository.apache.org/content/repositories/releases/org/apache/mahout/ Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable machine learning algorithm implementations under the Apache license. http://www.apache.org/licenses/LICENSE-2.0 Mahout is a machine learning library meant to scale to the size of data we manage today. Built on top of the powerful map/reduce paradigm of Apache Hadoop project, Mahout lets you run popular machine learning methods like clustering, collaborative filtering, classification over Terabytes of data over thousands of computers. - We may want to emphasize that using Mahout makes sense also for those people that do not have clusters with thousands of nodes? Mahout is a machine learning library meant to scale: Scale in terms of community to support anyone interested in using machine learning. Scale in terms of business by providing the library under a commercially friendly, free software license. Scale in terms of computation to the size of data we manage today. Built on top of the powerful map/reduce paradigm of the Apache Hadoop project, Mahout lets you solve popular machine learning problem settings like clustering, collaborative filtering and classification over Terabytes of data over thousands of computers. Implemented with scalability in mind the latest release brings many performance optimizations so that even in a single node setup the library performs well. - As mentioned earlier by Grant, we do need performance benchmarks at least for the the next release to prove that. The complete changelist can be found here: http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278 New Mahout 0.2 features include - Major performance enhancements in Collaborative Filtering, Classification and Clustering - New: Latent Dirichlet Allocation(LDA) implementation for topic modelling - New: Frequent Itemset Mining for mining top-k patterns from a list of transactions - New: Decision Forests implementation for Decision Tree classification (In Memory Partial Data) - New: HBase storage support for Naive Bayes model building and classification - New: Generation of vectors from Text documents for use with Mahout Algorithms - Performance improvements in various Vector implementations - Tons of bug fixes and code cleanup Getting started: New to Mahout? 1) Download Mahout at http://www.apache.org/dyn/closer.cgi/lucene/mahout 2) Check out the Quick start: http://cwiki.apache.org/MAHOUT/quickstart.html 3) Read the Mahout Wiki: http://cwiki.apache.org/MAHOUT 4) Join the community by subscribing to mahout-u...@lucene.apache.org 5) Give back: http://www.apache.org/foundation/getinvolved.html 6) Consider adding yourself to the power by Wiki page: http://cwiki.apache.org/MAHOUT/poweredby.html For more information on Apache Mahout, see http://lucene.apache.org/mahout Additional comment: I suppose, I will copy this over to my personal blog once the release is out. I would like to invite those interested in or using Mahout to do so as well.
Re: Informal Mahout MeetUp at ApacheCon Friday
On Friday 06 November 2009 04:27:03 Ted Dunning wrote: Pacific Coast Brewery is just down the street. I am already meeting some folks there at about 5 (halfway related to Mahout, but only halfway). +1 Isabel -- |\ _,,,---,,_ Web: http://www.isabel-drost.de /,`.-'`'-. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: xmpp://main...@spaceboyz.net signature.asc Description: This is a digitally signed message part.
Re: Feedback on release candidate for 0.2
On Tuesday 03 November 2009 15:45:08 Grant Ingersoll wrote: I agree, in general, we need to be able to get releases out faster and more reliable. People also should, especially when it is near release time, be encouraged to try trunk, as we aren't going to be making drastic changes at that point and it is much better to get the testing out of the way up front. I would hope that putting up nightly snapshots through hudson on repository.apache.org should lower the bar to try out trunk. Checking out trunk and compiling still involves far more work than simply switching the version of your mahout dependency to a snapshot. Isabel -- |\ _,,,---,,_ Web: http://www.isabel-drost.de /,`.-'`'-. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: xmpp://main...@spaceboyz.net signature.asc Description: This is a digitally signed message part.
Re: Feedback on release candidate for 0.2
On Friday 30 October 2009 22:16:59 Grant Ingersoll wrote: Hopefully, some of us Mahouts can carve out some time at ApacheCon to work. I will arrive Monday afternoon and stay until the following Sunday morning - I would guess that there should be some time in between to work on the release. Isabel -- QOTD: Political history is far too criminal a subject to be a fit thing to teach children. -- W. H. Auden |\ _,,,---,,_ Web: http://www.isabel-drost.de /,`.-'`'-. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: xmpp://main...@spaceboyz.net signature.asc Description: This is a digitally signed message part.
Re: Feedback on release candidate for 0.2
On Sat, Oct 31, 2009 at 10:36:29AM -0700, Jake Mannix wrote: Speaking of which, I didn't see a Mahout meetup anywhere - are we planning on having an informal one sometime? Tuesday is the Lucene MeetUp, and Thurs is Hadoop, we could go out for drinks or something after one of those two, or Friday night? Friday night sounds good - after the Lucene or Hadoop meetup would be fine with me as well. In case of Friday: Should I include a tiny little bit of advertisement for the informal meetup in my talk on Mahout? Any interest? +1 Isabel
Re: Success
On Tue Grant Ingersoll gsing...@apache.org wrote: Isabel, any idea where those things actually go? That URL is not browseable. http://maven.apache.org/developers/release/releasing.html (5th and 6th point) - says that for others to be able to view the artifacts you first need to log into Nexus, and close the repositoriy containing the release candidate for further deployments: Right click on this repository and select Close. This will close the repository from future deployments and make it available for others to view. Currently Nexus does not let me login, so I cannot verify whether I might see your release :( Isabel
Re: Success
On Wed Grant Ingersoll gsing...@apache.org wrote: Please look them over and give your thoughts on them, then if that looks good, we can call a vote. First of all - a big Thanks to all who helped get through the issues from me as well! Looks good on first sight - will have to digg deeper tomorrow. One thing I noticed - the 3rd party dependencies (hadoop, commons, kosmofs and the like) are not signed. Currently Nexus does not let me login, so I cannot verify whether I might see your release :( It should be your SVN creds. Just found out: Nexus does not like Konqueror (at least not the version currently installed on my machine). Any other browser works. Isabel
Re: Feedback on release candidate for 0.2
On Wed Sean Owen sro...@gmail.com wrote: Ran into this -- Currently when trying to build one of the tests fails for me. [INFO] [remote-resources:process {execution: default}] [ERROR] Error loading supplemental data models: Could not find resource 'supplemental-models.xml'. org.codehaus.plexus.resource.loader.ResourceNotFoundException: Could not find resource 'supplemental-models.xml'. I know we solved this by adding a file, src/main/appended-resources/supplemental-models.xml. I guess it just needs to be packaged. I'll look at that -- Isabel you might know more about this. That file should contain licensing information for all artifacts that we depend on through maven that have no description through apache deployed resources. However I do see it when unpacking the tar.gz file - it is located under mahout-0.2/src/main/appended-resources/ More information on that: http://maven.apache.org/plugins/maven-remote-resources-plugin/supplemental-models.html Isabel
Re: Feedback on release candidate for 0.2
On Wed, 28 Oct 2009 16:03:51 +0100 Isabel Drost isa...@apache.org wrote: On Wed Sean Owen sro...@gmail.com wrote: Ran into this -- Currently when trying to build one of the tests fails for me. Sorry - forgot to mention the failing test in my last mail: (org.apache.mahout.clustering.kmeans.TestKmeansClustering) Time elapsed: 18.9 sec FAILURE! Will test on my own laptop to see whether this is simply an environment issue. Isabel
Re: Release help, stuck on gpg-sign?
On Fri Grant Ingersoll gsing...@apache.org wrote: Why was gpg-plugin just added to the core pom and not higher up? All the artifacts produced need to be signed. The gpg-plugin is part of the apache-root-pom. See also: http://svn.apache.org/viewvc/maven/pom/trunk/asf/pom.xml?revision=766951view=markup http://maven.apache.org/developers/release/releasing.html If all our artifacts inherit from that, they should all get signed, right? Isabel
Re: TAR problems
On Mon Sean Owen sro...@gmail.com wrote: I don't know enough about GPG to know whether I should be seeing this at all (since my passphrase is already in settings.xml?) or how else this is supposed to work? does anyone see this? You shouldn't be asked for the password if you set it in your settings.xml and are using the profile you set it in for building. So, if your setting.xml says: profiles profile idapache-release/id properties gpg.passphrase*/gpg.passphrase /properties /profile /profiles and you are building with mvn -Papache-release goal you shouldn't be asked for the password. Isabel
Re: Release help, stuck on gpg-sign?
On Tue Sean Owen sro...@gmail.com wrote: I wonder, could whoever did the 0.1 release give it a shot? to see if it's just me? and, to perhaps just do the deployment? the legwork is done, it's ready to publish. mvn -Papache-release deploy did the trick for me. Are you sure that gpg is on your path? Though signing does work for me, the build fails as soon as it tries to upload our hadoop etc. jars to the Apache repo - I could not figure out a way to make that work - checking with infra how that is intended to be done with the apache repository. https://repository.apache.org/content/repositories/snapshots/org/apache/mahout/ Isabel
Re: Release help, stuck on gpg-sign?
On Wed Grant Ingersoll gsing...@apache.org wrote: Are you following: http://cwiki.apache.org/MAHOUT/how-to- release.html ? What step are you stuck on? http://maven.apache.org/developers/release/releasing.html (was sent to mahout-dev by Jukka some weeks ago and is linked to from https://issues.apache.org/jira/browse/INFRA-1896 - the jira issue in INFRA that deals with releasing to repository.apache.org) - probably I just misunderstood some of the steps mentioned therein? Isabel
Re: Release help, stuck on gpg-sign?
On Wed Grant Ingersoll gsing...@apache.org wrote: I'd like to make sure our Wiki properly reflects the steps, so once it is figured out, then our Wiki should be updated. +1 Isabel
Re: Release help, stuck on gpg-sign?
On Tuesday 20 October 2009 17:11:59 Sean Owen wrote: release:prepare is hanging for me at... [INFO] [INFO] [gpg:sign {execution: default}] I dont' think this is to do with the GPG signing I just added, as it shows up even if I remove that bit. Anyone more familiar with this? is my settings.xml OK? Jukka gave me the following guide for releasing according to the new Apache parent pom: http://maven.apache.org/developers/release/releasing.html It has some additional hints on prerequisites, trouble shooting etc. Not sure if that helps in your case. Isabel -- |\ _,,,---,,_ Web: http://www.isabel-drost.de /,`.-'`'-. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: xmpp://main...@spaceboyz.net signature.asc Description: This is a digitally signed message part.
Re: Where is CHANGES.txt, and what are your banner changes for 0.2?
On Fri, 16 Oct 2009 14:23:52 -0400 Grant Ingersoll gsing...@apache.org wrote: We haven't been keeping a CHANGES, as we're just relying on JIRA's ability to generate a list of what is in a version. When using mvn site site:deploy to generate a project html-report, you can generate a changes report as well. It is possible to teach maven to talk to JIRA to retrieve the current changes: http://maven.apache.org/plugins/maven-changes-plugin/jira-report-mojo.html Isabel
[jira] Resolved: (MAHOUT-171) Move deployment to repository.apache.org
[ https://issues.apache.org/jira/browse/MAHOUT-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost resolved MAHOUT-171. - Resolution: Fixed Checked in. Move deployment to repository.apache.org Key: MAHOUT-171 URL: https://issues.apache.org/jira/browse/MAHOUT-171 Project: Mahout Issue Type: Improvement Affects Versions: 0.1 Reporter: Isabel Drost Assignee: Isabel Drost Fix For: 0.2 Attachments: MAHOUT-171.patch Opening a JIRA task to collect what has to be done for moving over to using apache version 5 parent pom (see also http://markmail.org/thread/ld26m3xxzoztqsk6 ). * Link Apache parent pom into our pom. * Update hudson to build via maven ( ? ). * File subtask at INFRA-1896 to include mahout in repository.apache.org -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-171) Move deployment to repository.apache.org
[ https://issues.apache.org/jira/browse/MAHOUT-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767710#action_12767710 ] Isabel Drost commented on MAHOUT-171: - It was my own fault - I forgot to svn add the file after I applied and built with my own patch. Sorry :/ Move deployment to repository.apache.org Key: MAHOUT-171 URL: https://issues.apache.org/jira/browse/MAHOUT-171 Project: Mahout Issue Type: Improvement Affects Versions: 0.1 Reporter: Isabel Drost Assignee: Isabel Drost Fix For: 0.2 Attachments: MAHOUT-171.patch Opening a JIRA task to collect what has to be done for moving over to using apache version 5 parent pom (see also http://markmail.org/thread/ld26m3xxzoztqsk6 ). * Link Apache parent pom into our pom. * Update hudson to build via maven ( ? ). * File subtask at INFRA-1896 to include mahout in repository.apache.org -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth
[ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766030#action_12766030 ] Isabel Drost commented on MAHOUT-157: - The patch looks good to me. Good work Robin. Frequent Pattern Mining using Parallel FP-Growth Key: MAHOUT-157 URL: https://issues.apache.org/jira/browse/MAHOUT-157 Project: Mahout Issue Type: New Feature Components: Frequent Itemset/Association Rule Mining Affects Versions: 0.2 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.2 Attachments: MAHOUT-157-August-17.patch, MAHOUT-157-August-24.patch, MAHOUT-157-August-31.patch, MAHOUT-157-August-6.patch, MAHOUT-157-codecleanup-javadocs.patch, MAHOUT-157-Combinations-BSD-License.patch, MAHOUT-157-Combinations-BSD-License.patch, MAHOUT-157-CompactTransactionMapperFormat.patch, MAHOUT-157-final.patch, MAHOUT-157-inProgress-August-5.patch, MAHOUT-157-Oct-1.patch, MAHOUT-157-Oct-10.pfpgrowth.patch, MAHOUT-157-Oct-8.pfpgrowth.patch, MAHOUT-157-Oct-8.TestedMapReducePipeline.patch, MAHOUT-157-Oct-9.StreamingDBRead-Inprogress.patch, MAHOUT-157-September-10.patch, MAHOUT-157-September-18.patch, MAHOUT-157-September-5.patch Implement: http://infolab.stanford.edu/~echang/recsys08-69.pdf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing
[ https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost resolved MAHOUT-138. - Resolution: Fixed Fix Version/s: (was: 0.3) 0.2 The last ci changed the remaining classes - so at least grep does not find any usages of 'args\[' anywhere in our source code. Convert main() methods to use Commons CLI for argument processing - Key: MAHOUT-138 URL: https://issues.apache.org/jira/browse/MAHOUT-138 Project: Mahout Issue Type: Improvement Affects Versions: 0.2 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 0.2 Attachments: MAHOUT-138.patch, MAHOUT-138_fuzzyKMeansJob.patch Commons CLI is in the classpath and makes it much easier to handle command line args and they are more self-documenting when done right. We should convert our main methods to use CLI -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.