This project is mostly a text search project. You can get basic functionality without doing any math of this sort. (The Lucene search algorithms do a simplified and very fast version of one of the recommender algorithms in Mahout.)
On Wed, Nov 16, 2011 at 6:38 AM, Burcu Buyukkagnici <[email protected]>wrote: > Hi, > > Thanks for the resources. They, especially the blogs and its links, are > very helpful for me to understand the things.I might have skipped the > things related expert finding in the docs, because I haven't read > everything yet. Regarding expert finding, do I need a social engine to > create, keep and relate profiles or lucene/solr, apache's other projects > have this kind of functionality? > I want people and the organization can identify the experts relating to a > topic. sth like maven7. > http://www.maven7.com/index_en.php?page=organizational > The experts can be found from their products. For example, from Subversion > annotations I can learn who previously work on a similar subject. I want to > see the related developers, test specialist and related bugs. Also, based > on dependency of code, I want to identify the people who might be affected > by the changes that I am doing. > I hope I can explain what I'm thinking. So profiling experts based on text > files and database records mostly, can it be done with mahout, lucene etc? > > Thanks again, > > On Tue, Nov 15, 2011 at 9:34 AM, Yuval Feinstein <[email protected] > >wrote: > > > My 2c: Start with getting all the relevant texts into one place, namely a > > search index. > > A good prototyping tool would be Solr. > > You will need something like ManifoldCF: > > http://incubator.apache.org/connectors/ > > for collecting documents from the various environments. > > Here is Erik Hatcher's "Rapid Prototyping With Solr": > > > http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681 > > Once you get enough stuff into Solr, you will be able to search it > easily. > > Next, you can start using Mahout: > > > > > http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ > > I would go for an iterative design, first taking a small sample of > > documents from each environment, > > trying the systems out, and then scaling. > > Good luck, > > Yuval > > > > > > On Tue, Nov 15, 2011 at 9:12 AM, Burcu Buyukkagnici <[email protected] > > >wrote: > > > > > Hi, > > > I'm new to this community. I want to use mahout as a component of an > > > enterprise search project. The project is at conceptual phase. My > > business > > > need is to be able to find everything about a related task and > reorganize > > > the output as a new view. The results should be actionable. Also the > > system > > > should be integrated with software development environment tools; > > > Subversion; JIRA and Redmine; Sharepoint Blogs; wikis and people ( > active > > > directory) > > > Everything means, files, tools and people. Files are mostly text based > > > (word, pdf, source files);to search audio and video files are further > > > needs. > > > > > > Where does mahout; Lucene/solr and UIMA framework fit in the following > > > scenario? And what are the system requirements to setup a development > > > environment? > > > > > > X is a new project team member in a software development firm. Her > > project > > > is a 10 years-old maintainence project mainly; however customers want > > small > > > development requests on that platform. Her boss wants her to prepare a > > > software requirement specification document for a new request. Since > she > > > hasn't prepared an SRS before; she wants to find previously prepared > > > documents, and asks her collegues to give her a sample. > > > Her friend gives her a sample based on a very ancient version of SRS > from > > > her local computer. The company has Windows file server, a new content > > > management system (portal); also some projects use Subversion to store > > the > > > docs and also wikis. > > > > > > > > > 1. There should be a platform that can search files in all these > > > environments. > > > 2. The system should understand SRS is an outcome of software > > > requirements engineering or analysis process. The system should > > > understand > > > SRS, software requirements specification and functional design > > > descriptions > > > are similar terms. > > > 3. The company has manuals, templates and process definitions about > > > requirements engineering and has an SRS template which supersedes > other > > > versions. While searching the system should list organizational docs > > and > > > then project docs related to SRSes. > > > 4. The project has different SRSes written through 10 years. So the > > > system should list that specific projectsSRS templates indicationg > > > version > > > conflicts between org. document templates and projects... > > > 5. Also the system should list the people who involve requirements > > > engineering process previously in that project first; then in other > > > projects. > > > 6. Also system should have a suggestion mechanism. The system should > > > know the domain of the project X is workin on and its sub parts. For > > ex, > > > X > > > is working on an e-commerce project. And the new request is about > > mobile > > > payments. In the same company but in a different project; a project > > team > > > is > > > working on e-wallet projects for a bank. Based on her profile, system > > > should be able to suggest people, tools and outcomes from the other > > > project > > > relating with payments domain. > > > > > > The domain identification and grouping the related docs, tools and > people > > > in an existing system is nearly not possible manually. I want the > system > > > can identify and cluster the related things itself and also learn and > > > improve the results by user feedback. Also, some people should give > input > > > to the system by classifying the concepts for the system. Like for > > example; > > > I have organizational assets; document; tools; people. The documents > are > > > project docs and organizational docs and they are related. This can be > a > > > guidance for the system. > > > > > > I think carrot2 is doing sth very similar to what I say; but it has got > > > file limitation.Anyway, I need a roadmap to initiate a project like > > > this.Where should I start? > > > > > > Thanks, > > > > > > -- Lance Norskog [email protected]
