Re: mahout for enterprise search project

Burcu Buyukkagnici Wed, 16 Nov 2011 06:39:33 -0800

Hi,

Thanks for the resources. They, especially the blogs and its links, are
very helpful for me to understand the things.I might have skipped the
things related expert finding in the docs, because I haven't read
everything yet. Regarding expert finding, do I need a social engine to
create, keep and relate profiles or  lucene/solr, apache's other projects
have this kind of functionality?
I want people and the organization can identify the experts relating to a
topic. sth like maven7.
http://www.maven7.com/index_en.php?page=organizational
The experts can be found from their products. For example, from Subversion
annotations I can learn who previously work on a similar subject. I want to
see the related developers, test specialist and related bugs. Also, based
on dependency of code, I want to identify the people who might be affected
by the changes that I am doing.
I hope I can explain what I'm thinking. So profiling experts based on text
files and database records mostly, can it be done with mahout, lucene etc?


Thanks again,

On Tue, Nov 15, 2011 at 9:34 AM, Yuval Feinstein <[email protected]>wrote:

> My 2c: Start with getting all the relevant texts into one place, namely a
> search index.
> A good prototyping tool would be Solr.
> You will need something like ManifoldCF:
> http://incubator.apache.org/connectors/
> for collecting documents from the various environments.
> Here is Erik Hatcher's "Rapid Prototyping With Solr":
> http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681
> Once you get enough stuff into Solr, you will be able to search it easily.
> Next, you can start using Mahout:
>
> http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/
> I would go for an iterative design, first taking a small sample of
> documents from each environment,
> trying the systems out, and then scaling.
> Good luck,
> Yuval
>
>
> On Tue, Nov 15, 2011 at 9:12 AM, Burcu Buyukkagnici <[email protected]
> >wrote:
>
> > Hi,
> > I'm new to this community. I want to use mahout as a component of an
> > enterprise search project. The project is at conceptual phase. My
> business
> > need is to be able to find everything about a related task and reorganize
> > the output as a new view. The results should be actionable. Also the
> system
> > should be integrated with software development environment tools;
> > Subversion; JIRA and Redmine; Sharepoint Blogs; wikis and people ( active
> > directory)
> > Everything means, files, tools and people. Files are mostly text based
> > (word, pdf, source files);to search audio and video files are further
> > needs.
> >
> > Where does mahout; Lucene/solr and UIMA framework fit in the following
> > scenario? And what are the system requirements to setup a development
> > environment?
> >
> > X is a new project team member in a software development firm. Her
> project
> > is a 10 years-old maintainence project mainly; however customers want
> small
> > development requests on that platform. Her boss wants her to prepare a
> > software requirement specification document for a new request. Since she
> > hasn't prepared an SRS before; she wants to find previously prepared
> > documents, and asks her collegues to give her a sample.
> > Her friend gives her a sample based on a very ancient version of SRS from
> > her local computer. The company has Windows file server, a new content
> > management system (portal); also some projects use Subversion to store
> the
> > docs and also wikis.
> >
> >
> >   1. There should be a platform that can search files in all these
> >   environments.
> >   2. The system should understand SRS is an outcome of software
> >   requirements engineering or analysis process.  The system should
> > understand
> >   SRS, software requirements specification and functional design
> > descriptions
> >   are similar terms.
> >   3. The company has manuals, templates and process definitions about
> >   requirements engineering and has an SRS template which supersedes other
> >   versions. While searching the system should list organizational docs
> and
> >   then project docs related to SRSes.
> >   4. The project has different SRSes written through 10 years. So the
> >   system should list that specific projectsSRS templates indicationg
> > version
> >   conflicts between org. document templates and projects...
> >   5. Also the system should list the people who involve requirements
> >   engineering process previously in that project first; then in other
> >   projects.
> >   6. Also system should have a suggestion mechanism. The system should
> >   know the domain of the project X is workin on and its sub parts. For
> ex,
> > X
> >   is working on an e-commerce project. And the new request is about
> mobile
> >   payments. In the same company but in a different project; a project
> team
> > is
> >   working on e-wallet projects for a bank. Based on her profile, system
> >   should be able to suggest people, tools and outcomes from the other
> > project
> >   relating with payments domain.
> >
> > The domain identification and grouping the related docs, tools and people
> > in an existing system is nearly not possible manually. I want the system
> > can identify and cluster the related things itself and also learn and
> > improve the results by user feedback. Also, some people should give input
> > to the system by classifying the concepts for the system. Like for
> example;
> > I have organizational assets; document; tools; people. The documents are
> > project docs and organizational docs and they are related. This can be a
> > guidance for the system.
> >
> > I think carrot2 is doing sth very similar to what I say; but it has got
> > file limitation.Anyway, I need a roadmap to initiate a project like
> > this.Where should I start?
> >
> > Thanks,
> >
>

Re: mahout for enterprise search project

Reply via email to