My 2c: Start with getting all the relevant texts into one place, namely a
search index.
A good prototyping tool would be Solr.
You will need something like ManifoldCF:
http://incubator.apache.org/connectors/
for collecting documents from the various environments.
Here is Erik Hatcher's "Rapid Prototyping With Solr":
http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681
Once you get enough stuff into Solr, you will be able to search it easily.
Next, you can start using Mahout:
http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/
I would go for an iterative design, first taking a small sample of
documents from each environment,
trying the systems out, and then scaling.
Good luck,
Yuval


On Tue, Nov 15, 2011 at 9:12 AM, Burcu Buyukkagnici <[email protected]>wrote:

> Hi,
> I'm new to this community. I want to use mahout as a component of an
> enterprise search project. The project is at conceptual phase. My business
> need is to be able to find everything about a related task and reorganize
> the output as a new view. The results should be actionable. Also the system
> should be integrated with software development environment tools;
> Subversion; JIRA and Redmine; Sharepoint Blogs; wikis and people ( active
> directory)
> Everything means, files, tools and people. Files are mostly text based
> (word, pdf, source files);to search audio and video files are further
> needs.
>
> Where does mahout; Lucene/solr and UIMA framework fit in the following
> scenario? And what are the system requirements to setup a development
> environment?
>
> X is a new project team member in a software development firm. Her project
> is a 10 years-old maintainence project mainly; however customers want small
> development requests on that platform. Her boss wants her to prepare a
> software requirement specification document for a new request. Since she
> hasn't prepared an SRS before; she wants to find previously prepared
> documents, and asks her collegues to give her a sample.
> Her friend gives her a sample based on a very ancient version of SRS from
> her local computer. The company has Windows file server, a new content
> management system (portal); also some projects use Subversion to store the
> docs and also wikis.
>
>
>   1. There should be a platform that can search files in all these
>   environments.
>   2. The system should understand SRS is an outcome of software
>   requirements engineering or analysis process.  The system should
> understand
>   SRS, software requirements specification and functional design
> descriptions
>   are similar terms.
>   3. The company has manuals, templates and process definitions about
>   requirements engineering and has an SRS template which supersedes other
>   versions. While searching the system should list organizational docs and
>   then project docs related to SRSes.
>   4. The project has different SRSes written through 10 years. So the
>   system should list that specific projectsSRS templates indicationg
> version
>   conflicts between org. document templates and projects...
>   5. Also the system should list the people who involve requirements
>   engineering process previously in that project first; then in other
>   projects.
>   6. Also system should have a suggestion mechanism. The system should
>   know the domain of the project X is workin on and its sub parts. For ex,
> X
>   is working on an e-commerce project. And the new request is about mobile
>   payments. In the same company but in a different project; a project team
> is
>   working on e-wallet projects for a bank. Based on her profile, system
>   should be able to suggest people, tools and outcomes from the other
> project
>   relating with payments domain.
>
> The domain identification and grouping the related docs, tools and people
> in an existing system is nearly not possible manually. I want the system
> can identify and cluster the related things itself and also learn and
> improve the results by user feedback. Also, some people should give input
> to the system by classifying the concepts for the system. Like for example;
> I have organizational assets; document; tools; people. The documents are
> project docs and organizational docs and they are related. This can be a
> guidance for the system.
>
> I think carrot2 is doing sth very similar to what I say; but it has got
> file limitation.Anyway, I need a roadmap to initiate a project like
> this.Where should I start?
>
> Thanks,
>

Reply via email to