Hello, I would like to get some more information regarding OpenNLP and how it may be a fit for a project I am working up. Unfortunately, I am not a very technical person so any direction and re-direction of my assumptions would be appreciated. I am trying to work up a text analysis software for the staffing industry to take steps to narrow down candidate possibilities based on resume information. The first challenge has been finding a program that will extract the information from an assortment of document types. I understand from several users that Tika would be a good place to start and that it exports information back in a XML format. The second challenge would be to then rearrange that information based on a set of rules.
For example. On a resume people tend to have a variety of ways that they will write a date (7/7/15, 07-07-2015, July 7, 2015 etc.). They also have different areas of the resume that they will include a title, current employer, skills, education and the like. Although there is some commonality in the way people write their resumes I need to find a software that will allow me to pull the information needed and reformat it. I have been told that OpenNLP will allow you to "teach" it the information to pull and where to put it based on a set rules standard. Since I have been looking at resumes for a long time I would be able to show it where and when to move data. Do you feel, based on the description of what I am trying to do, that OpenNLP would be a viable solution? Thank you for any assistance that you may be able to provide. Have a great day. Scott