Hi, Folks, We have been using UIMA to mine data points from some documents in plain text format and our AE worked fine. But recently those documents are delivered in HTML format (i.e. with a bunch of HTML tags mixed in) and our AEs can no longer mine the data correctly. Our question is if whether there is any HTML Collection Reader component or library already available so we do not need to reinvent the wheel?
We tried an HTMLCommon collection reader but looks like it cannot parse a table correctly. It often adds many blank lines between tables cells/rows which confuses our AE. Any of your help is highly appreciated. Thanks -Chengmin
