Hi,

here are some quick rules. It could be solved with fewer rules and also
with better or faster rules. You need essentially a rule for detecting
the structure and a rule for assigning the semantics. The rules would
also work if you have a plain text table with more rows.


Let me know if you have questions about some parts.


Best,

Peter

TYPESYSTEM utils.PlainTextTypeSystem;
ENGINE utils.PlainTextAnnotator;

DECLARE Header;
DECLARE ColumnDelimiter;
DECLARE Cell(INT column);

DECLARE Keyword (STRING label);
DECLARE Keyword UnderWriterNameKeyword, AppraiserNameLicenseKeyword,
AppraisalCompanyNameKeyword;

"Underwriter's Name" -> UnderWriterNameKeyword ( "label" = "UnderWriter
Name");
"Appraiser's Name/License" -> AppraiserNameLicenseKeyword ( "label" =
"Appraiser Name");
"Appraisal Company Name" -> AppraisalCompanyNameKeyword ( "label" =
"Appraisal Company Name");

DECLARE Entry(Keyword keyword);

EXEC(PlainTextAnnotator, {Line,Paragraph});

ADDRETAINTYPE(WS);
Line{->TRIM(WS)};
Paragraph{->TRIM(WS)};

SPACE[3,100]{-PARTOF(ColumnDelimiter) -> ColumnDelimiter};
Line -> {ANY+{-PARTOF(Cell),-PARTOF(ColumnDelimiter) -> Cell};};
REMOVERETAINTYPE(WS);

INT index = 0;
BLOCK(structure) Line{}{
    ASSIGN(index, 0);
    Line{STARTSWITH(Paragraph) -> Header};
    c:Cell{-> c.column = index, index = index + 1};
}

Header<-{hc:Cell{hc.column == c.column}<-{k:Keyword;};}
    # c:@Cell{-PARTOF(Header) -> e:Entry, e.keyword = k};

DECLARE Entity (STRING label, STRING value);
DECLARE Entity UnderWriterName, AppraiserNameLicense, AppraisalCompanyName;

FOREACH(entry) Entry{}{
    entry{ -> CREATE(UnderWriterName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(UnderWriterNameKeyword)};};
    entry{ -> CREATE(AppraiserNameLicense, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraiserNameLicenseKeyword)};};
    entry{ -> CREATE(AppraisalCompanyName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraisalCompanyNameKeyword)};};
}



Am 06.11.2019 um 12:45 schrieb Shashank Pathak:
> Hi Peter,
>
> I am trying to get information from a indented text file.
>
> Input file text:
> Underwriter's Name          Appraiser's Name/License          Appraisal
> Company Name
> Alice Wheaton               Bruce Banner                      Stark
> Industries
>
> Approach:
>        I am trying to annotate fixed keywords like "Underwriter's Name" and
> then go to line next to this annotated keyword.
>        But I am not able to fetch UnderWriter's Name. It is giving all
> instances which are matched(Alice Wheaton  Bruce, Wheaton Bruce Banner,
> etc).
>
>
> Code :
>
> TYPESYSTEM utils.PlainTextTypeSystem;
> ENGINE utils.PlainTextAnnotator;
>
> EXEC(PlainTextAnnotator, {Line});
> ADDRETAINTYPE(WS);
> Line{->TRIM(WS)};
> REMOVERETAINTYPE(WS);
> Document{->FILTERTYPE(SPECIAL)};
>
> DECLARE UnderWriterKeyword, NameKeyword, UnderWriterNameKeyword;
> DECLARE UnderWriterName(String label, String value);
>
> CW{REGEXP("\\bUnderwriter") -> UnderWriterKeyword};
> CW{REGEXP("Name")->NameKeyword};
> (UnderWriterKeyword SW NameKeyword){->UnderWriterNameKeyword};
> Line{CONTAINS(UnderWriterNameKeyword)} Line -> {
>    n:CW[1,3]{-> CREATE(UnderWriterName, "label"="UnderWriter Name",
> "value"=n.ct)};
>    };
>
> Please tell me whether it is possible to achieve this using RUTA or not.
> Also share steps to get Underwriter's Name, Appraiser's Name/License and
> Appraisal Comapny Name.
> I have already posted question similar to this on stackoverflow
> https://stackoverflow.com/questions/58726610/using-ruta-get-a-data-present-in-next-line-of-annotated-keyword/58728364#58728364
>
> Thanks,
>
> Shashank Pathak
>
-- 
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: [email protected]
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Reply via email to