Mauro, Try also the code from svn trunk... I think I also fixed a bug with back-to-back namefinder tags that may also be causing this behavior.
James On 4/27/2012 12:06 PM, mauro fraboni wrote: > I have tried to train NER for Italian Addresses using the following train > data; this is just an extract because I used a train file of 50.000 records. > > VIA <START:street> FRANCESCO ZANARDI <END> <START:number> 985 <END> > <START:zip> 40131 <END> <START:town> BOLOGNA <END> <START:province> BO <END> > VIA <START:street> STEFANO BORGIA <END> <START:number> 151 <END> > <START:zip> 00168 <END> <START:town> ROMA <END> <START:province> RM <END> > VIALE <START:street> ITALIA <END> <START:number> 40 <END> <START:zip> 83100 > <END> <START:town> AVELLINO <END> <START:province> AV <END> > PIAZZA <START:street> ROMA <END> <START:number> 15 <END> <START:zip> 63100 > <END> <START:town> ASCOLI PICENO <END> <START:province> AP <END> > > > I have used the following line command to train: > > C:\Programmi\apache-opennlp-1.5.2-incubating\bin>opennlp.bat > TokenNameFinderTrainer -encoding UTF-8 -lang it -data > ../traindata/it-ner-address.train -model ../models/it/it-ner-address.bin > > > Then I have run a Name Finder Tool with the following connand: > C:\Programmi\apache-opennlp-1.5.2-incubating\bin>opennlp.bat > TokenNameFinder ../models/it/it-ner-address.bin < > ../input/it-ner-address.txt > ../output/it-ner-address.txt > using a small file of 100 records and I have received the following results > (still this is just an extract): > > PZA <START:number> GIOVANNI FONTANA <END> <START:zip> 1 <END> <START:town> > 60125 <END> <START:province> ANCONA <END> <START:province> AN <END> > VIA <START:number> A. GARIBALDI <END> <START:zip> 56 <END> <START:town> > 60019 <END> <START:province> SENIGALLIA <END> <START:province> AN <END> > VIA <START:number> A. GARIBALDI <END> <START:zip> 56 <END> <START:town> > 60019 <END> <START:province> SENIGALLIA <END> <START:province> AN <END> > VIA <START:zip> ACHILLE GRANDI <END> <START:zip> 21 <END> <START:street> > INT <END> <START:number> INT A <END> <START:street> 23891 BARZANO' <END> > <START:street> LC <END> > VIA <START:number> AGRARIA <END> <START:zip> 2 <END> <START:town> 60035 > <END> <START:province> JESI <END> <START:province> AN <END> > VIA <START:number> AGRARIA <END> <START:zip> 2 <END> <START:town> 60035 > <END> <START:province> JESI <END> <START:province> AN <END> > VIA <START:street> ALBERTO DA GIUSSANO <END> <START:number> 39 INT <END> > <START:zip> I <END> <START:town> 20030 <END> <START:street> SEVESO <END> > <START:street> MB <END> > VIA <START:number> AMEDEO <END> <START:zip> 51A <END> <START:town> 24040 > <END> <START:province> VERDELLINO <END> <START:province> BG <END> > VIA <START:street> AMEDEO DI SAVOIA 15 INT <END> <START:zip> INT <END> > <START:town> 46040 <END> <START:street> CASALROMANO <END> <START:street> MN > <END> > VIA <START:number> ANTONIO GRAMSCI <END> <START:zip> 14 <END> <START:town> > 61040 <END> <START:town> MONDAVIO PU <END> > VIA <START:town> ARNETTA <END> <START:zip> 20 <END> <START:street> INT > <END> <START:number> INT <END> <START:zip> B <END> <START:town> 21045 <END> > <START:province> GAZZADA SCHIANNO <END> <START:province> VA <END> > VIA <START:number> BRESCIA <END> <START:zip> 31 <END> <START:town> 26013 > <END> <START:province> CREMA <END> <START:province> CR <END> > VIA <START:zip> C. CAVOUR <END> <START:zip> 6 <END> <START:street> PRESSO > <END> <START:number> INT <END> <START:zip> FARMA <END> <START:town> 60033 > <END> <START:province> CHIARAVALLE <END> <START:province> AN <END> > VIA <START:number> CAMERANO <END> <START:zip> 7 <END> <START:town> 62019 > <END> <START:province> RECANATI <END> <START:province> MC <END> > VIA <START:town> CANDIA <END> <START:street> 350 <END> <START:street> INT > <END> <START:zip> INT E <END> <START:town> 60131 <END> <START:province> > ANCONA <END> <START:province> AN <END> > VIA <START:number> CESARE BECCARIA <END> <START:zip> 49 <END> <START:town> > 60019 <END> <START:province> SENIGALLIA <END> <START:province> AN <END> > VIA <START:zip> CESARE PAVESE <END> <START:zip> 28 <END> <START:street> INT > <END> <START:zip> INT INT <END> <START:town> 46030 <END> <START:town> > BIGARELLO MN <END> > > > > > The results are clearly not good. Do you have any idea of how I could > improve them ? I am new to Opennlp is there any parameter that I should use > when running the training? > > Mauro >
