Mauro,

I'm having to guess there may not be enough context for these
addresses.  The model approach usually does best when getting the
addresses in a larger context.  By the looks of what you have posted, it
looks like you want to be able to use it to extract the information from
a phone book type entries.
A better approach for this might be a dictionary or pattern matching
approach than using a model.

James

On 4/27/2012 12:06 PM, mauro fraboni wrote:
> I have tried to train NER for Italian Addresses using the following train
> data; this is just an extract because I used a train file of 50.000 records.
>
> VIA <START:street> FRANCESCO ZANARDI <END> <START:number> 985 <END>
> <START:zip> 40131 <END> <START:town> BOLOGNA <END> <START:province> BO <END>
> VIA <START:street> STEFANO BORGIA <END> <START:number> 151 <END>
> <START:zip> 00168 <END> <START:town> ROMA <END> <START:province> RM <END>
> VIALE <START:street> ITALIA <END> <START:number> 40 <END> <START:zip> 83100
> <END> <START:town> AVELLINO <END> <START:province> AV <END>
> PIAZZA <START:street> ROMA <END> <START:number> 15 <END> <START:zip> 63100
> <END> <START:town> ASCOLI PICENO <END> <START:province> AP <END>
>
>
> I have used the following line command to train:
>
> C:\Programmi\apache-opennlp-1.5.2-incubating\bin>opennlp.bat
> TokenNameFinderTrainer -encoding UTF-8 -lang it -data
> ../traindata/it-ner-address.train -model ../models/it/it-ner-address.bin
>
>
> Then I have run a Name Finder Tool with the following connand:
> C:\Programmi\apache-opennlp-1.5.2-incubating\bin>opennlp.bat
> TokenNameFinder ../models/it/it-ner-address.bin <
> ../input/it-ner-address.txt > ../output/it-ner-address.txt
> using a small file of 100 records and I have received the following results
> (still this is just an extract):
>
> PZA <START:number> GIOVANNI FONTANA <END> <START:zip> 1 <END> <START:town>
> 60125 <END> <START:province> ANCONA <END> <START:province> AN <END>
> VIA <START:number> A. GARIBALDI <END> <START:zip> 56 <END> <START:town>
> 60019 <END> <START:province> SENIGALLIA <END> <START:province> AN <END>
> VIA <START:number> A. GARIBALDI <END> <START:zip> 56 <END> <START:town>
> 60019 <END> <START:province> SENIGALLIA <END> <START:province> AN <END>
> VIA <START:zip> ACHILLE GRANDI <END> <START:zip> 21 <END> <START:street>
> INT <END> <START:number> INT A <END> <START:street> 23891 BARZANO' <END>
> <START:street> LC <END>
> VIA <START:number> AGRARIA <END> <START:zip> 2 <END> <START:town> 60035
> <END> <START:province> JESI <END> <START:province> AN <END>
> VIA <START:number> AGRARIA <END> <START:zip> 2 <END> <START:town> 60035
> <END> <START:province> JESI <END> <START:province> AN <END>
> VIA <START:street> ALBERTO DA GIUSSANO <END> <START:number> 39 INT <END>
> <START:zip> I <END> <START:town> 20030 <END> <START:street> SEVESO <END>
> <START:street> MB <END>
> VIA <START:number> AMEDEO <END> <START:zip> 51A <END> <START:town> 24040
> <END> <START:province> VERDELLINO <END> <START:province> BG <END>
> VIA <START:street> AMEDEO DI SAVOIA 15 INT <END> <START:zip> INT <END>
> <START:town> 46040 <END> <START:street> CASALROMANO <END> <START:street> MN
> <END>
> VIA <START:number> ANTONIO GRAMSCI <END> <START:zip> 14 <END> <START:town>
> 61040 <END> <START:town> MONDAVIO PU <END>
> VIA <START:town> ARNETTA <END> <START:zip> 20 <END> <START:street> INT
> <END> <START:number> INT <END> <START:zip> B <END> <START:town> 21045 <END>
> <START:province> GAZZADA SCHIANNO <END> <START:province> VA <END>
> VIA <START:number> BRESCIA <END> <START:zip> 31 <END> <START:town> 26013
> <END> <START:province> CREMA <END> <START:province> CR <END>
> VIA <START:zip> C. CAVOUR <END> <START:zip> 6 <END> <START:street> PRESSO
> <END> <START:number> INT <END> <START:zip> FARMA <END> <START:town> 60033
> <END> <START:province> CHIARAVALLE <END> <START:province> AN <END>
> VIA <START:number> CAMERANO <END> <START:zip> 7 <END> <START:town> 62019
> <END> <START:province> RECANATI <END> <START:province> MC <END>
> VIA <START:town> CANDIA <END> <START:street> 350 <END> <START:street> INT
> <END> <START:zip> INT E <END> <START:town> 60131 <END> <START:province>
> ANCONA <END> <START:province> AN <END>
> VIA <START:number> CESARE BECCARIA <END> <START:zip> 49 <END> <START:town>
> 60019 <END> <START:province> SENIGALLIA <END> <START:province> AN <END>
> VIA <START:zip> CESARE PAVESE <END> <START:zip> 28 <END> <START:street> INT
> <END> <START:zip> INT INT <END> <START:town> 46030 <END> <START:town>
> BIGARELLO MN <END>
>
>
>
>
> The results are clearly not  good. Do you have any idea of how I could
> improve them ? I am new to Opennlp is there any parameter that I should use
> when running the training?
>
> Mauro
>

Reply via email to