Re: Post Address Parsing and OpenNLP

mauro fraboni Fri, 20 Apr 2012 07:19:13 -0700

I was thinking to create some hundreds of train records ....may be one
thousands.


The fact that I could use some dictionaries for feature generation is quite
interesting but how could I do it? I read the Custom Feature Generation
paragraph in Developer Documentation but it is not so clear. Do you have
some examples on how is possible to add dictionaries?

thanks.

On Fri, Apr 20, 2012 at 3:53 PM, Jörn Kottmann <[email protected]> wrote:

> That should work, you might want to include street and square if its part
> of the name.
> The tags need to be separated by white spaces, otherwise the parser fails
> to recognize them.
>
> You will need a quite a few samples to train it, 10 or 20 will not be
> enough.
>
> Jörn
>
>
> On 04/20/2012 03:45 PM, mauro fraboni wrote:
>
>> I was thinking to train with a file made in this way:
>>
>> via<START:street>massarenti<**END
>>
>>> <START:number>300<END>,<START:**town>Bologna<END>,<START:**
>>> province>BO<END>,<START:**Country>IT>END>.
>>>
>> piazza<START:street>maggiore<**END
>>
>>> <START:number>3<END>,<START:**town>Trento<END>,<START:**
>>> province>TN<END>,<START:**Country>IT>END>
>>>
>> ............
>>
>>
>> via (meaning is street) and piazza (meaning is square) are two descriptors
>> that could not be classified according to my opinion.
>>
>> ciao
>>
>> On Fri, Apr 20, 2012 at 3:29 PM, Jim - FooBar();<[email protected]**
>> >wrote:
>>
>>  On 20/04/12 14:16, mauro fraboni wrote:
>>>
>>> I am investigating if it is possible to use OpenNLP to parse italian post
>>>> addresses.
>>>> I do not want to validate the input address using an official address
>>>> database; I just need to divide a single address string into its
>>>> individual
>>>> component parts and I thought to use NameFinder.
>>>> My idea was to train Name Finder using some italian addresses indicating
>>>> in
>>>> training data the parts like Street, Town, Province, Post Code, Country
>>>> Do you think that it can work? Someone has experience about it?
>>>>
>>>> Thanks and ciao.
>>>>
>>>>
>>>> Hmmm, that sounds like it should work....however you don't want to
>>> separate your entities to Street, Town, Province, Post Code, Country etc
>>> cos then how are you going to join them to get your 'real' entity
>>> (address)? I would say keep the whole address as 1 entity and produce
>>> some
>>> training data that mark the whole thing...of course if you already have
>>> some training is better otherwise you will spend a bit of time creating
>>> your annotated corpus...
>>>
>>> My logic says that this is the way to go - maybe I'm wrong is some
>>> way....
>>> Any different opinions anyone?
>>>
>>> Jim
>>>
>>> ps. In your first sentence did you by any chance mean to say "recognise"
>>> instead of "parse"?
>>>
>>>
>

Re: Post Address Parsing and OpenNLP

Reply via email to