> Couldn't find anything in Smalltalk but that should you give ideas and 
> inspire you or get you started...
> 
> https://github.com/search?q=contact+scraping&type=Repositories
> 
> I guess we have all that's needed in Pharo : parsers (HTML, XML, 
> PetitParser), Soup & regex !

Yes for markup, I played already quite much with Soup initially at first but 
then XPath which os far more convenient and direct.

PetitParser(2) is also a pure gem. I finally understand it better. Grammar and 
Parser are interesting but maybe more appropriate for 
structured/semi/structured texts.

Pure text, is nice with tools like those Richard just showed.

I continue to play with PetitParser2 today (finishing tutorial…).

Cheers,

Cédrick


> 
> On 2019-03-07 04:52, Cédrick Béler wrote:
>> Hi all,
>> 
>> I’ve often got the need to analyse some random unstructured text to discover 
>> (structured) information (in email for instance), to extract :
>> - emails
>> - telephone numbers
>> - addresses
>> - events
>> - person names (according to a list of known persons),
>> - etc…
>> 
>> Apple do it in email for instance (strangely, this is not generalized).
>> 
>> 
>> So my questions are :
>> - do we have something equivalent in Smalltalk/Pharo ? (I didn’t find)
>> - if not, what strategy would you use ?
>> => I do really stupid text analysis (substrings, finding @, …, parsing 
>> according to the text structure when there is… kind of Soup parsing…)
>> => I feel this is a job for PetitParser ? And would be a nice feet to the 
>> new GToolkit.
>> 
>> All ideas or suggestions are welcome ;-)
>> 
>> 
>> TIA,
>> 
>> Cédrick
>> 
>> 
>> 
> -- 
> -----------------
> Benoît St-Jean
> Yahoo! Messenger: bstjean
> Twitter: @BenLeChialeux
> Pinterest: benoitstjean
> Instagram: Chef_Benito
> IRC: lamneth
> Blogue: endormitoire.wordpress.com
> "A standpoint is an intellectual horizon of radius zero".  (A. Einstein)
> 
> 
> 
> 


Reply via email to