Great idea, +1.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Julien Nioche <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Friday, February 12, 2016 at 7:46 AM
To: "[email protected]" <[email protected]>
Subject: Re: [MASSMAIL]Extract Contact Information - Custom Parser

>we could create an account for the project at SO, give the user list as an
>email address and set up an alert so that any question tagged as [nutch]
>gets sent to [email protected]
>That should work shouldn't it?
>
>On 12 February 2016 at 15:11, Mattmann, Chris A (3980) <
>[email protected]> wrote:
>
>> That’s a cool idea but how would we set up the redirect since
>> wouldn’t that have to occur at SO?
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: [email protected]
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Julien Nioche <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Wednesday, February 10, 2016 at 6:48 AM
>> To: "[email protected]" <[email protected]>
>> Subject: Re: [MASSMAIL]Extract Contact Information - Custom Parser
>>
>> >See SO =>
>> >
>> 
>>http://stackoverflow.com/questions/35299744/nutch-parser-plugin-collect-c
>>o
>> >ntact-information
>> >
>> >There seems to be more and more people sending the questions to both
>>the
>> >ML
>> >and SO. Am wondering whether we should set up a redirect so that any
>> >question asked there lands automatically on the user list. Any
>>thoughts?
>> >
>> >On 10 February 2016 at 14:43, Markus Jelsma
>><[email protected]>
>> >wrote:
>> >
>> >> Yes, i would also implement a HtmlParserFilter plugin but execute the
>> >> regex on the parseText, because that is where you are going to find
>> >>phone
>> >> numbers etc.
>> >> Markus
>> >>
>> >>
>> >>
>> >> -----Original message-----
>> >> > From:Jorge Luis Betancourt González <[email protected]>
>> >> > Sent: Tuesday 9th February 2016 19:59
>> >> > To: [email protected]
>> >> > Subject: Re: [MASSMAIL]Extract Contact Information - Custom Parser
>> >> >
>> >> > Any particular requiremente that prevent you from implementing your
>> >> logic as a HtmlParser plugin? essentially the parsing will be done
>>for
>> >>you
>> >> (by parse-html or parse-tika) and all you need to do is find the
>>right
>> >> nodes and extract the desired information (see [1]).
>> >> >
>> >> > Regards,
>> >> >
>> >> > [1] 
>>http://svn.apache.org/repos/asf/nutch/trunk/src/plugin/headings/
>> >> >
>> >> > ----- Mensaje original -----
>> >> > De: "Bin Wang" <[email protected]>
>> >> > Para: "Apache.Nutch.User" <[email protected]>
>> >> > Enviados: Martes, 9 de Febrero 2016 13:19:35
>> >> > Asunto: [MASSMAIL]Extract Contact Information - Custom Parser
>> >> >
>> >> > Hi there,
>> >> >
>> >> > I am working on a project that need to identify contact points on
>> >> company's
>> >> > website and used for the purpose of enhancing security.
>> >> >
>> >> > Right now, I managed to crawl several rounds of sites. The next
>>step
>> >>will
>> >> > be to parse the HTML pages and locate where the contact information
>> >>is.
>> >> In
>> >> > this case, I am only interested in email addresses and phone
>> >>numbers....
>> >> >
>> >> > Here is what I am planning to do, we can write a map reduce jobs to
>> >>parse
>> >> > HTML file and use things like regular expression in combo with
>> >> > Jsoup/Beautifulsoup HTML parsers to find the regular expression.
>> >> >
>> >> > However, I am wondering is there any parser plugin that has already
>> >>been
>> >> > implemented and maybe tested used for this purpose?
>> >> >
>> >> > Also, any feedback how to achieve this is much appreciated!
>> >> >
>> >> > Best regards,
>> >> >
>> >> > Bin
>> >> >
>> >>
>> >
>> >
>> >
>> >--
>> >
>> >*Open Source Solutions for Text Engineering*
>> >
>> >http://www.digitalpebble.com
>> >http://digitalpebble.blogspot.com/
>> >#digitalpebble <http://twitter.com/digitalpebble>
>>
>>
>
>
>-- 
>
>*Open Source Solutions for Text Engineering*
>
>http://www.digitalpebble.com
>http://digitalpebble.blogspot.com/
>#digitalpebble <http://twitter.com/digitalpebble>

Reply via email to