aye! ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message----- From: Lewis John Mcgibbney <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Saturday, January 10, 2015 at 3:31 PM To: "[email protected]" <[email protected]> Subject: Re: Differences between parse-html and parse-tika for generation of parse metadata >BOOM >https://issues.apache.org/jira/browse/NUTCH-1815 > > >On Sat, Jan 10, 2015 at 10:15 AM, Lewis John Mcgibbney < >[email protected]> wrote: > >> Hi Folks, >> Is the aim to have identical output from parse-tika and parse-html for >> rendering of parse metadata? >> With Nutch 1.10-SNAPSHOT with no local source code modifications, if we >> take the following page [0], and turn metatags.names to wildcard *, with >> parse-tika I get >> >> Parse Metadata: og:type=article test=displayAbstract metatag.og:image= >> http://journals.cambridge.org/cover_images/OPL/OPL.jpg metatag.og:image= >> http://journals.cambridge.org/cover_images/OPL/OPL.jpg >> fb:app_id=102729586536954 Cache-Control=no-store Pragma=no-cache >> description=We present a comparative study of optical absorption, >> photoluminescence (PL), and photoconductivity in bulk heterojunctions >> comprising a high performance functionalized anthradithiophene (ADT) >> derivative or the benchmark polymer P3HT as donor and functionalized >> pentacene (Pn) derivative or PCBM as acceptor. Of all D/A blends >>studied, >> the ADT/PCBM blend exhibited the highest charge photogeneration >> efficiencies under 532 nm excitation, leading to the highest amplitudes >>of >> time-resolved and continuous wave (cw) photocurrents. At nanosecond time >> scales after photoexcitation, both ADT-TES-F-based blends and the >> P3HT/Pn-TIPS-F8 blend exhibited photocurrents which were higher by a >>factor >> of 2-10, depending on the blend, than that in the P3HT/PCBM blend. >>However, >> cw photocurrents showed a different trend, with the ADT-TES-F/PCBM blend >> exhibiting only a factor of 1.5-2.5 lower photoresponse than that in the >> P3HT/PCBM blend, due to other contributions, such as that of charge >> trap-limited transport, to cw photoresponse. >> verify-v1=P40xFgT/ywJlpV7zP/etM8pJVJZ4CjdOId2dmmiCb+4= >> metatag.og:type=article metatag.og:type=article metatag.expires=-1 >> metatag.expires=-1 format-detection=telephone=no >> metatag.verify-v1=P40xFgT/ywJlpV7zP/etM8pJVJZ4CjdOId2dmmiCb+4= >> metatag.verify-v1=P40xFgT/ywJlpV7zP/etM8pJVJZ4CjdOId2dmmiCb+4= >> metatag.description=We present a comparative study of optical >>absorption, >> photoluminescence (PL), and photoconductivity in bulk heterojunctions >> comprising a high performance functionalized anthradithiophene (ADT) >> derivative or the benchmark polymer P3HT as donor and functionalized >> pentacene (Pn) derivative or PCBM as acceptor. Of all D/A blends >>studied, >> the ADT/PCBM blend exhibited the highest charge photogeneration >> efficiencies under 532 nm excitation, leading to the highest amplitudes >>of >> time-resolved and continuous wave (cw) photocurrents. At nanosecond time >> scales after photoexcitation, both ADT-TES-F-based blends and the >> P3HT/Pn-TIPS-F8 blend exhibited photocurrents which were higher by a >>factor >> of 2-10, depending on the blend, than that in the P3HT/PCBM blend. >>However, >> cw photocurrents showed a different trend, with the ADT-TES-F/PCBM blend >> exhibiting only a factor of 1.5-2.5 lower photoresponse than that in the >> P3HT/PCBM blend, due to other contributions, such as that of charge >> trap-limited transport, to cw photoresponse. metatag.description=We >>present >> a comparative study of optical absorption, photoluminescence (PL), and >> photoconductivity in bulk heterojunctions comprising a high performance >> functionalized anthradithiophene (ADT) derivative or the benchmark >>polymer >> P3HT as donor and functionalized pentacene (Pn) derivative or PCBM as >> acceptor. Of all D/A blends studied, the ADT/PCBM blend exhibited the >> highest charge photogeneration efficiencies under 532 nm excitation, >> leading to the highest amplitudes of time-resolved and continuous wave >>(cw) >> photocurrents. At nanosecond time scales after photoexcitation, both >> ADT-TES-F-based blends and the P3HT/Pn-TIPS-F8 blend exhibited >> photocurrents which were higher by a factor of 2-10, depending on the >> blend, than that in the P3HT/PCBM blend. However, cw photocurrents >>showed a >> different trend, with the ADT-TES-F/PCBM blend exhibiting only a factor >>of >> 1.5-2.5 lower photoresponse than that in the P3HT/PCBM blend, due to >>other >> contributions, such as that of charge trap-limited transport, to cw >> photoresponse. dc:title=Cambridge Journals Online - MRS Online >>Proceedings >> Library - Abstract - Charge carrier dynamics in small-molecule- and >> polymer-based donor-acceptor blends metatag.og:url= >> http://journals.cambridge.org/abstract_S1946427414009567 metatag.og:url= >> http://journals.cambridge.org/abstract_S1946427414009567 >> metatag.test=displayAbstract metatag.test=displayAbstract >> metatag.content-encoding=UTF-8 metatag.content-encoding=UTF-8 Expires=-1 >> metatag.pragma=no-cache metatag.pragma=no-cache >>metatag.dc:title=Cambridge >> Journals Online - MRS Online Proceedings Library - Abstract - Charge >> carrier dynamics in small-molecule- and polymer-based donor-acceptor >>blends >> metatag.dc:title=Cambridge Journals Online - MRS Online Proceedings >>Library >> - Abstract - Charge carrier dynamics in small-molecule- and >>polymer-based >> donor-acceptor blends metatag.cache-control=no-store >> metatag.cache-control=no-store metatag.format-detection=telephone=no >> metatag.format-detection=telephone=no og:image= >> http://journals.cambridge.org/cover_images/OPL/OPL.jpg og:url= >> http://journals.cambridge.org/abstract_S1946427414009567 >> metatag.content-type=text/html; charset=UTF-8 >> metatag.content-type=text/html; charset=UTF-8 Content-Encoding=UTF-8 >> metatag.fb:app_id=102729586536954 metatag.fb:app_id=102729586536954 >> Content-Type=text/html; charset=UTF-8 >> >> with parse-html, I get >> >> Parse Metadata: metatag.test=displayAbstract caching.forbidden=content >> metatag.pragma=no-cache metatag.cache-control=no-store >>metatag.title=Charge >> carrier dynamics in small-molecule- and polymer-based donor-acceptor >>blends >> metatag.format-detection=telephone=no metatag.content-type=text/html; >> charset=UTF-8 CharEncodingForConversion=utf-8 OriginalCharEncoding=utf-8 >> metatag.expires=-1 metatag.originalcharencoding=utf-8 >> metatag.verify-v1=P40xFgT/ywJlpV7zP/etM8pJVJZ4CjdOId2dmmiCb+4= >> metatag.description=We present a comparative study of optical >>absorption, >> photoluminescence (PL), and photoconductivity in bulk heterojunctions >> comprising a high performance functionalized anthradithiophene (ADT) >> derivative or the benchmark polymer P3HT as donor and functionalized >> pentacene (Pn) derivative or PCBM as acceptor. Of all D/A blends >>studied, >> the ADT/PCBM blend exhibited the highest charge photogeneration >> efficiencies under 532 nm excitation, leading to the highest amplitudes >>of >> time-resolved and continuous wave (cw) photocurrents. At nanosecond time >> scales after photoexcitation, both ADT-TES-F-based blends and the >> P3HT/Pn-TIPS-F8 blend exhibited photocurrents which were higher by a >>factor >> of 2-10, depending on the blend, than that in the P3HT/PCBM blend. >>However, >> cw photocurrents showed a different trend, with the ADT-TES-F/PCBM blend >> exhibiting only a factor of 1.5-2.5 lower photoresponse than that in the >> P3HT/PCBM blend, due to other contributions, such as that of charge >> trap-limited transport, to cw photoresponse. >> metatag.charencodingforconversion=utf-8 >> >> Immediate observation is that parse-tika seems to be duplicating a lot >>of >> the fields... take for example the description. This is repeated thrice. >> >> If we could get conversation going on this it would be ideal. >> Thanks folks >> Lewis >> >> [0] >> >>http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid= >>9493586&fulltextType=RA&fileId=S1946427414009567 >> >> -- >> *Lewis* >> > > > >-- >*Lewis*

