HansBrende edited a comment on issue #104: Any23 295: Implement ability to use 
librdfa
URL: https://github.com/apache/any23/pull/104#issuecomment-531068423
 
 
   @lewismc My first thought is: if the performance of this module is not as 
good as that of our current implementation, then in its current form, what is 
the added value?
   
   My second thought is: the benchmarks do not test the Any23 `Extractor` 
wrappers around these rdf4j parsers, only the underlying parsers themselves. 
However, in Any23's `BaseRDFExtractor`, due to a lot of bugs in the semargl 
html parser, we had to preprocess the input stream using jsoup before passing 
"clean html" into the underlying parser. I am curious as to whether or not the 
`librdfa` parser would have any of those same html parsing bugs. If _not_, if I 
can take the preprocessing logic out of `BaseRDFExtractor` and move it to the 
semargl parser specifically, and **if** the librdfa parser can still pass the 
entire test suite without using the jsoup-preprocessed stream, then there would 
be a much better case for including it (as its performance would then likely 
eclipse our current rdfa performance without the preprocessing overhead).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to