Hi all, I started to use any23 recently and I had one issue extracting the information from one website (IMDB.com).
I want to extract triples from the webpages and I faced the following problem: Even when there is an IRI that could be used as the identifier for a concept it is not used and the blank node is used instead. In the following example the actor Marco Nanini is represented by a blank node ( *_:nodec984d7c9ee5436ea92571ccd94b946*) even when he has an IRI that could be used as the identifier (*file:/name/nm0620847/?ref_=tt_cl_t1*). After, the blank node identification is used to link it with a Movie, which is also identified by a blank node. It seems that in this specific case I could use the content from the property */Person/url* as the unique identifier (*IRI*) for the entity. I suppose it is not a problem of the extractor but on how the page was created. But as many people are using schema.org I was wondering if there is any solution for this case. I would be very glad if someone has any idea of a solution. <file:index.html%3Fref_=fn_al_tt_4> <http://purl.org/dc/terms/title> "Copacabana (2001) - IMDb" . _:nodee59ff091c1fa911a94a42244c38ab99a < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Movie> . *_:nodec984d7c9ee5436ea92571ccd94b946 <* *http://www.w3.org/1999/02/22-rdf-syntax-ns#type* <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>*> <* *http://schema.org/Person* <http://schema.org/Person> *> . **_:nodec984d7c9ee5436ea92571ccd94b946 <* *http://schema.org/Person/name* <http://schema.org/Person/name> *> "Marco Nanini" .**_:nodec984d7c9ee5436ea92571ccd94b946 <* *http://schema.org/Person/url* <http://schema.org/Person/url> *> <file:/name/nm0620847/?ref_=tt_cl_t1> . **_:nodee59ff091c1fa911a94a42244c38ab99a <**http://schema.org/Movie/actor* <http://schema.org/Movie/actor>*> _:nodec984d7c9ee5436ea92571ccd94b946 .* _:nodebf90e351418e786432aede35cceb807 < http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> . _:nodebf90e351418e786432aede35cceb807 <http://schema.org/Person/name> "Walderez de Barros" . _:nodebf90e351418e786432aede35cceb807 <http://schema.org/Person/url> <file:/name/nm0207281/?ref_=tt_cl_t2> . _:nodee59ff091c1fa911a94a42244c38ab99a <http://schema.org/Movie/actor> _:nodebf90e351418e786432aede35cceb807 . Best Regards, Bianca Pereira
