Hi Paul,
If you could put those extracted valid triples ( 716 million valid triples from Freebase) for download, it would be very helpful. Thanks. ________________________________ From: Kingsley Idehen <[email protected]> To: "[email protected]" <[email protected]> Sent: Thursday, February 14, 2013 4:03 AM Subject: Re: [Freebase-discuss] [BULK] 13 Million triples are invalid in the Freebase Quad Dump FYI On 2/13/13 5:26 PM, [email protected] wrote: A system called parallelSuperEyeball has been added to the freebase processing chain. I took apart the parser from the Jena framework to extract something that parses individual nodes in N-Triples files so that invalid triples do not stop the triple parsing process. The earlier partitionFreebaseRDF removes superfluous information and reformats the data for scalable parallel processing. > >I call the resulting product, which partitions valid and invalid facts from >Freebase, “:BaseKB Lime”, and it’s a refereshing alternative to the >difficulties that people have with off-brand >Linked Data products that don’t conform to industry standards. > >You can confirm these claim for yourself by downloading > >https://github.com/paulhoule/infovore/archive/t20130213.tar.gz > >cd infovore >mvn clean install >cd hydroxide-apps >mvn appassembler::assemble >cd .. >source ./hydroxide-apps/path.sh >export INFOVORE_BASE=/freebase/ >export INFOVORE_FREEBASE_FILE=/freebase/freebase-rdf-2013-01-27-00-00.gz >export INFOVORE_INSTANCE=2013-01-27 >mkdir /freebase/data/$INFOVORE_INSTANCE > >partitionFreebaseRDF >superParallelEyeball > >And then in /freebase/data/2013-01-27/work you’ll find > >baseKBLime – 716 million valid triples to load in your RDF store or otherwise >use >baseKBLimeRejected – 13 million invalid “triples” >freebase-raw-rejected.tsv – quite literally a handful of completely broken >lines from the quad dump that don’t even end with a period. > >I’m planning on fine tuning the rules on what the first stage accepts, >getting a newer version of the quad dump, and publishing :BaseKB Lime for >download soon. > > > > >_______________________________________________ You are receiving this message because you are subscribed to the Freebase-discuss mailing list. To post a message to the list: [email protected] To unsubscribe, view archives, etc: http://lists.freebase.com/mailman/listinfo/freebase-discuss -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen
