FYI On 2/13/13 5:26 PM, [email protected] wrote:
A system called parallelSuperEyeball has been added to the freebase processing chain. I took apart the parser from the Jena framework to extract something that parses individual nodes in N-Triples files so that invalid triples do not stop the triple parsing process. The earlier partitionFreebaseRDF removes superfluous information and reformats the data for scalable parallel processing. I call the resulting product, which partitions valid and invalid facts from Freebase, ":BaseKB Lime", and it's a refereshing alternative to the difficulties that people have with off-brandLinked Data products that don't conform to industry standards. You can confirm these claim for yourself by downloading https://github.com/paulhoule/infovore/archive/t20130213.tar.gz cd infovore mvn clean install cd hydroxide-apps mvn appassembler::assemble cd .. source ./hydroxide-apps/path.sh export INFOVORE_BASE=/freebase/ export INFOVORE_FREEBASE_FILE=/freebase/freebase-rdf-2013-01-27-00-00.gz export INFOVORE_INSTANCE=2013-01-27 mkdir /freebase/data/$INFOVORE_INSTANCE partitionFreebaseRDF superParallelEyeball And then in /freebase/data/2013-01-27/work you'll findbaseKBLime -- 716 million valid triples to load in your RDF store or otherwise usebaseKBLimeRejected -- 13 million invalid "triples"freebase-raw-rejected.tsv -- quite literally a handful of completely broken lines from the quad dump that don't even end with a period. I'm planning on fine tuning the rules on what the first stage accepts, getting a newer version of the quad dump, and publishing :BaseKB Lime for download soon._______________________________________________ You are receiving this message because you are subscribed to the Freebase-discuss mailing list. To post a message to the list: [email protected] To unsubscribe, view archives, etc: http://lists.freebase.com/mailman/listinfo/freebase-discuss
-- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen
smime.p7s
Description: S/MIME Cryptographic Signature
