FYI

On 2/13/13 5:26 PM, [email protected] wrote:
A system called parallelSuperEyeball has been added to the freebase processing chain. I took apart the parser from the Jena framework to extract something that parses individual nodes in N-Triples files so that invalid triples do not stop the triple parsing process. The earlier partitionFreebaseRDF removes superfluous information and reformats the data for scalable parallel processing. I call the resulting product, which partitions valid and invalid facts from Freebase, ":BaseKB Lime", and it's a refereshing alternative to the difficulties that people have with off-brand
Linked Data products that don't conform to industry standards.
You can confirm these claim for yourself by downloading
https://github.com/paulhoule/infovore/archive/t20130213.tar.gz
cd infovore
mvn clean install
cd hydroxide-apps
mvn appassembler::assemble
cd ..
source ./hydroxide-apps/path.sh
export INFOVORE_BASE=/freebase/
export INFOVORE_FREEBASE_FILE=/freebase/freebase-rdf-2013-01-27-00-00.gz
export INFOVORE_INSTANCE=2013-01-27
mkdir /freebase/data/$INFOVORE_INSTANCE
partitionFreebaseRDF
superParallelEyeball
And then in /freebase/data/2013-01-27/work you'll find
baseKBLime -- 716 million valid triples to load in your RDF store or otherwise use
baseKBLimeRejected -- 13 million invalid "triples"
freebase-raw-rejected.tsv -- quite literally a handful of completely broken lines from the quad dump that don't even end with a period. I'm planning on fine tuning the rules on what the first stage accepts, getting a newer version of the quad dump, and publishing :BaseKB Lime for download soon.


_______________________________________________
You are receiving this message because you are subscribed to the 
Freebase-discuss mailing list.
To post a message to the list: [email protected]
To unsubscribe, view archives, etc: 
http://lists.freebase.com/mailman/listinfo/freebase-discuss


--

Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen




Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to