Hi Kingsley, Thanks for your advice, I think there's a problem with my copy of Virtuoso since there was no cartridge to select previously (in crawler's setup page) and only after I went to RDF > Sponger > Cartridges/Meta cartridges, unchecked everything and rechecked them back again, they started to appear in my crawler's setup page to choose from.
Now after crawling with only Freebase as cartridge and Freebase NYTC and NYTCF as meta cartridges, I only get one page crawled, though it's correctly sponged and inserted into quad store via rdf_sink. I haven't activated 'Single page download', and I've tried any combination of http://rdf.freebase.com/ as 'Follow links matching' but still just one page. And even after activating all the cartridges back, I still don't get more than 1 page which seems odd to me, because before having the list of cartridges on the crawler's setup page, it started to crawl perfectly but the problem was with sponging and noise and all that stuff but now I can't get it to crawl more than 1 page with any setup. And there's one more problem which is every time I setup a new crawl job, the entry for the cartridge gets doubled down there on the RDF Cartridge part. Please refer to this image: http://i.imgur.com/3KLUx.png <http://i.imgur.com/3KLUx.png>Please note that: 1- I had previously removed this instance's virtuoso.db file from disk but it was perfectly (as it seems) regenerated. 2- By all of cartridges, I mean all except the first one (HTTP in RDF). So any ideas on where to start debugging the problem ? I already have tried enabling Sponger's logging feature but I get nothing in the log. Maybe I have to reset Crawler's settings or something like that. Thanks for your time and sorry for my wall of text. Best, Parsa On Thu, Sep 23, 2010 at 7:18 PM, Parsa Ghaffari <[email protected]>wrote: > Dear all, > > I'm trying to make a mashup of DBPedia and Freebase. At a query level I > know I can use SPARQL pragmas for owl:sameAs but I think it limits me to 2.4 > million interlinked concepts and on the Virtuoso's built-in crawler side, I > think I'll get some noise (i.e. data other than rdf.freebase.com) and I > still don't understand the crawler enough to know that if it crawls the > whole Freebase or not. Can someone address me on these 2 issues please ? > Thanks a lot. > > Best, > Parsa > -- Parsa B. Ghaffari
