Rupert, any clues on this problem? The resources below have http://zh.dbpedia.org. That does not exist. Does it cause any problems? I did
curl http://downloads.dbpedia.org/3.6/zh/page_links_zh.nt.bz2 \ | bzcat \ | sed -e 's/.*<http\:\/\/dbpedia\.org\/resource\/\([^>]*\)> ./\1/' \ | sort \ | uniq -c \ | sort -nr > incoming_links.txt to generate chinese incoming_links.txt. -harish On Thu, Aug 23, 2012 at 2:15 PM, harish suvarna <[email protected]> wrote: > OK. Great. It may be easy to fix then. here are few lines. > > 1192 < > http://zh.dbpedia.org/resource/\u7121\u7DAB\u96FB\u8996\u5916\u8CFC\u7F8E\u570B\u96FB\u5F71\u5217\u8868> > <http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/\u660E\u73E0\u53F0> . > 876 < > http://zh.dbpedia.org/resource/NGC\u5929\u4F53\u5217\u8868_(1000-1999)> < > http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/\u661F\u7CFB> . > 781 < > http://zh.dbpedia.org/resource/\u7121\u7DAB\u96FB\u8996\u7BC0\u76EE\u5217\u8868> > <http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/\u7FE1\u7FE0\u53F0> . > 611 < > http://zh.dbpedia.org/resource/\u7121\u7DAB\u96FB\u8996\u5916\u8CFC\u52D5\u756B\u5217\u8868> > <http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/\u7FE1\u7FE0\u53F0> . > 573 <http://zh.dbpedia.org/resource/NGC\u5929\u4F53\u5217\u8868_(1-999)> > <http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/\u661F\u7CFB> . > 519 < > http://zh.dbpedia.org/resource/\u540D\u5075\u63A2\u67EF\u5357\u52D5\u756B\u96C6\u6578\u5217\u8868> > <http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/\u540D\u5075\u63A2\u67EF\u5357\u6F2B\u756B\u5217\u8868> > . > 384 < > http://zh.dbpedia.org/resource/2006\u5E74\u9999\u6E2F\u9078\u8209\u59D4\u54E1\u6703\u754C\u5225\u5206\u7D44\u9078\u8209> > <http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/File:Black_check.svg> . > 366 < > http://zh.dbpedia.org/resource/\u5A1B\u6A02\u767E\u5206\u767E\u7BC0\u76EE\u5217\u8868_(2007\u5E74)> > <http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/\u5C0F\u9B3C> . > 365 < > http://zh.dbpedia.org/resource/\u7C21\u7E41\u8F49\u63DB\u4E00\u5C0D\u591A\u5217\u8868> > <http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/File:Cmbox_move.png> . > 355 < > http://zh.dbpedia.org/resource/\u5A1B\u6A02\u767E\u5206\u767E\u7BC0\u76EE\u5217\u8868_(2007\u5E74)> > <http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/\u5C0F\u8C6C> . > 7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> < > http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/Category:\u90B5\u9633\u4EBA> . > 7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> < > http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/Category:\u8523\u59D3> . > 7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> < > http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/Category:\u806F\u5408\u570B\u5B89\u5168\u7406\u4E8B\u6703\u4E3B\u5E2D> > . > 7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> < > http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/Category:\u570B\u7ACB\u6E05\u83EF\u5927\u5B78\u6559\u6388> > . > 7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> < > http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/Category:\u54E5\u502B\u6BD4\u4E9E\u5927\u5B78\u6821\u53CB> > . > 7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> < > http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/Category:\u53F0\u7063\u5916\u7701\u4EBA> . > 7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> < > http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/Category:\u5357\u958B\u5927\u5B78\u6559\u6388> > . > 7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> < > http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/Category:\u4E2D\u83EF\u6C11\u570B\u99D0\u8607\u806F\u5927\u4F7F> > . > 7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> < > http://dbpedia.org/ontology/wikiPageWikiLink> < > http://zh.dbpedia.org/resource/Category:\u4E2D\u83EF\u6C11\u570B\u99D0\u7F8E\u570B\u5927\u4F7F> > . > > > > On Thu, Aug 23, 2012 at 1:37 PM, Rupert Westenthaler < > [email protected]> wrote: > >> Hi, >> >> one more thing. Can you please post me the first few lines of >> >> {indexing-source}/indexing/resource/incoming_links.txt >> >> so that I can check the data against the configuration of the >> iditerator.properties file >> >> best >> Rupert >> >> On Thu, Aug 23, 2012 at 10:31 PM, Rupert Westenthaler >> <[email protected]> wrote: >> > Hi >> > >> > The log shows clearly that you only import the triples from the dumps >> > to the Jena TDB triple store used as Source for the indexing. >> > >> > See all the lines such as >> > >> > 8:14:08,196 [Thread-5] INFO tdb.loader - Add: 50,000 triples >> > (Batch: 3,256 / Avg: 3,256) >> > 08:14:12,802 [Thread-5] INFO tdb.loader - Add: 100,000 triples >> > (Batch: 10,855 / Avg: 5,009) >> > >> > BTW: this needs only to be done once. After this initialization step >> > completes you can remove the RDF files from >> > "{indexing-root}/indexing/resources/rdfdata/" (I usually just rename >> > the rdfdata folder to imported-rdfdata). >> > >> > The ~1.5hrs are just the time needed to import the data from the RDF >> > dumps to the Jena TDB store. >> > >> > With >> > >> > 08:18:04,242 [main] INFO impl.IndexerImpl - Indexing started ... >> > >> > the indexing starts and >> > >> > 08:21:03,176 [Indexing: Finished Entity Logger Deamon] INFO >> > impl.IndexerImpl - Indexed 0 items in 1410320sec (Infinityms/item): >> > processing: -1.000ms/item | queue: -1.000ms >> > >> > states clearly that no single Entity was indexed. >> > >> > I guess this has to do with the configuration. I will have a look at >> > it tomorrow morning. >> > >> > best >> > Rupert >> > >> > On Thu, Aug 23, 2012 at 9:53 PM, harish suvarna <[email protected]> >> wrote: >> >> I am attaching the zip of config folder. The indexing takes quiet some >> time >> >> (~1.5hrs). The number of triples it generates is high. >> >> I am attaching the english indexing output also. I used 10 files >> (except >> >> long_abstarcts_en.nt, it is 2.5 GB and I could not save it in utf8 on >> my >> >> mac.). But for Chinese I had all files. >> >> -harish >> >> >> >> >> >> On Thu, Aug 23, 2012 at 12:27 PM, Rupert Westenthaler >> >> <[email protected]> wrote: >> >>> >> >>> I would expect the dbpedia.solrindex.zip file to be several hundreds >> >>> MByte in size (if not gigabytes). >> >>> >> >>> The only explanation for this file to be so small is that something is >> >>> going wrong during indexing. >> >>> >> >>> Can you maybe provide the {indexing-root}/indexing/config folder so >> >>> that I can have a look at your configuration >> >>> >> >>> best >> >>> Rupert >> >>> >> >>> On Thu, Aug 23, 2012 at 5:49 PM, harish suvarna <[email protected]> >> >>> wrote: >> >>> > >> >>> > Rupert, >> >>> > I generated the index for dbpedia3.8 English files only. >> >>> > One thing that intrigues me is that the dbpedia.solrindex.zip >> filesize >> >>> > is >> >>> > 53kb, same when I generated for chinese. The english files are much >> >>> > bigger. >> >>> > In the english zip also, I can't find paris. >> >>> > I am attaching English dbpedia.solrindex.zip for any clues. >> >>> > Do I need to load the bundle jar file created by the dbpedia >> indexing? >> >>> > >> >>> > -harish >> >>> >> >>> >> >>> >> >>> -- >> >>> | Rupert Westenthaler [email protected] >> >>> | Bodenlehenstraße 11 ++43-699-11108907 >> >>> | A-5500 Bischofshofen >> >> >> >> >> >> >> >> >> >> -- >> >> Thanks >> >> Harish >> >> >> > >> > >> > >> > -- >> > | Rupert Westenthaler [email protected] >> > | Bodenlehenstraße 11 ++43-699-11108907 >> > | A-5500 Bischofshofen >> >> >> >> -- >> | Rupert Westenthaler [email protected] >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> > > > > -- > Thanks > Harish > > -- Thanks Harish
