Rupert, any clues on this problem?

The resources below have http://zh.dbpedia.org. That does not exist. Does
it cause any problems? I did

curl http://downloads.dbpedia.org/3.6/zh/page_links_zh.nt.bz2 \
        | bzcat \
        | sed -e 's/.*<http\:\/\/dbpedia\.org\/resource\/\([^>]*\)> ./\1/' \
        | sort \
        | uniq -c  \
        | sort -nr > incoming_links.txt

to generate chinese incoming_links.txt.

-harish

On Thu, Aug 23, 2012 at 2:15 PM, harish suvarna <[email protected]> wrote:

> OK. Great. It may be easy to fix then. here are few lines.
>
> 1192 <
> http://zh.dbpedia.org/resource/\u7121\u7DAB\u96FB\u8996\u5916\u8CFC\u7F8E\u570B\u96FB\u5F71\u5217\u8868>
> <http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/\u660E\u73E0\u53F0> .
>  876 <
> http://zh.dbpedia.org/resource/NGC\u5929\u4F53\u5217\u8868_(1000-1999)> <
> http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/\u661F\u7CFB> .
>  781 <
> http://zh.dbpedia.org/resource/\u7121\u7DAB\u96FB\u8996\u7BC0\u76EE\u5217\u8868>
> <http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/\u7FE1\u7FE0\u53F0> .
>  611 <
> http://zh.dbpedia.org/resource/\u7121\u7DAB\u96FB\u8996\u5916\u8CFC\u52D5\u756B\u5217\u8868>
> <http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/\u7FE1\u7FE0\u53F0> .
>  573 <http://zh.dbpedia.org/resource/NGC\u5929\u4F53\u5217\u8868_(1-999)>
> <http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/\u661F\u7CFB> .
>  519 <
> http://zh.dbpedia.org/resource/\u540D\u5075\u63A2\u67EF\u5357\u52D5\u756B\u96C6\u6578\u5217\u8868>
> <http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/\u540D\u5075\u63A2\u67EF\u5357\u6F2B\u756B\u5217\u8868>
> .
>  384 <
> http://zh.dbpedia.org/resource/2006\u5E74\u9999\u6E2F\u9078\u8209\u59D4\u54E1\u6703\u754C\u5225\u5206\u7D44\u9078\u8209>
> <http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/File:Black_check.svg> .
>  366 <
> http://zh.dbpedia.org/resource/\u5A1B\u6A02\u767E\u5206\u767E\u7BC0\u76EE\u5217\u8868_(2007\u5E74)>
> <http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/\u5C0F\u9B3C> .
>  365 <
> http://zh.dbpedia.org/resource/\u7C21\u7E41\u8F49\u63DB\u4E00\u5C0D\u591A\u5217\u8868>
> <http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/File:Cmbox_move.png> .
>  355 <
> http://zh.dbpedia.org/resource/\u5A1B\u6A02\u767E\u5206\u767E\u7BC0\u76EE\u5217\u8868_(2007\u5E74)>
> <http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/\u5C0F\u8C6C> .
> 7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> <
> http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/Category:\u90B5\u9633\u4EBA> .
>    7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> <
> http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/Category:\u8523\u59D3> .
>    7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> <
> http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/Category:\u806F\u5408\u570B\u5B89\u5168\u7406\u4E8B\u6703\u4E3B\u5E2D>
> .
>    7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> <
> http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/Category:\u570B\u7ACB\u6E05\u83EF\u5927\u5B78\u6559\u6388>
> .
>    7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> <
> http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/Category:\u54E5\u502B\u6BD4\u4E9E\u5927\u5B78\u6821\u53CB>
> .
>    7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> <
> http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/Category:\u53F0\u7063\u5916\u7701\u4EBA> .
>    7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> <
> http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/Category:\u5357\u958B\u5927\u5B78\u6559\u6388>
> .
>    7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> <
> http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/Category:\u4E2D\u83EF\u6C11\u570B\u99D0\u8607\u806F\u5927\u4F7F>
> .
>    7 <http://zh.dbpedia.org/resource/\u8523\u5EF7\u9EFB> <
> http://dbpedia.org/ontology/wikiPageWikiLink> <
> http://zh.dbpedia.org/resource/Category:\u4E2D\u83EF\u6C11\u570B\u99D0\u7F8E\u570B\u5927\u4F7F>
> .
>
>
>
> On Thu, Aug 23, 2012 at 1:37 PM, Rupert Westenthaler <
> [email protected]> wrote:
>
>> Hi,
>>
>> one more thing. Can you please post me the first few lines of
>>
>>  {indexing-source}/indexing/resource/incoming_links.txt
>>
>> so that I can check the data against the configuration of the
>> iditerator.properties file
>>
>> best
>> Rupert
>>
>> On Thu, Aug 23, 2012 at 10:31 PM, Rupert Westenthaler
>> <[email protected]> wrote:
>> > Hi
>> >
>> > The log shows clearly that you only import the triples from the dumps
>> > to the Jena TDB triple store used as Source for the indexing.
>> >
>> > See all the lines such as
>> >
>> >     8:14:08,196 [Thread-5] INFO  tdb.loader - Add: 50,000 triples
>> > (Batch: 3,256 / Avg: 3,256)
>> >     08:14:12,802 [Thread-5] INFO  tdb.loader - Add: 100,000 triples
>> > (Batch: 10,855 / Avg: 5,009)
>> >
>> > BTW: this needs only to be done once. After this initialization step
>> > completes you can remove the RDF files from
>> > "{indexing-root}/indexing/resources/rdfdata/" (I usually just rename
>> > the rdfdata folder to imported-rdfdata).
>> >
>> > The ~1.5hrs are just the time needed to import the data from the RDF
>> > dumps to the Jena TDB store.
>> >
>> > With
>> >
>> >     08:18:04,242 [main] INFO  impl.IndexerImpl - Indexing started ...
>> >
>> > the indexing starts and
>> >
>> >     08:21:03,176 [Indexing: Finished Entity Logger Deamon] INFO
>> > impl.IndexerImpl - Indexed 0 items in 1410320sec (Infinityms/item):
>> > processing:  -1.000ms/item | queue:  -1.000ms
>> >
>> > states clearly that no single Entity was indexed.
>> >
>> > I guess this has to do with the configuration. I will have a look at
>> > it tomorrow morning.
>> >
>> > best
>> > Rupert
>> >
>> > On Thu, Aug 23, 2012 at 9:53 PM, harish suvarna <[email protected]>
>> wrote:
>> >> I am attaching the zip of config folder. The indexing takes quiet some
>> time
>> >> (~1.5hrs). The number of triples it generates is high.
>> >> I am attaching the english indexing output also. I used 10 files
>> (except
>> >> long_abstarcts_en.nt, it is 2.5 GB and I could not save it in utf8 on
>> my
>> >> mac.). But for Chinese I had all files.
>> >> -harish
>> >>
>> >>
>> >> On Thu, Aug 23, 2012 at 12:27 PM, Rupert Westenthaler
>> >> <[email protected]> wrote:
>> >>>
>> >>> I would expect the dbpedia.solrindex.zip file to be several hundreds
>> >>> MByte in size (if not gigabytes).
>> >>>
>> >>> The only explanation for this file to be so small is that something is
>> >>> going wrong during indexing.
>> >>>
>> >>> Can you maybe provide the {indexing-root}/indexing/config folder so
>> >>> that I can have a look at your configuration
>> >>>
>> >>> best
>> >>> Rupert
>> >>>
>> >>> On Thu, Aug 23, 2012 at 5:49 PM, harish suvarna <[email protected]>
>> >>> wrote:
>> >>> >
>> >>> > Rupert,
>> >>> > I generated the index for dbpedia3.8 English files only.
>> >>> > One thing that intrigues me is that the dbpedia.solrindex.zip
>> filesize
>> >>> > is
>> >>> > 53kb, same when I generated for chinese. The english files are much
>> >>> > bigger.
>> >>> > In the english zip also, I can't find paris.
>> >>> > I am attaching English dbpedia.solrindex.zip for any clues.
>> >>> > Do I need to load the bundle jar file created by the dbpedia
>> indexing?
>> >>> >
>> >>> > -harish
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> | Rupert Westenthaler             [email protected]
>> >>> | Bodenlehenstraße 11                             ++43-699-11108907
>> >>> | A-5500 Bischofshofen
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Thanks
>> >> Harish
>> >>
>> >
>> >
>> >
>> > --
>> > | Rupert Westenthaler             [email protected]
>> > | Bodenlehenstraße 11                             ++43-699-11108907
>> > | A-5500 Bischofshofen
>>
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> Thanks
> Harish
>
>


-- 
Thanks
Harish

Reply via email to