I removed ~/stanbol folder. It is not helping. Let me clear the trunk/stanbol folder and see what happens. I suspect some cache clearnace problem.
-harish On Wed, Aug 1, 2012 at 10:48 AM, Walter Kasper <[email protected]> wrote: > harish suvarna wrote: > >> I did ' mvn clean install'. >> Which stanbol folder is this ? >> >> $HOME/stanbol where it sores some user/config prefs or trunk/stanbol? You >> mean remove the enitre folder? >> > > I guess it is $HOME/stanbol where the runtime config data are stored. I > usually clear the complete folder for a clean restart. > > >> I restarted the machine and doing another mvn clean install now. I will >> post you in another 30 mins. >> >> -harish >> >> On Wed, Aug 1, 2012 at 10:36 AM, Walter Kasper <[email protected]> wrote: >> >> Hi again, >>> >>> It came to my mind that you should also clear the 'stanbol' folder of the >>> Stanbol runtime system and restart the sysem. The folder might contain >>> old >>> bundle configuration data that don't get updated automatically. >>> >>> >>> Best regards, >>> >>> Walter >>> >>> harish suvarna wrote: >>> >>> Did a fresh build and inside Stanbol in localhost:8080, it is installed >>>> but >>>> is not activated. I still see the com.google.inject errors. >>>> I do see the pom.xml update from you. >>>> >>>> -harish >>>> >>>> On Wed, Aug 1, 2012 at 12:55 AM, Walter Kasper <[email protected]> wrote: >>>> >>>> Hi, >>>> >>>>> The OSGI bundlöe declared some package imports that usually indeed are >>>>> not >>>>> available nor required. I fixed that. Just check out the corrected >>>>> pom.xml. >>>>> On a fresh clean Stanbol installation langdetect worked fine for me. >>>>> >>>>> >>>>> Best regards, >>>>> >>>>> Walter >>>>> >>>>> harish suvarna wrote: >>>>> >>>>> Thanks Dr Walter. langdetect is very useful. I could successfully >>>>> >>>>>> compile >>>>>> it but unable to load into stanbol as I get th error >>>>>> ====== >>>>>> ERROR: Bundle org.apache.stanbol.enhancer.******engines.langdetect >>>>>> [177]: >>>>>> Error >>>>>> starting/stopping bundle. (org.osgi.framework.******BundleException: >>>>>> Unresolved >>>>>> constraint in bundle org.apache.stanbol.enhancer.**** >>>>>> **engines.langdetect >>>>>> >>>>>> [177]: >>>>>> Unable to resolve 177.0: missing requirement [177.0] package; >>>>>> (package=com.google.inject)) >>>>>> org.osgi.framework.******BundleException: Unresolved constraint in >>>>>> bundle >>>>>> org.apache.stanbol.enhancer.******engines.langdetect [177]: Unable to >>>>>> >>>>>> resolve >>>>>> >>>>>> 177.0: missing requirement [177.0] package; >>>>>> (package=com.google.inject) >>>>>> at org.apache.felix.framework.***** >>>>>> *Felix.resolveBundle(Felix.** >>>>>> java:3443) >>>>>> at org.apache.felix.framework.***** >>>>>> *Felix.startBundle(Felix.java:**** >>>>>> **1727) >>>>>> at org.apache.felix.framework.***** >>>>>> *Felix.setBundleStartLevel(** >>>>>> Felix.java:1333) >>>>>> at >>>>>> org.apache.felix.framework.******StartLevelImpl.run(** >>>>>> StartLevelImpl.java:270) >>>>>> at java.lang.Thread.run(Thread.******java:680) >>>>>> >>>>>> >>>>>> ============== >>>>>> >>>>>> I added the dependency >>>>>> <dependency> >>>>>> <groupId>com.google.inject</******groupId> >>>>>> >>>>>> >>>>>> <artifactId>guice</artifactId> >>>>>> <version>3.0</version> >>>>>> </dependency> >>>>>> >>>>>> but looks like it is looking for version 1.3.0, which I can't find in >>>>>> repo1.maven.org. I am not sure who is needing the inject library. The >>>>>> entire source of langdetect plugin does not contain the word inject. >>>>>> Only >>>>>> the manifest file in target/classes has this listed. >>>>>> >>>>>> >>>>>> -harish >>>>>> >>>>>> On Tue, Jul 31, 2012 at 1:32 AM, Walter Kasper <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Hi Harish, >>>>>> >>>>>> I checked in a new language identifier for Stanbol based on >>>>>>> http://code.google.com/p/********language-detection/<http://code.google.com/p/******language-detection/> >>>>>>> <http://**code.google.com/p/******language-detection/<http://code.google.com/p/****language-detection/> >>>>>>> > >>>>>>> <http://**code.google.com/p/****language-**detection/<http://code.google.com/p/**language-**detection/> >>>>>>> <http://**code.google.com/p/**language-**detection/<http://code.google.com/p/**language-detection/> >>>>>>> > >>>>>>> <http://**code.google.com/p/****language-**detection/<http://code.google.com/p/**language-**detection/> >>>>>>> <http://**code.google.com/p/language-****detection/<http://code.google.com/p/language-**detection/> >>>>>>> > >>>>>>> >>>>>>> <http://**code.google.com/p/**language-**detection/<http://code.google.com/p/language-**detection/> >>>>>>> <http://**code.google.com/p/language-**detection/<http://code.google.com/p/language-detection/> >>>>>>> > >>>>>>> . >>>>>>> Just check out from Stanbol trunk, install and try out. >>>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Walter >>>>>>> >>>>>>> harish suvarna wrote: >>>>>>> >>>>>>> Rupert, >>>>>>> >>>>>>> My initial debugging for Chinese text told me that the language >>>>>>>> identification done by langid enhancer using apache tika does not >>>>>>>> recognize >>>>>>>> chinese. The tika language detection seems is not supporting the CJK >>>>>>>> languages. With the result, the chinese language is identified as >>>>>>>> lithuanian language 'lt' . The apache tika group has an enhancement >>>>>>>> item >>>>>>>> 856 registered for detecting cjk languages >>>>>>>> >>>>>>>> https://issues.apache.org/********jira/browse/TIKA-856<https://issues.apache.org/******jira/browse/TIKA-856> >>>>>>>> <https:/**/issues.apache.org/****jira/**browse/TIKA-856<https://issues.apache.org/****jira/browse/TIKA-856> >>>>>>>> > >>>>>>>> <https://**issues.apache.org/****jira/**browse/TIKA-856<http://issues.apache.org/**jira/**browse/TIKA-856> >>>>>>>> <https:**//issues.apache.org/**jira/**browse/TIKA-856<https://issues.apache.org/**jira/browse/TIKA-856> >>>>>>>> > >>>>>>>> <https://**issues.apache.org/****jira/browse/**TIKA-856<http://issues.apache.org/**jira/browse/**TIKA-856> >>>>>>>> <http:/**/issues.apache.org/jira/**browse/**TIKA-856<http://issues.apache.org/jira/browse/**TIKA-856> >>>>>>>> > >>>>>>>> <https:/**/issues.apache.org/**jira/**browse/TIKA-856<http://issues.apache.org/jira/**browse/TIKA-856> >>>>>>>> <https:/**/issues.apache.org/jira/**browse/TIKA-856<https://issues.apache.org/jira/browse/TIKA-856> >>>>>>>> > >>>>>>>> >>>>>>>> in Feb 2012. I am not sure about the use of language >>>>>>>> identification >>>>>>>> in >>>>>>>> stanbol yet. Is the language id used to select the dbpedia index >>>>>>>> (approprite dbpedia language dump) for entity lookups? >>>>>>>> >>>>>>>> >>>>>>>> I am just thinking that, for my purpose, pick option 3 and make sure >>>>>>>> that >>>>>>>> it is of my language of my interest and then call paoding segmenter. >>>>>>>> Then >>>>>>>> iterate over the ngrams and do an entityhub lookup. I just still >>>>>>>> need >>>>>>>> to >>>>>>>> understand the code around how the whole entity lookup for dbpedia >>>>>>>> works. >>>>>>>> >>>>>>>> I find that the language detection library >>>>>>>> http://code.google.com/p/********language-detection/<http://code.google.com/p/******language-detection/> >>>>>>>> <http://**code.google.com/p/******language-detection/<http://code.google.com/p/****language-detection/> >>>>>>>> > >>>>>>>> <http://**code.google.com/p/****language-**detection/<http://code.google.com/p/**language-**detection/> >>>>>>>> <http://**code.google.com/p/**language-**detection/<http://code.google.com/p/**language-detection/> >>>>>>>> > >>>>>>>> <http://**code.google.com/p/****language-**detection/<http://code.google.com/p/**language-**detection/> >>>>>>>> <http://**code.google.com/p/language-****detection/<http://code.google.com/p/language-**detection/> >>>>>>>> > >>>>>>>> >>>>>>>> <http://**code.google.com/p/**language-**detection/<http://code.google.com/p/language-**detection/> >>>>>>>> <http://**code.google.com/p/language-**detection/<http://code.google.com/p/language-detection/> >>>>>>>> > >>>>>>>> >>>>>>>>> is >>>>>>>>>> >>>>>>>>> very good at language >>>>>>>> >>>>>>>> detection. It supports 53 languages out of box and the quality seems >>>>>>>> good. >>>>>>>> It is apache 2.0 license. I could volunteer to create a new langid >>>>>>>> engine >>>>>>>> based on this with the stanbol community approval. So if anyone >>>>>>>> sheds >>>>>>>> some >>>>>>>> light on how to add a new java library into stanbol, that be great. >>>>>>>> I >>>>>>>> am a >>>>>>>> maven beginner now. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> harish >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 26, 2012 at 9:46 PM, Rupert Westenthaler < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>> Hi harish, >>>>>>>> >>>>>>>> Note: Sorry I forgot to include the stanbol-dev mailing list in my >>>>>>>> >>>>>>>>> last >>>>>>>>> answer. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jul 27, 2012 at 3:33 AM, harish suvarna < >>>>>>>>> [email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Thanks a lot Rupert. >>>>>>>>> >>>>>>>>> I am weighing between options 2 and 3. What is the difference? >>>>>>>>>> Optiion 2 >>>>>>>>>> sounds like enhancing KeyWordLinkingEngine to deal with chinese >>>>>>>>>> text. >>>>>>>>>> It >>>>>>>>>> >>>>>>>>>> may >>>>>>>>>> >>>>>>>>>> be like paoding is hardcoded into KeyWordLinkingEngine. >>>>>>>>> Option 3 is >>>>>>>>> >>>>>>>>> like >>>>>>>>>> >>>>>>>>>> a >>>>>>>>>> >>>>>>>>>> separate engine. >>>>>>>>> >>>>>>>>> Option (2) will require some work improvements on the Stanbol >>>>>>>>>> side. >>>>>>>>>> >>>>>>>>>> However there where already discussion on how to create a "text >>>>>>>>> processing chain" that allows to split up things like tokenizing, >>>>>>>>> POS >>>>>>>>> tagging, Lemmatizing ... in different Enhancement Engines without >>>>>>>>> suffering form disadvantages of creating high amounts of RDF >>>>>>>>> triples. >>>>>>>>> One Idea was to base this on the Apache Lucene TokenStream [1] API >>>>>>>>> and >>>>>>>>> share the data as ContentPart [2] of the ContentItem. >>>>>>>>> >>>>>>>>> Option (3) indeed means that you will create your own >>>>>>>>> EnhancementEngine - a similar one to the KeywordLinkingEngine. >>>>>>>>> >>>>>>>>> But will I be able to use the stanbol dbpedia lookup using >>>>>>>>> option >>>>>>>>> 3? >>>>>>>>> Yes. You need only to obtain a Entityhub "ReferencedSite" and use >>>>>>>>> the >>>>>>>>> "FieldQuery" interface to search for Entities (see [1] for an >>>>>>>>> example) >>>>>>>>> >>>>>>>>> best >>>>>>>>> Rupert >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> http://blog.mikemccandless.********com/2012/04/lucenes-** >>>>>>>>> tokenstreams-are-actually.******html<http://blog.** >>>>>>>>> mikemccandless.com/2012/04/******lucenes-tokenstreams-are-****<http://mikemccandless.com/2012/04/****lucenes-tokenstreams-are-****> >>>>>>>>> actually.html<http://**mikemccandless.com/2012/04/**** >>>>>>>>> lucenes-tokenstreams-are-****actually.html<http://mikemccandless.com/2012/04/**lucenes-tokenstreams-are-**actually.html> >>>>>>>>> > >>>>>>>>> >>>>>>>>> <http://blog.**mikemccandless.**com/2012/04/**<http://mikemccandless.com/2012/04/**> >>>>>>>>> lucenes-tokenstreams-are-****actually.html<http://blog.** >>>>>>>>> mikemccandless.com/2012/04/**lucenes-tokenstreams-are-** >>>>>>>>> actually.html<http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html> >>>>>>>>> > >>>>>>>>> [2] >>>>>>>>> http://incubator.apache.org/********stanbol/docs/trunk/**<http://incubator.apache.org/******stanbol/docs/trunk/**> >>>>>>>>> components/****<http://**incubator.apache.org/****** >>>>>>>>> stanbol/docs/trunk/components/******<http://incubator.apache.org/****stanbol/docs/trunk/components/****> >>>>>>>>> > >>>>>>>>> <http://**incubator.apache.**org/****stanbol/docs/trunk/** >>>>>>>>> components/**<http://incubator.apache.org/****stanbol/docs/trunk/components/**> >>>>>>>>> ** <http://incubator.apache.org/****stanbol/docs/trunk/** >>>>>>>>> components/**<http://incubator.apache.org/**stanbol/docs/trunk/components/**> >>>>>>>>> >> >>>>>>>>> enhancer/contentitem.html#********content-parts<http://** >>>>>>>>> incubator.apache.org/stanbol/******docs/trunk/components/**<http://incubator.apache.org/stanbol/****docs/trunk/components/**> >>>>>>>>> <ht**tp://incubator.apache.org/**stanbol/**docs/trunk/** >>>>>>>>> components/**<http://incubator.apache.org/stanbol/**docs/trunk/components/**> >>>>>>>>> > >>>>>>>>> >>>>>>>>> enhancer/contentitem.html#******content-parts<http://** >>>>>>>>> incubator.apache.org/stanbol/****docs/trunk/components/**<http://incubator.apache.org/stanbol/**docs/trunk/components/**> >>>>>>>>> enhancer/contentitem.html#****content-parts<http://** >>>>>>>>> incubator.apache.org/stanbol/**docs/trunk/components/** >>>>>>>>> enhancer/contentitem.html#**content-parts<http://incubator.apache.org/stanbol/docs/trunk/components/enhancer/contentitem.html#content-parts> >>>>>>>>> > >>>>>>>>> [3] >>>>>>>>> >>>>>>>>> http://svn.apache.org/repos/********asf/incubator/stanbol/** >>>>>>>>> trunk/****<http://svn.apache.org/repos/******asf/incubator/stanbol/trunk/****> >>>>>>>>> <http://svn.apache.**org/repos/****asf/incubator/** >>>>>>>>> stanbol/trunk/**<http://svn.apache.org/repos/****asf/incubator/stanbol/trunk/**> >>>>>>>>> > >>>>>>>>> <http://svn.apache.org/****repos/**asf/incubator/stanbol/** >>>>>>>>> **trunk/**<http://svn.apache.org/**repos/**asf/incubator/stanbol/**trunk/**> >>>>>>>>> <http://svn.apache.**org/repos/**asf/incubator/**stanbol/trunk/**<http://svn.apache.org/repos/**asf/incubator/stanbol/trunk/**> >>>>>>>>> > >>>>>>>>> enhancer/engines/********keywordextraction/src/main/******** >>>>>>>>> java/org/apache/stanbol/ >>>>>>>>> **enhancer/engines/********keywordextraction/linking/** >>>>>>>>> impl/EntitySearcherUtils.java<******http://svn.apache.org/** >>>>>>>>> repos/**** <http://svn.apache.org/repos/****><http://svn.apache.** >>>>>>>>> org/repos/** <http://svn.apache.org/repos/**>> >>>>>>>>> asf/incubator/stanbol/trunk/******enhancer/engines/** >>>>>>>>> keywordextraction/src/main/******java/org/apache/stanbol/** >>>>>>>>> >>>>>>>>> enhancer/engines/******keywordextraction/linking/** >>>>>>>>> impl/EntitySearcherUtils.java<****http://svn.apache.org/repos/****<http://svn.apache.org/repos/**> >>>>>>>>> asf/incubator/stanbol/trunk/****enhancer/engines/** >>>>>>>>> keywordextraction/src/main/****java/org/apache/stanbol/** >>>>>>>>> enhancer/engines/****keywordextraction/linking/** >>>>>>>>> impl/EntitySearcherUtils.java<**http://svn.apache.org/repos/** >>>>>>>>> asf/incubator/stanbol/trunk/**enhancer/engines/** >>>>>>>>> keywordextraction/src/main/**java/org/apache/stanbol/** >>>>>>>>> enhancer/engines/**keywordextraction/linking/** >>>>>>>>> impl/EntitySearcherUtils.java<http://svn.apache.org/repos/asf/incubator/stanbol/trunk/enhancer/engines/keywordextraction/src/main/java/org/apache/stanbol/enhancer/engines/keywordextraction/linking/impl/EntitySearcherUtils.java> >>>>>>>>> > >>>>>>>>> >>>>>>>>> Btw, I created my own enhancement engine chains and I could see >>>>>>>>> them >>>>>>>>> >>>>>>>>> yesterday in localhost:8080. But today all of them have vanished >>>>>>>>> and >>>>>>>>> >>>>>>>>>> only >>>>>>>>>> the default chain shows up. Can I dig them up somewhere in the >>>>>>>>>> stanbol >>>>>>>>>> directory? >>>>>>>>>> >>>>>>>>>> -harish >>>>>>>>>> >>>>>>>>>> I just created the eclipse project >>>>>>>>>> On Thu, Jul 26, 2012 at 5:04 AM, Rupert Westenthaler >>>>>>>>>> <[email protected]********> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> There are no NER (Named Entity Recognition) models for Chinese >>>>>>>>>>> text >>>>>>>>>>> available via OpenNLP. So the default configuration of Stanbol >>>>>>>>>>> will >>>>>>>>>>> not process Chinese text. What you can do is to configure a >>>>>>>>>>> KeywordLinking Engine for Chinese text as this engine can also >>>>>>>>>>> process >>>>>>>>>>> in unknown languages (see [1] for details). >>>>>>>>>>> >>>>>>>>>>> However also the KeywordLinking Engine requires at least n >>>>>>>>>>> tokenizer >>>>>>>>>>> for looking up Words. As there is no specific Tokenizer for >>>>>>>>>>> OpenNLP >>>>>>>>>>> Chinese text it will use the default one that uses a fixed set of >>>>>>>>>>> chars to split words (white spaces, hyphens ...). You may better >>>>>>>>>>> how >>>>>>>>>>> well this would work with Chinese texts. My assumption would be >>>>>>>>>>> that >>>>>>>>>>> it is not sufficient - so results will be sub-optimal. >>>>>>>>>>> >>>>>>>>>>> To apply Chinese optimization I see three possibilities: >>>>>>>>>>> >>>>>>>>>>> 1. add support for Chinese to OpenNLP (Tokenizer, Sentence >>>>>>>>>>> detection, >>>>>>>>>>> POS tagging, Named Entity Detection) >>>>>>>>>>> 2. allow the KeywordLinkingEngine to use other already available >>>>>>>>>>> tools >>>>>>>>>>> for text processing (e.g. stuff that is already available for >>>>>>>>>>> Solr/Lucene [2] or the paoding chinese segment or referenced in >>>>>>>>>>> you >>>>>>>>>>> mail). Currently the KeywordLinkingEngine is hardwired with >>>>>>>>>>> OpenNLP, >>>>>>>>>>> because representing Tokens, POS ... as RDF would be to much of >>>>>>>>>>> an >>>>>>>>>>> overhead. >>>>>>>>>>> 3. implement a new EnhancementEngine for processing Chinese text. >>>>>>>>>>> >>>>>>>>>>> Hope this helps to get you started. >>>>>>>>>>> >>>>>>>>>>> best >>>>>>>>>>> Rupert >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> http://incubator.apache.org/********stanbol/docs/trunk/**<http://incubator.apache.org/******stanbol/docs/trunk/**> >>>>>>>>>>> <http**://incubator.apache.org/******stanbol/docs/trunk/**<http://incubator.apache.org/****stanbol/docs/trunk/**> >>>>>>>>>>> > >>>>>>>>>>> <http:/**/incubator.apache.**org/****stanbol/docs/trunk/**<http://incubator.apache.org/****stanbol/docs/trunk/**> >>>>>>>>>>> <**http://incubator.apache.org/****stanbol/docs/trunk/**<http://incubator.apache.org/**stanbol/docs/trunk/**> >>>>>>>>>>> > >>>>>>>>>>> multilingual.html<http://****inc**ubator.apache.org/**stanbol/**<http://ubator.apache.org/stanbol/**> >>>>>>>>>>> <http://incubator.**apache.org/stanbol/**<http://incubator.apache.org/stanbol/**> >>>>>>>>>>> > >>>>>>>>>>> docs/trunk/multilingual.html<**h**ttp://incubator.apache.org/** >>>>>>>>>>> ** <http://incubator.apache.org/**> >>>>>>>>>>> stanbol/docs/trunk/****multilingual.html<http://** >>>>>>>>>>> incubator.apache.org/stanbol/**docs/trunk/multilingual.html<http://incubator.apache.org/stanbol/docs/trunk/multilingual.html> >>>>>>>>>>> > >>>>>>>>>>> [2] >>>>>>>>>>> >>>>>>>>>>> http://wiki.apache.org/solr/**** >>>>>>>>>>> ****LanguageAnalysis#Chinese.**2C_*<http://wiki.apache.org/solr/******LanguageAnalysis#Chinese.2C_*> >>>>>>>>>>> ***<http://wiki.apache.org/**solr/****LanguageAnalysis#** >>>>>>>>>>> Chinese.2C_**<http://wiki.apache.org/solr/****LanguageAnalysis#Chinese.2C_**> >>>>>>>>>>> > >>>>>>>>>>> <http://wiki.apache.org/****solr/**LanguageAnalysis#**** >>>>>>>>>>> Chinese.2C_**<http://wiki.apache.org/**solr/**LanguageAnalysis#**Chinese.2C_**> >>>>>>>>>>> <http://wiki.**apache.org/solr/**** >>>>>>>>>>> LanguageAnalysis#Chinese.2C_**<http://wiki.apache.org/solr/**LanguageAnalysis#Chinese.2C_**> >>>>>>>>>>> **> >>>>>>>>>>> Japanese.2C_Korean<http://****wi**ki.apache.org/solr/**<http** >>>>>>>>>>> ://wiki.apache.org/solr/** <http://wiki.apache.org/solr/**>> >>>>>>>>>>> >>>>>>>>>> LanguageAnalysis#Chinese.2C_******Japanese.2C_Korean<http://** >>>>>>>>> wiki.apache.org/solr/****LanguageAnalysis#Chinese.2C_**<http://wiki.apache.org/solr/**LanguageAnalysis#Chinese.2C_**> >>>>>>>>> >>>>>>>>> Japanese.2C_Korean<http://**wiki.apache.org/solr/** >>>>>>>>> LanguageAnalysis#Chinese.2C_**Japanese.2C_Korean<http://wiki.apache.org/solr/LanguageAnalysis#Chinese.2C_Japanese.2C_Korean> >>>>>>>>> > >>>>>>>>> On Thu, Jul 26, 2012 at 2:00 AM, harish suvarna < >>>>>>>>> [email protected]> >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Rupert, >>>>>>>>>>> >>>>>>>>>>> Finally I am getting some time to work on Stanbol. My job is to >>>>>>>>>>>> demonstrate >>>>>>>>>>>> Stanbol annotations for Chinese text. >>>>>>>>>>>> I am just starting on it. I am following the instructions to >>>>>>>>>>>> build >>>>>>>>>>>> an >>>>>>>>>>>> enhancement engine from Anuj's blog. dbpedia has some chinese >>>>>>>>>>>> data >>>>>>>>>>>> >>>>>>>>>>>> dump >>>>>>>>>>>> >>>>>>>>>>>> too. >>>>>>>>>>> >>>>>>>>>> We may have to depend on the ngrams as keys and look them up in >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>> dbpedia >>>>>>>>>>>> labels. >>>>>>>>>>>> >>>>>>>>>>>> I am planning to use the paoding chinese segmentor >>>>>>>>>>>> (http://code.google.com/p/********paoding/<http://code.google.com/p/******paoding/> >>>>>>>>>>>> <http://code.google.**com/p/****paoding/<http://code.google.com/p/****paoding/> >>>>>>>>>>>> > >>>>>>>>>>>> <http://code.google.**com/p/****paoding/<http://code.google.** >>>>>>>>>>>> com/p/**paoding/ <http://code.google.com/p/**paoding/>> >>>>>>>>>>>> <http://code.google.**com/p/****paoding/<http://code.google.** >>>>>>>>>>>> com/p/paoding/ >>>>>>>>>>>> <http://code.google.com/p/**paoding/<http://code.google.com/p/paoding/> >>>>>>>>>>>> >> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ) >>>>>>>>>>>> for word breaking. >>>>>>>>>>>> >>>>>>>>>>>> Just curious. I pasted some chinese text in default engine of >>>>>>>>>>>> stanbol. >>>>>>>>>>>> It >>>>>>>>>>>> kind of finished the processing in no time at all. This gave me >>>>>>>>>>>> suspicion >>>>>>>>>>>> that may be if the language is chinese, no further processing is >>>>>>>>>>>> done. >>>>>>>>>>>> Is it >>>>>>>>>>>> right? Any more tips for making all this work in Stanbol? >>>>>>>>>>>> >>>>>>>>>>>> -harish >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>> | Rupert Westenthaler [email protected] >>>>>>>>>>> | Bodenlehenstraße 11 >>>>>>>>>>> ++43-699-11108907 >>>>>>>>>>> | A-5500 Bischofshofen >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>> | Rupert Westenthaler [email protected] >>>>>>>>> | Bodenlehenstraße 11 >>>>>>>>> ++43-699-11108907 >>>>>>>>> | A-5500 Bischofshofen >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Dr. Walter Kasper >>>>>>>> >>>>>>> DFKI GmbH >>>>>>> Stuhlsatzenhausweg 3 >>>>>>> D-66123 Saarbrücken >>>>>>> Tel.: +49-681-85775-5300 >>>>>>> Fax: +49-681-85775-5338 >>>>>>> Email: [email protected] >>>>>>> ------------------------------********------------------------**--** >>>>>>> >>>>>>> --**--**- >>>>>>> >>>>>>> >>>>>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >>>>>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >>>>>>> >>>>>>> Geschaeftsfuehrung: >>>>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >>>>>>> Dr. Walter Olthoff >>>>>>> >>>>>>> Vorsitzender des Aufsichtsrats: >>>>>>> Prof. Dr. h.c. Hans A. Aukes >>>>>>> >>>>>>> Amtsgericht Kaiserslautern, HRB 2313 >>>>>>> ------------------------------********------------------------**--** >>>>>>> --**--**- >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>> Dr. Walter Kasper >>>>> DFKI GmbH >>>>> Stuhlsatzenhausweg 3 >>>>> D-66123 Saarbrücken >>>>> Tel.: +49-681-85775-5300 >>>>> Fax: +49-681-85775-5338 >>>>> Email: [email protected] >>>>> ------------------------------******--------------------------** >>>>> --**--**- >>>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >>>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >>>>> >>>>> Geschaeftsfuehrung: >>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >>>>> Dr. Walter Olthoff >>>>> >>>>> Vorsitzender des Aufsichtsrats: >>>>> Prof. Dr. h.c. Hans A. Aukes >>>>> >>>>> Amtsgericht Kaiserslautern, HRB 2313 >>>>> ------------------------------******--------------------------** >>>>> --**--**- >>>>> >>>>> >>>>> >>>>> -- >>> Dr. Walter Kasper >>> DFKI GmbH >>> Stuhlsatzenhausweg 3 >>> D-66123 Saarbrücken >>> Tel.: +49-681-85775-5300 >>> Fax: +49-681-85775-5338 >>> Email: [email protected] >>> ------------------------------****----------------------------**--**- >>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >>> >>> Geschaeftsfuehrung: >>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >>> Dr. Walter Olthoff >>> >>> Vorsitzender des Aufsichtsrats: >>> Prof. Dr. h.c. Hans A. Aukes >>> >>> Amtsgericht Kaiserslautern, HRB 2313 >>> ------------------------------****----------------------------**--**- >>> >>> >>> > > -- > Dr. Walter Kasper > DFKI GmbH > Stuhlsatzenhausweg 3 > D-66123 Saarbrücken > Tel.: +49-681-85775-5300 > Fax: +49-681-85775-5338 > Email: [email protected] > ------------------------------**------------------------------**- > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern > > Geschaeftsfuehrung: > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) > Dr. Walter Olthoff > > Vorsitzender des Aufsichtsrats: > Prof. Dr. h.c. Hans A. Aukes > > Amtsgericht Kaiserslautern, HRB 2313 > ------------------------------**------------------------------**- > >
