I really appreciate your help and thank you very much Rupert. All the steps helped me and I am up and debugging in Eclipse.
Thanks, Harish On Thu, Jul 26, 2012 at 10:30 PM, Rupert Westenthaler < [email protected]> wrote: > Hi, > > I do use Eclipse and I usually do not care about classpath related > build problems in Eclipse as long as code suggestions do still work. > > If I have problems > > 1. mvn eclipse:clean eclipse:eclipse > 2. refreshing all projects in eclipse > 3. full project > clean > > usually solves those problems. NOTE that only calling "mvn > eclipse:eclipse" may not solve problems as it only adds new stuff to > the project files but does not remove old one. Note that I do prefer > to NOT use any Eclipse maven plugin as I had bad experiences with > those. However those cases where about two years ago so such tools > might have improved in the meantime. > > For Debugging I do use Eclipse: > > Unit tests work fine within eclipse. If I want to debug a component > within a Stanbol Server I do the following > > 1. Start the Stanbol Server in debug mode > > java -Xmx1024m -XX:MaxPermSize=256m \ > -Xdebug > -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n \ > -jar > org.apache.stanbol.launchers.full-0.10.0-incubating-SNAPSHOT.jar > > 2. connect Eclipse to the Stanbol Server: > * Debug Configurations > Remote Java Application >> create new > * Socket Attach > * Host: localhost and Port as specified with address (8787 in the > example above) > > 3. using the sling installer maven plugin to install/update the module > with the component I am working on > > mvn clean install -PinstallBundle > -Dsling.url=http://localhost:8080/system/console > > * Make sure to "disconnect" the debugger before calling this as > the debugging might interfere with update process of the module > > hope this helps > best > Rupert > > On Fri, Jul 27, 2012 at 3:56 AM, harish suvarna <[email protected]> > wrote: > > Hi, > > I am trying to add Chinese language processing using some opensource > > segmenters. I had some communication with Rupert. I am attaching Rupert's > > suggestions. This way I amy get some more suggestions help as well as > > Rupert's ideas get distributed to all. > > > > I am also following Anuj's blog to learn about Stanbol content > enhancement > > engine development. > > > > I can successfully build Stanbol and play with the default chain. > > > > I am trying to create the eclipse project now. mvn eclipse:eclipse was > > successful too. Then I imported the stanbol directory into eclipse > > workspace. > > In eclipse certain Stanbol projects are in red. > > > > Description Resource Path Location Type > > The project cannot be built until its prerequisite > > org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building > all > > projects is recommended org.apache.stanbol.enhancer.ldpath > > Unknown Java Problem > > The project cannot be built until its prerequisite > > org.apache.stanbol.entityhub.indexing.core is built. Cleaning and > building > > all projects is recommended > > org.apache.stanbol.entityhub.indexing.destination.solryard > > Unknown Java Problem > > The project cannot be built until its prerequisite > > org.apache.stanbol.entityhub.core is built. Cleaning and building all > > projects is recommended org.apache.stanbol.entityhub.query.clerezza > > Unknown Java Problem > > The project cannot be built until its prerequisite > > org.apache.stanbol.entityhub.core is built. Cleaning and building all > > projects is recommended org.apache.stanbol.entityhub.ldpath > > Unknown Java Problem > > The project cannot be built until its prerequisite > > org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building > all > > projects is recommended org.apache.stanbol.enhancer.rdfentities > > Unknown Java Problem > > The project cannot be built until its prerequisite > > org.apache.stanbol.enhancer.servicesapi is built. Cleaning and building > all > > projects is recommended org.apache.stanbol.enhancer.test > > Unknown Java Problem > > The project cannot be built until its prerequisite > > org.apache.stanbol.entityhub.core is built. Cleaning and building all > > projects is recommended org.apache.stanbol.entityhub.site.managed > > Unknown Java Problem > > .... > > ... > > > > Any extra steps are needed? > > Should I try to build and debug inside eclipse or build using mvn and > debug > > in eclipse? What developers do in common? > > > > -harish > > > > > > > > ================================================Previous > > communication================================================ > > Hi, > > > > There are no NER (Named Entity Recognition) models for Chinese text > > available via OpenNLP. So the default configuration of Stanbol will > > not process Chinese text. What you can do is to configure a > > KeywordLinking Engine for Chinese text as this engine can also process > > in unknown languages (see [1] for details). > > > > However also the KeywordLinking Engine requires at least n tokenizer > > for looking up Words. As there is no specific Tokenizer for OpenNLP > > Chinese text it will use the default one that uses a fixed set of > > chars to split words (white spaces, hyphens ...). You may better how > > well this would work with Chinese texts. My assumption would be that > > it is not sufficient - so results will be sub-optimal. > > > > To apply Chinese optimization I see three possibilities: > > > > 1. add support for Chinese to OpenNLP (Tokenizer, Sentence detection, > > POS tagging, Named Entity Detection) > > 2. allow the KeywordLinkingEngine to use other already available tools > > for text processing (e.g. stuff that is already available for > > Solr/Lucene [2] or the paoding chinese segment or referenced in you > > mail). Currently the KeywordLinkingEngine is hardwired with OpenNLP, > > because representing Tokens, POS ... as RDF would be to much of an > > overhead. > > 3. implement a new EnhancementEngine for processing Chinese text. > > > > Hope this helps to get you started. > > > > best > > Rupert > > > > [1] http://incubator.apache.org/stanbol/docs/trunk/multilingual.html > > [2] > > > http://wiki.apache.org/solr/LanguageAnalysis#Chinese.2C_Japanese.2C_Korean > > harish suvarna > > 6:33 PM (22 minutes ago) > > > > to Rupert > > Thanks a lot Rupert. > > > > I am weighing between options 2 and 3. What is the difference? Optiion 2 > > sounds like enhancing KeyWordLinkingEngine to deal with chinese text. It > > may be like paoding is hardcoded into KeyWordLinkingEngine. Option 3 is > > like a separate engine. But will I be able to use the stanbol dbpedia > > lookup using option 3? > > > > Btw, I created my own enhancement engine chains and I could see them > > yesterday in localhost:8080. But today all of them have vanished and only > > the default chain shows up. Can I dig them up somewhere in the stanbol > > directory? > > > > -harish > > > > I just created the eclipse project > > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >
