Re: Integrate solr with openNLP
Actually we dropped integrating nlp with solr but we took two different ideas: * we're using nlp seperately not with solr * we're taking help of UIMA for solr. Its more advanced. If you've a specific question. you can ask me. I'll tell you if i know. -Vivek On Wed, Sep 10, 2014 at 3:46 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, What is the progress of integration of nlp with solr. If you have achieved this integration techniques successfully then please share with us. With Regards Aman Tandon On Tue, Jun 10, 2014 at 11:04 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Aman, Yeah, We are also thinking the same. Using UIMA is better. And thanks to everyone. You guys really showed us the way(UIMA). We'll work on it. Thanks, Vivek On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Vikek, As everybody in the mail list mentioned to use UIMA you should go for it, as opennlp issues are not tracking properly, it can make stuck your development in near future if any issue comes, so its better to start investigate with uima. With Regards Aman Tandon On Fri, Jun 6, 2014 at 11:00 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Can anyone pleas reply..? Thanks, Vivek -- Forwarded message -- From: Vivekanand Ittigi vi...@biginfolabs.com Date: Wed, Jun 4, 2014 at 4:38 PM Subject: Re: Integrate solr with openNLP To: Tommaso Teofili tommaso.teof...@gmail.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org, Ahmet Arslan iori...@yahoo.com Hi Tommaso, Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm trying to apply named recognition(person name) token but im not seeing any change. my schema.xml looks like this: field name=text type=text_opennlp_pos_ner indexed=true stored=true multiValued=true/ fieldType name=text_opennlp_pos_ner class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.OpenNLPTokenizerFactory tokenizerModel=opennlp/en-token.bin / filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-person.bin / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please guide..? Thanks, Vivek On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, Ahment was suggesting to eventually use UIMA integration because OpenNLP has already an integration with Apache UIMA and so you would just have to use that [1]. And that's one of the main reason UIMA integration was done: it's a framework that you can easily hook into in order to plug your NLP algorithm. If you want to just use OpenNLP then it's up to you if either write your own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP to your documents or either you can write a dedicated analyzer / tokenizer / token filter. For the OpenNLP integration (LUCENE-2899), the patch is not up to date with the latest APIs in trunk, however you should be able to apply it to (if I recall correctly) to 4.4 version or so, and also adapting it to the latest API shouldn't be too hard. Regards, Tommaso [1] : http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima [2] : http://wiki.apache.org/solr/UpdateRequestProcessor 2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid : Can you extract names, locations etc using OpenNLP in plain/straight java program? If yes, here are two seperate options : 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an example to integrate your NER code into it and write your own indexing code. You have the full power here. No solr-plugins are involved. 2) Use 'Implementing a conditional copyField' given here : http://wiki.apache.org/solr/UpdateRequestProcessor as an example and integrate your NER code into it. Please note that these are separate ways to enrich your incoming documents, choose either (1) or (2). On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Okay, but i dint understand what you said. Can you please elaborate. Thanks, Vivek On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vivekanand, I have never use UIMA+Solr before. Personally I think it takes more time to learn how to configure/use
Retrieving multivalued field elements
Hi, I've multivalued field and i want to display all array elements using solrj command. I used the command mentioned below but i'm able to retrieve only 1st element of the array. response.getResults().get(0).getFieldValueMap().get(discussions) Output: Creation Time - 2014-06-12 17:37:53.0 NOTE: discussions is multivalued field in solr which contains arr name=discussions strCreation Time - 2014-06-12 17:37:53.0/str strLast modified Time - 2014-06-12 17:42:09.0/str strComment - posting bug from risk flows ...posting comment from risk flows ...syncing comments .../str /arr Is there any solrj API used for retrieving multivalued elements or its not possible..? -Vivek
Re: Retrieving multivalued field elements
Yes, you are right. It worked ! -Vivek On Mon, Aug 25, 2014 at 7:39 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Vivek, how about this? IteratorSolrDocument iter = queryResponse.getResults().iterator(); while (iter.hasNext()) { SolrDocument resultDoc = iter.next(); CollectionObject content = resultDoc.getFieldValues(discussions); } On Monday, August 25, 2014 4:55 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi, I've multivalued field and i want to display all array elements using solrj command. I used the command mentioned below but i'm able to retrieve only 1st element of the array. response.getResults().get(0).getFieldValueMap().get(discussions) Output: Creation Time - 2014-06-12 17:37:53.0 NOTE: discussions is multivalued field in solr which contains arr name=discussions strCreation Time - 2014-06-12 17:37:53.0/str strLast modified Time - 2014-06-12 17:42:09.0/str strComment - posting bug from risk flows ...posting comment from risk flows ...syncing comments .../str /arr Is there any solrj API used for retrieving multivalued elements or its not possible..? -Vivek
Unable to read HBase data from solr
I'm trying to read specific HBase data and index into solr using groovy script in /update handler of solrconfig file but I'm getting the error mentioned below I'm placing the same HBase jar on which i'm running in solr lib. Many article said WorkAround: 1. First i thought that class path has two default xmls and its throwing the error because one of the two is from some older version of hbase jar. But the class path has no hbase jar. 2. Setting hbase.default.for.version.skip to true in hbase-site.xml and adding that to class path But still im getting the same error. I think solr internally reads hbase-site.xml file but do not know from where..? Please help me.. If further info is needed i'm ready to provide SEVERE: org.apache.solr.common.SolrException: Unable to invoke function processAdd in script: update-script.groovy: java.lang.RuntimeException: hbase-default.xml file seems to be for and old version of HBase (null), this version is 0.94.10 at org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.invokeFunction(StatelessScriptUpdateProcessorFactory.java:433) at org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.processAdd(StatelessScriptUpdateProcessorFactory.java:374) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
Unable to get the class from external jar in Update handler
Hi, I've made the a jar which contains a class called ConcatClass and i've put this jar under lib of solr. And i'm trying to access this class in update-script.groovy in /update handleer. But groovy is not picking up ConcatClass class, giving the following error: SEVERE: Unable to create core: collection1 org.apache.solr.common.SolrException: Unable to initialize scripts: Unable to evaluate script: update-script.groovy at org.apache.solr.core.SolrCore.init(SolrCore.java:806) at org.apache.solr.core.SolrCore.init(SolrCore.java:619) at org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:967) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1049) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.solr.common.SolrException: Unable to initialize scripts: Unable to evaluate script: update-script.groovy at org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory.inform(StatelessScriptUpdateProcessorFactory.java:232) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:592) at org.apache.solr.core.SolrCore.init(SolrCore.java:801) ... 11 more Caused by: org.apache.solr.common.SolrException: Unable to evaluate script: update-script.groovy at org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory.initEngines(StatelessScriptUpdateProcessorFactory.java:314) at org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory.inform(StatelessScriptUpdateProcessorFactory.java:228) ... 13 more Caused by: javax.script.ScriptException: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: Script1.groovy: 1: unable to resolve class com.biginfolabs.openNLP.ConcatClass @ line 1, column 1. import com.biginfolabs.openNLP.ConcatClass; ^ 1 error at org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.eval(GroovyScriptEngineImpl.java:151) at org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.eval(GroovyScriptEngineImpl.java:122) at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:249) at org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory.initEngines(StatelessScriptUpdateProcessorFactory.java:312) ... 14 more NOTE: I've put this jar under lib of solr. Groovy file is not able to recognise ConcatClass.. does anyone have idea for it.. I wasted whole day to fix this. Thanks, Vivek
crawling all links of same domain in nutch in solr
Hi, Can anyone tel me how to crawl all other pages of same domain. For example i'm feeding a website http://www.techcrunch.com/ in seed.txt. Following property is added in nutch-site.xml property namedb.ignore.internal.links/name valuefalse/value descriptionIf true, when adding new links to a page, links from the same host are ignored. This is an effective way to limit the size of the link database, keeping only the highest quality links. /description /property And following is added in regex-urlfilter.txt # accept anything else +. Note: if i add http://www.tutorialspoint.com/ in seed.txt, I'm able to crawl all other pages but not techcrunch.com's pages though it has got many other pages too. Please help..? Thanks, Vivek
Integrating Solr with HBase Using Lily Project
Hi, I tried to Integrate Solr with HBase Using HBase Indexer project https://github.com/NGDATA/hbase-indexer/wiki (one of sub projects of Lily). I used Apache HBase running on HDFS and solr 4.8.0 but i started getting below mentioned error. 14/07/18 11:55:38 WARN impl.SepConsumer: Error processing a batch of SEP events, the error will be forwarded to HBase for retry java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.solr.common.SolrException: Unknown document router '{name=implicit}' at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at com.ngdata.sep.impl.SepConsumer.waitOnSepEventCompletion( SepConsumer.java:235) at com.ngdata.sep.impl.SepConsumer.replicateLogEntries(SepConsumer.java:220) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:622) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call( WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run( HBaseServer.java:1428) Caused by: java.lang.RuntimeException: org.apache.solr.common.SolrException: Unknown document router '{name=implicit}' at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents( IndexingEventListener.java:90) at com.ngdata.sep.impl.SepEventExecutor$1.run(SepEventExecutor.java:97) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:701) Caused by: org.apache.solr.common.SolrException: Unknown document router '{name=implicit}' at org.apache.solr.common.cloud.DocRouter.getDocRouter(DocRouter.java:46) at org.apache.solr.common.cloud.ClusterState.collectionFromObjects( ClusterState.java:263) at org.apache.solr.common.cloud.ClusterState.load(ClusterState.java:231) at org.apache.solr.common.cloud.ClusterState.load(ClusterState.java:207) at org.apache.solr.common.cloud.ZkStateReader.createClusterStateWatchersAndU pdate(ZkStateReader.java:299) I googled for the cause, this link https://groups.google.com/a/cloudera.org/forum/#!msg/search-user/p5xoeU194BM/XVdsyVpDjVUJ says this works on 4.2.0. So i switched to 4.2.0 and it worked. What i'm worried is why its not working in 4.8.0..? Is there anything i should add extra for that to work..? Now i've to index all my data in 4.2 and pretty much re-work so..? Thanks, Vivek
Re: About Query Parser
That's impressive answer. I actually wanted to know how exactly query parser works. I'm actually supposed to collect some fields,values,other related info and build a solr query. I wanted to know i should use this queryparser or java code to build solr query. Anyway it looks i've to go with java code so build it and im on it. Thanks, Vivek On Fri, Jun 20, 2014 at 6:06 PM, Daniel Collins danwcoll...@gmail.com wrote: I would say *:* is a human-readable/writable query. as is inStock:false. The former will be converted by the query parser into a MatchAllDocsQuery which is what Lucene understands. The latter will be converted (again by the query parser) into some query. Now this is where *which* query parser you are using is important. Is inStock a word to be queried, or a field in your schema? Probably the latter, but the query parser has to determine that using the Solr schema. So I would expect that query to be converted to a TermQuery(Term(inStock, false)), so a query for the value false in the field inStock. This is all interesting but what are you really trying to find out? If you just want to run queries and see what they translate to, you can use the debug options when you send the query in, and then Solr will return to you both the raw query (with any other options that the query handler might have added to your query) as well as the Lucene Query generated from it. e.g.from running : on a solr instance. rawquerystring: *:*, querystring: *:*, parsedquery: MatchAllDocsQuery(*:*), parsedquery_toString: *:*, QParser: LuceneQParser, Or (this shows the difference between raw query syntax and parsed query syntax) rawquerystring: body_en:test AND headline_en:hello, querystring: body_en:test AND headline_en:hello, parsedquery: +body_en:test +headline_en:hello, parsedquery_toString: +body_en:test +headline_en:hello, QParser: LuceneQParser, On 20 June 2014 13:05, Vivekanand Ittigi vi...@biginfolabs.com wrote: All right let me put this. http://192.168.1.78:8983/solr/collection1/select?q=inStock:falsefacet=truefacet.field=popularitywt=xmlindent=true . I just want to know what is this form. is it lucene query or this query should go under query parser to get converted to lucene query. Thanks, Vivek On Fri, Jun 20, 2014 at 5:19 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: That's *:* and a special case. There is no scoring here, nor searching. Just a dump of documents. Not even filtering or faceting. I sure hope you have more interesting examples. Regards, Alex On 20/06/2014 6:40 pm, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Daniel, You said inputs are human-generated and outputs are lucene objects. So my question is what does the below query mean. Does this fall under human-generated one or lucene.? http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true Thanks, Vivek On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com wrote: Alexandre's response is very thorough, so I'm really simplifying things, I confess but here's my query parsers for dummies. :) In terms of inputs/outputs, a QueryParser takes a string (generally assumed to be human generated i.e. something a user might type in, so maybe a sentence, a set of words, the format can vary) and outputs a Lucene Query object ( http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html ), which in fact is a kind of tree (again, I'm simplifying I know) since a query can contain nested expressions. So very loosely its a translator from a human-generated query into the structure that Lucene can handle. There are several different query parsers since they all use different input syntax, and ways of handling different constructs (to handle A and B, should the user type +A +B or A and B or just A B for example), and have different levels of support for the various Query structures that Lucene can handle: SpanQuery, FuzzyQuery, PhraseQuery, etc. We for example use an XML-based query parser. Why (you might well ask!), well we had an already used and supported query syntax of our own, which our users understood, so we couldn't use an off the shelf query parser. We could have built our own in Java, but for a variety of reasons we parse our queries in a front-end system ahead of Solr (which is C++-based), so we needed an interim format to pass queries to Solr that was as near to a Lucene Query object as we could get (and there was an existing XML parser to save us starting from square one!). As part of that Query construction (but independent of which QueryParser you use), Solr will also
About Query Parser
Hi, I think this might be a silly question but i want to make it clear. What is query parser...? What does it do.? I know its used for converting query. But from What to what?what is the input and what is the output of query parser. And where exactly this feature can be used? If possible please explain with the example. It really helps a lot? Thanks, Vivek
Re: About Query Parser
Hi Daniel, You said inputs are human-generated and outputs are lucene objects. So my question is what does the below query mean. Does this fall under human-generated one or lucene.? http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true Thanks, Vivek On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com wrote: Alexandre's response is very thorough, so I'm really simplifying things, I confess but here's my query parsers for dummies. :) In terms of inputs/outputs, a QueryParser takes a string (generally assumed to be human generated i.e. something a user might type in, so maybe a sentence, a set of words, the format can vary) and outputs a Lucene Query object ( http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html ), which in fact is a kind of tree (again, I'm simplifying I know) since a query can contain nested expressions. So very loosely its a translator from a human-generated query into the structure that Lucene can handle. There are several different query parsers since they all use different input syntax, and ways of handling different constructs (to handle A and B, should the user type +A +B or A and B or just A B for example), and have different levels of support for the various Query structures that Lucene can handle: SpanQuery, FuzzyQuery, PhraseQuery, etc. We for example use an XML-based query parser. Why (you might well ask!), well we had an already used and supported query syntax of our own, which our users understood, so we couldn't use an off the shelf query parser. We could have built our own in Java, but for a variety of reasons we parse our queries in a front-end system ahead of Solr (which is C++-based), so we needed an interim format to pass queries to Solr that was as near to a Lucene Query object as we could get (and there was an existing XML parser to save us starting from square one!). As part of that Query construction (but independent of which QueryParser you use), Solr will also make use of a set of Tokenizers and Filters ( https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters ) but that's more to do with dealing with the terms in the query (so in my examples above, is A a real word, does it need stemming, lowercasing, removing because its a stopword, etc).
Re: About Query Parser
All right let me put this. http://192.168.1.78:8983/solr/collection1/select?q=inStock:falsefacet=truefacet.field=popularitywt=xmlindent=true . I just want to know what is this form. is it lucene query or this query should go under query parser to get converted to lucene query. Thanks, Vivek On Fri, Jun 20, 2014 at 5:19 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: That's *:* and a special case. There is no scoring here, nor searching. Just a dump of documents. Not even filtering or faceting. I sure hope you have more interesting examples. Regards, Alex On 20/06/2014 6:40 pm, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Daniel, You said inputs are human-generated and outputs are lucene objects. So my question is what does the below query mean. Does this fall under human-generated one or lucene.? http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true Thanks, Vivek On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com wrote: Alexandre's response is very thorough, so I'm really simplifying things, I confess but here's my query parsers for dummies. :) In terms of inputs/outputs, a QueryParser takes a string (generally assumed to be human generated i.e. something a user might type in, so maybe a sentence, a set of words, the format can vary) and outputs a Lucene Query object ( http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html ), which in fact is a kind of tree (again, I'm simplifying I know) since a query can contain nested expressions. So very loosely its a translator from a human-generated query into the structure that Lucene can handle. There are several different query parsers since they all use different input syntax, and ways of handling different constructs (to handle A and B, should the user type +A +B or A and B or just A B for example), and have different levels of support for the various Query structures that Lucene can handle: SpanQuery, FuzzyQuery, PhraseQuery, etc. We for example use an XML-based query parser. Why (you might well ask!), well we had an already used and supported query syntax of our own, which our users understood, so we couldn't use an off the shelf query parser. We could have built our own in Java, but for a variety of reasons we parse our queries in a front-end system ahead of Solr (which is C++-based), so we needed an interim format to pass queries to Solr that was as near to a Lucene Query object as we could get (and there was an existing XML parser to save us starting from square one!). As part of that Query construction (but independent of which QueryParser you use), Solr will also make use of a set of Tokenizers and Filters ( https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters ) but that's more to do with dealing with the terms in the query (so in my examples above, is A a real word, does it need stemming, lowercasing, removing because its a stopword, etc).
making solr to understand English
Hi, I'm trying to setup solr that should understand English. For example I've indexed our company website (www.biginfolabs.com) or it could be any other website or our own data. If i put some English like queries i should get the one word answer just what Google does;queries are: * Where is India located. * who is the father of Obama Workaround: * Integrated UIMA,Mahout with solr * I read the book called Taming Text and implemented https://github.com/tamingtext/book. But Did not get what i want Can anyone please tell how to move further. It can be anything our team is ready to do it. Thanks, Vivek
VelocityResponseWriter in solr
Hi, I want to use VelocityResponseWriter in solr. I've indexed a website( for example http://www.biginfolabs.com/). If i type a query http://localhost:8983/solr/collection1/select?q=santhoswt=xmlindent=true http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true I will get all the fields related to that document (content,host,title,url etc) but if i put the query in velocity http://localhost:8983/solr/collection1/browse?q=santhosh i will see only 3 fields(id,url,content) instead of all other fields. How can i display all the fields?? This is in solrconfig.xml requestHandler name=/browse class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str !-- VelocityResponseWriter settings -- str name=wtvelocity/str str name=v.templatebrowse/str str name=v.layoutlayout/str str name=titleSolritas/str !-- Query settings -- str name=defTypeedismax/str str name=qf text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0 /str str name=dftext/str str name=mm100%/str str name=q.alt*:*/str str name=rows10/str str name=fl*,score/str str name=mlt.qf text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0 /str str name=mlt.fltext,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename/str int name=mlt.count3/int !-- Faceting defaults -- str name=faceton/str str name=facet.fieldcat/str str name=facet.fieldmanu_exact/str str name=facet.fieldcontent_type/str str name=facet.fieldauthor_s/str str name=facet.queryipod/str str name=facet.queryGB/str str name=facet.mincount1/str str name=facet.pivotcat,inStock/str str name=facet.range.otherafter/str str name=facet.rangeprice/str int name=f.price.facet.range.start0/int int name=f.price.facet.range.end600/int int name=f.price.facet.range.gap50/int str name=facet.rangepopularity/str int name=f.popularity.facet.range.start0/int int name=f.popularity.facet.range.end10/int int name=f.popularity.facet.range.gap3/int str name=facet.rangemanufacturedate_dt/str str name=f.manufacturedate_dt.facet.range.startNOW/YEAR-10YEARS/str str name=f.manufacturedate_dt.facet.range.endNOW/str str name=f.manufacturedate_dt.facet.range.gap+1YEAR/str str name=f.manufacturedate_dt.facet.range.otherbefore/str str name=f.manufacturedate_dt.facet.range.otherafter/str !-- Highlighting defaults -- str name=hlon/str str name=hl.flcontent features title name/str str name=hl.encoderhtml/str str name=hl.simple.prelt;bgt;/str str name=hl.simple.postlt;/bgt;/str str name=f.title.hl.fragsize0/str str name=f.title.hl.alternateFieldtitle/str str name=f.name.hl.fragsize0/str str name=f.name.hl.alternateFieldname/str str name=f.content.hl.snippets3/str str name=f.content.hl.fragsize200/str str name=f.content.hl.alternateFieldcontent/str str name=f.content.hl.maxAlternateFieldLength750/str !-- Spell checking defaults -- str name=spellcheckon/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count5/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str str name=spellcheck.maxCollations3/str /lst !-- append spellchecking to our list of components -- arr name=last-components strspellcheck/str /arr /requestHandler Thanks, Vivek
Re: Implementing Hive query in Solr
Hi Erick, We are actually comparing the speed of search. We are trying to run this few hive queries in solr. We are if we can implement this in solr definitely we can migrate our system into solr. Can you please look at this issue also http://stackoverflow.com/questions/24202798/sum-and-groupby-in-solr Here we are removing collect_set() concept. Thanks, Vivek On Thu, Jun 12, 2014 at 7:57 PM, Erick Erickson erickerick...@gmail.com wrote: Any time I see a question like this I break out in hives (little pun there). Solr is _not_ a replacement for Hive. Or any other SQL or SQL-like engine. Trying to make it into one is almost always a mistake. First I'd ask why you have to form this query. Now, while I have very little knowledge of HIve, collect_set removes duplicates Why do you have duplicates in the first place? Best, Erick On Thu, Jun 12, 2014 at 7:12 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi, Can anyone please look into this issue. I want to implement this query in solr. Thanks, Vivek -- Forwarded message -- From: Vivekanand Ittigi vi...@biginfolabs.com Date: Thu, Jun 12, 2014 at 11:08 AM Subject: Implementing Hive query in Solr To: solr-user@lucene.apache.org solr-user@lucene.apache.org Hi, My requirements is to execute this query(hive) in solr: select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market, collect_set(primary_cause) from bil_tos Where skuType='Product' group by RiskType,market; I can implement sum and groupBy operations in solr using StatsComponent concept but i've no idea to implement collect_set() in solr. Collect_set() is used in Hive queries. Please provide me equivalent function for collect_set in solr or links or how to achieve it. It'd be a great help. Thanks, Vivek
SUM and groupBy in solr
Hi, How to execute this query: select SUM(Primary_cause_vaR), RiskType,market from bil_tos Where skuType='Product' group by RiskType,market; I've used http://wiki.apache.org/solr/StatsComponent for this: * I see only sum with respective groupBy fields but i want to see RiskType, market fields also in the result Thanks, Vivek
Fwd: Implementing Hive query in Solr
Hi, Can anyone please look into this issue. I want to implement this query in solr. Thanks, Vivek -- Forwarded message -- From: Vivekanand Ittigi vi...@biginfolabs.com Date: Thu, Jun 12, 2014 at 11:08 AM Subject: Implementing Hive query in Solr To: solr-user@lucene.apache.org solr-user@lucene.apache.org Hi, My requirements is to execute this query(hive) in solr: select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market, collect_set(primary_cause) from bil_tos Where skuType='Product' group by RiskType,market; I can implement sum and groupBy operations in solr using StatsComponent concept but i've no idea to implement collect_set() in solr. Collect_set() is used in Hive queries. Please provide me equivalent function for collect_set in solr or links or how to achieve it. It'd be a great help. Thanks, Vivek
Implementing Hive query in Solr
Hi, My requirements is to execute this query(hive) in solr: select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market, collect_set(primary_cause) from bil_tos Where skuType='Product' group by RiskType,market; I can implement sum and groupBy operations in solr using StatsComponent concept but i've no idea to implement collect_set() in solr. Collect_set() is used in Hive queries. Please provide me equivalent function for collect_set in solr or links or how to achieve it. It'd be a great help. Thanks, Vivek
Re: Integrate solr with openNLP
Hi Aman, Yeah, We are also thinking the same. Using UIMA is better. And thanks to everyone. You guys really showed us the way(UIMA). We'll work on it. Thanks, Vivek On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Vikek, As everybody in the mail list mentioned to use UIMA you should go for it, as opennlp issues are not tracking properly, it can make stuck your development in near future if any issue comes, so its better to start investigate with uima. With Regards Aman Tandon On Fri, Jun 6, 2014 at 11:00 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Can anyone pleas reply..? Thanks, Vivek -- Forwarded message -- From: Vivekanand Ittigi vi...@biginfolabs.com Date: Wed, Jun 4, 2014 at 4:38 PM Subject: Re: Integrate solr with openNLP To: Tommaso Teofili tommaso.teof...@gmail.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org, Ahmet Arslan iori...@yahoo.com Hi Tommaso, Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm trying to apply named recognition(person name) token but im not seeing any change. my schema.xml looks like this: field name=text type=text_opennlp_pos_ner indexed=true stored=true multiValued=true/ fieldType name=text_opennlp_pos_ner class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.OpenNLPTokenizerFactory tokenizerModel=opennlp/en-token.bin / filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-person.bin / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please guide..? Thanks, Vivek On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, Ahment was suggesting to eventually use UIMA integration because OpenNLP has already an integration with Apache UIMA and so you would just have to use that [1]. And that's one of the main reason UIMA integration was done: it's a framework that you can easily hook into in order to plug your NLP algorithm. If you want to just use OpenNLP then it's up to you if either write your own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP to your documents or either you can write a dedicated analyzer / tokenizer / token filter. For the OpenNLP integration (LUCENE-2899), the patch is not up to date with the latest APIs in trunk, however you should be able to apply it to (if I recall correctly) to 4.4 version or so, and also adapting it to the latest API shouldn't be too hard. Regards, Tommaso [1] : http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima [2] : http://wiki.apache.org/solr/UpdateRequestProcessor 2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid: Can you extract names, locations etc using OpenNLP in plain/straight java program? If yes, here are two seperate options : 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an example to integrate your NER code into it and write your own indexing code. You have the full power here. No solr-plugins are involved. 2) Use 'Implementing a conditional copyField' given here : http://wiki.apache.org/solr/UpdateRequestProcessor as an example and integrate your NER code into it. Please note that these are separate ways to enrich your incoming documents, choose either (1) or (2). On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Okay, but i dint understand what you said. Can you please elaborate. Thanks, Vivek On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vivekanand, I have never use UIMA+Solr before. Personally I think it takes more time to learn how to configure/use these uima stuff. If you are familiar with java, write a class that extends UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new fields (organisation, city, person name, etc, to your document. This phase is usually called 'enrichment'. Does that makes sense? On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Ahmet, I followed what you said https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But how can i achieve my goal? i mean extracting only name of the organization or person from the content field. I guess i'm almost there but something is missing? please guide me Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Entire goal cant be said but one of those tasks can be like
Fwd: Integrate solr with openNLP
Can anyone pleas reply..? Thanks, Vivek -- Forwarded message -- From: Vivekanand Ittigi vi...@biginfolabs.com Date: Wed, Jun 4, 2014 at 4:38 PM Subject: Re: Integrate solr with openNLP To: Tommaso Teofili tommaso.teof...@gmail.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org, Ahmet Arslan iori...@yahoo.com Hi Tommaso, Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm trying to apply named recognition(person name) token but im not seeing any change. my schema.xml looks like this: field name=text type=text_opennlp_pos_ner indexed=true stored=true multiValued=true/ fieldType name=text_opennlp_pos_ner class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.OpenNLPTokenizerFactory tokenizerModel=opennlp/en-token.bin / filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-person.bin / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please guide..? Thanks, Vivek On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, Ahment was suggesting to eventually use UIMA integration because OpenNLP has already an integration with Apache UIMA and so you would just have to use that [1]. And that's one of the main reason UIMA integration was done: it's a framework that you can easily hook into in order to plug your NLP algorithm. If you want to just use OpenNLP then it's up to you if either write your own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP to your documents or either you can write a dedicated analyzer / tokenizer / token filter. For the OpenNLP integration (LUCENE-2899), the patch is not up to date with the latest APIs in trunk, however you should be able to apply it to (if I recall correctly) to 4.4 version or so, and also adapting it to the latest API shouldn't be too hard. Regards, Tommaso [1] : http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima [2] : http://wiki.apache.org/solr/UpdateRequestProcessor 2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid: Can you extract names, locations etc using OpenNLP in plain/straight java program? If yes, here are two seperate options : 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an example to integrate your NER code into it and write your own indexing code. You have the full power here. No solr-plugins are involved. 2) Use 'Implementing a conditional copyField' given here : http://wiki.apache.org/solr/UpdateRequestProcessor as an example and integrate your NER code into it. Please note that these are separate ways to enrich your incoming documents, choose either (1) or (2). On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Okay, but i dint understand what you said. Can you please elaborate. Thanks, Vivek On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vivekanand, I have never use UIMA+Solr before. Personally I think it takes more time to learn how to configure/use these uima stuff. If you are familiar with java, write a class that extends UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new fields (organisation, city, person name, etc, to your document. This phase is usually called 'enrichment'. Does that makes sense? On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Ahmet, I followed what you said https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But how can i achieve my goal? i mean extracting only name of the organization or person from the content field. I guess i'm almost there but something is missing? please guide me Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Entire goal cant be said but one of those tasks can be like this.. we have big document(can be website or pdf etc) indexed to the solr. Lets say field name=content will sore store the contents of document. All i want to do is pick name of persons,places from it using openNLP or some other means. Those names should be reflected in solr itself. Thanks, Vivek On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Please tell us what you are trying to in a new treat. Your high level goal. There may be some other ways/tools such as ( https://stanbol.apache.org ) other than OpenNLP. On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: We'll surely look into UIMA integration. But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the only link we've got to integrate?isn't there any other article or link which may help us to do
Re: Integrate solr with openNLP
Hi Tommaso, Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm trying to apply named recognition(person name) token but im not seeing any change. my schema.xml looks like this: field name=text type=text_opennlp_pos_ner indexed=true stored=true multiValued=true/ fieldType name=text_opennlp_pos_ner class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.OpenNLPTokenizerFactory tokenizerModel=opennlp/en-token.bin / filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-person.bin / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please guide..? Thanks, Vivek On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, Ahment was suggesting to eventually use UIMA integration because OpenNLP has already an integration with Apache UIMA and so you would just have to use that [1]. And that's one of the main reason UIMA integration was done: it's a framework that you can easily hook into in order to plug your NLP algorithm. If you want to just use OpenNLP then it's up to you if either write your own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP to your documents or either you can write a dedicated analyzer / tokenizer / token filter. For the OpenNLP integration (LUCENE-2899), the patch is not up to date with the latest APIs in trunk, however you should be able to apply it to (if I recall correctly) to 4.4 version or so, and also adapting it to the latest API shouldn't be too hard. Regards, Tommaso [1] : http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima [2] : http://wiki.apache.org/solr/UpdateRequestProcessor 2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid: Can you extract names, locations etc using OpenNLP in plain/straight java program? If yes, here are two seperate options : 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an example to integrate your NER code into it and write your own indexing code. You have the full power here. No solr-plugins are involved. 2) Use 'Implementing a conditional copyField' given here : http://wiki.apache.org/solr/UpdateRequestProcessor as an example and integrate your NER code into it. Please note that these are separate ways to enrich your incoming documents, choose either (1) or (2). On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Okay, but i dint understand what you said. Can you please elaborate. Thanks, Vivek On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vivekanand, I have never use UIMA+Solr before. Personally I think it takes more time to learn how to configure/use these uima stuff. If you are familiar with java, write a class that extends UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new fields (organisation, city, person name, etc, to your document. This phase is usually called 'enrichment'. Does that makes sense? On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Ahmet, I followed what you said https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But how can i achieve my goal? i mean extracting only name of the organization or person from the content field. I guess i'm almost there but something is missing? please guide me Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Entire goal cant be said but one of those tasks can be like this.. we have big document(can be website or pdf etc) indexed to the solr. Lets say field name=content will sore store the contents of document. All i want to do is pick name of persons,places from it using openNLP or some other means. Those names should be reflected in solr itself. Thanks, Vivek On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Please tell us what you are trying to in a new treat. Your high level goal. There may be some other ways/tools such as ( https://stanbol.apache.org ) other than OpenNLP. On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: We'll surely look into UIMA integration. But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the only link we've got to integrate?isn't there any other article or link which may help us to do fix this problem. Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I believe I answered it. Let me re-try, There is no committed code for OpenNLP. There is an open ticket with patches. They may not work with current trunk. Confluence is the official
Re: Integrate solr with openNLP
Entire goal cant be said but one of those tasks can be like this.. we have big document(can be website or pdf etc) indexed to the solr. Lets say field name=content will sore store the contents of document. All i want to do is pick name of persons,places from it using openNLP or some other means. Those names should be reflected in solr itself. Thanks, Vivek On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Please tell us what you are trying to in a new treat. Your high level goal. There may be some other ways/tools such as ( https://stanbol.apache.org ) other than OpenNLP. On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: We'll surely look into UIMA integration. But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the only link we've got to integrate?isn't there any other article or link which may help us to do fix this problem. Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I believe I answered it. Let me re-try, There is no committed code for OpenNLP. There is an open ticket with patches. They may not work with current trunk. Confluence is the official documentation. Wiki is maintained by community. Meaning wiki can talk about some uncommitted features/stuff. Like this one : https://wiki.apache.org/solr/OpenNLP What I am suggesting is, have a look at https://cwiki.apache.org/confluence/display/solr/UIMA+Integration And search how to use OpenNLP inside UIMA. May be LUCENE-2899 is already doable with solr-uima. I am adding Tommaso (sorry for this but we need an authoritative answer here) to clarify this. Also consider indexing with SolrJ and use OpenNLP enrichment outside the solr. Use openNLP with plain java, enrich your documents and index them with SolJ. You don't have to too everything inside solr as solr-plugins. Hope this helps, Ahmet On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Thanks, I will check with the jira.. but you dint answe my first question..? And there's no way to integrate solr with openNLP?or is there any committed code, using which i can go head. Thanks, Vivek On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Here is the jira issue : https://issues.apache.org/jira/browse/LUCENE-2899 Anyone can create an account. I didn't use UIMA by myself and I have little knowledge about it. But I believe it is possible to use OpenNLP inside UIMA. You need to dig into UIMA documentation. Solr UIMA integration already exists, thats why I questioned whether your requirement is possible with uima or not. I don't know the answer myself. Ahmet On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Arslan, If not uncommitted code, then which code to be used to integrate? If i have to comment my problems, which jira and how to put it? And why you are suggesting UIMA integration. My requirements is integrating with openNLP.? You mean we can do all the acitivties through UIMA as we do it using openNLP..?like name,location finder etc? Thanks, Vivek On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Uncommitted code could have these kind of problems. It is not guaranteed to work with latest trunk. You could commend the problem you face on the jira ticket. By the way, may be you are after something doable with already committed UIMA stuff? https://cwiki.apache.org/confluence/display/solr/UIMA+Integration Ahmet On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to integrate Installation For English language testing: Until LUCENE-2899 is committed: 1.pull the latest trunk or 4.0 branch 2.apply the latest LUCENE-2899 patch 3.do 'ant compile' cd solr/contrib/opennlp/src/test-files/training . . . i followed first two steps but got the following error while executing 3rd point common.compile-core: [javac] Compiling 10 source files to /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java [javac] warning: [path] bad path element /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar: no such file or directory [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43: error: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac
Re: Integrate solr with openNLP
Hi Ahmet, I followed what you said https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But how can i achieve my goal? i mean extracting only name of the organization or person from the content field. I guess i'm almost there but something is missing? please guide me Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Entire goal cant be said but one of those tasks can be like this.. we have big document(can be website or pdf etc) indexed to the solr. Lets say field name=content will sore store the contents of document. All i want to do is pick name of persons,places from it using openNLP or some other means. Those names should be reflected in solr itself. Thanks, Vivek On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Please tell us what you are trying to in a new treat. Your high level goal. There may be some other ways/tools such as ( https://stanbol.apache.org ) other than OpenNLP. On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: We'll surely look into UIMA integration. But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the only link we've got to integrate?isn't there any other article or link which may help us to do fix this problem. Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I believe I answered it. Let me re-try, There is no committed code for OpenNLP. There is an open ticket with patches. They may not work with current trunk. Confluence is the official documentation. Wiki is maintained by community. Meaning wiki can talk about some uncommitted features/stuff. Like this one : https://wiki.apache.org/solr/OpenNLP What I am suggesting is, have a look at https://cwiki.apache.org/confluence/display/solr/UIMA+Integration And search how to use OpenNLP inside UIMA. May be LUCENE-2899 is already doable with solr-uima. I am adding Tommaso (sorry for this but we need an authoritative answer here) to clarify this. Also consider indexing with SolrJ and use OpenNLP enrichment outside the solr. Use openNLP with plain java, enrich your documents and index them with SolJ. You don't have to too everything inside solr as solr-plugins. Hope this helps, Ahmet On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Thanks, I will check with the jira.. but you dint answe my first question..? And there's no way to integrate solr with openNLP?or is there any committed code, using which i can go head. Thanks, Vivek On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Here is the jira issue : https://issues.apache.org/jira/browse/LUCENE-2899 Anyone can create an account. I didn't use UIMA by myself and I have little knowledge about it. But I believe it is possible to use OpenNLP inside UIMA. You need to dig into UIMA documentation. Solr UIMA integration already exists, thats why I questioned whether your requirement is possible with uima or not. I don't know the answer myself. Ahmet On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Arslan, If not uncommitted code, then which code to be used to integrate? If i have to comment my problems, which jira and how to put it? And why you are suggesting UIMA integration. My requirements is integrating with openNLP.? You mean we can do all the acitivties through UIMA as we do it using openNLP..?like name,location finder etc? Thanks, Vivek On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Uncommitted code could have these kind of problems. It is not guaranteed to work with latest trunk. You could commend the problem you face on the jira ticket. By the way, may be you are after something doable with already committed UIMA stuff? https://cwiki.apache.org/confluence/display/solr/UIMA+Integration Ahmet On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to integrate Installation For English language testing: Until LUCENE-2899 is committed: 1.pull the latest trunk or 4.0 branch 2.apply the latest LUCENE-2899 patch 3.do 'ant compile' cd solr/contrib/opennlp/src/test-files/training . . . i followed first two steps but got the following error while executing 3rd point common.compile-core: [javac] Compiling 10 source files to /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java [javac] warning: [path] bad path element /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3
Re: Integrate solr with openNLP
Okay, but i dint understand what you said. Can you please elaborate. Thanks, Vivek On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vivekanand, I have never use UIMA+Solr before. Personally I think it takes more time to learn how to configure/use these uima stuff. If you are familiar with java, write a class that extends UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new fields (organisation, city, person name, etc, to your document. This phase is usually called 'enrichment'. Does that makes sense? On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Ahmet, I followed what you said https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But how can i achieve my goal? i mean extracting only name of the organization or person from the content field. I guess i'm almost there but something is missing? please guide me Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Entire goal cant be said but one of those tasks can be like this.. we have big document(can be website or pdf etc) indexed to the solr. Lets say field name=content will sore store the contents of document. All i want to do is pick name of persons,places from it using openNLP or some other means. Those names should be reflected in solr itself. Thanks, Vivek On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Please tell us what you are trying to in a new treat. Your high level goal. There may be some other ways/tools such as ( https://stanbol.apache.org ) other than OpenNLP. On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: We'll surely look into UIMA integration. But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the only link we've got to integrate?isn't there any other article or link which may help us to do fix this problem. Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I believe I answered it. Let me re-try, There is no committed code for OpenNLP. There is an open ticket with patches. They may not work with current trunk. Confluence is the official documentation. Wiki is maintained by community. Meaning wiki can talk about some uncommitted features/stuff. Like this one : https://wiki.apache.org/solr/OpenNLP What I am suggesting is, have a look at https://cwiki.apache.org/confluence/display/solr/UIMA+Integration And search how to use OpenNLP inside UIMA. May be LUCENE-2899 is already doable with solr-uima. I am adding Tommaso (sorry for this but we need an authoritative answer here) to clarify this. Also consider indexing with SolrJ and use OpenNLP enrichment outside the solr. Use openNLP with plain java, enrich your documents and index them with SolJ. You don't have to too everything inside solr as solr-plugins. Hope this helps, Ahmet On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Thanks, I will check with the jira.. but you dint answe my first question..? And there's no way to integrate solr with openNLP?or is there any committed code, using which i can go head. Thanks, Vivek On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Here is the jira issue : https://issues.apache.org/jira/browse/LUCENE-2899 Anyone can create an account. I didn't use UIMA by myself and I have little knowledge about it. But I believe it is possible to use OpenNLP inside UIMA. You need to dig into UIMA documentation. Solr UIMA integration already exists, thats why I questioned whether your requirement is possible with uima or not. I don't know the answer myself. Ahmet On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Arslan, If not uncommitted code, then which code to be used to integrate? If i have to comment my problems, which jira and how to put it? And why you are suggesting UIMA integration. My requirements is integrating with openNLP.? You mean we can do all the acitivties through UIMA as we do it using openNLP..?like name,location finder etc? Thanks, Vivek On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Uncommitted code could have these kind of problems. It is not guaranteed to work with latest trunk. You could commend the problem you face on the jira ticket. By the way, may be you are after something doable with already committed UIMA stuff? https://cwiki.apache.org/confluence/display/solr/UIMA+Integration Ahmet On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi
Unable to use OpenCalais Annotator in UIMA+solr
I followed this link https://cwiki.apache.org/confluence/display/solr/UIMA+Integration to integrate solr+uima. I'm succeeded in integrating. SentenceAnnotation is working fine but i want use openCalasis annotation so that i can fetch person, place,organization name. Nowhere its mentioned about which annotation is used as org.apache.uima.SentenceAnnotation is used for producing sentences. Please guide me which annotation to be used? Thanks, Vivek
Integrate solr with openNLP
I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to integrate Installation For English language testing: Until LUCENE-2899 is committed: 1.pull the latest trunk or 4.0 branch 2.apply the latest LUCENE-2899 patch 3.do 'ant compile' cd solr/contrib/opennlp/src/test-files/training . . . i followed first two steps but got the following error while executing 3rd point common.compile-core: [javac] Compiling 10 source files to /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java [javac] warning: [path] bad path element /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar: no such file or directory [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43: error: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac] location: class Version [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56: error: no suitable constructor found for Tokenizer(Reader) [javac] super(input); [javac] ^ [javac] constructor Tokenizer.Tokenizer(AttributeFactory) is not applicable [javac] (actual argument Reader cannot be converted to AttributeFactory by method invocation conversion) [javac] constructor Tokenizer.Tokenizer() is not applicable [javac] (actual and formal argument lists differ in length) [javac] 2 errors [javac] 1 warning Im really stuck how to passthough this step. I wasted my entire to fix this but couldn't move a bit. Please someone help me..? Thanks, Vivek
Re: Integrate solr with openNLP
Hi Arslan, If not uncommitted code, then which code to be used to integrate? If i have to comment my problems, which jira and how to put it? And why you are suggesting UIMA integration. My requirements is integrating with openNLP.? You mean we can do all the acitivties through UIMA as we do it using openNLP..?like name,location finder etc? Thanks, Vivek On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Uncommitted code could have these kind of problems. It is not guaranteed to work with latest trunk. You could commend the problem you face on the jira ticket. By the way, may be you are after something doable with already committed UIMA stuff? https://cwiki.apache.org/confluence/display/solr/UIMA+Integration Ahmet On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to integrate Installation For English language testing: Until LUCENE-2899 is committed: 1.pull the latest trunk or 4.0 branch 2.apply the latest LUCENE-2899 patch 3.do 'ant compile' cd solr/contrib/opennlp/src/test-files/training . . . i followed first two steps but got the following error while executing 3rd point common.compile-core: [javac] Compiling 10 source files to /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java [javac] warning: [path] bad path element /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar: no such file or directory [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43: error: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac] location: class Version [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56: error: no suitable constructor found for Tokenizer(Reader) [javac] super(input); [javac] ^ [javac] constructor Tokenizer.Tokenizer(AttributeFactory) is not applicable [javac] (actual argument Reader cannot be converted to AttributeFactory by method invocation conversion) [javac] constructor Tokenizer.Tokenizer() is not applicable [javac] (actual and formal argument lists differ in length) [javac] 2 errors [javac] 1 warning Im really stuck how to passthough this step. I wasted my entire to fix this but couldn't move a bit. Please someone help me..? Thanks, Vivek
Re: Integrate solr with openNLP
Thanks, I will check with the jira.. but you dint answe my first question..? And there's no way to integrate solr with openNLP?or is there any committed code, using which i can go head. Thanks, Vivek On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Here is the jira issue : https://issues.apache.org/jira/browse/LUCENE-2899 Anyone can create an account. I didn't use UIMA by myself and I have little knowledge about it. But I believe it is possible to use OpenNLP inside UIMA. You need to dig into UIMA documentation. Solr UIMA integration already exists, thats why I questioned whether your requirement is possible with uima or not. I don't know the answer myself. Ahmet On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Arslan, If not uncommitted code, then which code to be used to integrate? If i have to comment my problems, which jira and how to put it? And why you are suggesting UIMA integration. My requirements is integrating with openNLP.? You mean we can do all the acitivties through UIMA as we do it using openNLP..?like name,location finder etc? Thanks, Vivek On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Uncommitted code could have these kind of problems. It is not guaranteed to work with latest trunk. You could commend the problem you face on the jira ticket. By the way, may be you are after something doable with already committed UIMA stuff? https://cwiki.apache.org/confluence/display/solr/UIMA+Integration Ahmet On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to integrate Installation For English language testing: Until LUCENE-2899 is committed: 1.pull the latest trunk or 4.0 branch 2.apply the latest LUCENE-2899 patch 3.do 'ant compile' cd solr/contrib/opennlp/src/test-files/training . . . i followed first two steps but got the following error while executing 3rd point common.compile-core: [javac] Compiling 10 source files to /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java [javac] warning: [path] bad path element /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar: no such file or directory [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43: error: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac] location: class Version [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56: error: no suitable constructor found for Tokenizer(Reader) [javac] super(input); [javac] ^ [javac] constructor Tokenizer.Tokenizer(AttributeFactory) is not applicable [javac] (actual argument Reader cannot be converted to AttributeFactory by method invocation conversion) [javac] constructor Tokenizer.Tokenizer() is not applicable [javac] (actual and formal argument lists differ in length) [javac] 2 errors [javac] 1 warning Im really stuck how to passthough this step. I wasted my entire to fix this but couldn't move a bit. Please someone help me..? Thanks, Vivek
Re: Integrate solr with openNLP
We'll surely look into UIMA integration. But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the only link we've got to integrate?isn't there any other article or link which may help us to do fix this problem. Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I believe I answered it. Let me re-try, There is no committed code for OpenNLP. There is an open ticket with patches. They may not work with current trunk. Confluence is the official documentation. Wiki is maintained by community. Meaning wiki can talk about some uncommitted features/stuff. Like this one : https://wiki.apache.org/solr/OpenNLP What I am suggesting is, have a look at https://cwiki.apache.org/confluence/display/solr/UIMA+Integration And search how to use OpenNLP inside UIMA. May be LUCENE-2899 is already doable with solr-uima. I am adding Tommaso (sorry for this but we need an authoritative answer here) to clarify this. Also consider indexing with SolrJ and use OpenNLP enrichment outside the solr. Use openNLP with plain java, enrich your documents and index them with SolJ. You don't have to too everything inside solr as solr-plugins. Hope this helps, Ahmet On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Thanks, I will check with the jira.. but you dint answe my first question..? And there's no way to integrate solr with openNLP?or is there any committed code, using which i can go head. Thanks, Vivek On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Here is the jira issue : https://issues.apache.org/jira/browse/LUCENE-2899 Anyone can create an account. I didn't use UIMA by myself and I have little knowledge about it. But I believe it is possible to use OpenNLP inside UIMA. You need to dig into UIMA documentation. Solr UIMA integration already exists, thats why I questioned whether your requirement is possible with uima or not. I don't know the answer myself. Ahmet On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Arslan, If not uncommitted code, then which code to be used to integrate? If i have to comment my problems, which jira and how to put it? And why you are suggesting UIMA integration. My requirements is integrating with openNLP.? You mean we can do all the acitivties through UIMA as we do it using openNLP..?like name,location finder etc? Thanks, Vivek On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Uncommitted code could have these kind of problems. It is not guaranteed to work with latest trunk. You could commend the problem you face on the jira ticket. By the way, may be you are after something doable with already committed UIMA stuff? https://cwiki.apache.org/confluence/display/solr/UIMA+Integration Ahmet On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to integrate Installation For English language testing: Until LUCENE-2899 is committed: 1.pull the latest trunk or 4.0 branch 2.apply the latest LUCENE-2899 patch 3.do 'ant compile' cd solr/contrib/opennlp/src/test-files/training . . . i followed first two steps but got the following error while executing 3rd point common.compile-core: [javac] Compiling 10 source files to /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java [javac] warning: [path] bad path element /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar: no such file or directory [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43: error: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac] location: class Version [javac] /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56: error: no suitable constructor found for Tokenizer(Reader) [javac] super(input); [javac] ^ [javac] constructor Tokenizer.Tokenizer(AttributeFactory) is not applicable [javac] (actual argument Reader cannot be converted to AttributeFactory by method invocation conversion) [javac] constructor Tokenizer.Tokenizer() is not applicable [javac] (actual and formal argument lists differ in length) [javac] 2 errors [javac] 1 warning Im really stuck how