Re: Integrate solr with openNLP

2014-09-10 Thread Vivekanand Ittigi
Actually we dropped integrating nlp with solr but we took two different
ideas:

* we're using nlp seperately not with solr
* we're taking help of UIMA for solr. Its more advanced.

If you've a specific question. you can ask me. I'll tell you if i know.

-Vivek

On Wed, Sep 10, 2014 at 3:46 PM, Aman Tandon amantandon...@gmail.com
wrote:

 Hi,

 What is the progress of integration of nlp with solr. If you have achieved
 this integration techniques successfully then please share with us.

 With Regards
 Aman Tandon

 On Tue, Jun 10, 2014 at 11:04 AM, Vivekanand Ittigi vi...@biginfolabs.com
 
 wrote:

  Hi Aman,
 
  Yeah, We are also thinking the same. Using UIMA is better. And thanks to
  everyone. You guys really showed us the way(UIMA).
 
  We'll work on it.
 
  Thanks,
  Vivek
 
 
  On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon amantandon...@gmail.com
  wrote:
 
   Hi Vikek,
  
   As everybody in the mail list mentioned to use UIMA you should go for
 it,
   as opennlp issues are not tracking properly, it can make stuck your
   development in near future if any issue comes, so its better to start
   investigate with uima.
  
  
   With Regards
   Aman Tandon
  
  
   On Fri, Jun 6, 2014 at 11:00 AM, Vivekanand Ittigi 
  vi...@biginfolabs.com
   wrote:
  
Can anyone pleas reply..?
   
Thanks,
Vivek
   
-- Forwarded message --
From: Vivekanand Ittigi vi...@biginfolabs.com
Date: Wed, Jun 4, 2014 at 4:38 PM
Subject: Re: Integrate solr with openNLP
To: Tommaso Teofili tommaso.teof...@gmail.com
Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org,
 Ahmet
Arslan iori...@yahoo.com
   
   
Hi Tommaso,
   
Yes, you are right. 4.4 version will work.. I'm able to compile now.
  I'm
trying to apply named recognition(person name) token but im not
 seeing
   any
change. my schema.xml looks like this:
   
field name=text type=text_opennlp_pos_ner indexed=true
   stored=true
multiValued=true/
   
fieldType name=text_opennlp_pos_ner class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.OpenNLPTokenizerFactory
  tokenizerModel=opennlp/en-token.bin
/
filter class=solr.OpenNLPFilterFactory
  nerTaggerModels=opennlp/en-ner-person.bin
/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
   
/fieldType
   
Please guide..?
   
Thanks,
Vivek
   
   
On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili 
   tommaso.teof...@gmail.com

wrote:
   
 Hi all,

 Ahment was suggesting to eventually use UIMA integration because
   OpenNLP
 has already an integration with Apache UIMA and so you would just
  have
   to
 use that [1].
 And that's one of the main reason UIMA integration was done: it's a
 framework that you can easily hook into in order to plug your NLP
algorithm.

 If you want to just use OpenNLP then it's up to you if either write
   your
 own UpdateRequestProcessor plugin [2] to add metadata extracted by
OpenNLP
 to your documents or either you can write a dedicated analyzer /
tokenizer
 / token filter.

 For the OpenNLP integration (LUCENE-2899), the patch is not up to
  date
 with the latest APIs in trunk, however you should be able to apply
 it
   to
 (if I recall correctly) to 4.4 version or so, and also adapting it
 to
   the
 latest API shouldn't be too hard.

 Regards,
 Tommaso

 [1] :

   
  
 
 http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima
 [2] : http://wiki.apache.org/solr/UpdateRequestProcessor



 2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid
 :

 Can you extract names, locations etc using OpenNLP in
 plain/straight
   java
 program?

 If yes, here are two seperate options :

 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an
 example to integrate your NER code into it and write your own
  indexing
 code. You have the full power here. No solr-plugins are involved.

 2) Use 'Implementing a conditional copyField' given here :
 http://wiki.apache.org/solr/UpdateRequestProcessor
 as an example and integrate your NER code into it.


 Please note that these are separate ways to enrich your incoming
 documents, choose either (1) or (2).



 On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com wrote:
 Okay, but i dint understand what you said. Can you please
 elaborate.

 Thanks,
 Vivek





 On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com
   wrote:

  Hi Vivekanand,
 
  I have never use UIMA+Solr before.
 
  Personally I think it takes more time to learn how to
  configure/use

Retrieving multivalued field elements

2014-08-25 Thread Vivekanand Ittigi
Hi,

I've multivalued field and i want to display all array elements using solrj
command.

I used the command mentioned below but i'm able to retrieve only 1st
element of the array.

response.getResults().get(0).getFieldValueMap().get(discussions)
Output: Creation Time - 2014-06-12 17:37:53.0

NOTE: discussions is multivalued field in solr which contains

 arr name=discussions
  strCreation Time - 2014-06-12 17:37:53.0/str
  strLast modified Time - 2014-06-12 17:42:09.0/str
  strComment - posting bug from risk flows ...posting comment from
risk flows ...syncing comments .../str
/arr

Is there any solrj API used for retrieving multivalued elements or its not
possible..?

-Vivek


Re: Retrieving multivalued field elements

2014-08-25 Thread Vivekanand Ittigi
Yes, you are right. It worked !

-Vivek


On Mon, Aug 25, 2014 at 7:39 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Vivek,

 how about this?

 IteratorSolrDocument iter = queryResponse.getResults().iterator();

 while (iter.hasNext()) {
   SolrDocument resultDoc = iter.next();

   CollectionObject content =
  resultDoc.getFieldValues(discussions);
 }



 On Monday, August 25, 2014 4:55 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com wrote:
 Hi,

 I've multivalued field and i want to display all array elements using solrj
 command.

 I used the command mentioned below but i'm able to retrieve only 1st
 element of the array.

 response.getResults().get(0).getFieldValueMap().get(discussions)
 Output: Creation Time - 2014-06-12 17:37:53.0

 NOTE: discussions is multivalued field in solr which contains

 arr name=discussions
   strCreation Time - 2014-06-12 17:37:53.0/str
   strLast modified Time - 2014-06-12 17:42:09.0/str
   strComment - posting bug from risk flows ...posting comment from
 risk flows ...syncing comments .../str
 /arr

 Is there any solrj API used for retrieving multivalued elements or its not
 possible..?

 -Vivek




Unable to read HBase data from solr

2014-08-13 Thread Vivekanand Ittigi
I'm trying to read specific HBase data and index into solr using groovy
script in /update handler of solrconfig file but I'm getting the error
mentioned below

I'm placing the same HBase jar on which i'm running in solr lib. Many
article said

WorkAround:
1. First i thought that class path has two default xmls and its throwing
the error because one of the two is from some older version of hbase jar.
But the class path has no hbase jar.
2. Setting hbase.default.for.version.skip to true in hbase-site.xml and
adding that to class path

But still im getting the same error. I think solr internally reads
hbase-site.xml file but do not know from where..?

Please help me.. If further info is needed i'm ready to provide


SEVERE: org.apache.solr.common.SolrException: Unable to invoke function
processAdd in script: update-script.groovy: java.lang.RuntimeException:
hbase-default.xml file seems to be for and old version of HBase (null),
this version is 0.94.10
at
org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.invokeFunction(StatelessScriptUpdateProcessorFactory.java:433)
at
org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$ScriptUpdateProcessor.processAdd(StatelessScriptUpdateProcessorFactory.java:374)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)


Unable to get the class from external jar in Update handler

2014-08-06 Thread Vivekanand Ittigi
Hi,

I've made the a jar which contains a class called ConcatClass and i've
put this jar under lib of solr. And i'm trying to access this class in
update-script.groovy in /update handleer.

But groovy is not picking up ConcatClass class, giving the following
error:

SEVERE: Unable to create core: collection1
org.apache.solr.common.SolrException: Unable to initialize scripts: Unable
to evaluate script: update-script.groovy
at org.apache.solr.core.SolrCore.init(SolrCore.java:806)
at org.apache.solr.core.SolrCore.init(SolrCore.java:619)
at
org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:967)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1049)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.solr.common.SolrException: Unable to initialize
scripts: Unable to evaluate script: update-script.groovy
at
org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory.inform(StatelessScriptUpdateProcessorFactory.java:232)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:592)
at org.apache.solr.core.SolrCore.init(SolrCore.java:801)
... 11 more
Caused by: org.apache.solr.common.SolrException: Unable to evaluate script:
update-script.groovy
at
org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory.initEngines(StatelessScriptUpdateProcessorFactory.java:314)
at
org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory.inform(StatelessScriptUpdateProcessorFactory.java:228)
... 13 more
Caused by: javax.script.ScriptException:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup
failed:
Script1.groovy: 1: unable to resolve class
com.biginfolabs.openNLP.ConcatClass
 @ line 1, column 1.
   import com.biginfolabs.openNLP.ConcatClass;
   ^

1 error

at
org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.eval(GroovyScriptEngineImpl.java:151)
at
org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.eval(GroovyScriptEngineImpl.java:122)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:249)
at
org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory.initEngines(StatelessScriptUpdateProcessorFactory.java:312)
... 14 more

NOTE: I've put this jar under lib of solr.

Groovy file is not able to recognise ConcatClass.. does anyone have idea
for it.. I wasted whole day to fix this.

Thanks,
Vivek


crawling all links of same domain in nutch in solr

2014-07-28 Thread Vivekanand Ittigi
Hi,

Can anyone tel me how to crawl all other pages of same domain.
For example i'm feeding a website http://www.techcrunch.com/ in seed.txt.

Following property is added in nutch-site.xml

property
  namedb.ignore.internal.links/name
  valuefalse/value
  descriptionIf true, when adding new links to a page, links from
  the same host are ignored.  This is an effective way to limit the
  size of the link database, keeping only the highest quality
  links.
  /description
/property

And following is added in regex-urlfilter.txt

# accept anything else
+.

Note: if i add http://www.tutorialspoint.com/ in seed.txt, I'm able to
crawl all other pages but not techcrunch.com's pages though it has got many
other pages too.

Please help..?

Thanks,
Vivek


Integrating Solr with HBase Using Lily Project

2014-07-18 Thread Vivekanand Ittigi
Hi,

I tried to Integrate Solr with HBase Using HBase Indexer project
https://github.com/NGDATA/hbase-indexer/wiki (one of sub projects of Lily).

I used Apache HBase running on HDFS and solr 4.8.0 but i started getting
below mentioned error.

14/07/18 11:55:38 WARN impl.SepConsumer: Error processing a batch of SEP
events, the error will be forwarded to HBase for retry
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
org.apache.solr.common.SolrException: Unknown document router
'{name=implicit}'
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at com.ngdata.sep.impl.SepConsumer.waitOnSepEventCompletion(
SepConsumer.java:235)
at com.ngdata.sep.impl.SepConsumer.replicateLogEntries(SepConsumer.java:220)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
HBaseServer.java:1428)
Caused by: java.lang.RuntimeException: org.apache.solr.common.SolrException:
Unknown document router '{name=implicit}'
at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(
IndexingEventListener.java:90)
at com.ngdata.sep.impl.SepEventExecutor$1.run(SepEventExecutor.java:97)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: org.apache.solr.common.SolrException: Unknown document router
'{name=implicit}'
at org.apache.solr.common.cloud.DocRouter.getDocRouter(DocRouter.java:46)
at org.apache.solr.common.cloud.ClusterState.collectionFromObjects(
ClusterState.java:263)
at org.apache.solr.common.cloud.ClusterState.load(ClusterState.java:231)
at org.apache.solr.common.cloud.ClusterState.load(ClusterState.java:207)
at org.apache.solr.common.cloud.ZkStateReader.createClusterStateWatchersAndU
pdate(ZkStateReader.java:299)

I googled for the cause, this link
https://groups.google.com/a/cloudera.org/forum/#!msg/search-user/p5xoeU194BM/XVdsyVpDjVUJ
says this works on 4.2.0. So i switched to 4.2.0 and it worked.

What i'm worried is why its not working in 4.8.0..? Is there anything i
should add extra for that to work..?

Now i've to index all my data in 4.2 and pretty much re-work so..?

Thanks,
Vivek


Re: About Query Parser

2014-06-27 Thread Vivekanand Ittigi
That's impressive answer. I actually wanted to know how exactly query
parser works. I'm actually supposed to collect some fields,values,other
related info and build a solr query. I wanted to know i should use this
queryparser or java code to build solr query. Anyway it looks i've to go
with java code so build it and im on it.

Thanks,
Vivek


On Fri, Jun 20, 2014 at 6:06 PM, Daniel Collins danwcoll...@gmail.com
wrote:

 I would say *:* is a human-readable/writable query. as is
 inStock:false.  The former will be converted by the query parser into a
 MatchAllDocsQuery which is what Lucene understands.  The latter will be
 converted (again by the query parser) into some query.  Now this is where
 *which* query parser you are using is important.  Is inStock a word to be
 queried, or a field in your schema?  Probably the latter, but the query
 parser has to determine that using the Solr schema.  So I would expect that
 query to be converted to a TermQuery(Term(inStock, false)), so a query
 for the value false in the field inStock.

 This is all interesting but what are you really trying to find out?  If you
 just want to run queries and see what they translate to, you can use the
 debug options when you send the query in, and then Solr will return to you
 both the raw query (with any other options that the query handler might
 have added to your query) as well as the Lucene Query generated from it.

 e.g.from running : on a solr instance.

 rawquerystring: *:*, querystring: *:*, parsedquery:
 MatchAllDocsQuery(*:*), parsedquery_toString: *:*, QParser:
 LuceneQParser,
 Or (this shows the difference between raw query syntax and parsed query
 syntax) rawquerystring: body_en:test AND headline_en:hello,
 querystring:
 body_en:test AND headline_en:hello, parsedquery: +body_en:test
 +headline_en:hello, parsedquery_toString: +body_en:test
 +headline_en:hello, QParser: LuceneQParser,


 On 20 June 2014 13:05, Vivekanand Ittigi vi...@biginfolabs.com wrote:

  All right let me put this.
 
 
 
 http://192.168.1.78:8983/solr/collection1/select?q=inStock:falsefacet=truefacet.field=popularitywt=xmlindent=true
  .
 
  I just want to know what is this form. is it lucene query or this query
  should go under query parser to get converted to lucene query.
 
 
  Thanks,
  Vivek
 
 
  On Fri, Jun 20, 2014 at 5:19 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  
  wrote:
 
   That's *:* and a special case. There is no scoring here, nor searching.
   Just a dump of documents. Not even filtering or faceting. I sure hope
 you
   have more interesting examples.
  
   Regards,
   Alex
   On 20/06/2014 6:40 pm, Vivekanand Ittigi vi...@biginfolabs.com
  wrote:
  
Hi Daniel,
   
You said inputs are human-generated and outputs are lucene
 objects.
   So
my question is what does the below query mean. Does this fall under
human-generated one or lucene.?
   
   
  http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true
   
Thanks,
Vivek
   
   
   
On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins 
 danwcoll...@gmail.com
  
wrote:
   
 Alexandre's response is very thorough, so I'm really simplifying
   things,
I
 confess but here's my query parsers for dummies. :)

 In terms of inputs/outputs, a QueryParser takes a string (generally
assumed
 to be human generated i.e. something a user might type in, so
  maybe a
 sentence, a set of words, the format can vary) and outputs a Lucene
   Query
 object (


   
  
 
 http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
 ),
 which in fact is a kind of tree (again, I'm simplifying I know)
   since a
 query can contain nested expressions.

 So very loosely its a translator from a human-generated query into
  the
 structure that Lucene can handle.  There are several different
 query
 parsers since they all use different input syntax, and ways of
  handling
 different constructs (to handle A and B, should the user type +A
 +B
   or
A
 and B or just A B for example), and have different levels of
  support
for
 the various Query structures that Lucene can handle: SpanQuery,
FuzzyQuery,
 PhraseQuery, etc.

 We for example use an XML-based query parser.  Why (you might well
   ask!),
 well we had an already used and supported query syntax of our own,
   which
 our users understood, so we couldn't use an off the shelf query
  parser.
 We
 could have built our own in Java, but for a variety of reasons we
  parse
our
 queries in a front-end system ahead of Solr (which is C++-based),
 so
  we
 needed an interim format to pass queries to Solr that was as near
 to
  a
 Lucene Query object as we could get (and there was an existing XML
   parser
 to save us starting from square one!).

 As part of that Query construction (but independent of which
   QueryParser
 you use), Solr will also

About Query Parser

2014-06-20 Thread Vivekanand Ittigi
Hi,

I think this might be a silly question but i want to make it clear.

What is query parser...? What does it do.? I know its used for converting
query. But from What to what?what is the input and what is the output of
query parser. And where exactly this feature can be used?

If possible please explain with the example. It really helps a lot?

Thanks,
Vivek


Re: About Query Parser

2014-06-20 Thread Vivekanand Ittigi
Hi Daniel,

You said inputs are human-generated and outputs are lucene objects. So
my question is what does the below query mean. Does this fall under
human-generated one or lucene.?

http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true

Thanks,
Vivek



On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com
wrote:

 Alexandre's response is very thorough, so I'm really simplifying things, I
 confess but here's my query parsers for dummies. :)

 In terms of inputs/outputs, a QueryParser takes a string (generally assumed
 to be human generated i.e. something a user might type in, so maybe a
 sentence, a set of words, the format can vary) and outputs a Lucene Query
 object (

 http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
 ),
 which in fact is a kind of tree (again, I'm simplifying I know) since a
 query can contain nested expressions.

 So very loosely its a translator from a human-generated query into the
 structure that Lucene can handle.  There are several different query
 parsers since they all use different input syntax, and ways of handling
 different constructs (to handle A and B, should the user type +A +B or A
 and B or just A B for example), and have different levels of support for
 the various Query structures that Lucene can handle: SpanQuery, FuzzyQuery,
 PhraseQuery, etc.

 We for example use an XML-based query parser.  Why (you might well ask!),
 well we had an already used and supported query syntax of our own, which
 our users understood, so we couldn't use an off the shelf query parser.  We
 could have built our own in Java, but for a variety of reasons we parse our
 queries in a front-end system ahead of Solr (which is C++-based), so we
 needed an interim format to pass queries to Solr that was as near to a
 Lucene Query object as we could get (and there was an existing XML parser
 to save us starting from square one!).

 As part of that Query construction (but independent of which QueryParser
 you use), Solr will also make use of a set of Tokenizers and Filters (

 https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
 )
 but that's more to do with dealing with the terms in the query (so in my
 examples above, is A a real word, does it need stemming, lowercasing,
 removing because its a stopword, etc).



Re: About Query Parser

2014-06-20 Thread Vivekanand Ittigi
All right let me put this.

http://192.168.1.78:8983/solr/collection1/select?q=inStock:falsefacet=truefacet.field=popularitywt=xmlindent=true
.

I just want to know what is this form. is it lucene query or this query
should go under query parser to get converted to lucene query.


Thanks,
Vivek


On Fri, Jun 20, 2014 at 5:19 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 That's *:* and a special case. There is no scoring here, nor searching.
 Just a dump of documents. Not even filtering or faceting. I sure hope you
 have more interesting examples.

 Regards,
 Alex
 On 20/06/2014 6:40 pm, Vivekanand Ittigi vi...@biginfolabs.com wrote:

  Hi Daniel,
 
  You said inputs are human-generated and outputs are lucene objects.
 So
  my question is what does the below query mean. Does this fall under
  human-generated one or lucene.?
 
  http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true
 
  Thanks,
  Vivek
 
 
 
  On Fri, Jun 20, 2014 at 3:55 PM, Daniel Collins danwcoll...@gmail.com
  wrote:
 
   Alexandre's response is very thorough, so I'm really simplifying
 things,
  I
   confess but here's my query parsers for dummies. :)
  
   In terms of inputs/outputs, a QueryParser takes a string (generally
  assumed
   to be human generated i.e. something a user might type in, so maybe a
   sentence, a set of words, the format can vary) and outputs a Lucene
 Query
   object (
  
  
 
 http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Query.html
   ),
   which in fact is a kind of tree (again, I'm simplifying I know)
 since a
   query can contain nested expressions.
  
   So very loosely its a translator from a human-generated query into the
   structure that Lucene can handle.  There are several different query
   parsers since they all use different input syntax, and ways of handling
   different constructs (to handle A and B, should the user type +A +B
 or
  A
   and B or just A B for example), and have different levels of support
  for
   the various Query structures that Lucene can handle: SpanQuery,
  FuzzyQuery,
   PhraseQuery, etc.
  
   We for example use an XML-based query parser.  Why (you might well
 ask!),
   well we had an already used and supported query syntax of our own,
 which
   our users understood, so we couldn't use an off the shelf query parser.
   We
   could have built our own in Java, but for a variety of reasons we parse
  our
   queries in a front-end system ahead of Solr (which is C++-based), so we
   needed an interim format to pass queries to Solr that was as near to a
   Lucene Query object as we could get (and there was an existing XML
 parser
   to save us starting from square one!).
  
   As part of that Query construction (but independent of which
 QueryParser
   you use), Solr will also make use of a set of Tokenizers and Filters (
  
  
 
 https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters
   )
   but that's more to do with dealing with the terms in the query (so in
 my
   examples above, is A a real word, does it need stemming, lowercasing,
   removing because its a stopword, etc).
  
 



making solr to understand English

2014-06-19 Thread Vivekanand Ittigi
Hi,

I'm trying to setup solr that should understand English. For example I've
indexed our company website (www.biginfolabs.com) or it could be any other
website or our own data.

If i put some English like queries i should get the one word answer just
what Google does;queries are:

* Where is India located.
* who is the father of Obama

Workaround:
* Integrated UIMA,Mahout with solr
* I read the book called Taming Text and implemented
https://github.com/tamingtext/book. But Did not get what i want

Can anyone please tell how to move further. It can be anything our team is
ready to do it.

Thanks,
Vivek


VelocityResponseWriter in solr

2014-06-18 Thread Vivekanand Ittigi
Hi,

I want to use VelocityResponseWriter in solr.

I've indexed a website( for example http://www.biginfolabs.com/). If i type
a query
http://localhost:8983/solr/collection1/select?q=santhoswt=xmlindent=true
http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlindent=true
I will get all the fields related to that document (content,host,title,url
etc) but if i put the query in velocity
http://localhost:8983/solr/collection1/browse?q=santhosh i will see only 3
fields(id,url,content) instead of all other fields.

How can i display all the fields??

This is in solrconfig.xml

requestHandler name=/browse class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str

   !-- VelocityResponseWriter settings --
   str name=wtvelocity/str
   str name=v.templatebrowse/str
   str name=v.layoutlayout/str
   str name=titleSolritas/str

   !-- Query settings --
   str name=defTypeedismax/str
   str name=qf
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
  title^10.0 description^5.0 keywords^5.0 author^2.0
resourcename^1.0
   /str
   str name=dftext/str
   str name=mm100%/str
   str name=q.alt*:*/str
   str name=rows10/str
   str name=fl*,score/str

   str name=mlt.qf
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
   /str
   str
name=mlt.fltext,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename/str
   int name=mlt.count3/int

   !-- Faceting defaults --
   str name=faceton/str
   str name=facet.fieldcat/str
   str name=facet.fieldmanu_exact/str
   str name=facet.fieldcontent_type/str
   str name=facet.fieldauthor_s/str
   str name=facet.queryipod/str
   str name=facet.queryGB/str
   str name=facet.mincount1/str
   str name=facet.pivotcat,inStock/str
   str name=facet.range.otherafter/str
   str name=facet.rangeprice/str
   int name=f.price.facet.range.start0/int
   int name=f.price.facet.range.end600/int
   int name=f.price.facet.range.gap50/int
   str name=facet.rangepopularity/str
   int name=f.popularity.facet.range.start0/int
   int name=f.popularity.facet.range.end10/int
   int name=f.popularity.facet.range.gap3/int
   str name=facet.rangemanufacturedate_dt/str
   str
name=f.manufacturedate_dt.facet.range.startNOW/YEAR-10YEARS/str
   str name=f.manufacturedate_dt.facet.range.endNOW/str
   str name=f.manufacturedate_dt.facet.range.gap+1YEAR/str
   str name=f.manufacturedate_dt.facet.range.otherbefore/str
   str name=f.manufacturedate_dt.facet.range.otherafter/str

   !-- Highlighting defaults --
   str name=hlon/str
   str name=hl.flcontent features title name/str
   str name=hl.encoderhtml/str
   str name=hl.simple.prelt;bgt;/str
   str name=hl.simple.postlt;/bgt;/str
   str name=f.title.hl.fragsize0/str
   str name=f.title.hl.alternateFieldtitle/str
   str name=f.name.hl.fragsize0/str
   str name=f.name.hl.alternateFieldname/str
   str name=f.content.hl.snippets3/str
   str name=f.content.hl.fragsize200/str
   str name=f.content.hl.alternateFieldcontent/str
   str name=f.content.hl.maxAlternateFieldLength750/str

   !-- Spell checking defaults --
   str name=spellcheckon/str
   str name=spellcheck.extendedResultsfalse/str
   str name=spellcheck.count5/str
   str name=spellcheck.alternativeTermCount2/str
   str name=spellcheck.maxResultsForSuggest5/str
   str name=spellcheck.collatetrue/str
   str name=spellcheck.collateExtendedResultstrue/str
   str name=spellcheck.maxCollationTries5/str
   str name=spellcheck.maxCollations3/str
 /lst

 !-- append spellchecking to our list of components --
 arr name=last-components
   strspellcheck/str
 /arr
  /requestHandler


Thanks,
Vivek


Re: Implementing Hive query in Solr

2014-06-15 Thread Vivekanand Ittigi
Hi Erick,

We are actually comparing the speed of search. We are trying to run this
few hive queries in solr. We are if we can implement this in solr
definitely we can migrate our system into solr.

Can you please look at this issue also
http://stackoverflow.com/questions/24202798/sum-and-groupby-in-solr

Here we are removing collect_set() concept.

Thanks,
Vivek


On Thu, Jun 12, 2014 at 7:57 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Any time I see a question like this I break out in hives (little pun
 there).
 Solr is _not_ a replacement for Hive. Or any other SQL or SQL-like
 engine. Trying to make it into one is almost always a mistake. First I'd
 ask
 why you have to form this query.

 Now, while I have very little knowledge of HIve, collect_set removes
 duplicates
 Why do you have duplicates in the first place?

 Best,
 Erick

 On Thu, Jun 12, 2014 at 7:12 AM, Vivekanand Ittigi
 vi...@biginfolabs.com wrote:
  Hi,
 
  Can anyone please look into this issue. I want to implement this query in
  solr.
 
  Thanks,
  Vivek
 
  -- Forwarded message --
  From: Vivekanand Ittigi vi...@biginfolabs.com
  Date: Thu, Jun 12, 2014 at 11:08 AM
  Subject: Implementing Hive query in Solr
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 
 
  Hi,
 
  My requirements is to execute this query(hive) in solr:
 
  select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market,
  collect_set(primary_cause) from bil_tos Where skuType='Product' group by
  RiskType,market;
 
  I can implement sum and groupBy operations in solr using StatsComponent
  concept but i've no idea to implement collect_set() in solr.
 
  Collect_set() is used in Hive queries.
  Please provide me equivalent function for collect_set in solr or links or
  how to achieve it. It'd be a great help.
 
 
  Thanks,
  Vivek



SUM and groupBy in solr

2014-06-13 Thread Vivekanand Ittigi
Hi,

How to execute this query:

select SUM(Primary_cause_vaR),
RiskType,market from bil_tos Where skuType='Product' group by
RiskType,market;

I've used http://wiki.apache.org/solr/StatsComponent for this:

* I see only sum with respective groupBy fields but i want to see RiskType,
market fields also in the result

Thanks,
Vivek


Fwd: Implementing Hive query in Solr

2014-06-12 Thread Vivekanand Ittigi
Hi,

Can anyone please look into this issue. I want to implement this query in
solr.

Thanks,
Vivek

-- Forwarded message --
From: Vivekanand Ittigi vi...@biginfolabs.com
Date: Thu, Jun 12, 2014 at 11:08 AM
Subject: Implementing Hive query in Solr
To: solr-user@lucene.apache.org solr-user@lucene.apache.org


Hi,

My requirements is to execute this query(hive) in solr:

select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market,
collect_set(primary_cause) from bil_tos Where skuType='Product' group by
RiskType,market;

I can implement sum and groupBy operations in solr using StatsComponent
concept but i've no idea to implement collect_set() in solr.

Collect_set() is used in Hive queries.
Please provide me equivalent function for collect_set in solr or links or
how to achieve it. It'd be a great help.


Thanks,
Vivek


Implementing Hive query in Solr

2014-06-11 Thread Vivekanand Ittigi
Hi,

My requirements is to execute this query(hive) in solr:

select SUM(Primary_cause_vaR),collect_set(skuType),RiskType,market,
collect_set(primary_cause) from bil_tos Where skuType='Product' group by
RiskType,market;

I can implement sum and groupBy operations in solr using StatsComponent
concept but i've no idea to implement collect_set() in solr.

Collect_set() is used in Hive queries.
Please provide me equivalent function for collect_set in solr or links or
how to achieve it. It'd be a great help.


Thanks,
Vivek


Re: Integrate solr with openNLP

2014-06-09 Thread Vivekanand Ittigi
Hi Aman,

Yeah, We are also thinking the same. Using UIMA is better. And thanks to
everyone. You guys really showed us the way(UIMA).

We'll work on it.

Thanks,
Vivek


On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon amantandon...@gmail.com wrote:

 Hi Vikek,

 As everybody in the mail list mentioned to use UIMA you should go for it,
 as opennlp issues are not tracking properly, it can make stuck your
 development in near future if any issue comes, so its better to start
 investigate with uima.


 With Regards
 Aman Tandon


 On Fri, Jun 6, 2014 at 11:00 AM, Vivekanand Ittigi vi...@biginfolabs.com
 wrote:

  Can anyone pleas reply..?
 
  Thanks,
  Vivek
 
  -- Forwarded message --
  From: Vivekanand Ittigi vi...@biginfolabs.com
  Date: Wed, Jun 4, 2014 at 4:38 PM
  Subject: Re: Integrate solr with openNLP
  To: Tommaso Teofili tommaso.teof...@gmail.com
  Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org, Ahmet
  Arslan iori...@yahoo.com
 
 
  Hi Tommaso,
 
  Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm
  trying to apply named recognition(person name) token but im not seeing
 any
  change. my schema.xml looks like this:
 
  field name=text type=text_opennlp_pos_ner indexed=true
 stored=true
  multiValued=true/
 
  fieldType name=text_opennlp_pos_ner class=solr.TextField
  positionIncrementGap=100
analyzer
  tokenizer class=solr.OpenNLPTokenizerFactory
tokenizerModel=opennlp/en-token.bin
  /
  filter class=solr.OpenNLPFilterFactory
nerTaggerModels=opennlp/en-ner-person.bin
  /
  filter class=solr.LowerCaseFilterFactory/
/analyzer
 
  /fieldType
 
  Please guide..?
 
  Thanks,
  Vivek
 
 
  On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili 
 tommaso.teof...@gmail.com
  
  wrote:
 
   Hi all,
  
   Ahment was suggesting to eventually use UIMA integration because
 OpenNLP
   has already an integration with Apache UIMA and so you would just have
 to
   use that [1].
   And that's one of the main reason UIMA integration was done: it's a
   framework that you can easily hook into in order to plug your NLP
  algorithm.
  
   If you want to just use OpenNLP then it's up to you if either write
 your
   own UpdateRequestProcessor plugin [2] to add metadata extracted by
  OpenNLP
   to your documents or either you can write a dedicated analyzer /
  tokenizer
   / token filter.
  
   For the OpenNLP integration (LUCENE-2899), the patch is not up to date
   with the latest APIs in trunk, however you should be able to apply it
 to
   (if I recall correctly) to 4.4 version or so, and also adapting it to
 the
   latest API shouldn't be too hard.
  
   Regards,
   Tommaso
  
   [1] :
  
 
 http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima
   [2] : http://wiki.apache.org/solr/UpdateRequestProcessor
  
  
  
   2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid:
  
   Can you extract names, locations etc using OpenNLP in plain/straight
 java
   program?
  
   If yes, here are two seperate options :
  
   1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an
   example to integrate your NER code into it and write your own indexing
   code. You have the full power here. No solr-plugins are involved.
  
   2) Use 'Implementing a conditional copyField' given here :
   http://wiki.apache.org/solr/UpdateRequestProcessor
   as an example and integrate your NER code into it.
  
  
   Please note that these are separate ways to enrich your incoming
   documents, choose either (1) or (2).
  
  
  
   On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi 
   vi...@biginfolabs.com wrote:
   Okay, but i dint understand what you said. Can you please elaborate.
  
   Thanks,
   Vivek
  
  
  
  
  
   On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
  
Hi Vivekanand,
   
I have never use UIMA+Solr before.
   
Personally I think it takes more time to learn how to configure/use
   these
uima stuff.
   
   
If you are familiar with java, write a class that extends
UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new
   fields
(organisation, city, person name, etc, to your document. This phase
 is
usually called 'enrichment'.
   
Does that makes sense?
   
   
   
On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi 
   vi...@biginfolabs.com
wrote:
Hi Ahmet,
   
I followed what you said
https://cwiki.apache.org/confluence/display/solr/UIMA+Integration.
  But
   how
can i achieve my goal? i mean extracting only name of the
 organization
   or
person from the content field.
   
I guess i'm almost there but something is missing? please guide me
   
Thanks,
Vivek
   
   
   
   
   
On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi 
   vi...@biginfolabs.com
wrote:
   
 Entire goal cant be said but one of those tasks can be like

Fwd: Integrate solr with openNLP

2014-06-05 Thread Vivekanand Ittigi
Can anyone pleas reply..?

Thanks,
Vivek

-- Forwarded message --
From: Vivekanand Ittigi vi...@biginfolabs.com
Date: Wed, Jun 4, 2014 at 4:38 PM
Subject: Re: Integrate solr with openNLP
To: Tommaso Teofili tommaso.teof...@gmail.com
Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org, Ahmet
Arslan iori...@yahoo.com


Hi Tommaso,

Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm
trying to apply named recognition(person name) token but im not seeing any
change. my schema.xml looks like this:

field name=text type=text_opennlp_pos_ner indexed=true stored=true
multiValued=true/

fieldType name=text_opennlp_pos_ner class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.OpenNLPTokenizerFactory
  tokenizerModel=opennlp/en-token.bin
/
filter class=solr.OpenNLPFilterFactory
  nerTaggerModels=opennlp/en-ner-person.bin
/
filter class=solr.LowerCaseFilterFactory/
  /analyzer

/fieldType

Please guide..?

Thanks,
Vivek


On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili tommaso.teof...@gmail.com
wrote:

 Hi all,

 Ahment was suggesting to eventually use UIMA integration because OpenNLP
 has already an integration with Apache UIMA and so you would just have to
 use that [1].
 And that's one of the main reason UIMA integration was done: it's a
 framework that you can easily hook into in order to plug your NLP algorithm.

 If you want to just use OpenNLP then it's up to you if either write your
 own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP
 to your documents or either you can write a dedicated analyzer / tokenizer
 / token filter.

 For the OpenNLP integration (LUCENE-2899), the patch is not up to date
 with the latest APIs in trunk, however you should be able to apply it to
 (if I recall correctly) to 4.4 version or so, and also adapting it to the
 latest API shouldn't be too hard.

 Regards,
 Tommaso

 [1] :
 http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima
 [2] : http://wiki.apache.org/solr/UpdateRequestProcessor



 2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid:

 Can you extract names, locations etc using OpenNLP in plain/straight java
 program?

 If yes, here are two seperate options :

 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an
 example to integrate your NER code into it and write your own indexing
 code. You have the full power here. No solr-plugins are involved.

 2) Use 'Implementing a conditional copyField' given here :
 http://wiki.apache.org/solr/UpdateRequestProcessor
 as an example and integrate your NER code into it.


 Please note that these are separate ways to enrich your incoming
 documents, choose either (1) or (2).



 On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com wrote:
 Okay, but i dint understand what you said. Can you please elaborate.

 Thanks,
 Vivek





 On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi Vivekanand,
 
  I have never use UIMA+Solr before.
 
  Personally I think it takes more time to learn how to configure/use
 these
  uima stuff.
 
 
  If you are familiar with java, write a class that extends
  UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new
 fields
  (organisation, city, person name, etc, to your document. This phase is
  usually called 'enrichment'.
 
  Does that makes sense?
 
 
 
  On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com
  wrote:
  Hi Ahmet,
 
  I followed what you said
  https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But
 how
  can i achieve my goal? i mean extracting only name of the organization
 or
  person from the content field.
 
  I guess i'm almost there but something is missing? please guide me
 
  Thanks,
  Vivek
 
 
 
 
 
  On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com
  wrote:
 
   Entire goal cant be said but one of those tasks can be like this.. we
  have
   big document(can be website or pdf etc) indexed to the solr.
   Lets say field name=content will sore store the contents of
 document.
   All i want to do is pick name of persons,places from it using openNLP
 or
   some other means.
  
   Those names should be reflected in solr itself.
  
   Thanks,
   Vivek
  
  
   On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
  
   Hi,
  
   Please tell us what you are trying to in a new treat. Your high level
   goal. There may be some other ways/tools such as (
   https://stanbol.apache.org ) other than OpenNLP.
  
  
  
   On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi 
   vi...@biginfolabs.com wrote:
  
  
  
   We'll surely look into UIMA integration.
  
   But before moving, is this( https://wiki.apache.org/solr/OpenNLP )
 the
   only link we've got to integrate?isn't there any other article or
 link
   which may help us to do

Re: Integrate solr with openNLP

2014-06-04 Thread Vivekanand Ittigi
Hi Tommaso,

Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm
trying to apply named recognition(person name) token but im not seeing any
change. my schema.xml looks like this:

field name=text type=text_opennlp_pos_ner indexed=true stored=true
multiValued=true/

fieldType name=text_opennlp_pos_ner class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.OpenNLPTokenizerFactory
  tokenizerModel=opennlp/en-token.bin
/
filter class=solr.OpenNLPFilterFactory
  nerTaggerModels=opennlp/en-ner-person.bin
/
filter class=solr.LowerCaseFilterFactory/
  /analyzer

/fieldType

Please guide..?

Thanks,
Vivek


On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili tommaso.teof...@gmail.com
wrote:

 Hi all,

 Ahment was suggesting to eventually use UIMA integration because OpenNLP
 has already an integration with Apache UIMA and so you would just have to
 use that [1].
 And that's one of the main reason UIMA integration was done: it's a
 framework that you can easily hook into in order to plug your NLP algorithm.

 If you want to just use OpenNLP then it's up to you if either write your
 own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP
 to your documents or either you can write a dedicated analyzer / tokenizer
 / token filter.

 For the OpenNLP integration (LUCENE-2899), the patch is not up to date
 with the latest APIs in trunk, however you should be able to apply it to
 (if I recall correctly) to 4.4 version or so, and also adapting it to the
 latest API shouldn't be too hard.

 Regards,
 Tommaso

 [1] :
 http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima
 [2] : http://wiki.apache.org/solr/UpdateRequestProcessor



 2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid:

 Can you extract names, locations etc using OpenNLP in plain/straight java
 program?

 If yes, here are two seperate options :

 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an
 example to integrate your NER code into it and write your own indexing
 code. You have the full power here. No solr-plugins are involved.

 2) Use 'Implementing a conditional copyField' given here :
 http://wiki.apache.org/solr/UpdateRequestProcessor
 as an example and integrate your NER code into it.


 Please note that these are separate ways to enrich your incoming
 documents, choose either (1) or (2).



 On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com wrote:
 Okay, but i dint understand what you said. Can you please elaborate.

 Thanks,
 Vivek





 On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi Vivekanand,
 
  I have never use UIMA+Solr before.
 
  Personally I think it takes more time to learn how to configure/use
 these
  uima stuff.
 
 
  If you are familiar with java, write a class that extends
  UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new
 fields
  (organisation, city, person name, etc, to your document. This phase is
  usually called 'enrichment'.
 
  Does that makes sense?
 
 
 
  On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com
  wrote:
  Hi Ahmet,
 
  I followed what you said
  https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But
 how
  can i achieve my goal? i mean extracting only name of the organization
 or
  person from the content field.
 
  I guess i'm almost there but something is missing? please guide me
 
  Thanks,
  Vivek
 
 
 
 
 
  On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com
  wrote:
 
   Entire goal cant be said but one of those tasks can be like this.. we
  have
   big document(can be website or pdf etc) indexed to the solr.
   Lets say field name=content will sore store the contents of
 document.
   All i want to do is pick name of persons,places from it using openNLP
 or
   some other means.
  
   Those names should be reflected in solr itself.
  
   Thanks,
   Vivek
  
  
   On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
  
   Hi,
  
   Please tell us what you are trying to in a new treat. Your high level
   goal. There may be some other ways/tools such as (
   https://stanbol.apache.org ) other than OpenNLP.
  
  
  
   On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi 
   vi...@biginfolabs.com wrote:
  
  
  
   We'll surely look into UIMA integration.
  
   But before moving, is this( https://wiki.apache.org/solr/OpenNLP )
 the
   only link we've got to integrate?isn't there any other article or
 link
   which may help us to do fix this problem.
  
   Thanks,
   Vivek
  
  
  
  
   On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan iori...@yahoo.com
 wrote:
  
   Hi,
   
   I believe I answered it. Let me re-try,
   
   There is no committed code for OpenNLP. There is an open ticket with
   patches. They may not work with current trunk.
   
   Confluence is the official

Re: Integrate solr with openNLP

2014-06-03 Thread Vivekanand Ittigi
Entire goal cant be said but one of those tasks can be like this.. we have
big document(can be website or pdf etc) indexed to the solr.
Lets say field name=content will sore store the contents of document. All
i want to do is pick name of persons,places from it using openNLP or some
other means.

Those names should be reflected in solr itself.

Thanks,
Vivek


On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Please tell us what you are trying to in a new treat. Your high level
 goal. There may be some other ways/tools such as (
 https://stanbol.apache.org ) other than OpenNLP.



 On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi vi...@biginfolabs.com
 wrote:



 We'll surely look into UIMA integration.

 But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the
 only link we've got to integrate?isn't there any other article or link
 which may help us to do fix this problem.

 Thanks,
 Vivek




 On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,
 
 I believe I answered it. Let me re-try,
 
 There is no committed code for OpenNLP. There is an open ticket with
 patches. They may not work with current trunk.
 
 Confluence is the official documentation. Wiki is maintained by
 community. Meaning wiki can talk about some uncommitted features/stuff.
 Like this one : https://wiki.apache.org/solr/OpenNLP
 
 What I am suggesting is, have a look at
 https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
 
 
 And search how to use OpenNLP inside UIMA. May be LUCENE-2899 is already
 doable with solr-uima. I am adding Tommaso (sorry for this but we need an
 authoritative answer here) to clarify this.
 
 
 Also consider indexing with SolrJ and use OpenNLP enrichment outside the
 solr. Use openNLP with plain java, enrich your documents and index them
 with SolJ. You don't have to too everything inside solr as solr-plugins.
 
 Hope this helps,
 
 Ahmet
 
 
 
 On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com wrote:
 Thanks, I will check with the jira.. but you dint answe my first
 question..? And there's no way to integrate solr with openNLP?or is there
 any committed code, using which i can go head.
 
 Thanks,
 Vivek
 
 
 
 
 
 On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi,
 
  Here is the jira issue :
 https://issues.apache.org/jira/browse/LUCENE-2899
 
 
  Anyone can create an account.
 
  I didn't use UIMA by myself and I have little knowledge about it. But I
  believe it is possible to use OpenNLP inside UIMA.
  You need to dig into UIMA documentation.
 
  Solr UIMA integration already exists, thats why I questioned whether
 your
  requirement is possible with uima or not. I don't know the answer
 myself.
 
  Ahmet
 
 
 
  On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com
  wrote:
  Hi Arslan,
 
  If not uncommitted code, then which code to be used to integrate?
 
  If i have to comment my problems, which jira and how to put it?
 
  And why you are suggesting UIMA integration. My requirements is
 integrating
  with openNLP.? You mean we can do all the acitivties through UIMA as we
 do
  it using openNLP..?like name,location finder etc?
 
  Thanks,
  Vivek
 
 
 
 
 
  On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan iori...@yahoo.com.invalid
 
  wrote:
 
   Hi,
  
   Uncommitted code could have these kind of problems. It is not
 guaranteed
   to work with latest trunk.
  
   You could commend the problem you face on the jira ticket.
  
   By the way, may be you are after something doable with already
 committed
   UIMA stuff?
  
   https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
  
   Ahmet
  
  
  
   On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi 
  vi...@biginfolabs.com
   wrote:
   I followed this link to integrate
 https://wiki.apache.org/solr/OpenNLP
  to
   integrate
  
   Installation
  
   For English language testing: Until LUCENE-2899 is committed:
  
   1.pull the latest trunk or 4.0 branch
  
   2.apply the latest LUCENE-2899 patch
   3.do 'ant compile'
   cd solr/contrib/opennlp/src/test-files/training
   .
   .
   .
   i followed first two steps but got the following error while executing
  3rd
   point
  
   common.compile-core:
   [javac] Compiling 10 source files to
  
  
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java
  
   [javac] warning: [path] bad path element
  
  
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar:
   no such file or directory
  
   [javac]
  
  
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43:
   error: cannot find symbol
  
   [javac] super(Version.LUCENE_44, input);
  
   [javac]  ^
   [javac]   symbol:   variable LUCENE_44
   [javac

Re: Integrate solr with openNLP

2014-06-03 Thread Vivekanand Ittigi
Hi Ahmet,

I followed what you said
https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But how
can i achieve my goal? i mean extracting only name of the organization or
person from the content field.

I guess i'm almost there but something is missing? please guide me

Thanks,
Vivek


On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi vi...@biginfolabs.com
wrote:

 Entire goal cant be said but one of those tasks can be like this.. we have
 big document(can be website or pdf etc) indexed to the solr.
 Lets say field name=content will sore store the contents of document.
 All i want to do is pick name of persons,places from it using openNLP or
 some other means.

 Those names should be reflected in solr itself.

 Thanks,
 Vivek


 On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Please tell us what you are trying to in a new treat. Your high level
 goal. There may be some other ways/tools such as (
 https://stanbol.apache.org ) other than OpenNLP.



 On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi 
 vi...@biginfolabs.com wrote:



 We'll surely look into UIMA integration.

 But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the
 only link we've got to integrate?isn't there any other article or link
 which may help us to do fix this problem.

 Thanks,
 Vivek




 On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,
 
 I believe I answered it. Let me re-try,
 
 There is no committed code for OpenNLP. There is an open ticket with
 patches. They may not work with current trunk.
 
 Confluence is the official documentation. Wiki is maintained by
 community. Meaning wiki can talk about some uncommitted features/stuff.
 Like this one : https://wiki.apache.org/solr/OpenNLP
 
 What I am suggesting is, have a look at
 https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
 
 
 And search how to use OpenNLP inside UIMA. May be LUCENE-2899 is already
 doable with solr-uima. I am adding Tommaso (sorry for this but we need an
 authoritative answer here) to clarify this.
 
 
 Also consider indexing with SolrJ and use OpenNLP enrichment outside the
 solr. Use openNLP with plain java, enrich your documents and index them
 with SolJ. You don't have to too everything inside solr as solr-plugins.
 
 Hope this helps,
 
 Ahmet
 
 
 
 On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com wrote:
 Thanks, I will check with the jira.. but you dint answe my first
 question..? And there's no way to integrate solr with openNLP?or is there
 any committed code, using which i can go head.
 
 Thanks,
 Vivek
 
 
 
 
 
 On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi,
 
  Here is the jira issue :
 https://issues.apache.org/jira/browse/LUCENE-2899
 
 
  Anyone can create an account.
 
  I didn't use UIMA by myself and I have little knowledge about it. But I
  believe it is possible to use OpenNLP inside UIMA.
  You need to dig into UIMA documentation.
 
  Solr UIMA integration already exists, thats why I questioned whether
 your
  requirement is possible with uima or not. I don't know the answer
 myself.
 
  Ahmet
 
 
 
  On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com
  wrote:
  Hi Arslan,
 
  If not uncommitted code, then which code to be used to integrate?
 
  If i have to comment my problems, which jira and how to put it?
 
  And why you are suggesting UIMA integration. My requirements is
 integrating
  with openNLP.? You mean we can do all the acitivties through UIMA as
 we do
  it using openNLP..?like name,location finder etc?
 
  Thanks,
  Vivek
 
 
 
 
 
  On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan iori...@yahoo.com.invalid
 
  wrote:
 
   Hi,
  
   Uncommitted code could have these kind of problems. It is not
 guaranteed
   to work with latest trunk.
  
   You could commend the problem you face on the jira ticket.
  
   By the way, may be you are after something doable with already
 committed
   UIMA stuff?
  
   https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
  
   Ahmet
  
  
  
   On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi 
  vi...@biginfolabs.com
   wrote:
   I followed this link to integrate
 https://wiki.apache.org/solr/OpenNLP
  to
   integrate
  
   Installation
  
   For English language testing: Until LUCENE-2899 is committed:
  
   1.pull the latest trunk or 4.0 branch
  
   2.apply the latest LUCENE-2899 patch
   3.do 'ant compile'
   cd solr/contrib/opennlp/src/test-files/training
   .
   .
   .
   i followed first two steps but got the following error while
 executing
  3rd
   point
  
   common.compile-core:
   [javac] Compiling 10 source files to
  
  
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java
  
   [javac] warning: [path] bad path element
  
  
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3

Re: Integrate solr with openNLP

2014-06-03 Thread Vivekanand Ittigi
Okay, but i dint understand what you said. Can you please elaborate.

Thanks,
Vivek


On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Vivekanand,

 I have never use UIMA+Solr before.

 Personally I think it takes more time to learn how to configure/use these
 uima stuff.


 If you are familiar with java, write a class that extends
 UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new fields
 (organisation, city, person name, etc, to your document. This phase is
 usually called 'enrichment'.

 Does that makes sense?



 On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi vi...@biginfolabs.com
 wrote:
 Hi Ahmet,

 I followed what you said
 https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But how
 can i achieve my goal? i mean extracting only name of the organization or
 person from the content field.

 I guess i'm almost there but something is missing? please guide me

 Thanks,
 Vivek





 On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi vi...@biginfolabs.com
 wrote:

  Entire goal cant be said but one of those tasks can be like this.. we
 have
  big document(can be website or pdf etc) indexed to the solr.
  Lets say field name=content will sore store the contents of document.
  All i want to do is pick name of persons,places from it using openNLP or
  some other means.
 
  Those names should be reflected in solr itself.
 
  Thanks,
  Vivek
 
 
  On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi,
 
  Please tell us what you are trying to in a new treat. Your high level
  goal. There may be some other ways/tools such as (
  https://stanbol.apache.org ) other than OpenNLP.
 
 
 
  On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi 
  vi...@biginfolabs.com wrote:
 
 
 
  We'll surely look into UIMA integration.
 
  But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the
  only link we've got to integrate?isn't there any other article or link
  which may help us to do fix this problem.
 
  Thanks,
  Vivek
 
 
 
 
  On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi,
  
  I believe I answered it. Let me re-try,
  
  There is no committed code for OpenNLP. There is an open ticket with
  patches. They may not work with current trunk.
  
  Confluence is the official documentation. Wiki is maintained by
  community. Meaning wiki can talk about some uncommitted features/stuff.
  Like this one : https://wiki.apache.org/solr/OpenNLP
  
  What I am suggesting is, have a look at
  https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
  
  
  And search how to use OpenNLP inside UIMA. May be LUCENE-2899 is
 already
  doable with solr-uima. I am adding Tommaso (sorry for this but we need
 an
  authoritative answer here) to clarify this.
  
  
  Also consider indexing with SolrJ and use OpenNLP enrichment outside
 the
  solr. Use openNLP with plain java, enrich your documents and index them
  with SolJ. You don't have to too everything inside solr as solr-plugins.
  
  Hope this helps,
  
  Ahmet
  
  
  
  On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi 
  vi...@biginfolabs.com wrote:
  Thanks, I will check with the jira.. but you dint answe my first
  question..? And there's no way to integrate solr with openNLP?or is
 there
  any committed code, using which i can go head.
  
  Thanks,
  Vivek
  
  
  
  
  
  On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
  
   Hi,
  
   Here is the jira issue :
  https://issues.apache.org/jira/browse/LUCENE-2899
  
  
   Anyone can create an account.
  
   I didn't use UIMA by myself and I have little knowledge about it.
 But I
   believe it is possible to use OpenNLP inside UIMA.
   You need to dig into UIMA documentation.
  
   Solr UIMA integration already exists, thats why I questioned whether
  your
   requirement is possible with uima or not. I don't know the answer
  myself.
  
   Ahmet
  
  
  
   On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi 
  vi...@biginfolabs.com
   wrote:
   Hi Arslan,
  
   If not uncommitted code, then which code to be used to integrate?
  
   If i have to comment my problems, which jira and how to put it?
  
   And why you are suggesting UIMA integration. My requirements is
  integrating
   with openNLP.? You mean we can do all the acitivties through UIMA as
  we do
   it using openNLP..?like name,location finder etc?
  
   Thanks,
   Vivek
  
  
  
  
  
   On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan
 iori...@yahoo.com.invalid
  
   wrote:
  
Hi,
   
Uncommitted code could have these kind of problems. It is not
  guaranteed
to work with latest trunk.
   
You could commend the problem you face on the jira ticket.
   
By the way, may be you are after something doable with already
  committed
UIMA stuff?
   
https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
   
Ahmet
   
   
   
On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi

Unable to use OpenCalais Annotator in UIMA+solr

2014-06-03 Thread Vivekanand Ittigi
I followed this link
https://cwiki.apache.org/confluence/display/solr/UIMA+Integration to
integrate solr+uima.

I'm succeeded in integrating. SentenceAnnotation is working fine but i want
use openCalasis annotation so that i can fetch person, place,organization
name. Nowhere its mentioned about which annotation is used as
org.apache.uima.SentenceAnnotation
is used for producing sentences.

Please guide me which annotation to be used?

Thanks,
Vivek


Integrate solr with openNLP

2014-06-02 Thread Vivekanand Ittigi
I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to
integrate

Installation

For English language testing: Until LUCENE-2899 is committed:

1.pull the latest trunk or 4.0 branch

2.apply the latest LUCENE-2899 patch
3.do 'ant compile'
cd solr/contrib/opennlp/src/test-files/training
.
.
.
i followed first two steps but got the following error while executing 3rd
point

common.compile-core:
[javac] Compiling 10 source files to
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java

[javac] warning: [path] bad path element
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar:
no such file or directory

[javac]
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43:
error: cannot find symbol

[javac] super(Version.LUCENE_44, input);

[javac]  ^
[javac]   symbol:   variable LUCENE_44
[javac]   location: class Version
[javac]
/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56:
error: no suitable constructor found for Tokenizer(Reader)
[javac] super(input);
[javac] ^
[javac] constructor Tokenizer.Tokenizer(AttributeFactory) is not
applicable
[javac]   (actual argument Reader cannot be converted to
AttributeFactory by method invocation conversion)
[javac] constructor Tokenizer.Tokenizer() is not applicable
[javac]   (actual and formal argument lists differ in length)
[javac] 2 errors
[javac] 1 warning

Im really stuck how to passthough this step. I wasted my entire to fix this
but couldn't move a bit. Please someone help me..?

Thanks,
Vivek


Re: Integrate solr with openNLP

2014-06-02 Thread Vivekanand Ittigi
Hi Arslan,

If not uncommitted code, then which code to be used to integrate?

If i have to comment my problems, which jira and how to put it?

And why you are suggesting UIMA integration. My requirements is integrating
with openNLP.? You mean we can do all the acitivties through UIMA as we do
it using openNLP..?like name,location finder etc?

Thanks,
Vivek


On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi,

 Uncommitted code could have these kind of problems. It is not guaranteed
 to work with latest trunk.

 You could commend the problem you face on the jira ticket.

 By the way, may be you are after something doable with already committed
 UIMA stuff?

 https://cwiki.apache.org/confluence/display/solr/UIMA+Integration

 Ahmet



 On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi vi...@biginfolabs.com
 wrote:
 I followed this link to integrate https://wiki.apache.org/solr/OpenNLP to
 integrate

 Installation

 For English language testing: Until LUCENE-2899 is committed:

 1.pull the latest trunk or 4.0 branch

 2.apply the latest LUCENE-2899 patch
 3.do 'ant compile'
 cd solr/contrib/opennlp/src/test-files/training
 .
 .
 .
 i followed first two steps but got the following error while executing 3rd
 point

 common.compile-core:
 [javac] Compiling 10 source files to

 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java

 [javac] warning: [path] bad path element

 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar:
 no such file or directory

 [javac]

 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43:
 error: cannot find symbol

 [javac] super(Version.LUCENE_44, input);

 [javac]  ^
 [javac]   symbol:   variable LUCENE_44
 [javac]   location: class Version
 [javac]

 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56:
 error: no suitable constructor found for Tokenizer(Reader)
 [javac] super(input);
 [javac] ^
 [javac] constructor Tokenizer.Tokenizer(AttributeFactory) is not
 applicable
 [javac]   (actual argument Reader cannot be converted to
 AttributeFactory by method invocation conversion)
 [javac] constructor Tokenizer.Tokenizer() is not applicable
 [javac]   (actual and formal argument lists differ in length)
 [javac] 2 errors
 [javac] 1 warning

 Im really stuck how to passthough this step. I wasted my entire to fix this
 but couldn't move a bit. Please someone help me..?

 Thanks,
 Vivek




Re: Integrate solr with openNLP

2014-06-02 Thread Vivekanand Ittigi
Thanks, I will check with the jira.. but you dint answe my first
question..? And there's no way to integrate solr with openNLP?or is there
any committed code, using which i can go head.

Thanks,
Vivek


On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Here is the jira issue : https://issues.apache.org/jira/browse/LUCENE-2899


 Anyone can create an account.

 I didn't use UIMA by myself and I have little knowledge about it. But I
 believe it is possible to use OpenNLP inside UIMA.
 You need to dig into UIMA documentation.

 Solr UIMA integration already exists, thats why I questioned whether your
 requirement is possible with uima or not. I don't know the answer myself.

 Ahmet



 On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi vi...@biginfolabs.com
 wrote:
 Hi Arslan,

 If not uncommitted code, then which code to be used to integrate?

 If i have to comment my problems, which jira and how to put it?

 And why you are suggesting UIMA integration. My requirements is integrating
 with openNLP.? You mean we can do all the acitivties through UIMA as we do
 it using openNLP..?like name,location finder etc?

 Thanks,
 Vivek





 On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan iori...@yahoo.com.invalid
 wrote:

  Hi,
 
  Uncommitted code could have these kind of problems. It is not guaranteed
  to work with latest trunk.
 
  You could commend the problem you face on the jira ticket.
 
  By the way, may be you are after something doable with already committed
  UIMA stuff?
 
  https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
 
  Ahmet
 
 
 
  On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com
  wrote:
  I followed this link to integrate https://wiki.apache.org/solr/OpenNLP
 to
  integrate
 
  Installation
 
  For English language testing: Until LUCENE-2899 is committed:
 
  1.pull the latest trunk or 4.0 branch
 
  2.apply the latest LUCENE-2899 patch
  3.do 'ant compile'
  cd solr/contrib/opennlp/src/test-files/training
  .
  .
  .
  i followed first two steps but got the following error while executing
 3rd
  point
 
  common.compile-core:
  [javac] Compiling 10 source files to
 
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java
 
  [javac] warning: [path] bad path element
 
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar:
  no such file or directory
 
  [javac]
 
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43:
  error: cannot find symbol
 
  [javac] super(Version.LUCENE_44, input);
 
  [javac]  ^
  [javac]   symbol:   variable LUCENE_44
  [javac]   location: class Version
  [javac]
 
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56:
  error: no suitable constructor found for Tokenizer(Reader)
  [javac] super(input);
  [javac] ^
  [javac] constructor Tokenizer.Tokenizer(AttributeFactory) is not
  applicable
  [javac]   (actual argument Reader cannot be converted to
  AttributeFactory by method invocation conversion)
  [javac] constructor Tokenizer.Tokenizer() is not applicable
  [javac]   (actual and formal argument lists differ in length)
  [javac] 2 errors
  [javac] 1 warning
 
  Im really stuck how to passthough this step. I wasted my entire to fix
 this
  but couldn't move a bit. Please someone help me..?
 
  Thanks,
  Vivek
 
 




Re: Integrate solr with openNLP

2014-06-02 Thread Vivekanand Ittigi
We'll surely look into UIMA integration.

But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the only
link we've got to integrate?isn't there any other article or link which may
help us to do fix this problem.

Thanks,
Vivek


On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 I believe I answered it. Let me re-try,

 There is no committed code for OpenNLP. There is an open ticket with
 patches. They may not work with current trunk.

 Confluence is the official documentation. Wiki is maintained by community.
 Meaning wiki can talk about some uncommitted features/stuff. Like this one
 : https://wiki.apache.org/solr/OpenNLP

 What I am suggesting is, have a look at
 https://cwiki.apache.org/confluence/display/solr/UIMA+Integration


 And search how to use OpenNLP inside UIMA. May be LUCENE-2899 is already
 doable with solr-uima. I am adding Tommaso (sorry for this but we need an
 authoritative answer here) to clarify this.


 Also consider indexing with SolrJ and use OpenNLP enrichment outside the
 solr. Use openNLP with plain java, enrich your documents and index them
 with SolJ. You don't have to too everything inside solr as solr-plugins.

 Hope this helps,

 Ahmet


 On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi vi...@biginfolabs.com
 wrote:
 Thanks, I will check with the jira.. but you dint answe my first
 question..? And there's no way to integrate solr with openNLP?or is there
 any committed code, using which i can go head.

 Thanks,
 Vivek





 On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  Here is the jira issue :
 https://issues.apache.org/jira/browse/LUCENE-2899
 
 
  Anyone can create an account.
 
  I didn't use UIMA by myself and I have little knowledge about it. But I
  believe it is possible to use OpenNLP inside UIMA.
  You need to dig into UIMA documentation.
 
  Solr UIMA integration already exists, thats why I questioned whether your
  requirement is possible with uima or not. I don't know the answer myself.
 
  Ahmet
 
 
 
  On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi 
 vi...@biginfolabs.com
  wrote:
  Hi Arslan,
 
  If not uncommitted code, then which code to be used to integrate?
 
  If i have to comment my problems, which jira and how to put it?
 
  And why you are suggesting UIMA integration. My requirements is
 integrating
  with openNLP.? You mean we can do all the acitivties through UIMA as we
 do
  it using openNLP..?like name,location finder etc?
 
  Thanks,
  Vivek
 
 
 
 
 
  On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan iori...@yahoo.com.invalid
  wrote:
 
   Hi,
  
   Uncommitted code could have these kind of problems. It is not
 guaranteed
   to work with latest trunk.
  
   You could commend the problem you face on the jira ticket.
  
   By the way, may be you are after something doable with already
 committed
   UIMA stuff?
  
   https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
  
   Ahmet
  
  
  
   On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi 
  vi...@biginfolabs.com
   wrote:
   I followed this link to integrate https://wiki.apache.org/solr/OpenNLP
  to
   integrate
  
   Installation
  
   For English language testing: Until LUCENE-2899 is committed:
  
   1.pull the latest trunk or 4.0 branch
  
   2.apply the latest LUCENE-2899 patch
   3.do 'ant compile'
   cd solr/contrib/opennlp/src/test-files/training
   .
   .
   .
   i followed first two steps but got the following error while executing
  3rd
   point
  
   common.compile-core:
   [javac] Compiling 10 source files to
  
  
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java
  
   [javac] warning: [path] bad path element
  
  
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar:
   no such file or directory
  
   [javac]
  
  
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43:
   error: cannot find symbol
  
   [javac] super(Version.LUCENE_44, input);
  
   [javac]  ^
   [javac]   symbol:   variable LUCENE_44
   [javac]   location: class Version
   [javac]
  
  
 
 /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56:
   error: no suitable constructor found for Tokenizer(Reader)
   [javac] super(input);
   [javac] ^
   [javac] constructor Tokenizer.Tokenizer(AttributeFactory) is
 not
   applicable
   [javac]   (actual argument Reader cannot be converted to
   AttributeFactory by method invocation conversion)
   [javac] constructor Tokenizer.Tokenizer() is not applicable
   [javac]   (actual and formal argument lists differ in length)
   [javac] 2 errors
   [javac] 1 warning
  
   Im really stuck how