dynamic field sorting

2017-03-20 Thread Midas A
Hi ,

How can i improve the performance of dynamic field sorting .

index size is : 20 GB

Regards,
Midas


Re: Storing index of different collections in different location

2017-03-20 Thread Zheng Lin Edwin Yeo
Hi Mikhail,

Thanks for the information.
I'll try it out.

Regards,
Edwin


On 20 March 2017 at 02:50, Mikhail Khludnev  wrote:

> Hello, Edwin.
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#
> CollectionsAPI-CREATESHARD:CreateaShard
> mentions
> property.*name*=*value* string No Set core property *name* to *value*. See
> the section Defining core.properties
> 
> for
> details on supported properties and values.
> https://cwiki.apache.org/confluence/display/solr/Defining+core.properties
> mentions
>
> dataDir
>
> The core's data directory (where indexes are stored) as either an absolute
> pathname, or a path relative to the value of instanceDir.  This is data by
> default.
> Probably you can adjust index location if you create shards manually.
>
> On Sun, Mar 19, 2017 at 5:46 PM, Zheng Lin Edwin Yeo  >
> wrote:
>
> > Hi,
> >
> > Is it possible to store the index of different collections of the same
> > shard under different directory or even different hard disk?
> >
> > For example, I want to store the index of collection1 in C drive, and the
> > index of collection2 in D drive.
> >
> > I'm using SolrCloud in Solr 6.4.2
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: SOLR Data Locality

2017-03-20 Thread Shawn Heisey
On 3/17/2017 11:14 AM, Imad Qureshi wrote:
> I understand that but unfortunately that's not an option right now. We 
> already have 16 TB of index in HDFS. 
>
> So let me rephrase this question. How important is data locality for SOLR. Is 
> performance impacted if SOLR data is on a remote node?

What's going to matter is how fast the data can be retrieved.  With
standard local filesystems, the operating system will use unallocated
memory to cache the data, so if you have enough available memory for
that caching to be effective, access is lightning fast -- the most
requested index data will be in memory, and pulled directly from there
into the application.  If the disk has to be read to obtain the needed
data, it will be slow.  If data has to be transferred over a network
that's gigabit or slower, that is also slow.  Faster network
technologies are available for a price premium, but if a disk has to be
read to get the data, the network speed won't matter.  Good performance
means avoiding going to the disk or transferring over the network.

SSD storage is faster than regular disks, but still not as fast as main
memory, and increased storage speed probably won't matter if the network
can't keep up.

If I'm not mistaken, I think an HDFS client can allocate system memory
for caching purposes to avoid the slow transfer for frequently requested
data.  If my understanding is correct, then enough memory allocated to
the HDFS client MIGHT avoid network/disk transfer for the important data
in the index ... but whether this works in practice is a question I
cannot answer.

Unless your 16TB of index data is being utilized by MANY Solr servers
that each use a very small part of the data and have the ability to
cache a significant percentage of the data they're using, it's highly
unlikely that you're going to have enough memory for good caching. 
Indexes that large are typically slow unless you can afford a LOT of
hardware, which means a lot of money.

Thanks,
Shawn



Re: model building

2017-03-20 Thread Joel Bernstein
I've only tested with the training data in it's own collection, but it was
designed for multiple training sets in the same collection.

I suspect you're training set is too small to get a reliable model from.
The training sets we tested with were considerably larger.

All the idfs_ds values being the same seems odd though. The idfs_ds in
particular were designed to be accurate when there are multiple training
sets in the same collection.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Mar 20, 2017 at 5:41 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> If I put the training data into its own collection and use q="*:*", then
> it works correctly.  Is that a requirement?
> Thank you.
>
> -Joe
>
>
>
> On 3/20/2017 3:47 PM, Joe Obernberger wrote:
>
>> I'm trying to build a model using tweets.  I've manually tagged 30 tweets
>> as threatening, and 50 random tweets as non-threatening.  When I build the
>> mode with:
>>
>> update(models2, batchSize="50",
>>  train(UNCLASS,
>>   features(UNCLASS,
>>  q="ProfileID:PROFCLUST1",
>>  featureSet="threatFeatures3",
>>  field="ClusterText",
>>  outcome="out_i",
>>  positiveLabel=1,
>>  numTerms=250),
>>   q="ProfileID:PROFCLUST1",
>>   name="threatModel3",
>>   field="ClusterText",
>>   outcome="out_i",
>>   maxIterations="100"))
>>
>> It appears to work, but all the idfs_ds values are identical. The
>> terms_ss values look reasonable, but nearly all the weights_ds are 1.0.
>> For out_i it is either -1 for non-threatening tweets, and +1 for
>> threatening tweets.  I'm trying to follow along with Joel Bernstein's
>> excellent post here:
>> http://joelsolr.blogspot.com/2017/01/deploying-ai-alerting-s
>> ystem-with-solrs.html
>>
>> Tips?
>>
>> Thank you!
>>
>> -Joe
>>
>>
>


Re: model building

2017-03-20 Thread Joe Obernberger
If I put the training data into its own collection and use q="*:*", then 
it works correctly.  Is that a requirement?

Thank you.

-Joe


On 3/20/2017 3:47 PM, Joe Obernberger wrote:
I'm trying to build a model using tweets.  I've manually tagged 30 
tweets as threatening, and 50 random tweets as non-threatening.  When 
I build the mode with:


update(models2, batchSize="50",
 train(UNCLASS,
  features(UNCLASS,
 q="ProfileID:PROFCLUST1",
 featureSet="threatFeatures3",
 field="ClusterText",
 outcome="out_i",
 positiveLabel=1,
 numTerms=250),
  q="ProfileID:PROFCLUST1",
  name="threatModel3",
  field="ClusterText",
  outcome="out_i",
  maxIterations="100"))

It appears to work, but all the idfs_ds values are identical. The 
terms_ss values look reasonable, but nearly all the weights_ds are 
1.0.  For out_i it is either -1 for non-threatening tweets, and +1 for 
threatening tweets.  I'm trying to follow along with Joel Bernstein's 
excellent post here:
http://joelsolr.blogspot.com/2017/01/deploying-ai-alerting-system-with-solrs.html 



Tips?

Thank you!

-Joe





Re: ChildDocTransformerFactory and returning only parents with children

2017-03-20 Thread David Kramer
I’ll be honest I didn’t understand most of what you wrote (like I said we’re 
just getting started with this).  We will most certainly need to do faceted 
search in future iterations so thanks for the “json.facets” reference.  And I 
do understand that the ChildDocTransformer is really for controlling what gets 
output and not for finding or filtering rows.

Your answer started me thinking about solving different parts of the problem in 
different parts of the query.  I got something that works now:
   q=title:"Under Armour" OR description:"Under Armour"
fq={!parent which=docType:Product}color:*Blue*
   fl=title, description, brand,id,[child parentFilter="docType:Product" 
childFilter="color:*Blue*"]  
This does show me only Under Armor products with blue items, and returns just 
the blue items nested inside the products.  That will work. There may be a more 
efficient/direct way of doing it, but at least we can move forward.  Is this a 
good approach?

With respect to multiple levels, it’s not a matter of trying to query more than 
two nested documents deep, it’s a matter of I haven’t seen a single example of 
how to query more than two levels.  The documentation and every example I found 
for ChildDocTransformer and Block Join just show parents and children.  A few 
hours ago Mikhail graciously send me a link off-list to an article that 
basically says grandchildren are children too so you can search/filter on them 
as if they were children, and I understood most of it. Will have to dig into it 
more.

Thanks!

On 3/20/17, 1:20 PM, "Alexandre Rafalovitch"  wrote:

You should be able to nest things multiple levels deep. What happens
when you try?

For trying to find parents where children satisfy some criteria,
[child] result transformer is probably a bit later. You may want to
look into json.facets instead and search against children with
shifting domain up to parents after. Then, you also do the [child]
transformer to get the expanded children (if you need them).

Regards,
   Alex.



http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 20 March 2017 at 11:58, David Kramer  wrote:
> Hi.  We’re just ramping up a product search engine for our eCommerce 
site, so this is all new development and we are slowly building up our Solr 
knowledgebase, so thanks in advance for any guidance.
>
> Our catalog (mostly shoes and apparel) has three objects nested: Products 
(title, description, etc), items (color, price, etc), and SKU (size, etc).  
Since Solr doesn’t do documents nested three deep, the SKUs and items both get 
retrieved as children of products.  That has not bit us yet…  Also, our search 
results page expects a list of Item objects, then groups them (rolls them up) 
by their parent object.  Right now we are returning just the items, and that’s 
great, but we want to implement pagination of the products, so we need to 
return the items nested in products, then paginate on the products.
>
> If I send ‘q=docType:Product description:Armour=title, 
description,id,[child parentFilter="docType:Product" 
childFilter="docType:Item"]’ I get a nice list of products with items nested 
inside them. Woot.
>
> The problem is, if we want to filter on item attributes, I get back 
products that have no children, which means we can’t paginate on the results if 
we remove those parents.  For instance, send ‘q=docType:Product 
description:Armour=title, description,id,[child 
parentFilter="docType:Product" childFilter="docType:Item AND price:49.99"]’, we 
get the products and their items nicely nested, and only items with a price of 
49.99 are shown, but so are parents that have no matching items.
>
> How can I build a query that will not return parents without children? I 
haven’t figured out a way to reference the children in the query.
>
> Since we’re not in production yet, I can change lots of things here.  I 
would PREFER not to denormalize the documents into one document per SKU with 
all the item and product information too, as our catalog is quite large and 
that would lead to a huge import file and lots of duplicated content between 
documents in the index.  If that’s the only way, though, it is possible.
>
> Thanks in advance.




Solr Split not working

2017-03-20 Thread Azazel K
Hi, We have a solr index running in 4.5.0 that we are trying to upgrade to 
4.7.2 and split the shard.

The uniqueKey is a TrieLongField, and it's values are always negative :

Max : -9223372035490849922
Min : -9223372036854609508


When we copy the solr 4.5.0 index to new cluster running 4.7.2 and split the 
index, the data is duplicated in both new shards.


Any ideas why?


model building

2017-03-20 Thread Joe Obernberger
I'm trying to build a model using tweets.  I've manually tagged 30 
tweets as threatening, and 50 random tweets as non-threatening.  When I 
build the mode with:


update(models2, batchSize="50",
 train(UNCLASS,
  features(UNCLASS,
 q="ProfileID:PROFCLUST1",
 featureSet="threatFeatures3",
 field="ClusterText",
 outcome="out_i",
 positiveLabel=1,
 numTerms=250),
  q="ProfileID:PROFCLUST1",
  name="threatModel3",
  field="ClusterText",
  outcome="out_i",
  maxIterations="100"))

It appears to work, but all the idfs_ds values are identical. The 
terms_ss values look reasonable, but nearly all the weights_ds are 1.0.  
For out_i it is either -1 for non-threatening tweets, and +1 for 
threatening tweets.  I'm trying to follow along with Joel Bernstein's 
excellent post here:

http://joelsolr.blogspot.com/2017/01/deploying-ai-alerting-system-with-solrs.html

Tips?

Thank you!

-Joe



Re: About editing managed-schema by hand

2017-03-20 Thread Shawn Heisey
On 3/20/2017 9:22 AM, Issei Nishigata wrote:
> Is my understanding correct that managed-schema is not limited that it
> can be modified only via Schema API, but that we usually modify it via
> Schema API, and we also can modify what Schema API can't do by
> hand-editing? Needless to say, I understand that there is an
> assumption that we do not use Schema API and hand-editing at the same
> time.

Hand-editing is perfectly acceptable if it is done right.  The reason
that editing is discouraged is because mixing it with API usage can
result in hand edits getting lost.

The online Schema API reference now includes a note about why
hand-editing is discouraged.  The managed-schema comment could include a
link to that.

Thanks,
Shawn



Re: ChildDocTransformerFactory and returning only parents with children

2017-03-20 Thread Alexandre Rafalovitch
You should be able to nest things multiple levels deep. What happens
when you try?

For trying to find parents where children satisfy some criteria,
[child] result transformer is probably a bit later. You may want to
look into json.facets instead and search against children with
shifting domain up to parents after. Then, you also do the [child]
transformer to get the expanded children (if you need them).

Regards,
   Alex.



http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 20 March 2017 at 11:58, David Kramer  wrote:
> Hi.  We’re just ramping up a product search engine for our eCommerce site, so 
> this is all new development and we are slowly building up our Solr 
> knowledgebase, so thanks in advance for any guidance.
>
> Our catalog (mostly shoes and apparel) has three objects nested: Products 
> (title, description, etc), items (color, price, etc), and SKU (size, etc).  
> Since Solr doesn’t do documents nested three deep, the SKUs and items both 
> get retrieved as children of products.  That has not bit us yet…  Also, our 
> search results page expects a list of Item objects, then groups them (rolls 
> them up) by their parent object.  Right now we are returning just the items, 
> and that’s great, but we want to implement pagination of the products, so we 
> need to return the items nested in products, then paginate on the products.
>
> If I send ‘q=docType:Product description:Armour=title, 
> description,id,[child parentFilter="docType:Product" 
> childFilter="docType:Item"]’ I get a nice list of products with items nested 
> inside them. Woot.
>
> The problem is, if we want to filter on item attributes, I get back products 
> that have no children, which means we can’t paginate on the results if we 
> remove those parents.  For instance, send ‘q=docType:Product 
> description:Armour=title, description,id,[child 
> parentFilter="docType:Product" childFilter="docType:Item AND price:49.99"]’, 
> we get the products and their items nicely nested, and only items with a 
> price of 49.99 are shown, but so are parents that have no matching items.
>
> How can I build a query that will not return parents without children? I 
> haven’t figured out a way to reference the children in the query.
>
> Since we’re not in production yet, I can change lots of things here.  I would 
> PREFER not to denormalize the documents into one document per SKU with all 
> the item and product information too, as our catalog is quite large and that 
> would lead to a huge import file and lots of duplicated content between 
> documents in the index.  If that’s the only way, though, it is possible.
>
> Thanks in advance.


Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-20 Thread Elodie Sannier

We have found a workaround to close the searchers checking the current
index version.
And now the SolrCore does not have many open searchers.

However, we have less unwanted deleted files references but we still
have some.

We have two collections fr_blue, fr_green with aliases:
fr -> fr_blue
fr_temp -> fr_green

The fr collection receives the queries, the fr_temp collection does not
receive the queries.

The problem occurs when we are doing the following sequence:
1- swap aliases (create alias fr -> fr_green and fr_temp -> fr_blue for
example)
2- reload collection with fr_temp alias (fr_blue for example)

We suspect that there is a problem with the reload of a collection that
has received traffic so far but doesn't receive anymore since the
aliases swap.
A problem with the increment / decrement of the searcher perhaps ?

Elodie

On 03/14/2017 06:42 PM, Shawn Heisey wrote:

On 3/14/2017 10:23 AM, Elodie Sannier wrote:

The request close() method decrements the reference count on the
searcher.

 From what I could tell, that method decrements the reference counter,
but does not actually close the searcher object.  I cannot tell you what
the correct procedure is to make sure that all resources are properly
closed at the proper time.  This might be a bug, or there might be
something missing from your code.  I do not know which.

Thanks,
Shawn




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 158 Ter Rue du Temple 75003 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


ChildDocTransformerFactory and returning only parents with children

2017-03-20 Thread David Kramer
Hi.  We’re just ramping up a product search engine for our eCommerce site, so 
this is all new development and we are slowly building up our Solr 
knowledgebase, so thanks in advance for any guidance.

Our catalog (mostly shoes and apparel) has three objects nested: Products 
(title, description, etc), items (color, price, etc), and SKU (size, etc).  
Since Solr doesn’t do documents nested three deep, the SKUs and items both get 
retrieved as children of products.  That has not bit us yet…  Also, our search 
results page expects a list of Item objects, then groups them (rolls them up) 
by their parent object.  Right now we are returning just the items, and that’s 
great, but we want to implement pagination of the products, so we need to 
return the items nested in products, then paginate on the products.

If I send ‘q=docType:Product description:Armour=title, description,id,[child 
parentFilter="docType:Product" childFilter="docType:Item"]’ I get a nice list 
of products with items nested inside them. Woot.

The problem is, if we want to filter on item attributes, I get back products 
that have no children, which means we can’t paginate on the results if we 
remove those parents.  For instance, send ‘q=docType:Product 
description:Armour=title, description,id,[child 
parentFilter="docType:Product" childFilter="docType:Item AND price:49.99"]’, we 
get the products and their items nicely nested, and only items with a price of 
49.99 are shown, but so are parents that have no matching items.

How can I build a query that will not return parents without children? I 
haven’t figured out a way to reference the children in the query.

Since we’re not in production yet, I can change lots of things here.  I would 
PREFER not to denormalize the documents into one document per SKU with all the 
item and product information too, as our catalog is quite large and that would 
lead to a huge import file and lots of duplicated content between documents in 
the index.  If that’s the only way, though, it is possible.

Thanks in advance.


Re: About editing managed-schema by hand

2017-03-20 Thread Issei Nishigata
Thank you for these information.

but I am still confusing about specification of managed-schema.

I recognize that I cannot modify "unique id" or "Similarity" by Schema API
now.
* https://issues.apache.org/jira/browse/SOLR-7242

Isn't there any other way than hand-editing in this particular case?
Do we have any other way than hand-editing?


Is my understanding correct that managed-schema is not limited that it can
be modified
only via Schema API, but that we usually modify it via Schema API, and we
also can modify
what Schema API can't do by hand-editing?

Needless to say, I understand that there is an assumption that we do not
use
Schema API and hand-editing at the same time.



Thanks,
Issei

2017-03-02 10:15 GMT+09:00 Shawn Heisey :

> 2/27/2017 4:46 AM, Issei Nishigata wrote:
> > Thank you for your reply. If I was to say which one, I'd maybe be
> > talking about the concept for Solr. I understand we should use
> > "ClassicSchemaFactory" when we want to hand-edit, but why are there
> > two files, schema.xml and managed-schema, in spite that we can
> > hand-edit managed-schema? If we can modify the schema.xml through
> > Schema API, I think we won't need the managed-schema, but is there any
> > reason why that can't be done? Could you please let me know if there
> > is any information that can clear things up around those details?
>
> The default filename with the Managed Schema factory is managed-schema
> -- no extension.  I'm pretty sure that the reason the extension was
> removed was to discourage hand-editing.  If you use both hand-editing
> and API modification, you can lose some (or maybe all) of your hand edits.
>
> The default filename for the schema with the classic factory is
> schema.xml.  With this factory, API modification is not possible.
>
> If the managed factory is in use, and a schema.xml file is found during
> startup, the system will rename managed-schema (or whatever the config
> says to use) to something else, then rename schema.xml to managed-schema
> -- basically this is a startup-only way to support a legacy config.
>
> I personally don't ever plan to use the managed schema API, but I will
> leave the default factory in place, and hand-edit managed-schema, just
> like I did in previous versions with schema.xml.
>
> Thanks,
> Shawn
>
>


Count Dates Given A Range in a Multivalued Field

2017-03-20 Thread Furkan KAMACI
Hi All,

I have a multivalued date field i.e.:

[2017-02-06T00:00:00Z,2017-02-09T00:00:00Z,2017-03-04T00:00:00Z]

I want to count how many dates exist given a data range within such field.
i.e.

start: 2017-02-01T00:00:00Z
end: 2017-02-28T00:00:00Z

result is 2 (2017-02-06T00:00:00Z and 2017-02-09T00:00:00Z). I want to do
it with JSON Facet API.

How can I do it?


Re: Exception during integration of Solr with UIMA

2017-03-20 Thread aruninfo100
Hi Tommaso,

Thanks for the reply.
In the UIMAUpdateRequestProcessor I have only OpenCalais  liscence entry and
no other entries.So I need to remove the same right.
   VALID_OPENCALAIS_KEY .

Do i need to make modifications/remove in the
OverridingParamsExtServicesAE.xml file for OpenCalais ?

Thanks and Regards,
Arun



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exception-during-integration-of-Solr-with-UIMA-tp4325897p4325910.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Exception during integration of Solr with UIMA

2017-03-20 Thread Tommaso Teofili
Hi,

the UIMA OpenCalais Annotator you're using refers to an old endpoint which
is no longer available, see log line [1].
I would suggest to simply remove the OpenCalaisAnnotator entry from your
UIMAUpdateRequestProcessor configuration in solrconfig.xml.
More generally you should put only the UIMA components you know you want to
use within the UIMAUpdateRequestProcessor configuration.
I'll log an issue to adjust the documentation there.

Regards,
Tommaso

[1] : *Caused by: java.net.UnknownHostException: api.opencalais.com*


Il giorno lun 20 mar 2017 alle ore 08:42 aruninfo100 <
arunabraham...@gmail.com> ha scritto:

> Hi All,
>
> I am trying to integrate UIMA with Solr.I am following the steps mentioned
> in https://cwiki.apache.org/confluence/display/solr/UIMA+Integration .But
> when I try to index the  documents,exceptions are thrown in terminal and
> solr log is also logged with error traces.I have been trying to work around
> for some time,but unable to get a proper solution for the issue.
>
> I am using solr 6.1.0
>
> I have included all the jars mentioned in the document.
> analyze field:
>
>  
>   content
>  
> This field hold all the extracted content(text content) from respective
> documents indexed(some are large documents).
> content field is of field type text_general.It is not a copy field.The
> field
> holds the respective document contents.
>termOffsets="true" stored="true"
>   termPositions="true" termVectors="true" multiValued="true"
> required="true"/>
>
> I have created the three fields too in the config file(referred the
> document).
>
> I have generated valid keys for the API's.  Internet connection is
> available.
>
> solrconfig.xml:
>
> 
>  class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
> 
> 
>  name="keyword_apikey">VALID_ALCHEMYAPI_KEY
>  name="concept_apikey">VALID_ALCHEMYAPI_KEY
> VALID_ALCHEMYAPI_KEY
> VALID_ALCHEMYAPI_KEY
>  name="entities_apikey">VALID_ALCHEMYAPI_KEY
> VALID_OPENCALAIS_KEY
> 
> 
> name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml
>
> true
> fileName
>
> 
> false
> 
>   content
> 
>   
>   
> 
>name="name">org.apache.uima.alchemy.ts.concept.ConceptFS
>   
> text
> concept
>   
> 
> 
>name="name">org.apache.uima.alchemy.ts.language.LanguageFS
>   
> language
> language
>   
> 
> 
>   org.apache.uima.SentenceAnnotation
>   
> coveredText
> sentence
>   
> 
>   
> 
>   
>   
>   
> 
>
>
>  
>   
> uima
>   
> 
>
>
>
> terminal error trace:
>
> Mar 19, 2017 10:46:16 AM WhitespaceTokenizer typeSystemInit
> INFO: "Whitespace tokenizer typesystem initialized"
> Mar 19, 2017 10:46:16 AM WhitespaceTokenizer process
> INFO: "Whitespace tokenizer starts processing"
> Mar 19, 2017 10:46:16 AM WhitespaceTokenizer process
> INFO: "Whitespace tokenizer finished processing"
> Mar 19, 2017 10:46:16 AM
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisE
> ngine_impl callAnalysisComponentProcess(405)
> SEVERE: Exception occurred
> org.apache.uima.analysis_engine.AnalysisEngineProcessException
> at
> org.apache.uima.annotator.calais.OpenCalaisAnnotator.process(OpenCala
> isAnnotator.java:206)
> at
> org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasA
> nnotator_ImplBase.java:56)
> at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.cal
> lAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
> at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.pro
> cessAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
> at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterato
> r.processUntilNextOutputCas(ASB_impl.java:567)
> at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterato
> r.(ASB_impl.java:409)
> at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.ja
> va:342)
> at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.pro
> cessAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
> at
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(A
> nalysisEngineImplBase.java:267)
> at
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(A
> nalysisEngineImplBase.java:280)
> at
> 

Exception during integration of Solr with UIMA

2017-03-20 Thread aruninfo100
Hi All, 

I am trying to integrate UIMA with Solr.I am following the steps mentioned
in https://cwiki.apache.org/confluence/display/solr/UIMA+Integration .But
when I try to index the  documents,exceptions are thrown in terminal and
solr log is also logged with error traces.I have been trying to work around
for some time,but unable to get a proper solution for the issue. 

I am using solr 6.1.0 

I have included all the jars mentioned in the document. 
analyze field: 

 
  content
 
This field hold all the extracted content(text content) from respective
documents indexed(some are large documents). 
content field is of field type text_general.It is not a copy field.The field
holds the respective document contents. 
  

I have created the three fields too in the config file(referred the
document). 

I have generated valid keys for the API's.  Internet connection is 
available. 

solrconfig.xml:





VALID_ALCHEMYAPI_KEY
VALID_ALCHEMYAPI_KEY
VALID_ALCHEMYAPI_KEY
VALID_ALCHEMYAPI_KEY
VALID_ALCHEMYAPI_KEY
VALID_OPENCALAIS_KEY

/org/apache/uima/desc/OverridingParamsExtServicesAE.xml

true
fileName


false

  content

  
  

  org.apache.uima.alchemy.ts.concept.ConceptFS
  
text
concept
  


  org.apache.uima.alchemy.ts.language.LanguageFS
  
language
language
  


  org.apache.uima.SentenceAnnotation
  
coveredText
sentence
  

  

  
  
  



 
  
uima
  




terminal error trace: 

Mar 19, 2017 10:46:16 AM WhitespaceTokenizer typeSystemInit 
INFO: "Whitespace tokenizer typesystem initialized" 
Mar 19, 2017 10:46:16 AM WhitespaceTokenizer process 
INFO: "Whitespace tokenizer starts processing" 
Mar 19, 2017 10:46:16 AM WhitespaceTokenizer process 
INFO: "Whitespace tokenizer finished processing" 
Mar 19, 2017 10:46:16 AM
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisE 
ngine_impl callAnalysisComponentProcess(405) 
SEVERE: Exception occurred 
org.apache.uima.analysis_engine.AnalysisEngineProcessException 
at
org.apache.uima.annotator.calais.OpenCalaisAnnotator.process(OpenCala 
isAnnotator.java:206) 
at
org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasA 
nnotator_ImplBase.java:56) 
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.cal 
lAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377) 
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.pro 
cessAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295) 
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterato 
r.processUntilNextOutputCas(ASB_impl.java:567) 
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterato 
r.(ASB_impl.java:409) 
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.ja 
va:342) 
at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.pro 
cessAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267) 
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(A 
nalysisEngineImplBase.java:267) 
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(A 
nalysisEngineImplBase.java:280) 
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText 
(UIMAUpdateRequestProcessor.java:176) 
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd( 
UIMAUpdateRequestProcessor.java:78) 
at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.j 
ava:97) 
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.read 
OuterMostDocIterator(JavaBinUpdateRequestCodec.java:179) 
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.read 
Iterator(JavaBinUpdateRequestCodec.java:135) 
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:27 
4) 
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.read 
NamedList(JavaBinUpdateRequestCodec.java:121) 
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:23 
9) 
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java: 
157) 
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmars