Re: Facet pivot and distributed search

2014-02-07 Thread Geert Van Huychem
Thx!

Geert Van Huychem
IT Consultant
iFrameWorx BVBA

Mobile: +32 497 27 69 03
E-mail: ge...@iframeworx.be
Site: http://www.iframeworx.be
LinkedIn: http://www.linkedin.com/in/geertvanhuychem


On Fri, Feb 7, 2014 at 8:55 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 Yes this is a open issue.

 https://issues.apache.org/jira/browse/SOLR-2894

 On Fri, Feb 7, 2014 at 1:13 PM, Geert Van Huychem ge...@iframeworx.be
 wrote:
  Hi
 
  I'm using Solr 4.5 in a multi-core environment.
 
  I've setup
  - one core per documenttype: text, rss, tweet and external documents.
  - one distrib core which basically distributes the query to the 4 cores
  mentioned hereabove.
 
  Facet pivot works on each core individually, but when I send the exact
 same
  query to the distrib core, I get no results.
 
  Anyone? Bug? Open issue?
 
  Best
 
  Geert Van Huychem



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Tf-Idf for a specific query

2014-02-07 Thread Mikhail Khludnev
Hello Dave
you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it
yourself)
then, for certain term you can get number of occurrences per document by
http://wiki.apache.org/solr/FunctionQuery#tf



On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote:

 Hi Guys..

 I require to obtain Tf-idf score from Solr for a certain set of documents.
 But the catch is that, I needs the IDF (or DF) to be calculated on the
 documents returned by the specific query and not the entire corpus.

 Please provide me some hint on whether Solr has this feature or if I can
 use the Lucene Api directly to achieve this.


 Thanks in advance,
 Dave




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Intercept updates and cascade loading of Index.

2014-02-07 Thread soodyogesh
Thanks for insights. This helps indeed, however im not sure how do i get
delta on commit. I guess I need to do some custom query  to get what has
changed since last update or sort of like that.

I would experiment around that, if anyone does that please share.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Intercept-updates-and-cascade-loading-of-Index-tp4115833p4116010.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MoreLikeThis

2014-02-07 Thread rubenboada
Hi iorixxx,

Sorry for the delay in replying.

The code is the below:

/public void performSearch(DSpaceObject dso) throws SearchServiceException {

if(queryResults != null)
{
return;
}

this.queryArgs = prepareDefaultFilters(getView());
this.queryArgs.setRows(1);
this.queryArgs.add(fl,dc.contributor.author,handle);
this.queryArgs.add(mlt,true);
this.queryArgs.add(mlt.fl,dc.contributor.author,handle);
this.queryArgs.add(mlt.mindf,1);
this.queryArgs.add(mlt.mintf,1);
this.queryArgs.setQuery(handle: + dso.getHandle());
this.queryArgs.setRows(1);

queryResults = getSearchService().search(queryArgs);

}
/

I use dc.contributor.author for similarity (mlt.fl). I don't know which
parameter is mlt.qf, the code by default is the above, and I only changed
the part of dc.contributor.author.

Thanks

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MoreLikeThis-tp4114605p4116022.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud: problems with delete

2014-02-07 Thread ku3ia
Hi all!
Does SolrCloud correct delete documents? When I send many requests via POST
with small number of ids – there are some documents left in index, which not
deleted.

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-problems-with-delete-tp4116026.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr joins

2014-02-07 Thread Mikhail Khludnev
 Basically, i am trying to understand where and how solr joins differ from
 lucene joins. Any pointers, much appreciated ?


Hello Anand,

I'm keen for index-time joins (aka block joins), thus I've never looked
into query-time ones.
I even didn't ever know that there are two different query-time joins. This
diverging might caused the fabulous drama: segmented vs top-level
Ok. It seems like, Solr's query time join never scores
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L535
But Lucene's JoinUtils does
https://github.com/apache/lucene-solr/blob/trunk/lucene/join/src/java/org/apache/lucene/search/join/JoinUtil.java?source=cc#L78
Also Solr's join joins across different cores.

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: solr joins

2014-02-07 Thread anand chandak
Thanks Mikhail, curious why was scoring left out of solr ? And if 
there's any plan to port it ?


Also, if you can please elaborate on the segmented vs toplevel

Thanks,

Anand


On 2/7/2014 4:53 PM, Mikhail Khludnev wrote:

Basically, i am trying to understand where and how solr joins differ from
lucene joins. Any pointers, much appreciated ?



Hello Anand,

I'm keen for index-time joins (aka block joins), thus I've never looked
into query-time ones.
I even didn't ever know that there are two different query-time joins. This
diverging might caused the fabulous drama: segmented vs top-level
Ok. It seems like, Solr's query time join never scores
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L535
But Lucene's JoinUtils does
https://github.com/apache/lucene-solr/blob/trunk/lucene/join/src/java/org/apache/lucene/search/join/JoinUtil.java?source=cc#L78
Also Solr's join joins across different cores.





Re: solr joins

2014-02-07 Thread Mikhail Khludnev
On Fri, Feb 7, 2014 at 3:32 PM, anand chandak anand.chan...@oracle.comwrote:

 Thanks Mikhail, curious why was scoring left out of solr ?

Have no idea. I've never been involved in query-time join.


 And if there's any plan to port it ?

I suppose  there is no plan until someone raise a jira.


 Also, if you can please elaborate on the segmented vs toplevel


http://vimeo.com/44113003
Is Your Index Reader Really Atomic or Maybe Slow? by Uwe Schindler


 Thanks,

 Anand



 On 2/7/2014 4:53 PM, Mikhail Khludnev wrote:

  Basically, i am trying to understand where and how solr joins differ from
 lucene joins. Any pointers, much appreciated ?


  Hello Anand,

 I'm keen for index-time joins (aka block joins), thus I've never looked
 into query-time ones.
 I even didn't ever know that there are two different query-time joins.
 This
 diverging might caused the fabulous drama: segmented vs top-level
 Ok. It seems like, Solr's query time join never scores
 https://github.com/apache/lucene-solr/blob/trunk/solr/
 core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L535
 But Lucene's JoinUtils does
 https://github.com/apache/lucene-solr/blob/trunk/lucene/
 join/src/java/org/apache/lucene/search/join/JoinUtil.java?source=cc#L78
 Also Solr's join joins across different cores.





-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Solr Composite Unique key from existing fields in schema

2014-02-07 Thread Anurag Verma
Hi,
 I am developing a search application using SOLR. I don't have primary
key in any table. Composite key is being used in my application. How do i
implement composite key as unique key in this case. please help. i am
struggling.

-- 
Thanks  Regards
Anurag Verma
Arise! Awake! And stop not till the goal is reached!


Re: SolrCloud: problems with delete

2014-02-07 Thread Furkan KAMACI
Hi;

What is your commit policy? Check this is works or not:
*solr/update?commit=true* If it works then could you write your* autocommit
configuration*?

Thanks;
Furkan KAMACI


2014-02-07 13:23 GMT+02:00 ku3ia dem...@gmail.com:

 Hi all!
 Does SolrCloud correct delete documents? When I send many requests via POST
 with small number of ids - there are some documents left in index, which
 not
 deleted.

 Thanks.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-problems-with-delete-tp4116026.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud: problems with delete

2014-02-07 Thread ku3ia
autocommit and /update/?commit=true works fine. I tell about, for example, I
send 807632 docs to index to my 3 shard cluster - everything is fine, but
when I'm trying to remove them, using POST request with small number of ids,
lets say 100 per request - some docs are still on index, but seems must not.
When I send a POST request with ids number like 1K or 2K all docs are
deleted from index.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-problems-with-delete-tp4116026p4116038.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help for integrating solr-4.5.1 with UIMA

2014-02-07 Thread rashi gandhi
Hi,

I tried almost all combinations in solrconfig.xml for using UIMA with solr.
But each time i am indexing data to solr, getting this excpetion

113701 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore  û
org.apache.solr.common.SolrException:
org.apache.uima.resource.ResourceInitializationException

at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:64)

at
org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:204)

at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:60)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)

at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)

at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008)

at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)

at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

Caused by: org.apache.uima.resource.ResourceInitializationException

at
org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:58)

at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:61)

... 22 more

Caused by: java.lang.NullPointerException

at
org.apache.uima.util.XMLInputSource.init(XMLInputSource.java:118)

at
org.apache.lucene.analysis.uima.ae.BasicAEProvider.getInputSource(BasicAEProvider.java:84)

at
org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:50)

... 23 more



113873 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  û
[collection1] webapp=/solr path=/update params={version=2.2} status=500
QTime=203



113888 [http-bio-8080-exec-1] ERROR
org.apache.solr.servlet.SolrDispatchFilter  û
null:org.apache.solr.common.SolrException: org.apache.uima.resource.

ResourceInitializationException

at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:64)

at
org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:204)

at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:60)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)

at

Re: Facet pivot and distributed search

2014-02-07 Thread Trey Grainger
FYI, the last distributed pivot facet patch functionally works, but there
are some sub-optimal data structures being used and some unnecessary
duplicate processing of values. As a result, we found that for certain
worst-case scenarios (i.e. data is not randomly distributed across Solr
cores and requires significant refinement) pivot facets with multiple
levels could take over a minute to aggregate and process results. This was
using a dataset of several hundred million documents and dozens of pivot
facets across 120 Solr cores distributed over 20 servers, so it is a more
extreme use-case than most will encounter.

Nevertheless, we've refactored the code and data structures and brought the
processing time from over a minute down to less than a second using the
above configuration. We plan to post the patch within the next week.


On Fri, Feb 7, 2014 at 3:08 AM, Geert Van Huychem ge...@iframeworx.bewrote:

 Thx!

 Geert Van Huychem
 IT Consultant
 iFrameWorx BVBA

 Mobile: +32 497 27 69 03
 E-mail: ge...@iframeworx.be
 Site: http://www.iframeworx.be
 LinkedIn: http://www.linkedin.com/in/geertvanhuychem


 On Fri, Feb 7, 2014 at 8:55 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

  Yes this is a open issue.
 
  https://issues.apache.org/jira/browse/SOLR-2894
 
  On Fri, Feb 7, 2014 at 1:13 PM, Geert Van Huychem ge...@iframeworx.be
  wrote:
   Hi
  
   I'm using Solr 4.5 in a multi-core environment.
  
   I've setup
   - one core per documenttype: text, rss, tweet and external documents.
   - one distrib core which basically distributes the query to the 4 cores
   mentioned hereabove.
  
   Facet pivot works on each core individually, but when I send the exact
  same
   query to the distrib core, I get no results.
  
   Anyone? Bug? Open issue?
  
   Best
  
   Geert Van Huychem
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 



ExtendedDismax and NOT operator

2014-02-07 Thread Geert Van Huychem
Hi

This is my config:

  requestHandler name=edismax_basic class=solr.SearchHandler
lst name=defaults
  str name=defTypeedismax/str
  str name=qfbody/str
  str name=pftitle^30 introduction^15 body^10/str
  str name=ps0/str
/lst
  /requestHandler

Executing the following link:
http://localhost:8983/solr/distrib/select?q=term1 NOT
term2start=0rows=0qt=edismax_basicdebugQuery=true

gives me as debuginfo:

str name=parsedquery
(+(DisjunctionMaxQuery((body:term1)) -DisjunctionMaxQuery((body:term2)))
DisjunctionMaxQuery((title:term1 term2^30.0))
DisjunctionMaxQuery((introduction:term1 term2^15.0))
DisjunctionMaxQuery((body:term1 term2^10.0)))/no_coord
/str

My question is: why is term2 included in the phrase query part?

Best
Geert Van Huychem


Re: Solr and Polygon/Radius based spatial searches

2014-02-07 Thread leevduhl
David,  Thanks for the response, the info should be very helpful!

Lee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-Polygon-Radius-based-spatial-searches-tp4115121p4116068.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ExtendedDismax and NOT operator

2014-02-07 Thread Jack Krupansky
I suspect that's a bug. The phrase boost code should have the logic to 
exclude negated terms.


File a Jira.

Thanks for reporting this.

-- Jack Krupansky

-Original Message- 
From: Geert Van Huychem

Sent: Friday, February 7, 2014 9:40 AM
To: solr-user@lucene.apache.org
Subject: ExtendedDismax and NOT operator

Hi

This is my config:

 requestHandler name=edismax_basic class=solr.SearchHandler
   lst name=defaults
 str name=defTypeedismax/str
 str name=qfbody/str
 str name=pftitle^30 introduction^15 body^10/str
 str name=ps0/str
   /lst
 /requestHandler

Executing the following link:
http://localhost:8983/solr/distrib/select?q=term1 NOT
term2start=0rows=0qt=edismax_basicdebugQuery=true

gives me as debuginfo:

str name=parsedquery
(+(DisjunctionMaxQuery((body:term1)) -DisjunctionMaxQuery((body:term2)))
DisjunctionMaxQuery((title:term1 term2^30.0))
DisjunctionMaxQuery((introduction:term1 term2^15.0))
DisjunctionMaxQuery((body:term1 term2^10.0)))/no_coord
/str

My question is: why is term2 included in the phrase query part?

Best
Geert Van Huychem 



Group.Facet issue in Sharded Solr Setup

2014-02-07 Thread rks_lucene
Am facing an issue with counts when using group.facets in my sharded solr.
(Groups do not overlap across shards and for various reasons I cannot use
group.truncate) 

Now, the problem being faced is that for items ranking lower in the faceted
list sorted by count, the group facet counts are coming *higher* than actual
values.

So on doing an online search I came across details of sharded faceting at
this link:

http://lucene.472066.n3.nabble.com/At-a-high-level-how-does-faceting-in-SolrCloud-work-td4009897.html

From the above link it appears there is a *third corrective step* wherein
the coordinator node after getting individual results and building a final
list, asks each shard to compute it's exact count for selected constraints.

I wanted to ask if the Group.facet implementation in 4.x has been factored
in this step and that the coordinator node is asking for grouped facet
values instead of ungrouped facet counts during the corrective step ?

Asking this because, counts are coming right for the 50% of the popular
items but are incorrect (and always higher) for lesser items. 

Also has anyone else faced this ?

Ritesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Group-Facet-issue-in-Sharded-Solr-Setup-tp4116077.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ExtendedDismax and NOT operator

2014-02-07 Thread Alexei Martchenko
Just to clarify: the actual url is properly space-escaped?

http://localhost:8983/solr/distrib/select?q=term1%20NOT%20
term2start=0rows=0qt=edismax_basicdebugQuery=true



alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-02-07 12:40 GMT-02:00 Geert Van Huychem ge...@iframeworx.be:

 Hi

 This is my config:

   requestHandler name=edismax_basic class=solr.SearchHandler
 lst name=defaults
   str name=defTypeedismax/str
   str name=qfbody/str
   str name=pftitle^30 introduction^15 body^10/str
   str name=ps0/str
 /lst
   /requestHandler

 Executing the following link:
 http://localhost:8983/solr/distrib/select?q=term1 NOT
 term2start=0rows=0qt=edismax_basicdebugQuery=true

 gives me as debuginfo:

 str name=parsedquery
 (+(DisjunctionMaxQuery((body:term1)) -DisjunctionMaxQuery((body:term2)))
 DisjunctionMaxQuery((title:term1 term2^30.0))
 DisjunctionMaxQuery((introduction:term1 term2^15.0))
 DisjunctionMaxQuery((body:term1 term2^10.0)))/no_coord
 /str

 My question is: why is term2 included in the phrase query part?

 Best
 Geert Van Huychem



Re: ExtendedDismax and NOT operator

2014-02-07 Thread Geert Van Huychem
Yes, it is.

Geert Van Huychem
IT Consultant
iFrameWorx BVBA

Mobile: +32 497 27 69 03
E-mail: ge...@iframeworx.be
Site: http://www.iframeworx.be
LinkedIn: http://www.linkedin.com/in/geertvanhuychem


On Fri, Feb 7, 2014 at 6:44 PM, Alexei Martchenko
ale...@martchenko.com.brwrote:

 Just to clarify: the actual url is properly space-escaped?

 http://localhost:8983/solr/distrib/select?q=term1%20NOT%20
 term2start=0rows=0qt=edismax_basicdebugQuery=true



 alexei martchenko
 Facebook http://www.facebook.com/alexeiramone |
 Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
 Steam http://steamcommunity.com/id/alexeiramone/ |
 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
 Github https://github.com/alexeiramone | (11) 9 7613.0966 |


 2014-02-07 12:40 GMT-02:00 Geert Van Huychem ge...@iframeworx.be:

  Hi
 
  This is my config:
 
requestHandler name=edismax_basic class=solr.SearchHandler
  lst name=defaults
str name=defTypeedismax/str
str name=qfbody/str
str name=pftitle^30 introduction^15 body^10/str
str name=ps0/str
  /lst
/requestHandler
 
  Executing the following link:
  http://localhost:8983/solr/distrib/select?q=term1 NOT
  term2start=0rows=0qt=edismax_basicdebugQuery=true
 
  gives me as debuginfo:
 
  str name=parsedquery
  (+(DisjunctionMaxQuery((body:term1)) -DisjunctionMaxQuery((body:term2)))
  DisjunctionMaxQuery((title:term1 term2^30.0))
  DisjunctionMaxQuery((introduction:term1 term2^15.0))
  DisjunctionMaxQuery((body:term1 term2^10.0)))/no_coord
  /str
 
  My question is: why is term2 included in the phrase query part?
 
  Best
  Geert Van Huychem
 



Re: Need help for integrating solr-4.5.1 with UIMA

2014-02-07 Thread Jack Krupansky
The UIMA component is not very error-friendly - NPE gets thrown for missing 
or misspelled parameter names. Basically, you have to look at the source 
code based on that stack trace to find out which parameter was missing.


-- Jack Krupansky

-Original Message- 
From: rashi gandhi

Sent: Friday, February 7, 2014 8:32 AM
To: solr-user@lucene.apache.org ; u...@uima.apache.org
Subject: Re: Need help for integrating solr-4.5.1 with UIMA

Hi,

I tried almost all combinations in solrconfig.xml for using UIMA with solr.
But each time i am indexing data to solr, getting this excpetion

113701 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore  û
org.apache.solr.common.SolrException:
org.apache.uima.resource.ResourceInitializationException

   at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:64)

   at
org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:204)

   at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:60)

   at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)

   at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)

   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)

   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)

   at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

   at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

   at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)

   at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)

   at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)

   at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)

   at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)

   at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)

   at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)

   at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008)

   at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)

   at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)

   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

   at java.lang.Thread.run(Thread.java:619)

Caused by: org.apache.uima.resource.ResourceInitializationException

   at
org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:58)

   at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:61)

   ... 22 more

Caused by: java.lang.NullPointerException

   at
org.apache.uima.util.XMLInputSource.init(XMLInputSource.java:118)

   at
org.apache.lucene.analysis.uima.ae.BasicAEProvider.getInputSource(BasicAEProvider.java:84)

   at
org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:50)

   ... 23 more



113873 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  û
[collection1] webapp=/solr path=/update params={version=2.2} status=500
QTime=203



113888 [http-bio-8080-exec-1] ERROR
org.apache.solr.servlet.SolrDispatchFilter  û
null:org.apache.solr.common.SolrException: org.apache.uima.resource.

ResourceInitializationException

   at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:64)

   at
org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:204)

   at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:60)

   at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)

   at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)

   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)

   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)

   at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

   at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

   at

SOLR 4.6 and Highlight Snippets with spannear

2014-02-07 Thread Puneet Pawaia
Hi.
I am using Solr 4.6 with XmlQueryParser from Jira. I have noticed that if I
have a spannear query, then no highlights snippets are returned. I have
tried both regular highlighter as well as fast vector highlighter.
Is there any limitation of the highlighters with respect to spannear
queries?
Regards
Puneet


Swap space,JVM-Memory,Physical memory on Solr Admin UI explanation

2014-02-07 Thread Vijay Balakrishnan
Hi,

I am using solr 4.6.1 on a Windows 7 server right now with 32 GB RAM.I have
a SolrCloud with 3 shards, 2 replicas and an embedded Zookeeper on the 1
box.I have allocated -Xmx5GB RAM to each Solr instance when starting up
with -XX:MaxNewSize:1636m

I see the Swap space(32.5G/64GB),JVM-Memory(521.1MB/4.73GB),Physical
memory(11.07 GB/32GB) on Solr Admin UI. That usage is confusing me. The
Swap space was going up when indexing 15 million documents but not the
JVM-memory(which went up a  max of 1.1G or so). So, does that mean I don't
need to allocate that much RAM for each Solr instance ?

Could someone explain the 3 terms clearly in terms of their use in Indexing
and Querying:
Swap space, JVM-memory and Physical memory ?


TIA,
Vijay


Re: Tf-Idf for a specific query

2014-02-07 Thread Mikhail Khludnev
David,

I can imagine that DF for resultset is facets!


On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.comwrote:

 Hi Mikhail,

 The DF seems to be based on the entire document set. What I require is
 based on a the results of a single query.

 Suppose my Solr query returns a set of 50K documents from a superset of
 10Million documents, I require to calculate the DF just based on the 50K
 documents. But currently it seems to be calculated on the entire doc set.

 So, is there any way to get the DF or IDF just on basis of the docs
 returned by the query?

 Regards,
 Dave







 On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
  wrote:

  Hello Dave
  you can get DF from http://wiki.apache.org/solr/TermsComponent (invert
 it
  yourself)
  then, for certain term you can get number of occurrences per document by
  http://wiki.apache.org/solr/FunctionQuery#tf
 
 
 
  On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com
  wrote:
 
   Hi Guys..
  
   I require to obtain Tf-idf score from Solr for a certain set of
  documents.
   But the catch is that, I needs the IDF (or DF) to be calculated on the
   documents returned by the specific query and not the entire corpus.
  
   Please provide me some hint on whether Solr has this feature or if I
 can
   use the Lucene Api directly to achieve this.
  
  
   Thanks in advance,
   Dave
  
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
   mkhlud...@griddynamics.com
 




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Inconsistent results in a distributed configuration

2014-02-07 Thread felipefrbr
I´m getting inconsistent results in a distributed configuration. 

Using stats command over a single core containing about 3 milion docs I´ve
got  452660794509326.7 (a double type field). 

On the other hand, when partitioning the data into 2 or 4 cores I am getting
a different result: 452660794509325.4.

Has anyone faced the same problem ? Is it a misconfiguration or a bug ? Any
hints ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inconsistent-results-in-a-distributed-configuration-tp4116061.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tf-Idf for a specific query

2014-02-07 Thread David Miller
Hi Mikhail,

The DF seems to be based on the entire document set. What I require is
based on a the results of a single query.

Suppose my Solr query returns a set of 50K documents from a superset of
10Million documents, I require to calculate the DF just based on the 50K
documents. But currently it seems to be calculated on the entire doc set.

So, is there any way to get the DF or IDF just on basis of the docs
returned by the query?

Regards,
Dave







On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com
 wrote:

 Hello Dave
 you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it
 yourself)
 then, for certain term you can get number of occurrences per document by
 http://wiki.apache.org/solr/FunctionQuery#tf



 On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com
 wrote:

  Hi Guys..
 
  I require to obtain Tf-idf score from Solr for a certain set of
 documents.
  But the catch is that, I needs the IDF (or DF) to be calculated on the
  documents returned by the specific query and not the entire corpus.
 
  Please provide me some hint on whether Solr has this feature or if I can
  use the Lucene Api directly to achieve this.
 
 
  Thanks in advance,
  Dave
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-07 Thread Brett Hoerner
I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ
4.6.1 and indexing ceased (indexer returned No live servers for shard but
the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
fine for the query side, just not adding documents.



21:35:21.508 [qtp1418442930-22296231] ERROR
o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException:
Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:724)


Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-07 Thread Mark Miller
Hey, yeah, blew it on this one. Someone just reported it the other day - the 
way that a bug was fixed was not back and forward compatible. The first 
implementation was wrong.  

You have to update the other nodes to 4.6.1 as well.  

I’m going to look at some scripting test that can help check for this type of 
thing.

- Mark  

http://about.me/markrmiller



On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com wrote: I 
have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ
4.6.1 and indexing ceased (indexer returned No live servers for shard but
the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
fine for the query side, just not adding documents.



21:35:21.508 [qtp1418442930-22296231] ERROR
o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException:
Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:724)


Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-07 Thread Brett Hoerner
On Fri, Feb 7, 2014 at 6:15 PM, Mark Miller markrmil...@gmail.com wrote:

 You have to update the other nodes to 4.6.1 as well.


I'm not sure I follow, all of the Solr instances in the cluster are 4.6.1
to my knowledge?

Thanks,
Brett


Index a new record in MySQL

2014-02-07 Thread PeriS
Hi,

How do I approach the issue of firing the DIH without it having to index the 
whole DB when adding a new record? It appears that when a new record is added 
the delta query on DIH doesn’t pick up the record. And I don’t want to run a 
full index on the DB when adding 1 single row. Any suggestions please?

Thanks
-Peri





*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.




Re: Tf-Idf for a specific query

2014-02-07 Thread David Miller
Thanks Mikhai,

It seems that, this was what I was looking for. Being new to this, I wasn't
aware of such a use of facets.

Now I can probably combine the term vectors and facets to fit my scenario.

Regards,
Dave


On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev mkhlud...@griddynamics.com
 wrote:

 David,

 I can imagine that DF for resultset is facets!


 On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com
 wrote:

  Hi Mikhail,
 
  The DF seems to be based on the entire document set. What I require is
  based on a the results of a single query.
 
  Suppose my Solr query returns a set of 50K documents from a superset of
  10Million documents, I require to calculate the DF just based on the 50K
  documents. But currently it seems to be calculated on the entire doc set.
 
  So, is there any way to get the DF or IDF just on basis of the docs
  returned by the query?
 
  Regards,
  Dave
 
 
 
 
 
 
 
  On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev 
  mkhlud...@griddynamics.com
   wrote:
 
   Hello Dave
   you can get DF from http://wiki.apache.org/solr/TermsComponent (invert
  it
   yourself)
   then, for certain term you can get number of occurrences per document
 by
   http://wiki.apache.org/solr/FunctionQuery#tf
  
  
  
   On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com
   wrote:
  
Hi Guys..
   
I require to obtain Tf-idf score from Solr for a certain set of
   documents.
But the catch is that, I needs the IDF (or DF) to be calculated on
 the
documents returned by the specific query and not the entire corpus.
   
Please provide me some hint on whether Solr has this feature or if I
  can
use the Lucene Api directly to achieve this.
   
   
Thanks in advance,
Dave
   
  
  
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
mkhlud...@griddynamics.com
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com