Is term~ effect available as a eDisMax param or a TokenFilter?

2014-07-02 Thread Alexandre Rafalovitch
Hello,

I am trying to match the names. In UI, I can do it by doing name~ or
name~2, but I can't expect users to do that and I don't want to do
pre-tokenization in the middleware to inject that. Also, only specific
fields are names, people can also enter phone numbers, which I don't
want to fuzz when matching their fields.

I thought eDisMax allowed to specify as part of 'fl' (fl=SURNAME~1
FIRSTNAME~1) but that does not seem to work. I know there are other
parameters that do take that, but they all seem to be for phrases
distance, not fuzzy.

So, the question is, is the same algorithm (levenshtein distance?)
available in some other way, like a an TokenFilter? I know there are
other name munging filters there (like Metaphone), but was curious
specifically about the equivalent one.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


Re: Integrating solr with Hadoop

2014-07-02 Thread gurunath
Thanks Eric,

I will watch out for Map reduce option. It will be helpfull if I get any
links to set up hadoop with solr.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715p4145157.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: NPE when using facets with the MLT handler.

2014-07-02 Thread Markus Jelsma
Hi, i don't think this is ever going to work with the MLT Handler, you should 
use the regular SearchHandler instead.
 
 
-Original message-
 From:SafeJava T t...@safejava.com
 Sent: Monday 30th June 2014 17:52
 To: solr-user@lucene.apache.org
 Subject: NPE when using facets with the MLT handler.
 
 I am getting an NPE when using facets with the MLT handler.  I googled for
 other npe errors with facets, but this trace looked different from the ones
 I found. We are using Solr 4.9-SNAPSHOT.
 
 I have reduced the query to the most basic form I can:
 
 q=id:XXXmlt.fl=mlt_fieldfacet=truefacet.field=id
 I changed it to facet on id, to ensure that the field was present in all
 results.
 
 Any ideas on how to work around this?
 
 
 java.lang.NullPointerException at
 org.apache.solr.search.facet.SimpleFacets.addFacets(SimpleFacets.java:375)
 at
 org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:211)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1955) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:769)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368) at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:744)
 
 Thanks,
 Tom
 


RE: Memory Leaks in solr 4.8.1

2014-07-02 Thread Markus Jelsma
Hi, you can safely ignore this, it is shutting down anyway. Just don't reload 
the app a lot of times without actually restarting Tomcat. 
 
-Original message-
 From:Aman Tandon amantandon...@gmail.com
 Sent: Wednesday 2nd July 2014 7:22
 To: solr-user@lucene.apache.org
 Subject: Memory Leaks in solr 4.8.1
 
 Hi,
 
 When i am shutting down the solr i am gettng the Memory Leaks error in logs.
 
 Jul 02, 2014 10:49:10 AM org.apache.catalina.loader.WebappClassLoader
  checkThreadLocalMapForLeaks
  SEVERE: The web application [/solr] created a ThreadLocal with key of type
  [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
  [org.apache.solr.schema.DateField$ThreadLocalDateFormat@1d987b2]) and a
  value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
  (value 
  [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
  but failed to remove it when the web application was stopped. Threads are
  going to be renewed over time to try and avoid a probable memory leak.
 
 
 Please check.
 With Regards
 Aman Tandon
 


Re: Understanding fieldNorm differences between 3.6.1 and 4.9 solrs

2014-07-02 Thread Aaron Daubman
Wow - so apparently I have terrible recall and should re-read this thread I
started on the same topic when upgrading from 1.4 to 3.6 and hit a very
similar fieldNorm issue almost two years ago! =)
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201207.mbox/%3CCALyTvnpwZMj4zxPbK0abVpnyRJny=qauijdqmj7e3zgnv7u...@mail.gmail.com%3E

In the mean time, I'm still happy to hear any new thoughts / suggestions on
making similarity contiguous across upgrades.

Thanks again,
   Aaron


On Tue, Jul 1, 2014 at 11:14 PM, Aaron Daubman daub...@gmail.com wrote:

 In trying to determine some subtle scoring differences (causing
 occasionally significant ordering differences) among search results, I
 wrote a parser to normalize debug.explain.structured JSON output.

 It appears that every score that is different comes down to a difference
 in fieldNorm, where the 3.6.1 solr is using  0.109375 as the fieldNorm, and
 the 4.9 solr is using 0.125 as the fieldNorm. [1]

 What would be causing the different versions to use different field norms
 (and rather infrequently, as the majority of scores are identical as
 desired)?

 Thanks,
   Aaron

 [1] Here's a snippet of the diff (of the output from my
 debug.explain.structured normalizer) for one such difference (apologies for
 the width):

 06808040cd523a296abaf26025148c85: {
 06808040cd523a296abaf26025148c85: {
 *  _value: 0.839616605,   |
  _value: 0.854748135, *
   description: product of:,
 description: product of:,
   details: [
  details: [
 {   {
 *  _value: 2.623802,  |
  _value: 2.67108801, *
   description: sum of:,
 description: sum of:,
   details: [
  details: [
 {
   {
 *  _value: 0.0644619693,  |
  _value: 0.0736708307, *
   description: weight(t_style:alternative
description: weight(t_style:alternative
   details: [
details: [
 {
   {
   _value: 0.0629802298,
 _value: 0.0629802298,
   description: queryWeight,
 description: queryWeight,
   details: [
details: [
 {
   {
   _value: 4.18500798,
 _value: 4.18500798,
   description: idf(137871)
description: idf(137871)
 }
   }
   ]
 ]
 },
  },
 {
   {
 *  _value: 1.02352709,|
  _value: 1.1697453, *
   description: fieldWeight,
 description: fieldWeight,
   details: [
details: [
 {
   {
   _value: 2.23606799,
 _value: 2.23606799,
   description: tf(freq=5)
 description: tf(freq=5)
 },
  },
 {
   {
   _value: 4.18500798,
 _value: 4.18500798,
   description: idf(137871)
description: idf(137871)
 },
  },
 {
   {
 *  _value: 0.109375,  |
  _value: 0.125, *
 *  description: fieldNorm
  description: fieldNorm*
 }
   }
   ]
 ]
 }
   }
   ]
 ]
 },
  },



Re: How to integrate nlp in solr

2014-07-02 Thread parnab kumar
Aman,

  I feel focusing on  Question-Answering and Information Extraction
 components of NLP should help you achieve what  you are looking for. Go
through this book *Taming Text * (http://www.manning.com/ingersoll/ ) .
Most of your queries should be answered including details on implementation
and sample source codes.



To state naively :
  NLP tools gives you the power to extract or  interpret knowledge from
text, which you basically store in the lucene index in form of fields or
store along with the terms using payloads. During query processing time,
you similarly gather additional knowledge from the query (using techniques
like query expansion, relevance feedback, or ontologies ) and simply map
those knowledge with the knowledge gained from the text. Its an effort to
move to semantic retrieval rather than simple term matching.

Thanks,
Parnab


On Wed, Jul 2, 2014 at 6:29 AM, Aman Tandon amantandon...@gmail.com wrote:

 Hi Alex,

 Thanks alex, one more thing i want to ask that so do we need to add the
 extra fields for those entities, e.g. Item (bags), color (blue), etc.

 If some how i managed to implement this nlp then i will definitely publish
 it on my blog :)

 With Regards
 Aman Tandon


 On Wed, Jul 2, 2014 at 10:34 AM, Alexandre Rafalovitch arafa...@gmail.com
 
 wrote:

  Not from me, no. I don't have any real examples for this ready. I
  suspect the path beyond the basics is VERY dependent on your data and
  your business requirements.
 
  I would start from thinking how would YOU (as a human) do that match.
  Where does the 'blue' and 'color' and 'college' and 'bags' come from.
  Then, figuring out what is required for Solr to know to look there.
 
  NLP is not magic, just advanced technology. You need to know where you
  are going to get there.
 
  Regards,
 Alex.
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Wed, Jul 2, 2014 at 11:35 AM, Aman Tandon amantandon...@gmail.com
  wrote:
   Any help here
  
   With Regards
   Aman Tandon
  
  
   On Mon, Jun 30, 2014 at 11:00 PM, Aman Tandon amantandon...@gmail.com
 
   wrote:
  
   Hi Alex,
  
   I was try to get knowledge from these tutorials
   http://www.slideshare.net/teofili/natural-language-search-in-solr 
   https://wiki.apache.org/solr/OpenNLP: this one is kinda bit
 explaining
   but the real demo is not present.
   e.g. query: I want blue color college bags, then how using nlp it will
   work and how it will search, there is no such brief explanation out
  there,
   i will be thankful to you if you can help me in this.
  
   With Regards
   Aman Tandon
  
  
   On Mon, Jun 30, 2014 at 6:38 AM, Alexandre Rafalovitch 
  arafa...@gmail.com
wrote:
  
   On Sun, Jun 29, 2014 at 10:19 PM, Aman Tandon 
 amantandon...@gmail.com
  
   wrote:
the appropriate results
   What are those specifically? You need to be a bit more precise about
   what you are trying to achieve. Otherwise, there are too many NLP
   branches and too many approaches.
  
   Regards,
  Alex.
   Personal website: http://www.outerthoughts.com/
   Current project: http://www.solr-start.com/ - Accelerating your Solr
   proficiency
  
  
  
 



OCR - Saving multi-term position

2014-07-02 Thread Manuel Le Normand
Hello,
Many of our indexed documents are scanned and OCR'ed documents.
Unfortunately we were not able to improve much the OCR quality (less than
80% word accuracy) for various reasons, a fact which badly hurts the
retrieval quality.

As we use an open-source OCR, we think of changing every scanned term
output to it's main possible variations to get a higher level of confidence.

Is there any analyser that supports this kind of need or should I make up a
syntax and analyser of my own, i.e the payload syntax?

The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3 fox|4

Thanks,
Manuel


RE: Endeca to Solr Migration

2014-07-02 Thread Dyer, James
We migrated a big application from Endeca (6.0, I think) a several years ago.  
We were not using any of the business UI tools, but we found that Solr is a lot 
more flexible and performant than Endeca.  But with more flexibility comes more 
you need to know.

The hardest thing was to migrate the Endeca dimensions to Solr facets.  We had 
endeca-api specific dependencies throughout the application, even in the 
presentation layer.  We ended up writing a bridge api that allowed us to keep 
our endeca-specific code and translate the queries to solr queries.  We are 
storing a cross-reference between the N values from Endeca and key/value 
pairs to translate something like N=4000 to fq=Language:English.  With solr, 
there is more you need to do in your app that the backend doesn't manage for 
you.  In the end, though, it lets you sparate your concerns better.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: mrg81 [mailto:maya...@gmail.com] 
Sent: Saturday, June 28, 2014 1:11 PM
To: solr-user@lucene.apache.org
Subject: Endeca to Solr Migration

Hello --

I wanted to get some details on Endeca to Solr Migration. I am
interested in few topics:

1. We would like to migrate the Faceted Navigation, Boosting individual
records and a few other items. 
2. But the biggest question is about the UI [Experience Manager] - I have
not found a tool that comes close to Experience Manager. I did read about
Hue [In response to Gareth's question on Migration], but it seems that we
will have to do a lot of customization to use that. 

Questions:

1. Is there a UI that we can use? Is it possible to un-hook the Experience
Manager UI and point to Solr?
2. How long does a typical migration take? Assuming that we have to migrate
the Faceted Navigation and Boosted records? 

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Endeca-to-Solr-Migration-tp4144582.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: OCR - Saving multi-term position

2014-07-02 Thread Michael Della Bitta
I don't have first hand knowledge of how you implement that, but I bet a
look at the WordDelimiterFilter would help you understand how to emit
multiple terms with the same positions pretty easily.

I've heard of this bag of word variants approach to indexing poor-quality
OCR output before for findability reasons and I heard it works out OK.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Wed, Jul 2, 2014 at 10:19 AM, Manuel Le Normand 
manuel.lenorm...@gmail.com wrote:

 Hello,
 Many of our indexed documents are scanned and OCR'ed documents.
 Unfortunately we were not able to improve much the OCR quality (less than
 80% word accuracy) for various reasons, a fact which badly hurts the
 retrieval quality.

 As we use an open-source OCR, we think of changing every scanned term
 output to it's main possible variations to get a higher level of
 confidence.

 Is there any analyser that supports this kind of need or should I make up a
 syntax and analyser of my own, i.e the payload syntax?

 The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3 fox|4

 Thanks,
 Manuel



Customise score

2014-07-02 Thread rachun
Dear all,

Could anybody suggest me how to customize the score?
So, I have data like this ..

{ID : '0001', Title :'MacBookPro',Price: 400,Base_score:'121.2'}
{ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2'}
{ID : '0003', Title :'Laptop',Price: 300,Base_score:'155.7'}

Notice that I have ID field for uniqueKey.
When I query q=MacBook  sort=score desc
it will return result something like this

{ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2',score:1.45}
{ID : '0001', Title :'MacBookPro',Price: 400,Base_score:'121.2',score:1.11}

But I want solr to produce score by also using my Base_score. The score
should be something like this

- score = 100.2 + 1.45 = 101.65
- score = 121.2 + 1.11 = 122.31

Then the result should be something like this..

{ID : '0001', Title :'MacBookPro',Price:
400,Base_score:'121.2',score:122.31}
{ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2',score:101.65}

I'm not familia with Java so I can't write my own function as somebody do.
So, which is the easiest way to do this work using existing function from
solr?

Thank you very much,
Chun.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customise-score-tp4145214.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Customise score

2014-07-02 Thread Gora Mohanty
On 2 July 2014 20:32, rachun rachun.c...@gmail.com wrote:
 Dear all,

 Could anybody suggest me how to customize the score?
 So, I have data like this ..

 {ID : '0001', Title :'MacBookPro',Price: 400,Base_score:'121.2'}
 {ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2'}
 {ID : '0003', Title :'Laptop',Price: 300,Base_score:'155.7'}

 Notice that I have ID field for uniqueKey.
 When I query q=MacBook  sort=score desc
 it will return result something like this

 {ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2',score:1.45}
 {ID : '0001', Title :'MacBookPro',Price: 400,Base_score:'121.2',score:1.11}

 But I want solr to produce score by also using my Base_score. The score
 should be something like this

 - score = 100.2 + 1.45 = 101.65
 - score = 121.2 + 1.11 = 122.31

You should use Solr's sum function query:
http://wiki.apache.org/solr/FunctionQuery#sum
q=MacBooksort=sum(Base_score, score)+desc should do it.

Regards,
Gora


Re: Clubbing queries with different criterias together?

2014-07-02 Thread lalitjangra
Thanks Ahmet,

I tried with multiple combinations  finally got it using full query as
nested query.

Is it fine to use full query inside nested query with filters _query_ as
below.

http://localhost:8983/solr/collection1/select?q=text:sharepointwt=jsonindent=trueAuthenticatedUserName=ljangra_query_:select?q=text:sharepointwt=jsonindent=truefq:acls:(*)

Is it still more performant than using two separate queries?

Regards.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clubbing-queries-with-different-criterias-together-tp4143829p4145217.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Customise score

2014-07-02 Thread rachun
Gora,
firstly I would like thank you for your quick response.

.../select?q=MacBooksort=SUM(base_score, score)+descwt=jsonindent=true

I tried that but it didn't work and I got this error message 

error:{
msg:Can't determine a Sort Order (asc or desc) in sort spec
'SUM(base_score, score) desc', pos=15,
code:400}}

Best Regards,
Chun




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145216.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OCR - Saving multi-term position

2014-07-02 Thread Erick Erickson
Problem here is that you wind up with a zillion unique terms in your
index, which may lead to performance issues, but you probably already
know that :).

I've seen situations where running it through a dictionary helps. That
is, does each term in the OCR match some dictionary? Problem here is
that it then de-values terms that don't happen to be in the
dictionary, names for instance.

But to answer your question: No, there really isn't a pre-built
analysis chain that i know of that does this. Root issue is how to
assign confidence? No clue for your specific domain.

So payloads seem quite reasonable here. Happens there's a recent
end-to-end example, see:
http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/

Best,
Erick

On Wed, Jul 2, 2014 at 7:58 AM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 I don't have first hand knowledge of how you implement that, but I bet a
 look at the WordDelimiterFilter would help you understand how to emit
 multiple terms with the same positions pretty easily.

 I've heard of this bag of word variants approach to indexing poor-quality
 OCR output before for findability reasons and I heard it works out OK.

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 w: appinions.com http://www.appinions.com/


 On Wed, Jul 2, 2014 at 10:19 AM, Manuel Le Normand 
 manuel.lenorm...@gmail.com wrote:

 Hello,
 Many of our indexed documents are scanned and OCR'ed documents.
 Unfortunately we were not able to improve much the OCR quality (less than
 80% word accuracy) for various reasons, a fact which badly hurts the
 retrieval quality.

 As we use an open-source OCR, we think of changing every scanned term
 output to it's main possible variations to get a higher level of
 confidence.

 Is there any analyser that supports this kind of need or should I make up a
 syntax and analyser of my own, i.e the payload syntax?

 The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3 fox|4

 Thanks,
 Manuel



Re: OCR - Saving multi-term position

2014-07-02 Thread Manuel Le Normand
Thanks for your answers Erick and Michael.

The term confidence level is an OCR output metric which tells for every
word what are the odds it's the actual scanned term. I wish the OCR prog to
output all the suspected words that sum up to above ~90% of confidence it
is the actual term instead of outputting a single word as default behaviour.

I'm happy to hear this approach was used before, I will implement an
analyser that indexes these terms in same position to enable positional
queries.
Hope it works on well. In case it does I will open up a Jira ticket for it.

If anyone else has had experience with this use case I'd love hearing,

Manuel


On Wed, Jul 2, 2014 at 7:28 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Problem here is that you wind up with a zillion unique terms in your
 index, which may lead to performance issues, but you probably already
 know that :).

 I've seen situations where running it through a dictionary helps. That
 is, does each term in the OCR match some dictionary? Problem here is
 that it then de-values terms that don't happen to be in the
 dictionary, names for instance.

 But to answer your question: No, there really isn't a pre-built
 analysis chain that i know of that does this. Root issue is how to
 assign confidence? No clue for your specific domain.

 So payloads seem quite reasonable here. Happens there's a recent
 end-to-end example, see:
 http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/

 Best,
 Erick

 On Wed, Jul 2, 2014 at 7:58 AM, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
  I don't have first hand knowledge of how you implement that, but I bet a
  look at the WordDelimiterFilter would help you understand how to emit
  multiple terms with the same positions pretty easily.
 
  I've heard of this bag of word variants approach to indexing
 poor-quality
  OCR output before for findability reasons and I heard it works out OK.
 
  Michael Della Bitta
 
  Applications Developer
 
  o: +1 646 532 3062
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
  
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
  w: appinions.com http://www.appinions.com/
 
 
  On Wed, Jul 2, 2014 at 10:19 AM, Manuel Le Normand 
  manuel.lenorm...@gmail.com wrote:
 
  Hello,
  Many of our indexed documents are scanned and OCR'ed documents.
  Unfortunately we were not able to improve much the OCR quality (less
 than
  80% word accuracy) for various reasons, a fact which badly hurts the
  retrieval quality.
 
  As we use an open-source OCR, we think of changing every scanned term
  output to it's main possible variations to get a higher level of
  confidence.
 
  Is there any analyser that supports this kind of need or should I make
 up a
  syntax and analyser of my own, i.e the payload syntax?
 
  The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3
 fox|4
 
  Thanks,
  Manuel
 



Re: Does Solr move documents between shards when the value of the shard key is updated ?

2014-07-02 Thread IJ
So - we do end up with two copies / versions of the same document (uniqueid)
- one in each of the two shards - Is this a BUG or a FEATURE in Solr ?

Have a follow up question - In case one were to attempt to delete the
document -lets say usng the CloudSolrServer - deleteById() API - would that
attempt to delete the document in both (or all) shards ? How would Solr
determine which shard / shards to run the delete against ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043p4145237.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Migration from Autonomy IDOL to SOLR

2014-07-02 Thread wrdrvr
I know that this is an old thread, but I wanted to pass on some additional
information in blatant self promotion. 

We've just completed an IDOL to Solr migration for our e commerce site with
approximately 40 Million items and anywhere between 200,000 to 300,000
searches per day. I am documenting some lessons learned some some product
discriminators here: 
http://engineering2success.blogspot.com/




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Migration-from-Autonomy-IDOL-to-SOLR-tp3255377p4145247.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Slow QTimes - 5 seconds for Small sized Collections

2014-07-02 Thread IJ
This issue was finally resolved. Adding an explicit Host - IP address mapping
on /etc/host file seemed to do the trick. The one strange thing is - before
the host file entry was made - we were unable to simulate the 5 second delay
from the linux shell by performing a simple nslookup host name. In any
case - the issue now stands resolved - Thanks to all.

On the other discussion item about the QTime in the SolrQueryResponse NOT
matching the QTime in the Solr.log, here is what I found:

1. If the Query from CloudSolrServer hit the right node (i.e. contains the
shard with the desired dataset), then the QTimes match

2. If the Query from CloudSolrServer hits a node (NodeX) that does NOT
contain our data - then Solr routes the request to the right node (NodeY) to
fetch the data. In such situations - QTime in logged in both nodes that the
query passes through - albeit with different values. The QTime logged on
NodeX matches what we see on SolrQueryResponse - and this time includes the
time for inter-node communication between NodeX and NodeY.

In essence this means that the QTime in SolrQueryResponse is NOT always a
representation of the query time - but could include time spent for
inter-node communication.

P.S. All of the above statements were made in context of a sharding strategy
to co-locate a single customer's document into a single shard.

Here is a short wishlist based on the experience in debugging this issue:
1. Wish SolrQueryResponse could contain a list of node names / shard-replica
names  that a request passed through for processing the query (when debug is
turned ON)
2. Wish SolrQueryResponse could provide a breakup of QTime on each of the
individual nodes / shard-replicas - instead of returning a single value of
QTime



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-tp4143681p4145251.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Customise score

2014-07-02 Thread Ahmet Arslan
Hi,


Why did you use upper case? What happens when you use : sort=sum(...



On Wednesday, July 2, 2014 6:23 PM, rachun rachun.c...@gmail.com wrote:



Gora,
firstly I would like thank you for your quick response.

.../select?q=MacBooksort=SUM(base_score, score)+descwt=jsonindent=true

I tried that but it didn't work and I got this error message 

error:{
    msg:Can't determine a Sort Order (asc or desc) in sort spec
'SUM(base_score, score) desc', pos=15,
    code:400}}

Best Regards,
Chun




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145216.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Customise score

2014-07-02 Thread Jack Krupansky
I think the white space after the comma  is the culprit. No white space is 
allowed in function queries that are embedded, such as in the sort 
parameter.


-- Jack Krupansky

-Original Message- 
From: Ahmet Arslan

Sent: Wednesday, July 2, 2014 2:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Customise score

Hi,


Why did you use upper case? What happens when you use : sort=sum(...



On Wednesday, July 2, 2014 6:23 PM, rachun rachun.c...@gmail.com wrote:



Gora,
firstly I would like thank you for your quick response.

.../select?q=MacBooksort=SUM(base_score, score)+descwt=jsonindent=true

I tried that but it didn't work and I got this error message

error:{
   msg:Can't determine a Sort Order (asc or desc) in sort spec
'SUM(base_score, score) desc', pos=15,
   code:400}}

Best Regards,
Chun




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145216.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Migration from Autonomy IDOL to SOLR

2014-07-02 Thread Jack Krupansky

Thanks for posting this.

-- Jack Krupansky

-Original Message- 
From: wrdrvr

Sent: Wednesday, July 2, 2014 1:47 PM
To: solr-user@lucene.apache.org
Subject: Re: Migration from Autonomy IDOL to SOLR

I know that this is an old thread, but I wanted to pass on some additional
information in blatant self promotion.

We've just completed an IDOL to Solr migration for our e commerce site with
approximately 40 Million items and anywhere between 200,000 to 300,000
searches per day. I am documenting some lessons learned some some product
discriminators here:
http://engineering2success.blogspot.com/




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Migration-from-Autonomy-IDOL-to-SOLR-tp3255377p4145247.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system

2014-07-02 Thread Tom Chen
Hi,


When we run Solr Map Reduce Indexer Tool (
https://github.com/markrmiller/solr-map-reduce-example), it generates
indexes on HDFS

The last stage is Go Live to merge the generated index to live SolrCloud
index.

If the live SolrCloud write index to local file system (rather than HDFS),
the Go Live gives such error like this:

2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
hdfs://
bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-0
into http://bdvs087.test.com:8983/solr
2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error sending
live merge command
java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
directory '/opt/testdir/solr/node/hdfs:/
bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-1/data/index'
does not exist
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
at java.util.concurrent.FutureTask.get(FutureTask.java:94)
at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
at
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
at
org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(AccessController.java:310)
at javax.security.auth.Subject.doAs(Subject.java:573)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
directory '/opt/testdir/solr/node/hdfs:/
bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-1/data/index'
does not exist
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:149)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
at java.lang.Thread.run(Thread.java:738)

Any way to setup SolrCloud to write index to local file system, while
allowing the Solr MapReduceIndexerTool's GoLive to merge index generated on
HDFS to the SolrCloud?

Thanks,
Tom


Re: OCR - Saving multi-term position

2014-07-02 Thread Jack Krupansky
Take a look at the synonym filter as well. I mean, basically that's exactly 
what you are doing - adding synonyms at each position.


-- Jack Krupansky

-Original Message- 
From: Manuel Le Normand

Sent: Wednesday, July 2, 2014 12:57 PM
To: solr-user@lucene.apache.org
Subject: Re: OCR - Saving multi-term position

Thanks for your answers Erick and Michael.

The term confidence level is an OCR output metric which tells for every
word what are the odds it's the actual scanned term. I wish the OCR prog to
output all the suspected words that sum up to above ~90% of confidence it
is the actual term instead of outputting a single word as default behaviour.

I'm happy to hear this approach was used before, I will implement an
analyser that indexes these terms in same position to enable positional
queries.
Hope it works on well. In case it does I will open up a Jira ticket for it.

If anyone else has had experience with this use case I'd love hearing,

Manuel


On Wed, Jul 2, 2014 at 7:28 PM, Erick Erickson erickerick...@gmail.com
wrote:


Problem here is that you wind up with a zillion unique terms in your
index, which may lead to performance issues, but you probably already
know that :).

I've seen situations where running it through a dictionary helps. That
is, does each term in the OCR match some dictionary? Problem here is
that it then de-values terms that don't happen to be in the
dictionary, names for instance.

But to answer your question: No, there really isn't a pre-built
analysis chain that i know of that does this. Root issue is how to
assign confidence? No clue for your specific domain.

So payloads seem quite reasonable here. Happens there's a recent
end-to-end example, see:
http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/

Best,
Erick

On Wed, Jul 2, 2014 at 7:58 AM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 I don't have first hand knowledge of how you implement that, but I bet a
 look at the WordDelimiterFilter would help you understand how to emit
 multiple terms with the same positions pretty easily.

 I've heard of this bag of word variants approach to indexing
poor-quality
 OCR output before for findability reasons and I heard it works out OK.

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts

 w: appinions.com http://www.appinions.com/


 On Wed, Jul 2, 2014 at 10:19 AM, Manuel Le Normand 
 manuel.lenorm...@gmail.com wrote:

 Hello,
 Many of our indexed documents are scanned and OCR'ed documents.
 Unfortunately we were not able to improve much the OCR quality (less
than
 80% word accuracy) for various reasons, a fact which badly hurts the
 retrieval quality.

 As we use an open-source OCR, we think of changing every scanned term
 output to it's main possible variations to get a higher level of
 confidence.

 Is there any analyser that supports this kind of need or should I make
up a
 syntax and analyser of my own, i.e the payload syntax?

 The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3
fox|4

 Thanks,
 Manuel






Re: Customise score

2014-07-02 Thread rachun
Hi Ahmet,
I also tried this 
.../select?q=MacBooksort=sum(base_score, score)+descwt=jsonindent=true

I got the same error 

error:{
msg:Can't determine a Sort Order (asc or desc) in sort spec
'sum(base_score, score) desc', pos=15,
code:400}}

Best regards,
Chun



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145320.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Customise score

2014-07-02 Thread rachun
Hi Jack,

I tried as you suggest 

.../select?q=MacBooksort=sum(base_score,score)+descwt=jsonindent=true 

but it didn't work and I got this error message

 error:{
msg:sort param could not be parsed as a query, and is not a field
that exists in the index: sum(base_score,score),
code:400}}

so, when I try something like this

.../select?q=MacBooksort=sum(base_score,base_score)+descwt=jsonindent=true 

it works fine.
How to archive this, any idea?

Best Regards,
Chun





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145322.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Customise score

2014-07-02 Thread Jack Krupansky
You probably don't have a field named score. That said, the Solr error 
message is not very useful at all!


If you want to reference the document score, I don't think there is a direct 
way to do it, but you can indirectly by using the query function:


.../select?q=MacBooksort=sum(base_score,query($q,0))+descwt=jsonindent=true

-- Jack Krupansky

-Original Message- 
From: rachun

Sent: Wednesday, July 2, 2014 7:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Customise score

Hi Jack,

I tried as you suggest

.../select?q=MacBooksort=sum(base_score,score)+descwt=jsonindent=true

but it didn't work and I got this error message

error:{
   msg:sort param could not be parsed as a query, and is not a field
that exists in the index: sum(base_score,score),
   code:400}}

so, when I try something like this

.../select?q=MacBooksort=sum(base_score,base_score)+descwt=jsonindent=true

it works fine.
How to archive this, any idea?

Best Regards,
Chun





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145322.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: OCR - Saving multi-term position

2014-07-02 Thread Koji Sekiguchi

Hi Manuel,

I think OCR error correction is one of well-known NLP tasks.
I'd thought it could be implemented in the past by using Lucene.

This is a brief idea:

1. You have got a Lucene index. This existing index is made from
correct (i.e. error free) documents that are same domain of OCR documents.

2. Tokenize OCR text by ShingleTokenizer. By ShingleTokenizer, you'll get:

the quiok
tlne quick
the quick
:

3. Search those phrase in the existing index. I think exact search
(PhraseQuery) or FuzzyQuery can be worked. You should get the highest hit
count when searching the quick among those phrases.

Koji
--
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

(2014/07/02 7:19), Manuel Le Normand wrote:

Hello,
Many of our indexed documents are scanned and OCR'ed documents.
Unfortunately we were not able to improve much the OCR quality (less than
80% word accuracy) for various reasons, a fact which badly hurts the
retrieval quality.

As we use an open-source OCR, we think of changing every scanned term
output to it's main possible variations to get a higher level of confidence.

Is there any analyser that supports this kind of need or should I make up a
syntax and analyser of my own, i.e the payload syntax?

The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3 fox|4

Thanks,
Manuel







Re: Does Solr move documents between shards when the value of the shard key is updated ?

2014-07-02 Thread Erick Erickson
bq: Is this a BUG or a FEATURE in Solr

How about just the way it works?

You've changed the route key with the same
unique key, taking control of the routing.

When you change that routing, how is Solr to
know where the _old_ document lived? It would
have to, say, query the entire cluster for any doc
that had the given uniqueKey and delete it,
something that'd be horribly slow.

As to your follow-up question, I'm not totally sure.
I believe the delete is sent to all shards, but why
don't you test to see?

Best,
Erick


On Wed, Jul 2, 2014 at 10:22 AM, IJ jay...@gmail.com wrote:
 So - we do end up with two copies / versions of the same document (uniqueid)
 - one in each of the two shards - Is this a BUG or a FEATURE in Solr ?

 Have a follow up question - In case one were to attempt to delete the
 document -lets say usng the CloudSolrServer - deleteById() API - would that
 attempt to delete the document in both (or all) shards ? How would Solr
 determine which shard / shards to run the delete against ?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043p4145237.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system

2014-07-02 Thread Erick Erickson
How would the MapReduceIndexerTool (MRIT for short)
find the local disk to write from HDFS to for each shard?
All it has is the information in the Solr configs, which are
usually relative paths on the local Solr machines, relative
to SOLR_HOME. Which could be different on each node
(that would be screwy, but possible).

Permissions would also be a royal pain to get right

You _can_ forego the --go-live option and copy from
the HDFS nodes to your local drive and then execute
the mergeIndexes command, see:
https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
Note that there is the MergeIndexTool, but there's also
the Core Admin command.

The sub-indexes are in a partition in HDFS and numbered
sequentially.

Best,
Erick

On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen tomchen1...@gmail.com wrote:
 Hi,


 When we run Solr Map Reduce Indexer Tool (
 https://github.com/markrmiller/solr-map-reduce-example), it generates
 indexes on HDFS

 The last stage is Go Live to merge the generated index to live SolrCloud
 index.

 If the live SolrCloud write index to local file system (rather than HDFS),
 the Go Live gives such error like this:

 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
 hdfs://
 bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-0
 into http://bdvs087.test.com:8983/solr
 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error sending
 live merge command
 java.util.concurrent.ExecutionException:
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 directory '/opt/testdir/solr/node/hdfs:/
 bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-1/data/index'
 does not exist
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
 at java.util.concurrent.FutureTask.get(FutureTask.java:94)
 at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
 at
 org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
 at
 org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at
 org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
 at java.lang.reflect.Method.invoke(Method.java:611)
 at
 org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(AccessController.java:310)
 at javax.security.auth.Subject.doAs(Subject.java:573)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by:
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 directory '/opt/testdir/solr/node/hdfs:/
 bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-1/data/index'
 does not exist
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
 at
 org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
 at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
 at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
 at java.util.concurrent.FutureTask.run(FutureTask.java:149)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
 at java.util.concurrent.FutureTask.run(FutureTask.java:149)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
 at java.lang.Thread.run(Thread.java:738)

 Any way to setup SolrCloud to write index to local file system, while
 allowing the Solr MapReduceIndexerTool's GoLive to merge index generated on
 HDFS to the SolrCloud?

 Thanks,
 Tom


Re: CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true

2014-07-02 Thread Umesh Prasad
Created the jira ..
https://issues.apache.org/jira/browse/SOLR-6222



On 30 June 2014 23:53, Joel Bernstein joels...@gmail.com wrote:

 Sure, go ahead create the ticket. I think there is more we can here as
 well. I suspect we can get the CollapsingQParserPlugin to work with
 useFilterForSortedQuery=true if scoring is not needed for the collapse.
 I'll take a closer look at this.

 Joel Bernstein
 Search Engineer at Heliosearch


 On Mon, Jun 30, 2014 at 1:43 AM, Umesh Prasad umesh.i...@gmail.com
 wrote:

  Hi Joel,
  Thanks a lot for clarification ..  An error message would indeed be a
  good thing ..   Should I open a jira item for same ?
 
 
 
  On 28 June 2014 19:08, Joel Bernstein joels...@gmail.com wrote:
 
   OK, I see the problem. When you use useFilterForSortedQuery true
   /useFilterForSortedQuery Solr builds a docSet in a way that seems to
 be
   incompatible with the CollapsingQParserPlugin. With
   useFilterForSortedQuery
   true /useFilterForSortedQuery, Solr doesn't run the main query again
  when
   collecting the DocSet. The getDocSetScore() method is expecting the
 main
   query to present, because the CollapsingQParserPlugin may need the
 scores
   generated from the main query, to select the group head.
  
   I think trying to make useFilterForSortedQuery true
   /useFilterForSortedQuery compatible with CollapsingQParsePlugin is
   probably not possible. So, a nice error message would be a good thing.
  
   Joel Bernstein
   Search Engineer at Heliosearch
  
  
   On Tue, Jun 24, 2014 at 3:31 AM, Umesh Prasad umesh.i...@gmail.com
   wrote:
  
Hi ,
Found another bug with CollapsignQParserPlugin. Not a critical
 one.
   
It throws an exception when used with
   
useFilterForSortedQuery true /useFilterForSortedQuery
   
Patch attached (against 4.8.1 but reproducible in other branches
 also)
   
   
518 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null
   
  
 
 params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s%7DdefType=edismaxbf=field%28test_ti%29}
hits=2 status=0 QTime=99
4557 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null
   
  
 
 params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s+nullPolicy%3Dexpand+min%3Dtest_tf%7DdefType=edismaxbf=field%28test_ti%29sort=}
hits=4 status=0 QTime=15
4587 T11 C0 oasc.SolrException.log ERROR
java.lang.UnsupportedOperationException: Query  does not implement
createWeight
at org.apache.lucene.search.Query.createWeight(Query.java:80)
at
   
  
 
 org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
at
   
  
 
 org.apache.solr.search.SolrIndexSearcher.getDocSetScore(SolrIndexSearcher.java:879)
at
   
  
 
 org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:902)
at
   
  
 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1381)
at
   
  
 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478)
at
   
  
 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461)
at
   
  
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
at
   
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
at
 org.apache.solr.util.TestHarness.query(TestHarness.java:295)
at
 org.apache.solr.util.TestHarness.query(TestHarness.java:278)
at
   org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:676)
at
   org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:669)
at
   
  
 
 org.apache.solr.search.TestCollapseQParserPlugin.testCollapseQueries(TestCollapseQParserPlugin.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
 Method)
at
   
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
   
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
   
  
 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at
   
  
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at
   
  
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at
   
  
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at
   
  
 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at
   
  
 
 

RE: Memory Leaks in solr 4.8.1

2014-07-02 Thread Aman Tandon
We reload at interval of 6/7 days and restart may be in 15/18 days if the
response becomes too slow
On Jul 2, 2014 7:09 PM, Markus Jelsma markus.jel...@openindex.io wrote:

 Hi, you can safely ignore this, it is shutting down anyway. Just don't
 reload the app a lot of times without actually restarting Tomcat.

 -Original message-
  From:Aman Tandon amantandon...@gmail.com
  Sent: Wednesday 2nd July 2014 7:22
  To: solr-user@lucene.apache.org
  Subject: Memory Leaks in solr 4.8.1
 
  Hi,
 
  When i am shutting down the solr i am gettng the Memory Leaks error in
 logs.
 
  Jul 02, 2014 10:49:10 AM org.apache.catalina.loader.WebappClassLoader
   checkThreadLocalMapForLeaks
   SEVERE: The web application [/solr] created a ThreadLocal with key of
 type
   [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
   [org.apache.solr.schema.DateField$ThreadLocalDateFormat@1d987b2]) and
 a
   value of type
 [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
   (value
 [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
   but failed to remove it when the web application was stopped. Threads
 are
   going to be renewed over time to try and avoid a probable memory leak.
  
 
  Please check.
  With Regards
  Aman Tandon
 



Re: How to integrate nlp in solr

2014-07-02 Thread Aman Tandon
Thanks pranab, I am unfamiliar with payloads, can you provide some info
about payload and how they are helpful in nlp
On Jul 2, 2014 7:41 PM, parnab kumar parnab.2...@gmail.com wrote:

 Aman,

   I feel focusing on  Question-Answering and Information Extraction
  components of NLP should help you achieve what  you are looking for. Go
 through this book *Taming Text * (http://www.manning.com/ingersoll/ ) .
 Most of your queries should be answered including details on implementation
 and sample source codes.



 To state naively :
   NLP tools gives you the power to extract or  interpret knowledge from
 text, which you basically store in the lucene index in form of fields or
 store along with the terms using payloads. During query processing time,
 you similarly gather additional knowledge from the query (using techniques
 like query expansion, relevance feedback, or ontologies ) and simply map
 those knowledge with the knowledge gained from the text. Its an effort to
 move to semantic retrieval rather than simple term matching.

 Thanks,
 Parnab


 On Wed, Jul 2, 2014 at 6:29 AM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi Alex,
 
  Thanks alex, one more thing i want to ask that so do we need to add the
  extra fields for those entities, e.g. Item (bags), color (blue), etc.
 
  If some how i managed to implement this nlp then i will definitely
 publish
  it on my blog :)
 
  With Regards
  Aman Tandon
 
 
  On Wed, Jul 2, 2014 at 10:34 AM, Alexandre Rafalovitch 
 arafa...@gmail.com
  
  wrote:
 
   Not from me, no. I don't have any real examples for this ready. I
   suspect the path beyond the basics is VERY dependent on your data and
   your business requirements.
  
   I would start from thinking how would YOU (as a human) do that match.
   Where does the 'blue' and 'color' and 'college' and 'bags' come from.
   Then, figuring out what is required for Solr to know to look there.
  
   NLP is not magic, just advanced technology. You need to know where you
   are going to get there.
  
   Regards,
  Alex.
   Personal website: http://www.outerthoughts.com/
   Current project: http://www.solr-start.com/ - Accelerating your Solr
   proficiency
  
  
   On Wed, Jul 2, 2014 at 11:35 AM, Aman Tandon amantandon...@gmail.com
   wrote:
Any help here
   
With Regards
Aman Tandon
   
   
On Mon, Jun 30, 2014 at 11:00 PM, Aman Tandon 
 amantandon...@gmail.com
  
wrote:
   
Hi Alex,
   
I was try to get knowledge from these tutorials
http://www.slideshare.net/teofili/natural-language-search-in-solr 
https://wiki.apache.org/solr/OpenNLP: this one is kinda bit
  explaining
but the real demo is not present.
e.g. query: I want blue color college bags, then how using nlp it
 will
work and how it will search, there is no such brief explanation out
   there,
i will be thankful to you if you can help me in this.
   
With Regards
Aman Tandon
   
   
On Mon, Jun 30, 2014 at 6:38 AM, Alexandre Rafalovitch 
   arafa...@gmail.com
 wrote:
   
On Sun, Jun 29, 2014 at 10:19 PM, Aman Tandon 
  amantandon...@gmail.com
   
wrote:
 the appropriate results
What are those specifically? You need to be a bit more precise
 about
what you are trying to achieve. Otherwise, there are too many NLP
branches and too many approaches.
   
Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your
 Solr
proficiency
   
   
   
  
 



Re: Streaming large updates with SolrJ

2014-07-02 Thread Chris Hostetter

: Now that I think about it, though, is there a way to use the Update Xml
: messages with something akin to the cloud solr server?  I only see examples
: posting to actual Solr instances, but we really need to be able to take
: advantage of the zookeepers to send our updates to the appropriate servers.

Part of your confusion may be that there are 2 different way of leveraging
the SolrServer APIs (either CloudSolrServer, or any other SolrServer 
implementation)...

 * syntactic sugar apis like SolrServer.add(...) which require
SolrInputDocuments
 * the lower level methods like SolrRequest.process(solrServer)

...with the later, you can subclass AbstractUpdateRequest and implement 
getContentStreams() to send whatever (lazy constructed) stream of bytes 
you want to Solr.

Altenatively: you could conider subclassing SolrInputField with something 
thta knows how to lazy fetch the data you want to stream across the wire, 
and then (unless i'm missing something?) you can still use the sugar APIs 
with SolrInputDocuments but only individual field values will need to 
exist in RAM at any one time (as the BinaryWriter or XmlWriter calls 
SolrInputField.getValues() on your custom class to stream over the wire)

However: if you are using SolrCloud, none of this will help you work 
arround the previuosly mentioned SOLR-6199, which affects how much RAM 
Solr needs to use on the server side when forwarding docs arround to 
replicas.



-Hoss
http://www.lucidworks.com/


schema / config file names

2014-07-02 Thread John Smodic
Is it required for the schema.xml and solrconfig.xml to have those exact 
filenames?

Can I alias schema.xml to foo.xml in some way, for example?

Thanks.

Re: schema / config file names

2014-07-02 Thread Chris Hostetter

: Is it required for the schema.xml and solrconfig.xml to have those exact 
: filenames?

It's an extremelely good idea ... but strictly speaking no...

https://cwiki.apache.org/confluence/display/solr/CoreAdminHandler+Parameters+and+Usage#CoreAdminHandlerParametersandUsage-CREATE

This smells like an XY Problem though ... please explain *why* you care 
what these file names are, and why you want them to be different?

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss
http://www.lucidworks.com/


RE: Memory Leaks in solr 4.8.1

2014-07-02 Thread Chris Hostetter

This is a long standing issue in solr, that has some suggested fixes (see 
jira comments), but no one has been seriously afected by it enough for 
anyone to invest time in trying to improve it...

https://issues.apache.org/jira/browse/SOLR-2357

In general, the fact that Solr is moving away from being a webapp, and 
towards being a stand alone java application, makes it even less likeley 
that this will ever really affect anyone.



: Date: Thu, 3 Jul 2014 07:37:03 +0530
: From: Aman Tandon amantandon...@gmail.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: RE: Memory Leaks in solr 4.8.1
: 
: We reload at interval of 6/7 days and restart may be in 15/18 days if the
: response becomes too slow
: On Jul 2, 2014 7:09 PM, Markus Jelsma markus.jel...@openindex.io wrote:
: 
:  Hi, you can safely ignore this, it is shutting down anyway. Just don't
:  reload the app a lot of times without actually restarting Tomcat.
: 
:  -Original message-
:   From:Aman Tandon amantandon...@gmail.com
:   Sent: Wednesday 2nd July 2014 7:22
:   To: solr-user@lucene.apache.org
:   Subject: Memory Leaks in solr 4.8.1
:  
:   Hi,
:  
:   When i am shutting down the solr i am gettng the Memory Leaks error in
:  logs.
:  
:   Jul 02, 2014 10:49:10 AM org.apache.catalina.loader.WebappClassLoader
:checkThreadLocalMapForLeaks
:SEVERE: The web application [/solr] created a ThreadLocal with key of
:  type
:[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
:[org.apache.solr.schema.DateField$ThreadLocalDateFormat@1d987b2]) and
:  a
:value of type
:  [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
:(value
:  [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
:but failed to remove it when the web application was stopped. Threads
:  are
:going to be renewed over time to try and avoid a probable memory leak.
:   
:  
:   Please check.
:   With Regards
:   Aman Tandon
:  
: 
: 

-Hoss
http://www.lucidworks.com/


Re: schema / config file names

2014-07-02 Thread John Smodic
That's good to know.

I don't actually want to do it. I want to see just how much of Solr's 
schema and configuration can be reliably validated. The error messages 
I've been getting back for misconfigured setups are less than ideal at 
times. But it should be easy for me to validate certain things without 
talking to Solr at all, like the existence of the schema in ZK, that it's 
a valid XML file, etc.

Is there an XSD or any kind of validation for the schema / solrconfig? 
There's an unresolved Jira issue in SOLR-1758 that seems promising but 
never got merged.

Thanks.



From:   Chris Hostetter hossman_luc...@fucit.org
To: solr-user@lucene.apache.org, 
Date:   07/02/2014 10:22 PM
Subject:Re: schema / config file names




: Is it required for the schema.xml and solrconfig.xml to have those exact 

: filenames?

It's an extremelely good idea ... but strictly speaking no...

https://cwiki.apache.org/confluence/display/solr/CoreAdminHandler+Parameters+and+Usage#CoreAdminHandlerParametersandUsage-CREATE


This smells like an XY Problem though ... please explain *why* you care 
what these file names are, and why you want them to be different?

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss
http://www.lucidworks.com/




Re: Memory Leaks in solr 4.8.1

2014-07-02 Thread Aman Tandon
Thanks chris, independent of servlet container is good.

Eagerly waiting for solr 5 :)

With Regards
Aman Tandon


On Thu, Jul 3, 2014 at 7:58 AM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 This is a long standing issue in solr, that has some suggested fixes (see
 jira comments), but no one has been seriously afected by it enough for
 anyone to invest time in trying to improve it...

 https://issues.apache.org/jira/browse/SOLR-2357

 In general, the fact that Solr is moving away from being a webapp, and
 towards being a stand alone java application, makes it even less likeley
 that this will ever really affect anyone.



 : Date: Thu, 3 Jul 2014 07:37:03 +0530
 : From: Aman Tandon amantandon...@gmail.com
 : Reply-To: solr-user@lucene.apache.org
 : To: solr-user@lucene.apache.org
 : Subject: RE: Memory Leaks in solr 4.8.1
 :
 : We reload at interval of 6/7 days and restart may be in 15/18 days if the
 : response becomes too slow
 : On Jul 2, 2014 7:09 PM, Markus Jelsma markus.jel...@openindex.io
 wrote:
 :
 :  Hi, you can safely ignore this, it is shutting down anyway. Just don't
 :  reload the app a lot of times without actually restarting Tomcat.
 : 
 :  -Original message-
 :   From:Aman Tandon amantandon...@gmail.com
 :   Sent: Wednesday 2nd July 2014 7:22
 :   To: solr-user@lucene.apache.org
 :   Subject: Memory Leaks in solr 4.8.1
 :  
 :   Hi,
 :  
 :   When i am shutting down the solr i am gettng the Memory Leaks error
 in
 :  logs.
 :  
 :   Jul 02, 2014 10:49:10 AM org.apache.catalina.loader.WebappClassLoader
 :checkThreadLocalMapForLeaks
 :SEVERE: The web application [/solr] created a ThreadLocal with key
 of
 :  type
 :[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
 :[org.apache.solr.schema.DateField$ThreadLocalDateFormat@1d987b2])
 and
 :  a
 :value of type
 :  [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
 :(value
 :  [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a
 ])
 :but failed to remove it when the web application was stopped.
 Threads
 :  are
 :going to be renewed over time to try and avoid a probable memory
 leak.
 :   
 :  
 :   Please check.
 :   With Regards
 :   Aman Tandon
 :  
 : 
 :

 -Hoss
 http://www.lucidworks.com/



Re: Slow QTimes - 5 seconds for Small sized Collections

2014-07-02 Thread Shawn Heisey
On 7/2/2014 11:55 AM, IJ wrote:
 Here is a short wishlist based on the experience in debugging this issue:
 1. Wish SolrQueryResponse could contain a list of node names / shard-replica
 names  that a request passed through for processing the query (when debug is
 turned ON)
 2. Wish SolrQueryResponse could provide a breakup of QTime on each of the
 individual nodes / shard-replicas - instead of returning a single value of
 QTime

If you have a new enough Solr version, you can include a shards.info
parameter set to true, and you will get some information from the
communication with each shard.  I set this parameter to true in my
request handler defaults.

I have seen some per-shard info in the debug as well, but I do not know
whether this is influenced by shards.info.

It looks like this parameter was added in version 4.0.  It probably has
been enhanced in later releases.  Naturally I would recommend that you
run the latest release.

https://issues.apache.org/jira/browse/SOLR-3134

Thanks,
Shawn



Re: schema / config file names

2014-07-02 Thread Tirthankar Chatterjee
Chris,
We have actually done that. Our requirement was basically have a single 
installation of Solr to assume different roles and each role had its own 
changes for optimisation done on solrconfig.xml and schema.xml

When we start a role we basically adapt to file role_solrconfig.xml and 
role_schema.xml and then fire up cores for each of these role. Is there a 
better way to solve this issue?

Thanks
Tirthankar

 On 02-Jul-2014, at 10:22 pm, Chris Hostetter hossman_luc...@fucit.org 
 wrote:
 
 
 : Is it required for the schema.xml and solrconfig.xml to have those exact 
 : filenames?
 
 It's an extremelely good idea ... but strictly speaking no...
 
 https://cwiki.apache.org/confluence/display/solr/CoreAdminHandler+Parameters+and+Usage#CoreAdminHandlerParametersandUsage-CREATE
 
 This smells like an XY Problem though ... please explain *why* you care 
 what these file names are, and why you want them to be different?
 
 https://people.apache.org/~hossman/#xyproblem
 XY Problem
 
 Your question appears to be an XY Problem ... that is: you are dealing
 with X, you are assuming Y will help you, and you are asking about Y
 without giving more details about the X so that we can understand the
 full issue.  Perhaps the best solution doesn't involve Y at all?
 See Also: http://www.perlmonks.org/index.pl?node_id=542341
 
 
 
 
 -Hoss
 http://www.lucidworks.com/



***Legal Disclaimer***
This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you.
**


Re: Customise score

2014-07-02 Thread rachun
 Hi, Jack,

Thank you very much for you solution its works!

I'm sorry that I didn't make it clear at the beginning for 'score' which i
mean document score (solr produce it at query time).

Thank you very much for all of you,
Chun.








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145359.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-02 Thread Shawn Heisey
On 7/1/2014 4:57 AM, mskeerthi wrote:
 I have to download my 5 million records from sqlserver to solr into one
 index. I am getting below exception after downloading 1 Million records. Is
 there any configuration or another to download from sqlserver to solr.
 
 Below is the exception i am getting in solr:
 org.apache.solr.common.SolrException; auto commit
 error...:java.lang.IllegalStateException: this writer hit an
 OutOfMemoryError; cannot commit

JDBC has a bad habit of defaulting to a mode where it will try to load
the entire SQL result set into RAM.  Different JDBC drivers have
different ways of dealing with this problem.  For Microsoft SQL Server,
here's a guide:

https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_MS_SQL_Server_database_with_sqljdbc_driver._DataImportHandler_is_going_out_of_memory._I_tried_adjustng_the_batchSize_values_but_they_don.27t_seem_to_make_any_difference._How_do_I_fix_this.3F

If you have trouble with that really long URL in your mail client, just
visit the main FAQ page and click on the link for SQL Server:

https://wiki.apache.org/solr/DataImportHandlerFaq

Thanks,
Shawn



Re: External File Field eating memory

2014-07-02 Thread Kamal Kishore Aggarwal
Any replies ??


On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal 
kkroyal@gmail.com wrote:

 Hi Team,

 I have recently implemented EFF in solr. There are about 1.5
 lacs(unsorted) values in the external file. After this implementation, the
 server has become slow. The solr query time has also increased.

 Can anybody confirm me if these issues are because of this implementation.
 Is that memory does EFF eats up?

 Regards
 Kamal Kishore



Re: External File Field eating memory

2014-07-02 Thread Alexandre Rafalovitch
How would we know where the problem is? It's your custom
implementation. And it's your own documents, so we don't know field
sizes/etc. And it's your own metric (ok, Indian metric, but lacs are
fairly unknown outside of India).

Seriously though, have you tried using any memory profilers and
running with/without your EFF implementation or with just dummy return
result? Java 8 has some new FlightRecorder and other tools built-in.
That would tell you where the leak/usage might be. I think this kind
of question, you need to really dig deep in yourself first.

Have you tried using EFF but a primitive one that does not load
anything from file? Is there performance impact? If not, then the
issue is most likely in your code. Maybe it does not shutdown properly
when Indexer is reloaded or similar.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Jul 3, 2014 at 12:23 PM, Kamal Kishore Aggarwal
kkroyal@gmail.com wrote:
 Any replies ??


 On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal 
 kkroyal@gmail.com wrote:

 Hi Team,

 I have recently implemented EFF in solr. There are about 1.5
 lacs(unsorted) values in the external file. After this implementation, the
 server has become slow. The solr query time has also increased.

 Can anybody confirm me if these issues are because of this implementation.
 Is that memory does EFF eats up?

 Regards
 Kamal Kishore