Data Import Handler - resource not found - Jetty - Windows 7

2014-07-25 Thread Yavar Husain
Have most of experience working on Solr with Tomcat. However I recently
started with Jetty. I am using Solr 4.7.0 on Windows 7. I have configured
solr properly and am able to see the admin UI as well as velocity browse.
Dataimporthandler screen is also getting displayed. However when I do a
full import it fails with the following error:

INFO  - 2014-07-25 12:28:35.177; org.apache.solr.core.SolrCore;
[collection1] webapp=/solr path=/dataimport
params={indent=truecommand=status_=1406271515176wt=json} status=0
QTime=0
ERROR - 2014-07-25 12:28:35.179; org.apache.solr.common.SolrException;
java.io.IOException: Can't find resource
'C:/solr-4.7.0/example/solr/collection1/conf' in classpath or
'C:\solr-4.7.0\example\solr\collection1\conf'
at
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:342)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:134)

Few Notes:
My solrconfig.xml has dataimport configured and i have used:

  lib dir=C:/solr-4.7.0/example/solr/collection1/lib
regex=solr-dataimporthandler-4.7.0.jar /
  lib dir=C:/solr-4.7.0/example/solr/collection1/lib
regex=solr-dataimporthandler-extras-4.7.0.jar /
  lib dir=C:/solr-4.7.0/example/solr/collection1/lib
regex=mysql-connector-java-5.1.18-bin.jar /

Also my jars are present on those paths.

On my core admin UI I can see correct datadir which is
C:\solr-4.7.0\example\solr\collection1\data\

Any help would be appreciated.

Thanks,
Yavar

   -


Re: Shuffle results a little

2014-07-25 Thread babenis
from what i gather about reranking query is that it would further fine-pick
results, rather than dispurse similarities, or am i looking at it the wrong
way?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shuffle-results-a-little-tp1891206p4149169.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any Solr consultants available??

2014-07-25 Thread Charlie Hull

On 24/07/2014 01:54, Alexandre Rafalovitch wrote:

On Thu, Jul 24, 2014 at 2:44 AM, Jack Krupansky j...@basetechnology.com wrote:

All the great Solr guys I know are quite busy.


Sounds like an opportunity for somebody to put together a training
hacker camp, similar to https://hackerbeach.org/ . Cross-train
consultants in Solr, immediately increase their value.  Do it
somewhere on the beach or in the mountains, etc. If somebody organizes
it, I would probably even be interested to teaching the first (newbie)
part.

And the graduation project would a be a solr-consutants.com website to
make it easier to find those same consultants later. :-)

Regards,
Alex.
P.s. Last issue of my newsletter had Solr big ideas. The one above
was not in it, but it is - I believe - also viable. Contact me if it
catches your fancy for more detailed brainstorming and notes sharing.


We're definitely interested in the idea of 'growing' more Solr 
consultants, and eventually committers. Beaches and mountains are good 
too :) I think the skill shortage is a huge problem for the open source 
search world.


Charlie


Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: integrating Accumulo with solr

2014-07-25 Thread Ali Nazemian
Dear Jack,
Actually I am going to do benefit-cost analysis for in-house developement
or going for sqrrl support.
Best regards.


On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky j...@basetechnology.com
wrote:

 Like I said, you're going to have to be a real, hard-core gunslinger to do
 that well. Sqrrl uses Lucene directly, BTW:

 Full-Text Search: Utilizing open-source Lucene and custom indexing
 methods, Sqrrl Enterprise users can conduct real-time, full-text search
 across data in Sqrrl Enterprise.

 See:
 http://sqrrl.com/product/search/

 Out of curiosity, why are you not using that integrated Lucene support of
 Sqrrl Enterprise?


 -- Jack Krupansky

 -Original Message- From: Ali Nazemian
 Sent: Thursday, July 24, 2014 3:07 PM

 To: solr-user@lucene.apache.org
 Subject: Re: integrating Accumulo with solr

 Dear Jack,
 Thank you. I am aware of datastax but I am looking for integrating accumulo
 with solr. This is something like what sqrrl guys offer.
 Regards.


 On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  If you are not a true hard-core gunslinger who is willing to dive in and
 integrate the code yourself, instead you should give serious consideration
 to a product such as DataStax Enterprise that fully integrates and
 packages
 a NoSQL database (Cassandra) and Solr for search. The security aspects are
 still a work in progress, but certainly headed in the right direction. And
 it has Hadoop and Spark integration as well.

 See:
 http://www.datastax.com/what-we-offer/products-services/
 datastax-enterprise

 -- Jack Krupansky

 -Original Message- From: Ali Nazemian
 Sent: Thursday, July 24, 2014 10:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: integrating Accumulo with solr


 Thank you very much. Nice Idea but how can Solr and Accumulo can be
 synchronized in this way?
 I know that Solr can be integrated with HDFS and also Accumulo works on
 the
 top of HDFS. So can I use HDFS as integration point? I mean set Solr to
 use
 HDFS as a source of documents as well as the destination of documents.
 Regards.


 On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock jgres...@gmail.com wrote:

  Ali,


 Sounds like a good choice.  It's pretty standard to store the primary
 storage id as a field in Solr so that you can search the full text in
 Solr
 and then retrieve the full document elsewhere.

 I would recommend creating a document structure in Solr with whatever
 fields you want indexed (most likely as text_en, etc.), and then store a
 string field named content_id, which would be the Accumulo row id
 that
 you look up with a scan.

 One caveat -- Accumulo will be protected at the cell level, but if you
 need
 your Solr search results to be protected by complex authorization strings
 similar to Accumulo, you will need to write your own QParserPlugin and
 use
 post filtering:
 http://java.dzone.com/articles/custom-security-filtering-solr

 The code you see in that article is written for an earlier version of
 Solr,
 but it's not too difficult to adjust it for the latest (we've done so in
 our project).  Once you've implemented this, you would store an
 authorizations string field in each Solr document, and pass in the
 authorizations that the user has access to in the fq parameter of every
 query.  It's also not too bad to write something that parses the Accumulo
 authorizations string (like AB(C|D|E|F)) and interpret it accordingly
 in
 the QParserPlugin.

 This will give you true row level security in Solr and Accumulo, and it
 performs quite well in Solr.

 Let me know if you have any other questions.

 Joe


 On Thu, Jul 24, 2014 at 4:07 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

  Dear Joe,
  Hi,
  I am going to store the crawl web pages in accumulo as the main storage
  part of my project and I need to give these data to solr for indexing 
 and
  user searches. I need to do some social and web analysis on my data as
 well
  as having some security features. Therefore accumulo is my choice for 
 the
  database part and for index and search I am going to use Solr. Would 
 you
  please guide me through that?
 
 
 
  On Thu, Jul 24, 2014 at 1:28 AM, Joe Gresock jgres...@gmail.com
 wrote:
 
   We store data in both Solr and Accumulo -- do you have more details
 about
   what kind of data and indexing you want?  Is there a reason you're
  thinking
   of using both databases in particular?
  
  
   On Wed, Jul 23, 2014 at 5:17 AM, Ali Nazemian alinazem...@gmail.com
 
   wrote:
  
Dear All,
Hi,
I was wondering is there anybody out there that tried to integrate
 Solr
with Accumulo? I was thinking about using Accumulo on top of HDFS 
   and
   using
Solr to index data inside Accumulo? Do you have any idea how can I
do
   such
integration?
   
Best regards.
   
--
A.Nazemian
   
  
  
  
   --
   I know what it is to be in need, and I know what it is to have  
 plenty.
  I
   have learned the secret of being 

Re: Any Solr consultants available??

2014-07-25 Thread Alexandre Rafalovitch
Well, if we do it in England, we could hire out a castle, I bet. :-) I
am flexible on my holiday locations. And probably easier to do the
first one in English.

We can continue this on direct email, on the LinkedIn group (perfect
place probably) and/or on the margins of the Solr Revolution. Target
next spring/summer for the week-long event, work backwards from there.
Talk to http://www.techstars.com/program/locations/london/ to
specifically target the startups, etc

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Fri, Jul 25, 2014 at 3:17 PM, Charlie Hull char...@flax.co.uk wrote:
 On 24/07/2014 01:54, Alexandre Rafalovitch wrote:

 On Thu, Jul 24, 2014 at 2:44 AM, Jack Krupansky j...@basetechnology.com
 wrote:
 All the great Solr guys I know are quite busy.

 Sounds like an opportunity for somebody to put together a training
 hacker camp, similar to https://hackerbeach.org/ . Cross-train
 consultants in Solr, immediately increase their value.

 We're definitely interested in the idea of 'growing' more Solr consultants,
 and eventually committers. Beaches and mountains are good too :) I think the
 skill shortage is a huge problem for the open source search world.

 Charlie


Re: spatial search: find result in bbox OR first result outside bbox

2014-07-25 Thread elisabeth benoit
Thanks a lot for your answer David!

I'll check that out.

Elisabeth


2014-07-24 20:28 GMT+02:00 david.w.smi...@gmail.com 
david.w.smi...@gmail.com:

 Hi Elisabeth,

 Sorry for not responding sooner; I forgot.

 You’re in need of some spatial nearest-neighbor code I wrote but it isn’t
 open-sourced yet.  It works on the RPT grid.

 Any way, you should consider doing this in two searches: the first query
 tries the bbox provided, and if that returns nothing then issue a second
 for the closest within the a 1000km distance.  The first query is
 straight-forward as documented.  The second would be close to what you gave
 in your example but sort by distance and return rows=1.  It will *not*
 compute the distance to every document, just those within the 1000km radius
 plus some grid internal grid squares *if* you use spatial RPT
 (“location_rpt” in the example schema).  But use LatLonType for optimal
 sorting performance, not RPT.

 With respect to doing this in one search vs two, that would involve writing
 a custom request handler.  I have a patch to make this easier:
 https://issues.apache.org/jira/browse/SOLR-5005.  If in your case there
 are
 absolutely no other filters and it’s not a distributed search (no
 sharding), then you could approach this with a custom query parser that
 generates and executes one query to know if it should return that query or
 return the fallback.

 Please let me know how this goes.

 ~ David Smiley
 Freelance Apache Lucene/Solr Search Consultant/Developer
 http://www.linkedin.com/in/davidwsmiley


 On Tue, Jul 22, 2014 at 3:12 AM, elisabeth benoit 
 elisaelisael...@gmail.com
  wrote:

  Hello,
 
  I am using solr 4.2.1. I have the following use case.
 
  I should find results inside bbox OR if there is none, first result
 outside
  bbox within a 1000 km distance. I was wondering what is the best way to
  proceed.
 
  I was considering doing a geofilt search from the center of my bounding
 box
  and post filtering results.
 
  fq={!geofilt sfield=store}pt=45.15,-93.85d=1000
 
  From a performance point of view I don't think it's a good solution
 though,
  since solr will have to calculate every document distance, then sort.
 
  I was wondering if there was another way to do this and avoid sending
 more
  than one request to solr.
 
  Thanks,
  Elisabeth
 



Facing issue while implementing connection pooling with solr

2014-07-25 Thread vicky desai
 0 down vote favorite


I have this requirement where I want to limit the number of concurrent calls
to solr say 50. So I am trying to implement connection pooling in HTTP
client which is then used in solr object HttpSolrServer. Please find the
code below

HttpClient httpclient = new DefaultHttpClient();

httpclient.getParams().setParameter(
HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 50);
httpclient.getParams().setParameter(
HttpClientUtil.PROP_MAX_CONNECTIONS, 50);


HttpSolrServer httpSolrServer = new HttpSolrServer(
solr url,httpclient);

SolrQuery solrQuery = new SolrQuery(*:*);
for (int i = 0; i  1; i++) {

long numFound = httpSolrServer.query(solrQuery).getResults()
.getNumFound();
System.out.println(numFound);
}`

I was expecting only 50 connections to be created from my application to
solr and then probably experience some slowness until the older connections
are freed. However at every regular interval a new connection is created
despite there are waiting connections at solr end and those connections are
never used again.

Example Output

tcp 0 0192.168.0.241:22192.168.0.109:54120   ESTABLISHED 
tcp 0 0 :::192.168.0.241:8190 :::192.168.0.109:47382TIME_WAIT 
tcp 0 0 :::192.168.0.241:8190 :::192.168.0.109:47383ESTABLISHED 
tcp 0 0 :::192.168.0.241:8190 :::192.168.0.109:47371TIME_WAIT 
tcp 0 0 :::192.168.0.241:8190 :::192.168.0.109:47381TIME_WAIT

where 109 is the ip where I am running my application and 241 is ip where
solr is run. In this case :192.168.0.109:47382 will never be used again
and it is finally terminated by solr

Am i going wrong somewhere. Any help will be highly appreciated




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facing-issue-while-implementing-connection-pooling-with-solr-tp4149176.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shuffle results a little

2014-07-25 Thread Joel Bernstein
Query ReRanking is built on the RankQuery API. With the RankQuery API you
can build and plugin your own ranking algorithms.

Here's a blog describing the RankQuery API:

http://heliosearch.org/solrs-new-rankquery-feature/

Joel Bernstein
Search Engineer at Heliosearch


On Fri, Jul 25, 2014 at 4:11 AM, babenis babe...@gmail.com wrote:

 from what i gather about reranking query is that it would further fine-pick
 results, rather than dispurse similarities, or am i looking at it the wrong
 way?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Shuffle-results-a-little-tp1891206p4149169.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Any Solr consultants available??

2014-07-25 Thread Jack Krupansky

Any or all of the above, and more.

OTOH, how many people are there out there who want to become Solr 
consultants, but aren't already either doing it or at least already in the 
process of coming up to speed or maybe just not cut out for it?


But then there are the kids in school. Maybe we need to get more professors 
interested in Solr (or do they prefer Elasticsearch?!) and assigning 
projects? And maybe the problem is that a lot of the need is in departments 
outside of CS, but Solr (the people with actual data needs) is just too... 
difficult... for a lot of non-CS students to casually pick up.


I sense the difficulty is that Solr is too much of a complex toolkit 
rather than a packaged product. For example, the recent inquiry related to 
queries for compound and split terms - it's not automatic and OOB for Solr, 
and without an obvious and simple solution. Lots of things are like that in 
Solr.


-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Friday, July 25, 2014 4:52 AM
To: solr-user
Subject: Re: Any Solr consultants available??

Well, if we do it in England, we could hire out a castle, I bet. :-) I
am flexible on my holiday locations. And probably easier to do the
first one in English.

We can continue this on direct email, on the LinkedIn group (perfect
place probably) and/or on the margins of the Solr Revolution. Target
next spring/summer for the week-long event, work backwards from there.
Talk to http://www.techstars.com/program/locations/london/ to
specifically target the startups, etc

Regards,
  Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Fri, Jul 25, 2014 at 3:17 PM, Charlie Hull char...@flax.co.uk wrote:

On 24/07/2014 01:54, Alexandre Rafalovitch wrote:


On Thu, Jul 24, 2014 at 2:44 AM, Jack Krupansky j...@basetechnology.com
wrote:

All the great Solr guys I know are quite busy.


Sounds like an opportunity for somebody to put together a training
hacker camp, similar to https://hackerbeach.org/ . Cross-train
consultants in Solr, immediately increase their value.


We're definitely interested in the idea of 'growing' more Solr 
consultants,
and eventually committers. Beaches and mountains are good too :) I think 
the

skill shortage is a huge problem for the open source search world.

Charlie 




Re: Java heap space error

2014-07-25 Thread Shawn Heisey
On 7/24/2014 7:53 AM, Ameya Aware wrote:
 I did not make any other change than this.. rest of the settings are
 default.
 
 Do i need to set garbage collection strategy?

The collector chosen and its and tuning params can have a massive impact
on performance, but it will make no difference at all if you are getting
OutOfMemoryError exceptions.  This means the program is trying to
allocate more memory than it has been told it can allocate.  Changing
the garbage collector will not change Java's response when the program
wants to allocate too much memory.

The odd location of the commas in the start of this thread make it hard
to understand exactly what numbers you were trying to say, but I think
you were saying that you were trying to index 20 documents and it
died after indexing 15000.

How big was the solr index before you started indexing, both in number
of documents and disk space consumed?  How are you doing the indexing?
Is it being done with requests to the /update handler, or are you using
the dataimport handler to import from somewhere, like a database?

Is it a single index, or distributed?  Are you running in normal mode
or SolrCloud?  Can you share your solrconfig.xml file so we can look for
possible problems?

I already gave you a wiki URL that gives possible reasons for needing a
very large heap, and some things you can do to reduce the requirements.

Thanks,
Shawn



Re: To warm the whole cache of Solr other than the only autowarmcount

2014-07-25 Thread Shawn Heisey
On 7/24/2014 8:45 PM, YouPeng Yang wrote:
 To Matt
 
   Thank you,your opinion is very valuable ,So I have checked the source
 codes about how the cache warming  up. It seems to just put items of the
 old caches into the new caches.
   I will pull Mark Miller into this discussion.He is the one of the
 developer of the Solr whom  I had  contacted with.
 
  To Mark Miller
 
Would you please check out what we are discussing in the last two
 posts.I need your help.

Matt is completely right.  Any commit can drastically change the Lucene
document id numbers.  It would be too expensive to determine which
numbers haven't changed.  That means Solr must throw away all cache
information on commit.

Two of Solr's caches support autowarming.  Those caches use queries as
keys and results as values.  Autowarming works by re-executing the top N
queries (keys) in the old cache to obtain fresh Lucene document id
numbers (values).  The cache code does take *keys* from the old cache
for the new cache, but not *values*.  I'm very sure about this, as I
wrote the current (and not terribly good) LFUCache.

Thanks,
Shawn



Re: Data Import Handler - resource not found - Jetty - Windows 7

2014-07-25 Thread Shawn Heisey
On 7/25/2014 1:06 AM, Yavar Husain wrote:
 Have most of experience working on Solr with Tomcat. However I recently
 started with Jetty. I am using Solr 4.7.0 on Windows 7. I have configured
 solr properly and am able to see the admin UI as well as velocity browse.
 Dataimporthandler screen is also getting displayed. However when I do a
 full import it fails with the following error:
 
 INFO  - 2014-07-25 12:28:35.177; org.apache.solr.core.SolrCore;
 [collection1] webapp=/solr path=/dataimport
 params={indent=truecommand=status_=1406271515176wt=json} status=0
 QTime=0
 ERROR - 2014-07-25 12:28:35.179; org.apache.solr.common.SolrException;
 java.io.IOException: Can't find resource
 'C:/solr-4.7.0/example/solr/collection1/conf' in classpath or
 'C:\solr-4.7.0\example\solr\collection1\conf'
 at
 org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:342)
 at
 org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:134)

In 4.7.0, line 134 of DataImportHandler.java is concerned with locating
the config file for the dataimport handler.  In the following excerpt
from a solrconfig.xml file included with Solr, the config file is
db-data-config.xml.  What do you have for this in your solrconfig.xml?

   requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdb-data-config.xml/str
/lst
  /requestHandler

Thanks,
Shawn



Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-25 Thread O. Olson
Thank you Uwe. Unfortunately, I could not get your explain solr website to
work. I always get an error saying Ops. We have internal server error. This
event was logged. We will try fix this soon. We are sorry for
inconvenience.

At this point, I know that I need to have some technical background to
understanding how these numbers are calculated. However even with that, I am
sure that the format of this output is not obvious. I am curious about the
documentation of this output format. It seems to be unintelligible. 

If this is not documented anywhere, can someone point me to which class is
doing this output.

Thank you,
O. O.


an6 wrote
 Hi,
 
 to get an idea of the meaning of all this numbers, have a look on 
 http://explain.solr.pl. I like this tool, it's great.
 
 Uwe





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137p4149217.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-25 Thread Erik Hatcher
The format of the XML explain output is not indented or very readable.  When I 
really need to see the explain indented, I use wt=rubyindent=true (I don’t 
think the indent parameter is relevant for the explain output, but I use it 
anyway)

Erik

On Jul 25, 2014, at 10:11 AM, O. Olson olson_...@yahoo.it wrote:

 Thank you Uwe. Unfortunately, I could not get your explain solr website to
 work. I always get an error saying Ops. We have internal server error. This
 event was logged. We will try fix this soon. We are sorry for
 inconvenience.
 
 At this point, I know that I need to have some technical background to
 understanding how these numbers are calculated. However even with that, I am
 sure that the format of this output is not obvious. I am curious about the
 documentation of this output format. It seems to be unintelligible. 
 
 If this is not documented anywhere, can someone point me to which class is
 doing this output.
 
 Thank you,
 O. O.
 
 
 an6 wrote
 Hi,
 
 to get an idea of the meaning of all this numbers, have a look on 
 http://explain.solr.pl. I like this tool, it's great.
 
 Uwe
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137p4149217.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Java heap space error

2014-07-25 Thread Steve Rowe
On Jul 25, 2014, at 9:13 AM, Shawn Heisey s...@elyograg.org wrote:

 On 7/24/2014 7:53 AM, Ameya Aware wrote:
 The odd location of the commas in the start of this thread make it hard
 to understand exactly what numbers you were trying to say


On Jul 24, 2014, at 9:32 AM, Ameya Aware ameya.aw...@gmail.com wrote:

 I am in process of indexing around 2,00,000 documents.



1 Lakh (aka Lac) = 10^5 is written as 1,00,000 

It’s used in Bangladesh, India, Myanmar, Nepal, Pakistan, and Sri Lanka, 
roughly 1/4 of the world’s population.

http://en.wikipedia.org/wiki/Lakh



Re: Slow inserts when using Solr Cloud

2014-07-25 Thread ian
I've built and installed the latest snapshot of Solr 4.10 using the same
SolrCloud configuration and that gave me a tenfold increase in throughput,
so it certainly looks like SOLR-6136 was the issue that was causing my slow
insert rate/high latency with shard routing and replicas.  Thanks for your
help.


Timothy Potter wrote
 Hi Ian,
 
 What's the CPU doing on the leader? Have you tried attaching a
 profiler to the leader while running and then seeing if there are any
 hotspots showing. Not sure if this is related but we recently fixed an
 issue in the area of leader forwarding to replica that used too many
 CPU cycles inefficiently - see SOLR-6136.
 
 Tim





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slow-inserts-when-using-Solr-Cloud-tp4146087p4149219.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Full Import frozen after indexing a fixed number of records

2014-07-25 Thread Aniket Bhoi
I have Apache Solr,hosted on my apache Tomcat Server with SQLServer Backend.


Details:

*Solr Version:*
Solr Specification Version: 3.4.0.2012.01.23.14.08.01
Solr Implementation Version: 3.4
Lucene Specification Version: 3.4
Lucene Implementation Version: 3.4

*Tomcat version:*
Apache Tomcat/6.0.18

*OS details:*
SUSE Linux Enterprise Server 11 (x86_64)

After I run a full import,Indexing proceeds sucessfully,but seems to freeze
everytime after fetching fixed number of records.What I mean is after it
fetches 10730 records it just freezes and doesnt process any more.

Excerpt from dataimport.xml:

lst name=statusMessages
str name=Time Elapsed0:15:31.959/str
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched*10730*/str
str name=Total Documents Processed3579/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2014-07-25 10:44:39/str

This seems to happen everytime.

I checked the tomcatlog.Following is the excerpt when Solr freezes:

INFO:  Generating record for Unique ID :null attachment Ref:null
parent ref :nullexecuted by thread:25
Jul 25, 2014 10:53:31 AM
org.apache.solr.update.processor.LogUpdateProcessor processAdd
FINE: add AH_12345
Jul 25, 2014 10:53:31 AM
org.apache.solr.handler.dataimport.DocBuilder$EntityRunner runAThread
INFO:  Generating record for Unique ID :null attachment Ref:null
parent ref :nullexecuted by thread:26
Jul 25, 2014 10:53:31 AM
org.apache.solr.update.processor.LogUpdateProcessor processAdd
FINE: add AH_23451
Jul 25, 2014 10:53:34 AM org.apache.solr.core.SolrCore execute
INFO: [calls] webapp=/solr path=/dataimport params={} status=0 QTime=0
Jul 25, 2014 10:53:36 AM org.apache.solr.core.SolrCore execute
INFO: [calls] webapp=/solr path=/dataimport params={} status=0 QTime=0
Jul 25, 2014 10:53:38 AM org.apache.solr.core.SolrCore execute
INFO: [calls] webapp=/solr path=/dataimport params={} status=0 QTime=0

Help appreciated.

Regards,

Aniket


Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-25 Thread O. Olson
Thank you very much Erik. This is exactly what I was looking for. While at
the moment I have no clue about these numbers, they ruby formatting makes it
much more easier to understand.

Thanks to you Koji. I'm sorry I did not acknowledge you before. I think
Erik's solution is what I was looking for.
O. O.



Erik Hatcher-4 wrote
 The format of the XML explain output is not indented or very readable. 
 When I really need to see the explain indented, I use wt=rubyindent=true
 (I don’t think the indent parameter is relevant for the explain output,
 but I use it anyway)
 
   Erik





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137p4149226.html
Sent from the Solr - User mailing list archive at Nabble.com.


java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2014-07-25 Thread Ameya Aware
Hi,

I am in process of indexing lot of documents but after around 9
documents i am getting below error:

java.lang.OutOfMemoryError: Requested array size exceeds VM limit

I am passing below parameters with Solr :

java -Xms6144m -Xmx6144m -XX:MaxPermSize=512m
-Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode -XX:+CMSParallelRemarkEnabled
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70 -XX:ConcGCThreads=6
-XX:ParallelGCThreads=6 -jar start.jar


Also, i am Auto-committing after 2 documents.


I searched on google for this but could not get any specific answer.


Can anybody help with this?


Thanks,
Ameya


Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2014-07-25 Thread Greg Walters
Would you include the entire stack trace for your OOM message? Are you seeing 
this on the client or server side?

Thanks,
Greg

On Jul 25, 2014, at 10:21 AM, Ameya Aware ameya.aw...@gmail.com wrote:

 Hi,
 
 I am in process of indexing lot of documents but after around 9
 documents i am getting below error:
 
 java.lang.OutOfMemoryError: Requested array size exceeds VM limit
 
 I am passing below parameters with Solr :
 
 java -Xms6144m -Xmx6144m -XX:MaxPermSize=512m
 -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseCompressedOops
 -XX:+UseConcMarkSweepGC
 -XX:+CMSIncrementalMode -XX:+CMSParallelRemarkEnabled
 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:CMSInitiatingOccupancyFraction=70 -XX:ConcGCThreads=6
 -XX:ParallelGCThreads=6 -jar start.jar
 
 
 Also, i am Auto-committing after 2 documents.
 
 
 I searched on google for this but could not get any specific answer.
 
 
 Can anybody help with this?
 
 
 Thanks,
 Ameya



RE: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2014-07-25 Thread Matt Kuiper (Springblox)
You might consider looking at your internal Solr cache configuration 
(solrconfig.xml).  These caches occupy heap space, and from my understanding do 
not overflow to disk.  So if there is not enough heap memory to support the 
caches an OOM error will be thrown.

I also believe these caches live in Old Gen.  So you might consider decreasing 
your CMSInitiatingOccupancyFraction to trigger a GC sooner.

Based on your description below every 20,000 documents your caches will be 
invalidated and rebuilt as part of a commit.  So a GC that occurs sooner may 
help free the memory of the old caches.  

Matt

-Original Message-
From: Ameya Aware [mailto:ameya.aw...@gmail.com] 
Sent: Friday, July 25, 2014 9:22 AM
To: solr-user@lucene.apache.org
Subject: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

Hi,

I am in process of indexing lot of documents but after around 9 documents i 
am getting below error:

java.lang.OutOfMemoryError: Requested array size exceeds VM limit

I am passing below parameters with Solr :

java -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -Dcom.sun.management.jmxremote 
-XX:+UseParNewGC -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC 
-XX:+CMSIncrementalMode -XX:+CMSParallelRemarkEnabled 
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70 -XX:ConcGCThreads=6
-XX:ParallelGCThreads=6 -jar start.jar


Also, i am Auto-committing after 2 documents.


I searched on google for this but could not get any specific answer.


Can anybody help with this?


Thanks,
Ameya


Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2014-07-25 Thread Ameya Aware
Please find below entire stack trace:


ERROR - 2014-07-25 13:14:22.202; org.apache.solr.common.SolrException;
null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Requested
array size exceeds VM limit
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:303)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:278)
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:88)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at

Solr MoreLikeThis returns no match while the source document is in Solr

2014-07-25 Thread Donglin Chen
Hi,

I issued MoreLikeThis query using a uniquekey of a source document, and I
got no match as below (but I can select this document fine in Solr).

?xml version=1.0 encoding=UTF-8?
response
  lst name=responseHeader
int name=status0/int
int name=QTime0/int
  /lst
  result name=match numFound=0 start=0 maxScore=0.0/
  null name=response/
  lst name=interestingTerms/
/response

The query is like this:
http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20320

However, using select in stead of MLT, this document did return
http://localhost:8080/solr/dbcollection_1/select?q=uniquekey:20320

when I tried another uniquekey with almost the same document content, Solr
returned match and similar jobs.
 http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20321

When I tried another MLT query where there is empty value on the matching
field, no similar jobs returned as expected, but nonetheless the match
document is returned as expected.
What could cause MLT query return result name=match numFound=0..
whereas we can select this document fine?

Thanks!
Daniel


Re: Solr MoreLikeThis returns no match while the source document is in Solr

2014-07-25 Thread Anshum Gupta
Hi,

These might help you:

https://issues.apache.org/jira/browse/SOLR-4414
https://issues.apache.org/jira/browse/SOLR-5480

and

https://issues.apache.org/jira/browse/SOLR-6248.


On Fri, Jul 25, 2014 at 11:58 AM, Donglin Chen
daniel.chen@gmail.com wrote:
 Hi,

 I issued MoreLikeThis query using a uniquekey of a source document, and I
 got no match as below (but I can select this document fine in Solr).

 ?xml version=1.0 encoding=UTF-8?
 response
   lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
   /lst
   result name=match numFound=0 start=0 maxScore=0.0/
   null name=response/
   lst name=interestingTerms/
 /response

 The query is like this:
 http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20320

 However, using select in stead of MLT, this document did return
 http://localhost:8080/solr/dbcollection_1/select?q=uniquekey:20320

 when I tried another uniquekey with almost the same document content, Solr
 returned match and similar jobs.
  http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20321

 When I tried another MLT query where there is empty value on the matching
 field, no similar jobs returned as expected, but nonetheless the match
 document is returned as expected.
 What could cause MLT query return result name=match numFound=0..
 whereas we can select this document fine?

 Thanks!
 Daniel



-- 

Anshum Gupta
http://www.anshumgupta.net


Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2014-07-25 Thread Greg Walters
Using Tika to extract documents or content is something I don't have experience 
with but it looks like your issue is in that process. If you're able to 
reproduce this issue near the same place every time maybe you've got a document 
that has a lot of nested fields in it or otherwise causes the extractor/update 
processor to do something weird.

Thanks,
Greg

On Jul 25, 2014, at 12:32 PM, Ameya Aware ameya.aw...@gmail.com wrote:

 Please find below entire stack trace:
 
 
 ERROR - 2014-07-25 13:14:22.202; org.apache.solr.common.SolrException;
 null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Requested
 array size exceeds VM limit
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:790)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:439)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:636)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
 at java.util.Arrays.copyOf(Unknown Source)
 at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
 at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
 at java.lang.AbstractStringBuilder.append(Unknown Source)
 at java.lang.StringBuilder.append(Unknown Source)
 at
 org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:303)
 at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at
 org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
 at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at
 org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
 at
 org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
 at
 org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
 at
 org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
 at
 org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:278)
 at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:88)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 

Re: Solr MoreLikeThis returns no match while the source document is in Solr

2014-07-25 Thread Daniel Chen
Thank you Anshum!

The links helps.

Daniel


On Fri, Jul 25, 2014 at 3:07 PM, Anshum Gupta ans...@anshumgupta.net
wrote:

 Hi,

 These might help you:

 https://issues.apache.org/jira/browse/SOLR-4414
 https://issues.apache.org/jira/browse/SOLR-5480

 and

 https://issues.apache.org/jira/browse/SOLR-6248.


 On Fri, Jul 25, 2014 at 11:58 AM, Donglin Chen
 daniel.chen@gmail.com wrote:
  Hi,
 
  I issued MoreLikeThis query using a uniquekey of a source document, and I
  got no match as below (but I can select this document fine in Solr).
 
  ?xml version=1.0 encoding=UTF-8?
  response
lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
/lst
result name=match numFound=0 start=0 maxScore=0.0/
null name=response/
lst name=interestingTerms/
  /response
 
  The query is like this:
  http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20320
 
  However, using select in stead of MLT, this document did return
  http://localhost:8080/solr/dbcollection_1/select?q=uniquekey:20320
 
  when I tried another uniquekey with almost the same document content,
 Solr
  returned match and similar jobs.
   http://localhost:8080/solr/dbcollection_1/mlt?q=uniquekey:20321
 
  When I tried another MLT query where there is empty value on the matching
  field, no similar jobs returned as expected, but nonetheless the match
  document is returned as expected.
  What could cause MLT query return result name=match numFound=0..
  whereas we can select this document fine?
 
  Thanks!
  Daniel



 --

 Anshum Gupta
 http://www.anshumgupta.net



Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-25 Thread Jack Krupansky
The formatting is one thing, but ultimately it is just a giant expression, 
one for each document. The expression is computing the score, based on your 
chosen or default similarity algorithm. All the terms in the expressions 
are detailed here:


http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Unless you dive into that math (not so bad, really, if you are motivated), 
the expressions are going to be rather opaque to you.


The long floating point numbers are mostly just the intermediate (and final) 
calculations of the math described above.


Try constructing a very simple collection of simple, contrived documents, 
like a short sentence in each, with some common terms, and then try simply 
queries to see how the expression term values change. Try computing TF, DF, 
IDF yourself (just count the terms by hand), and compare to what debug gives 
you.


-- Jack Krupansky

-Original Message- 
From: O. Olson

Sent: Thursday, July 24, 2014 6:45 PM
To: solr-user@lucene.apache.org
Subject: Understanding the Debug explanations for Query Result 
Scoring/Ranking


Hi,

If you add /*debug=true*/ to the Solr request /(and wt=xml if your
current output is not XML)/, you would get a node in the resulting XML that
is named debug. There is a child node to this called explain to this
which has a list showing why the results are ranked in a particular order.
I'm curious if there is some documentation on understanding these
numbers/results.

I am new to Solr, so I apologize that I may be using the wrong terms to
describe my problem. I also aware of
http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
though I have not completely understood it.

My problem is trying to understand something like this:

1.5797625 = (MATCH) sum of: 0.4717142 = (MATCH) weight(text:televis in
44109) [DefaultSimilarity], result of: 0.4717142 = score(doc=44109,freq=1.0
= termFreq=1.0 ), product of: 0.71447384 = queryWeight, product of:
7.0424104 = idf(docFreq=896, maxDocs=377553) 0.10145303 = queryNorm 0.660226
= fieldWeight in 44109, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
termFreq=1.0 7.0424104 = idf(docFreq=896, maxDocs=377553) 0.09375 =
fieldNorm(doc=44109) 1.1080483 = (MATCH) weight(text:tv in 44109)
[DefaultSimilarity], result of: 1.1080483 = score(doc=44109,freq=6.0 =
termFreq=6.0 ), product of: 0.6996622 = queryWeight, product of: 6.896415 =
idf(docFreq=1037, maxDocs=377553) 0.10145303 = queryNorm 1.5836904 =
fieldWeight in 44109, product of: 2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0 6.896415 = idf(docFreq=1037, maxDocs=377553) 0.09375 =
fieldNorm(doc=44109)

*Note:* I have searched for televisions. My search field is a single
catch-all field. The Edismax parser seems to break up my search term into
televis and tv

Is there some documentation on how to understand these numbers. They do not
seem to be properly delimited. At the minimum, I can understand something
like:
1.5797625 =  0.4717142 + 1.1080483
and
0.71447384  = 7.0424104 * 0.10145303

But, I cannot understand if something like 0.10145303 = queryNorm 0.660226
= fieldWeight in 44109 is used in the calculation anywhere. Also since
there were only two terms /(televis and tv)/ I could use subtraction to
find out 1.1080483 was the start of a new result.

I'd also appreciate if someone can tell me which class dumps out the above
data. If I know it, I can edit that class to make the output a bit more
understandable for me.

Thank you,
O. O.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-the-Debug-explanations-for-Query-Result-Scoring-Ranking-tp4149137.html
Sent from the Solr - User mailing list archive at Nabble.com. 



RE: Java heap space error

2014-07-25 Thread Toke Eskildsen
Steve Rowe [sar...@gmail.com] wrote:
 1 Lakh (aka Lac) = 10^5 is written as 1,00,000

 It’s used in Bangladesh, India, Myanmar, Nepal, Pakistan, and Sri Lanka, 
 roughly 1/4 of the world’s population.

Yet still it causes confusion and distracts from the issue. Let's just stick to 
metric, okay?

- Toke Eskildsen


SOLR cloud creating multiple copies of the same index

2014-07-25 Thread pras.venkatesh
Hi , we have a solr cloud instance with 8 nodes and 4 shards. We are starting
to see that index size is growing so huge and when looked at the file system
solr has created several copies of the index.
However using solr admin, I could see its using only on the them.

This is what I see in solr admin.

Index:
/opt/solr/collections/aq-collection/data/index.20140725024044234

Master (Searching)  
1406320016969

Gen - 81553

size -58.72 GB.

But when I go in to the file system , This is how it looks.

16G   index.20140527220456134
  45G   index.20140630001131038
 4.6G   index.20140630090031282
  20G   index.20140703192128959
 1.3G   index.20140703200948410
  31G   index.20140708162308859
  52G   index.20140716165801658
  59G   index.20140725024044234
   4K   index.properties
   4K   replication.properties

it is actually pointing only to the index.20140725024044234, and using that
for searching and indexing. The timstamps on other indexes are old(about a
month or so)

Can some one explain me why it created so many copies of the index(we did
not create them manually). and how it can be prevented.

Our solr instances are running on solaris VMs



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-cloud-creating-multiple-copies-of-the-same-index-tp4149264.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud extended warmup support

2014-07-25 Thread Jeff Wartes
It¹s a command like this just prior to jetty startup:

find -L solrhome dir -type f -exec cat {}  /dev/null \;


On 7/24/14, 2:11 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote:

Jeff Wartes [jwar...@whitepages.com] wrote:
 Well, I¹m not sure what to say. I¹ve been observing a noticeable latency
 decrease over the first few thousand queries.

How exactly do you get the index files fully cached? The cp-command will
(at least for some systems) happily skip copying if the destination is
/dev/null. One way is to ensure caching is to cat all the files to
/dev/null.

- Toke Eskildsen



Re: SOLR cloud creating multiple copies of the same index

2014-07-25 Thread Jeff Wartes

Looks to me like you are, or were, hitting the replication handler¹s
backup function:
http://wiki.apache.org/solr/SolrReplication#HTTP_API

ie, http://master_host:port/solr/replication?command=backup

You might not have been doing it explicitly, there¹s some support for a
backup being triggered when certain things happen:
http://wiki.apache.org/solr/SolrReplication#Master




On 7/25/14, 1:50 PM, pras.venkatesh prasann...@outlook.com wrote:

Hi , we have a solr cloud instance with 8 nodes and 4 shards. We are
starting
to see that index size is growing so huge and when looked at the file
system
solr has created several copies of the index.
However using solr admin, I could see its using only on the them.

This is what I see in solr admin.

Index:
/opt/solr/collections/aq-collection/data/index.20140725024044234

Master (Searching) 
1406320016969
   
Gen - 81553
   
size -58.72 GB.

But when I go in to the file system , This is how it looks.

16G   index.20140527220456134
  45G   index.20140630001131038
 4.6G   index.20140630090031282
  20G   index.20140703192128959
 1.3G   index.20140703200948410
  31G   index.20140708162308859
  52G   index.20140716165801658
  59G   index.20140725024044234
   4K   index.properties
   4K   replication.properties

it is actually pointing only to the index.20140725024044234, and using
that
for searching and indexing. The timstamps on other indexes are old(about a
month or so)

Can some one explain me why it created so many copies of the index(we did
not create them manually). and how it can be prevented.

Our solr instances are running on solaris VMs



--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLR-cloud-creating-multiple-copies-of-
the-same-index-tp4149264.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-25 Thread Chris Hostetter

: Thank you very much Erik. This is exactly what I was looking for. While at
: the moment I have no clue about these numbers, they ruby formatting makes it
: much more easier to understand.

Just to be clear, regardless of *which* response writer you use (xml, 
ruby, json, etc...) the default behavior is to include the score 
explanation sa a single string which uses tabs/newlines to deal with the 
nested (this nesting is visible if you view the raw response, no matter 
what ResponseWriter)

You can however add a param indicating that you want the explaantion 
information to be returned as a *structured data* instead o a simple 
string...

https://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured

...if you wnat to programatically process debug info, this is the 
recomended way to to so.

-Hoss
http://www.lucidworks.com/


Re: Any Solr consultants available??

2014-07-25 Thread Alexandre Rafalovitch
On Fri, Jul 25, 2014 at 6:59 PM, Jack Krupansky j...@basetechnology.com wrote:
 OTOH, how many people are there out there who want to become Solr
 consultants, but aren't already either doing it or at least already in the
 process of coming up to speed or maybe just not cut out for it?

Well, I would target two groups:
*) Startups that just realized they need search
*) People who want to become consultants and want speed track to that
(already in the process can take quite a while).

For Startups, I would do a weeklong version of what I did with my
one-day Solr Masterclass.
*) Bring your own data, we teach you very specific process of
development-oriented setup (e.g. start from
https://github.com/arafalov/simplest-solr-config/blob/master/simplest-solr/collection1/conf/schema.xml
, teach rapid iterations, ways to affect data in Solr such as URP,
Custom Search Components, etc).
*) Then teach debugging.
*) Then SolrCloud.
*) Then maybe touch on BigData as many SAAS startups will hit that problem
*) Then going into production.
*) Then, send them out with a (paid-for and/or subscription) dedicated
discussion group where the mentor would continue answering questions
as they bubble up. etc.
*) And more

For consultants:
*) you teach them to understand which problems Solr is good for
*) you teach them how to explain Solr to others.
*) Teach them (or build for them) great Solr demos.
*) Give them unsolved-but-tractable project and assist them in making
those happen (e.g. build a Solr-backed real solr-consultants website,
testing Solr clients with latest Solr, testing upstream integration,
creating Solr feature demos for 3rd party products that have Solr
inside, etc)
*) Build them environments to quickly test their ideas, skills, etc.
*) Give them tools and tricks to quickly build online identity around
Solr (blogging tips, link to their articles to build SEO, GitHub
repos, etc)
*) Build a network where consultants can pass work to each other based
on geography
*) Get preferential deals with commercial Solr components suppliers,
so the consultants get things like UI components at reduced price or
extended trials or whatever
*) Dedicated discussion group
*) If they are in the solr-consultants directory, charge them
subscription fees but give them a dedicated discussion group where
they can talk but also ask for particular features (e.g. better
examples, demo repos, language support, deals, commonly useful
components like the split/join filters, etc). Use those as projects to
drive next batch of developers.
*) Reach out to startup community and offer discounted/apprenticeship
model to access those newly graduated consultants.
*) Possibly provide things like USA corporation umbrella to bring -
say - a Philipino consultant to USA/UK for 3 months to train and then
let them go back home to establish the business.
*) And, again, a lot more

And, of course, gamify the whole lot wherever possible to drive the
speed of adoption :-)

Time is money.

Many of the things above exist for Solr, but they are all over the
web, often rotting after initial release due to lack of visibility,
etc. Other things are missing documentation, etc. Many of the other
things exist (e.g. consultant directories) but they are not Solr
specific. Frankly, many of the things that do exist have terrible
search, fixing that alone would be competitive beyond Solr. There is
value in building a happy singing YCombinator-style path.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853