One of three cores is missing userData and lastModified fields from /admin/cores

2015-03-24 Thread Aaron Daubman
Hey All, On a Solr server running 4.10.2 with three cores, two return the expected info from /solr/admin/cores?wt=json but the third is missing userData and lastModified. The first (artists) and third (tracks) cores from the linked screenshot are the ones I care about. Unfortunately, the third

Re: Understanding fieldNorm differences between 3.6.1 and 4.9 solrs

2014-07-02 Thread Aaron Daubman
/%3CCALyTvnpwZMj4zxPbK0abVpnyRJny=qauijdqmj7e3zgnv7u...@mail.gmail.com%3E In the mean time, I'm still happy to hear any new thoughts / suggestions on making similarity contiguous across upgrades. Thanks again, Aaron On Tue, Jul 1, 2014 at 11:14 PM, Aaron Daubman daub...@gmail.com wrote: In trying to determine some

Understanding fieldNorm differences between 3.6.1 and 4.9 solrs

2014-07-01 Thread Aaron Daubman
In trying to determine some subtle scoring differences (causing occasionally significant ordering differences) among search results, I wrote a parser to normalize debug.explain.structured JSON output. It appears that every score that is different comes down to a difference in fieldNorm, where the

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

2012-12-04 Thread Aaron Daubman
Hi Upayavira, One small question - did you re-index in-between? The index structure will be different for each. Yes, the Solr 1.4.1 (working) instance was built using the original schema and that solr version. The Solr 3.6.1 (not working) instance was re-built using the new schema and Solr

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

2012-12-04 Thread Aaron Daubman
I forgot a possibly important piece... Given the different Solr versions, the schema version (and it's related different defaults) is also a change: Solr 1.4.1 Has: schema name=ourSchema version=1.1 Solr 3.6.1 Has: schema name=ourSchema version=1.5 Solr 1.4.1 Relevant Schema Parts - Working

Re: Cannot run Solr4 from Intellij Idea

2012-12-04 Thread Aaron Daubman
Interestingly, I have run in to this same (or very similar) issue when attempting to run embedded solr. All of the solr.* classes that were recently moved to lucene would not work with the solr.* shorthand - I had to replace them with the full classpath. As you found, these shorthands in the same

Preventing accepting queries while custom QueryComponent starts up?

2012-11-08 Thread Aaron Daubman
Greetings, I have several custom QueryComponents that have high one-time startup costs (hashing things in the index, caching things from a RDBMS, etc...) Is there a way to prevent solr from accepting connections before all QueryComponents are ready? Especially, since many of our instance are

Re: Preventing accepting queries while custom QueryComponent starts up?

2012-11-08 Thread Aaron Daubman
, 2012 at 11:54 AM, Aaron Daubman daub...@gmail.com wrote: Greetings, I have several custom QueryComponents that have high one-time startup costs (hashing things in the index, caching things from a RDBMS, etc...) Is there a way to prevent solr from accepting connections before all

Re: Preventing accepting queries while custom QueryComponent starts up?

2012-11-08 Thread Aaron Daubman
(plus when I deploy, my deploy script runs some actual simple test queries to ensure they return before enabling the ping handler to return 200s) to avoid this problem. What are you doing to programmatically disable/enable the ping handler? This sounds like exactly what I should be doing as

Improving performance for use-case where large (200) number of phrase queries are used?

2012-10-24 Thread Aaron Daubman
Greetings, We have a solr instance in use that gets some perhaps atypical queries and suffers from poor (2 second) QTimes. Documents (~2,350,000) in this instance are mainly comprised of various descriptive fields, such as multi-word (phrase) tags - an average document contains 200-400 phrases

Re: Improving performance for use-case where large (200) number of phrase queries are used?

2012-10-24 Thread Aaron Daubman
Thanks for the ideas - some followup questions in-line below: * use shingles e.g. to turn two-word phrases into single terms (how long is your average phrase?). Would this be different than what I was calling common grams? (other than shingling every two words, rather than just common ones?)

Re: Improving performance for use-case where large (200) number of phrase queries are used?

2012-10-24 Thread Aaron Daubman
Hi Peter, Thanks for the recommendation - I believe we are thinking along the same lines, but wanted to check to make sure. Are you suggesting something different than my #5 (below) or are we essentially suggesting the same thing? On Wed, Oct 24, 2012 at 1:20 PM, Peter Keegan

Why does SolrIndexSearcher.java enforce mutual exclusion of filter and filterList?

2012-10-21 Thread Aaron Daubman
Greetings, I'm wondering if somebody would please explain why SolrIndexSearcher.java enforces mutual exclusion of filter and filterList (e.g. see: https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L2039 ) For a custom application

ScorerDocQueue.java's downHeap showing up as frequent hotspot in profiling - ideas why?

2012-10-16 Thread Aaron Daubman
Greetings, In a recent batch of solr 3.6.1 slow response time queries the profiler highlighted downHeap (line 212) in SoorerDocQueue.java as averaging more than 60ms across the 16 calls I was looking at and showing it spiking up over 100ms - which, after looking at the code (two int

Re: PriorityQueue:initialize consistently showing up as hot spot while profiling

2012-10-10 Thread Aaron Daubman
Hi Mikhail, On Fri, Oct 5, 2012 at 7:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: okay. huge rows value is no.1 way to kill Lucene. It's not possible, absolutely. You need to rethink logic of your component. Check Solr's FieldCollapsing code, IIRC it makes second search to achieve

Re: PriorityQueue:initialize consistently showing up as hot spot while profiling

2012-10-05 Thread Aaron Daubman
On Fri, Oct 5, 2012 at 6:56 AM, Aaron Daubman daub...@gmail.com wrote: Greetings, I've been seeing this call chain come up fairly frequently when debugging longer-QTime queries under Solr 3.6.1 but have not been able to understand from the code what is really going on - the call graph

PriorityQueue:initialize consistently showing up as hot spot while profiling

2012-10-04 Thread Aaron Daubman
Greetings, I've been seeing this call chain come up fairly frequently when debugging longer-QTime queries under Solr 3.6.1 but have not been able to understand from the code what is really going on - the call graph and code follow below. Would somebody please explain to me: 1) Why this would

Re: Understanding fieldCache SUBREADER insanity

2012-10-02 Thread Aaron Daubman
Hi Yonik, I've been attempting to fix the SUBREADER insanity in our custom component, and have made perhaps some progress (or is this worse?) - I've gone from SUBREADER to VALUEMISMATCH insanity: ---snip--- entries_count : 12 entry#0 :

Solr Caching - how to tune, how much to increase, and any tips on using Solr with JDK7 and G1 GC?

2012-09-29 Thread Aaron Daubman
Greetings, I've recently moved to running some of our Solr (3.6.1) instances using JDK 7u7 with the G1 GC (playing with max pauses in the 20 to 100ms range). By and large, it has been working well (or, perhaps I should say that without requiring much tuning it works much better in general than my

How to more gracefully handle field format exceptions?

2012-09-24 Thread Aaron Daubman
Greetings, Is there a way to configure more graceful handling of field formatting exceptions when indexing documents? Currently, there is a field being generated in some documents that I am indexing that is supposed to be a float but some times slips through as an empty string. (I know, fix the

Re: How to more gracefully handle field format exceptions?

2012-09-24 Thread Aaron Daubman
catch the error on the client, fix/clean/remove, and retry, no? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Mon, Sep 24, 2012 at 9:21 PM, Aaron Daubman daub...@gmail.com wrote: Greetings

Re: Understanding fieldCache SUBREADER insanity

2012-09-21 Thread Aaron Daubman
Yonik, et al. I believe I found the section of code pushing me into 'insanity' status: ---snip--- int[] collapseIDs = null; float[] hotnessValues = null; String[] artistIDs = null; try { collapseIDs =

Understanding fieldCache SUBREADER insanity

2012-09-19 Thread Aaron Daubman
Hi all, In reviewing a solr instance with somewhat variable performance, I noticed that its fieldCache stats show an insanity_count of 1 with the insanity type SUBREADER: ---snip--- insanity_count : 1 insanity#0 : SUBREADER: Found caches for descendants of ReadOnlyDirectoryReader(segments_k

Re: Understanding fieldCache SUBREADER insanity

2012-09-19 Thread Aaron Daubman
Hi Tomás, This probably means that you are using the same field for faceting and for sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level cache and faceting uses by default the global field cache. This can be a problem because the field is duplicated in cache, and then it

Solr request/response lifecycle and logging full response time

2012-09-06 Thread Aaron Daubman
Greetings, I'm looking to add some additional logging to a solr 3.6.0 setup to allow us to determine actual time spent by Solr responding to a request. We have a custom QueryComponent that sometimes returns 1+ MB of data and while QTime is always on the order of ~100ms, the response time at the

Re: Solr request/response lifecycle and logging full response time

2012-09-06 Thread Aaron Daubman
, Aaron Daubman daub...@gmail.com wrote: Greetings, I'm looking to add some additional logging to a solr 3.6.0 setup to allow us to determine actual time spent by Solr responding to a request. We have a custom QueryComponent that sometimes returns 1+ MB of data and while QTime is always

Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-19 Thread Aaron Daubman
Robert, I have a solr 1.4.1 instance and a solr 3.6.0 instance, both configured as identically as possible (given deprecations) and indexing the same document. Why did you do this? If you want the exact same scoring, use the exact same analysis. This means specifying luceneMatchVersion =

Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-19 Thread Aaron Daubman
Robert, So this is lossy: basically you can think of there being only 256 possible values. So when you increased the number of terms only slightly by changing your analysis, this happened to bump you over the edge rounding you up to the next value. more information:

Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-18 Thread Aaron Daubman
Greetings, I've been digging in to this for two days now and have come up short - hopefully there is some simple answer I am just not seeing: I have a solr 1.4.1 instance and a solr 3.6.0 instance, both configured as identically as possible (given deprecations) and indexing the same document.

Debugging jetty IllegalStateException errors?

2012-07-04 Thread Aaron Daubman
Greetings, I'm wondering if anybody has experienced (and found root cause) for errors like this. We're running Solr 3.6.0 with latest stable Jetty 7 (7.6.4.v20120524). I know this is likely due to a client (or the server) terminating the connection unexpectedly, but we see these fairly frequently

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-11 Thread Aaron Daubman
While I look into doing some refactoring, as well as creating some new UpdateRequestProcessors (and/or backporting), would you please point me to some reading material on why you say the following: In this day and age, a custom update handler is almost never the right answer to a problem -- nor

Re: What would cause: SEVERE: java.lang.ClassCastException: com.company.MyCustomTokenizerFactory cannot be cast to org.apache.solr.analysis.TokenizerFactory

2012-06-10 Thread Aaron Daubman
. -- Jack Krupansky -Original Message- From: Aaron Daubman Sent: Saturday, June 09, 2012 12:03 AM To: solr-user@lucene.apache.org Subject: What would cause: SEVERE: java.lang.ClassCastException: com.company.**MyCustomTokenizerFactory cannot be cast to org.apache.solr.analysis

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-10 Thread Aaron Daubman
Hoss, The new FieldValueSubsetUpdateProcessorFactory classes look phenomenal. I haven't looked yet, but what are the chances these will be back-ported to 3.6 (or how hard would it be to backport them?)... I'll have to check out the source in more detail. If stuck on 3.6, what would be the best

What would cause: SEVERE: java.lang.ClassCastException: com.company.MyCustomTokenizerFactory cannot be cast to org.apache.solr.analysis.TokenizerFactory

2012-06-08 Thread Aaron Daubman
Greetings, I am in the process of updating custom code and schema from Solr 1.4 to 3.6.0 and have run into the following issue with our two custom Tokenizer and Token Filter components. I've been banging my head against this one for far too long, especially since it must be something obvious I'm

Re: What would cause: SEVERE: java.lang.ClassCastException: com.company.MyCustomTokenizerFactory cannot be cast to org.apache.solr.analysis.TokenizerFactory

2012-06-08 Thread Aaron Daubman
=English protected=protwords.txt/ /analyzer /fieldtype ---snip--- On Sat, Jun 9, 2012 at 12:03 AM, Aaron Daubman daub...@gmail.com wrote: Greetings, I am in the process of updating custom code and schema from Solr 1.4 to 3.6.0 and have run into the following issue with our two

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-05 Thread Aaron Daubman
Thanks for the responses, By saying dirty data you imply that only one of the values is good or clean and that the others can be safely discarded/ignored, as opposed to true multi-valued data where each value is there for good reason and needs to be preserved. In any case, how do you

Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-04 Thread Aaron Daubman
Greetings, I have dirty source data where some documents being indexed, although unlikely, may contain multivalued fields that are also required for sorting. In previous versions of Solr, sorting on this field worked fine (possibly because few or no multivalued fields were ever encountered?),

Re: Tips on creating a custom QueryCache?

2012-05-30 Thread Aaron Daubman
Hoss, : 1) Any recommendations on which best to sub-class? I'm guessing, for this : scenario with rare batch puts and no evictions, I'd be looking for get : performance. This will also be on a box with many CPUs - so I wonder if the : older LRUCache would be preferable? i suspect you are

Example setup of using Solr 3.6.0 with Jetty 7 (7.6.3)?

2012-05-29 Thread Aaron Daubman
Greetings, Has anybody gotten Solr 3.6.0 to work well with Jetty 7.6.3, and if so, would you mind sharing your config files / directory structure / other useful details? Thanks, Aaron

Generating maven artifacts for 3.6.0 build - correct -Dversion to use?

2012-05-25 Thread Aaron Daubman
Greetings, Following the directions here: http://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/maven/README.maven for building Lucene/Solr with Maven, what is the correct -Dversion to pass in to get-maven-poms. This seems set up for building -SNAPSHOT, however, I would like to use maven

Re: Tips on creating a custom QueryCache?

2012-05-24 Thread Aaron Daubman
before the usual QueryComponent? This component would be responsible for loading queries, executing them, caching results, and for returning those results when these queries are encountered later on. Otis From: Aaron Daubman daub...@gmail.com Subject: Tips

Re: Tips on creating a custom QueryCache?

2012-05-24 Thread Aaron Daubman
Hoss, brilliant as always - many thanks! =) Subclassing the SolrCache class sounds like a good way to accomplish this. Some questions: 1) Any recommendations on which best to sub-class? I'm guessing, for this scenario with rare batch puts and no evictions, I'd be looking for get performance.

Tips on creating a custom QueryCache?

2012-05-23 Thread Aaron Daubman
Greetings, I'm looking for pointers on where to start when creating a custom QueryCache. Our usage patterns are possibly a bit unique, so let me explain the desired use case: Our Solr index is read-only except for dedicated periods where it is updated and re-optimized. On startup, I would like