RE: facet on field aliases of same field

2014-10-29 Thread Michael Ryan
It is indeed possible. Just need to use a different syntax. As far as I know, the facet parameters need to be local parameters, like this... facet.range={!key=date_decade facet.range.start=1600-01-01T00:00:00Z facet.range.end=2000-01-01T00:00:00Z

Changing/merging terms of existing documents without reindexing them

2014-10-22 Thread Michael Ryan
I have the following problem: I have many (let's say hundreds of millions) of documents in an existing distributed index that have a field with a variety of values. Two of these values are dog and puppy. I have decided that I want to reclassify these to just all be dog. I do queries on this

RE: Exact match on string field with special characters

2014-10-06 Thread Michael Ryan
This should do what you want: String fq = Field1 + \ + org.apache.solr.client.solrj.util.ClientUtils.escapeQueryChars(value) + \; -Michael -Original Message- From: tedsolr [mailto:tsm...@sciquest.com] Sent: Monday, October 06, 2014 10:49 AM To: solr-user@lucene.apache.org Subject: Re:

RE: Inconsistent response time

2014-10-03 Thread Michael Ryan
It could be due to the minimum timer resolution on Windows. Do a search for windows 15ms and you'll find a lot of information about it. Though, I'm not sure which versions of Windows and/or Java have that problem. You could test it out by timing things other than Solr and see if they also take

RE: Exact match on string field with special characters

2014-10-01 Thread Michael Ryan
When you call addFacetField, the parameter you pass it should just be the fieldName. The fieldValue shouldn't come into play at all (unless I'm misunderstanding what you're trying to do). If you ever do need to escape a value for a query, you can use

RE: Content-Charset header in HttpSolrServer

2014-08-10 Thread Michael Ryan
Done. https://issues.apache.org/jira/browse/SOLR-6360 -Michael -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, August 06, 2014 7:55 PM To: solr-user@lucene.apache.org Subject: Re: Content-Charset header in HttpSolrServer : I was reviewing

RE: solr update dynamic field generates multiValued error

2014-08-04 Thread Michael Ryan
Are the latLong_0_coordinate and latLong_1_coordinate fields populated using copyField? If so, this sounds like it could be https://issues.apache.org/jira/browse/SOLR-3502. -Michael -Original Message- From: Franco Giacosa [mailto:fgiac...@gmail.com] Sent: Monday, August 04, 2014 9:05

Content-Charset header in HttpSolrServer

2014-07-27 Thread Michael Ryan
I was reviewing the httpclient code in HttpSolrServer and noticed that it sets a Content-Charset header. As far as I know this is not a real header and is not necessary. Anyone know a reason for this to be there? I'm guessing this was just a mistake when converting from httpclient3 to

RE: DocValues without re-index?

2014-07-22 Thread Michael Ryan
On Tue, Jul 22, 2014 at 6:50 AM, Michael Ryan mr...@moreover.com wrote: Is it possible to use DocValues on an existing index without first re-indexing? -Michael -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com

DocValues without re-index?

2014-07-21 Thread Michael Ryan
Is it possible to use DocValues on an existing index without first re-indexing? -Michael

RE: Group only top 50 results not All results.

2014-07-11 Thread Michael Ryan
I suggest doing this in two queries. In the first query, retrieve the unique ids of the top 50 documents. In the second query, just query for those ids (e.g., q=ids:(2 13 55 62 81)), and add the facet parameters on that query. -Michael -Original Message- From: Aaron Gibbons

RE: Multiterm analysis in complexphrase query

2014-07-01 Thread Michael Ryan
any questions. LUCENE-5470 and LUCENE-5504 would move multiterm analysis farther down and make it available to all parsers that use QueryParserBase, including the ComplexPhraseQueryParser. Best, Tim -Original Message- From: Michael Ryan [mailto:mr...@moreover.com] Sent: Sunday

Best way to fix Document contains at least one immense term?

2014-07-01 Thread Michael Ryan
In LUCENE-5472, Lucene was changed to throw an error if a term is too long, rather than just logging a message. I have fields with terms that are too long, but I don't care - I just want to ignore them and move on. The recommended solution in the docs is to use LengthFilterFactory, but this

RE: Best way to fix Document contains at least one immense term?

2014-07-01 Thread Michael Ryan
script update processor. Can you tell us more about the nature of your data? I mean, sometimes analyzer filters strip or fold accented characters anyway, so count of characters versus UTF-8 bytes may be a non-problem. -- Jack Krupansky -Original Message- From: Michael Ryan Sent

Multiterm analysis in complexphrase query

2014-06-29 Thread Michael Ryan
I've been using a modified version of the complex phrase query parser patch from https://issues.apache.org/jira/browse/SOLR-1604 in Solr 3.6, and I'm currently upgrading to 4.9, which has this built-in. I'm having trouble with using accents in wildcard queries, support for which was added in

RE: Date truncation and time zone when searching

2014-05-21 Thread Michael Ryan
Well for CEST, which is 2 hours ahead, I would think you could just do... datefield:[* TO NOW/MONTH-2HOURS] That would give you everything up to 2014-04-30 22:00:00 GMT, which is 2014-05-01 00:00:00 CEST. Always always always store the correct value. -Michael -Original Message- From:

score retrieval performance

2014-05-19 Thread Michael Ryan
Is there any significant difference in query speed when retrieving the score pseudo-field? E.g., does... q=foosort=date+descfl=*,score ...take longer to run than... q=foosort=date+descfl=* I know there's different code paths in Solr depending on whether the score is needed or not, but not

RE: timeAllowed query parameter not working?

2014-03-27 Thread Michael Ryan
Unfortunately the timeAllowed parameter doesn't apply to the part of the processing that makes wildcard queries so slow. It only applies to a later part of the processing when the matching documents are being collected. There's some discussion in the original ticket that implemented this

RE: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-11-26 Thread Michael Ryan
My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone

RE: Interesting edismax/qs bug in Solr 3.5

2013-09-22 Thread Michael Ryan
Sounds like https://issues.apache.org/jira/browse/LUCENE-3821 (issue seems to be fixed but still shows as open). -Michael -Original Message- From: Arcadius Ahouansou [mailto:arcad...@menelic.com] Sent: Sunday, September 22, 2013 11:15 PM To: solr-user Subject: Interesting edismax/qs

RE: JVM Crash using solr 4.4 on Centos

2013-09-19 Thread Michael Ryan
This is a known bug in that JDK version. Upgrade to a newer version of JDK 7 (any build within the last two years or so should be fine). If that's not possible for you, you can add -XX:-UseLoopPredicate as a command line option to java to work around this. -Michael -Original Message-

RE: Memory usage during aggregation - SolrCloud with very large numbers of facet terms.

2013-09-03 Thread Michael Ryan
However, the Solr instance we direct our client query to is consuming significantly more RAM (10GB) and is still failing after a few queries when it runs out of heap space. This is presumably due to the role it plays, aggregating the results from each shard. That seems quite odd... What

RE: swap and GC

2013-07-29 Thread Michael Ryan
This is interesting... How are you measuring the heap size? -Michael -Original Message- From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] Sent: Monday, July 29, 2013 5:34 AM To: solr-user@lucene.apache.org Subject: swap and GC Something interesting I have noticed today, after

RE: Solr 3.6 optimize and field cache question

2013-07-08 Thread Michael Ryan
I'm 99% sure that the deleted docs will indeed use up space in the field cache, at least until the segments that those documents are in are merged - that is what an optimize will do. Of course, these segments will automatically be merged eventually, but it might take days for this to happen,

Using per-segment FieldCache or DocValues in custom component?

2013-07-01 Thread Michael Ryan
I have some custom code that uses the top-level FieldCache (e.g., FieldCache.DEFAULT.getLongs(reader, foobar, false)). I'd like to redesign this to use the per-segment FieldCaches so that re-opening a Searcher is fast(er). In most cases, I've got a docId and I want to get the value for a

RE: why does the uniqueKey has to be indexed.

2013-06-24 Thread Michael Ryan
To enforce uniqueness, Solr needs to be able to search on the id to see if it is currently in the index. -Michael -Original Message- From: Mysurf Mail [mailto:stammail...@gmail.com] Sent: Monday, June 24, 2013 11:52 AM To: solr-user@lucene.apache.org Subject: why does the uniqueKey has

RE: Restarting SOLR will remove all cache?

2013-06-21 Thread Michael Ryan
Restarting Solr won't clear the disk cache. When I'm doing perf testing, I'll sometimes run this on the server before each test to clear out the disk cache: echo 1 /proc/sys/vm/drop_caches -Michael -Original Message- From: Learner [mailto:bbar...@gmail.com] Sent: Friday, June 21,

RE: Stats facet on int/tint fields

2013-04-22 Thread Michael Ryan
Sounds like this could be https://issues.apache.org/jira/browse/SOLR-2976. -Michael -Original Message- From: vinothkumar raman [mailto:vinothkr.k...@gmail.com] Sent: Monday, April 22, 2013 5:54 AM To: solr-user@lucene.apache.org; solr-...@lucene.apache.org Subject: Stats facet on

RE: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Michael Ryan
I've investigated this in the past. The worst case is 2*indexSize additional disk space (3*indexSize total) during an optimize. In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 10. We would see the worst case happen when there were exactly 20 segments (or some

RE: NPE when faceting TEXTfield in a distributed search query

2013-04-10 Thread Michael Ryan
Large facet.limit values cause a very large amount of form data to be sent to the shards, though I'm not sure why this would cause a NullPointerException. Perhaps the web server you are using is truncating the data instead of returning a form too large error, which is somehow causing an NPE.

RE: NPE when faceting TEXTfield in a distributed search query

2013-04-10 Thread Michael Ryan
Yes, this is a distributed search thing. In a distributed search, it will first make a somewhat normal facet request to all of the shards, get back the facet values, then make a second request in order to get the full counts of the facet values - this second request contains a list of facet

RE: Slow performance on distributed search

2013-03-26 Thread Michael Ryan
What are the values of the start and rows parameters you are using? When you say the controller shard takes a long time, how long is it taking - 100ms, 1s, 10s...? -Michael -Original Message- From: qungg [mailto:qzheng1...@gmail.com] Sent: Tuesday, March 26, 2013 11:17 AM To:

RE: Slow performance on distributed search

2013-03-26 Thread Michael Ryan
Depending on your use case and the particulars of your system, a previous post I made about using a FieldCache in SolrIndexSearcher for id retrieval (see http://osdir.com/ml/solr-user.lucene.apache.org/2013-01/msg01574.html) may help you. In your case, it might not be the merging process on the

Nested queries with proximity/slop

2013-03-19 Thread Michael Ryan
I was wondering if anyone is aware of an existing Jira for this bug... _query_:\a b\~2 ...is parsed as... PhraseQuery(someField:a b) ...instead of the expected... PhraseQuery(someField:a b~2) _query_:\a b\~2 ...is parsed as... PhraseQuery(someField:a b~2) _query_:\a b\~2~3 ...is parsed as...

RE: Distributed Search and the Stale Check

2013-02-25 Thread Michael Ryan
I don't have anything to add besides saying this is awesome. Great analysis. -Michael

RE: Can't determine Sort Order: 'prijs ASC', pos=5

2013-02-13 Thread Michael Ryan
I think the order needs to be in lowercase. Try asc instead of ASC. -Michael -Original Message- From: PeterKerk [mailto:vettepa...@hotmail.com] Sent: Wednesday, February 13, 2013 7:30 PM To: solr-user@lucene.apache.org Subject: Can't determine Sort Order: 'prijs ASC', pos=5 On this

RE: solr j response

2013-02-10 Thread Michael Ryan
Assuming that createdDate is a DateField in your schema.xml, the object returned by SolrJ will be a Date object (though you will need to cast it to a Date). -Michael

RE: LocalParam tag does not work when is placed in brackets

2013-02-07 Thread Michael Ryan
I'm pretty sure the local params have to be at the very start of the query. But you should be able to do this with nested queries. Try this... fq=_query_:{!tag=d0feea8}category:\5\ OR otherField:\otherValue\ AND type:DOCUMENT -Michael -Original Message- From: Karol Sikora

RE: A question about attaching shards to load balancers

2013-01-30 Thread Michael Ryan
From a performance point of view, I can't imagine it mattering. In our setup, we have a dedicated Solr server that is not a shard that takes incoming requests (we call it the coordinator). This server is very lightweight and practically has no load at all. My gut feeling is that having a

Using FieldCache in SolrIndexSearcher for distributed id retrieval

2013-01-29 Thread Michael Ryan
Following up from a post I made back in 2011... I am a user of Solr 3.2 and I make use of the distributed search capabilities of Solr using a fairly simple architecture of a coordinator + some shards. Correct me if I am wrong: In a standard distributed search with QueryComponent, the

RE: Issues with docFreq/docCount on SolrCloud

2013-01-23 Thread Michael Ryan
Are you able to see any evidence that some of the 500k docs are being added twice? Check the maxDocs on the Solr admin page. I vaguely recall there being some issue with docs in SolrCloud being added multiple times (which under the covers is really add, delete, add). I think that could cause

RE: Solr 4.0 - timeAllowed in distributed search

2013-01-20 Thread Michael Ryan
(This is based on my knowledge of 3.6 - not sure if this has changed in 4.0) You are using rows=3, which requires retrieving 3 documents from disk. In a non-distributed search, the QTime will not include the time it takes to retrieve these documents, but in a distributed search, it

RE: SOlr 3.5 and sharding

2013-01-14 Thread Michael Ryan
If you have the same documents -- with the same uniqueKey -- across multiple shards, the count will not be what you expect. You'll need to ensure that each document exists only on a single shard. -Michael -Original Message- From: Jean-Sebastien Vachon

RE: wildcard faceting in solr cloud

2013-01-08 Thread Michael Ryan
I'd guess that the patch simply doesn't implement it for distributed searches. The code for distributed facets is quite a bit more complicated, and I don't see it touched in this patch. -Michael -Original Message- From: jmozah [mailto:jmo...@gmail.com] Sent: Tuesday, January 08, 2013

RE: Question about GC logging timestamps

2013-01-05 Thread Michael Ryan
From my own experience, the timestamp seems to be logged at the start of the garbage collection. -Michael

RE: Odd exceptions in both 3.5 and 4.1-SNAPSHOT

2013-01-03 Thread Michael Ryan
We see these EofExceptions in our system occasionally. I believe they occur when our SolrJ client times out and closes the connection, before Jetty returns the response. -Michael -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, January 03, 2013 10:07 AM

RE: Sort speed asc vs desc - is desc slower?

2012-12-12 Thread Michael Ryan
Perhaps if there are a lot more ties on one end vs the other? Or of the values being sorted on aren't that random? Do they naturally increase like a timestamp? It's a unique id field. The id is a simple sequential id, so docs with a lower doc id will naturally also have a lower id. I think

Highlighting data stored outside of Solr

2012-12-11 Thread Michael Ryan
Has anyone ever attempted to highlight a field that is not stored in Solr? We have been considering not storing fields in Solr, but still would like to use Solr's built-in highlighting. On first glance, it looks like it would be fairly simply to modify DefaultSolrHighlighter to get the stored

Occasional failed to respond errors

2012-12-05 Thread Michael Ryan
We have a longstanding issue with failed to respond errors in Solr when our coordinator is querying our Solr shards. To elaborate further... we're using the built-in distributed capabilities of Solr 3.6, and using Jetty as our server. Occasionally, we will have a query fail due to an error

RE: SolrCloud - Query performance degrades with multiple servers

2012-12-05 Thread Michael Ryan
As you add nodes, the average response time of the slowest node will likely increase. For example, consider an extreme case where you have something like 1 million nodes - you're practically guaranteed that one of them is going to be doing something like a stop-the-world garbage collection. So

RE: Solr 4 : Optimize very slow

2012-12-04 Thread Michael Ryan
When I upgraded from 3.2 to 3.6, I found that an optimize - all other variables being the same - took about twice as long. Eventually I was able to track this down to the new default of MMapDirectory. By changing back to NIOFSDirectory, I was able to get the optimize time back down to what it

RE: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread Michael Ryan
Yeah, the situation is kind of a pain right now. In https://issues.apache.org/jira/browse/SOLR-2438, it was enabled by default and there is no way to disable without patching SolrQueryParser. There's also the edismax parser which doesn't have a setting for this, which I've made a jira for at

RE: Solr - Disk writes and set up suggestions

2012-11-03 Thread Michael Ryan
I'd recommend not optimizing every hour. Are you seeing a significant performance increase from optimizing this frequently? -Michael

RE: facet by in the past and in the future

2012-10-18 Thread Michael Ryan
This should do it: facet=truefacet.query=yourDateField:([* TO NOW/DAY-1MILLI])facet.query=yourDateField:([NOW/DAY TO *]) -Michael -Original Message- From: Paul [mailto:p...@nines.org] Sent: Thursday, October 18, 2012 5:28 PM To: solr-user@lucene.apache.org Subject: facet by in the past

RE: How many documents in each Lucene segment?

2012-10-15 Thread Michael Ryan
Easiest way I know of without parsing any of the index files is to take the size of the fdx file in bytes and divide by 8. This will give you the exact number of documents before 4.0, and a close approximation in 4.0. Though, the fdx file might not be on disk if you haven't committed. -Michael

RE: Building solr with maven

2012-10-14 Thread Michael Ryan
We have a maven project to build a war containing everything from the Solr war, plus some of our own code. Here's the relevant stuff from our pom.xml: packagingwar/packaging dependencies dependency groupIdorg.apache.solr/groupId

RE: Strange spikes in query response times...any ideas where else to look?

2012-06-28 Thread Michael Ryan
A few questions... 1) Do you only see these spikes when running JMeter? I.e., do you ever see a spike when you manually run a query? 2) How are you measuring the response time? In my experience there are three different ways to measure query speed. Usually all of them will be approximately

RE: KeywordTokenizerFactory with SynonymFilterFactory

2012-06-16 Thread Michael Ryan
Try changing the tokenizer2 SynonymFilterFactory filter to this: filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=false expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ By default, it seems that it uses WhitespaceTokenizer. -Michael

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread Michael Ryan
I'd guess that this is because SnowballPorterFilterFactory does not implement MultiTermAwareComponent. Not sure, though. -Michael

RE: Changing precisionStep without a re-index

2012-04-18 Thread Michael Ryan
In case anyone tries to do this... If you facet on a TrieField and change the precisionStep to 0, you'll need to re-index. Changing precisionStep to 0 changes the prefix returned by TrieField.getMainValuePrefix(FieldType), which then causes facets with a value of 0 to be returned. -Michael

Changing precisionStep without a re-index

2012-04-16 Thread Michael Ryan
Is it safe to change the precisionStep for a TrieField without doing a re-index? Specifically, I want to change a field from this: fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ to this: fieldType name=long class=solr.TrieLongField

RE: Changing precisionStep without a re-index

2012-04-16 Thread Michael Ryan
Not really - it changes what tokens are indexed for them numbers and range queries won't work correctly. Sorting (FieldCache), function queries, etc, would still work, and exact match queries would still work. Thanks. So it is just range queries that won't work correctly? That's okay for my

RE: mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Michael Ryan
It looks like the first format was removed in 3.6 as part of https://issues.apache.org/jira/browse/SOLR-1052. The second format works in all 3.x versions. -Michael -Original Message- From: Peter Wolanin [mailto:peter.wola...@acquia.com] Sent: Friday, April 13, 2012 12:32 PM To:

RE: How to limit the number of open searchers?

2012-03-11 Thread Michael Ryan
I'm curious, why can't you do a master/slave setup? It's just not all that useful for this particular application. Indexing new docs and merging segments - which as I understand is the main strength of having a write-only master - is a relatively small part of our app. What really is

RE: How to limit the number of open searchers?

2012-03-07 Thread Michael Ryan
Unless you have warming happening, there should only be a single searcher open at any given time. So it seems to me that maxWarmingSearchers should give you what you need. What I'm seeing is that if a query takes a very long time to run, and runs across the duration of multiple commits (I

How to limit the number of open searchers?

2012-03-05 Thread Michael Ryan
Is there a way to limit the number of searchers that can be open at a given time? I know there is a maxWarmingSearchers configuration that limits the number of warming searchers, but that's not quite what I'm looking for... Ideally, when I commit, I want there to only be one searcher open

RE: How can Solr do parallel query warming with firstSearcher and newSearcher?

2012-03-05 Thread Michael Ryan
https://issues.apache.org/jira/browse/SOLR-2548 may be of interest to you. -Michael

RE: Update Solr Schema To Store Field

2012-02-01 Thread Michael Ryan
This should be fine. From my experience, changing a field from stored=false to stored=true and vice versa is generally safe to do and has no unexpected behavior. -Michael

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Michael Ryan
Try putting the HTMLStripCharFilterFactory before the StandardTokenizerFactory instead of after it. I vaguely recall being burned by something like this before. -Michael

RE: Question on Reverse Indexing

2012-01-21 Thread Michael Ryan
Can this be the reason why it is working automatically although there are no reversed tokens being stored and even without the ReversedWildcardFilterFactory being set, solr automatically is allowing leading wild card search? Yes, that's correct. See

RE: Question about sorting by a field

2012-01-19 Thread Michael Ryan
How about having a single-valued field named firstDestination that has the first destination in the list, and then your query could be something like 'destination:Buenos Aires firstDestination:Buenos Aires'. Docs that match both should have a higher score and thus will be listed first.

TrieField precisionStep effect on non-range queries and sorting

2012-01-02 Thread Michael Ryan
I was wondering... how does the TrieField precisionStep value affect the speed of non-range queries and sorting? I'm assuming that int (precisionStep=0) is no slower than tint (precisionStep=8) for these - is that correct? tint is just faster for range queries? Is int any faster than tint

RE: Poor performance on distributed search

2011-12-19 Thread Michael Ryan
I had a similar requirement in my project, where a user might ask for up to 3000 results. What I did was change SolrIndexSearcher.doc(int, Set) to retrieve the unique key from the field cache instead of retrieving it as a stored field from disk. This resulted in a massive speed improvement for

RE: Replication Index Fetch error

2011-12-19 Thread Michael Ryan
According to http://lucene.apache.org/java/3_4_0/fileformats.html, the FNMVersion changed from -2 to -3 in Lucene 3.4. Is it possible that the new master is actually running 3.4, and the new slave is running 3.2? (This is just a wild guess.) -Michael

RE: multi value field search

2011-12-17 Thread Michael Ryan
The problem I have is that at search time, I have faceting turned on for this field and therefore, I get the four facets canadian, imperial, bank, and commerce, which all refer to the same record. How can I go about searching for any word contained in the company name but then return the

UnInvertedField vs FieldCache for facets for single-token text fields

2011-11-03 Thread Michael Ryan
I have some fields I facet on that are TextFields but have just a single token. The fieldType looks like this: fieldType name=myStringFieldType class=solr.TextField indexed=true stored=false omitNorms=true sortMissingLast=true positionIncrementGap=100 analyzer tokenizer

RE: Query time help

2011-10-30 Thread Michael Ryan
Another thing to note is that QTime does not include the time it takes to retrieve the stored documents to include in the response. So if you're using a high rows value in your query, QTime may be much smaller than the actual time Solr spends generating the response. Try adding rows=1 to your

Applying hl.requireFieldMatch to groups of fields

2011-10-27 Thread Michael Ryan
I am trying to highlight FieldA when a user searches on either FieldA or FieldB, but I do not want to highlight FieldA when a user searches on FieldC. To explain further: I have a field named content and a field named contentCS. The content field is a stored text field that uses

How to make UnInvertedField faster?

2011-10-19 Thread Michael Ryan
I was wondering if anyone has any ideas for making UnInvertedField.uninvert() faster, or other alternatives for generating facets quickly. The vast majority of the CPU time for our Solr instances is spent generating UnInvertedFields after each commit. Here's an example of one of our slower

RE: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Michael Ryan
I think the problem is that the mergePolicy config needs to be inside of the indexDefaults config, rather than after it as your have. -Michael

Using the contrib flexible query parser in Solr

2011-09-13 Thread Michael Ryan
Has anyone used the Flexible Query Parser (https://issues.apache.org/jira/browse/LUCENE-1567) in Solr? I'm just starting to look at it for the first time and was wondering if it is something that can be dropped into Solr fairly easily, or if more extensive changes are needed. I thought

RE: High facet.limit (with only 2-3 actual facets) - Massive bandwidth consumption in DistributedSearch

2011-09-08 Thread Michael Ryan
Are you using facet.mincount in the query? -Michael

RE: High facet.limit (with only 2-3 actual facets) - Massive bandwidth consumption in DistributedSearch

2011-09-08 Thread Michael Ryan
yep - facet.mincount=1 Yeah, I've ran into this same issue, though I never looked too closely into it. What is happening is that the facet.mincount parameter is removed when the query is made to the shards, so each shard is returning about 3 facet values, most of them with a count of 0. I

RE: Optimize concern in Solr 3.2

2011-09-02 Thread Michael Ryan
I have recently upgraded from Solr 1.4 to Solr 3.2. In Solr 1.4 only 3 files (one .cfs two segments) file were made in *index/* directory. (after doing optimize). Now, in Solr 3.2, the optimize seems not be working. My final number of files in *index/* directory are in 7-8 in number. Can

RE: Query vs Filter Query Usage

2011-08-25 Thread Michael Ryan
10,000,000 document index Internal Document id is 32 bit unsigned int Max Memory Used by a single cache slot in the filter cache = 32 bits x 10,000,000 docs = 320,000,000 bits or 38 MB I think it depends on where exactly the result set was generated. I believe the result set will usually be

Optimize requires 50% more disk space when there are exactly 20 segments

2011-08-24 Thread Michael Ryan
I'm using Solr 3.2 with a mergeFactor of 10 and no merge policy configured, thus using the default LogByteSizeMergePolicy. Before I do an optimize, typically the largest segment will be about 90% of the total index size. When I do an optimize, the total disk space required is usually about 2x

RE: Requiring multiple matches of a term

2011-08-21 Thread Michael Ryan
One simple way of doing this is maybe to write a wrapper for TermQuery that only returns docs with a Term Frequency X as far as I understand the question those terms don't have to be within a certain window right? Correct. Terms can be anywhere in the document. I figured term frequencies

Requiring multiple matches of a term

2011-08-19 Thread Michael Ryan
Is there a way to specify in a query that a term must match at least X times in a document, where X is some value greater than 1? For example, I want to only get documents that contain the word dog three times. I've thought that using a proximity query with an arbitrary large distance value

RE: Solr Accent Insensitive and sensitive search

2011-08-17 Thread Michael Ryan
Are you using the same analyzer for both type=query and type=index? Can you show us the fieldType from your schema? -Michael

RE: copyfields in schema.xml

2011-08-11 Thread Michael Ryan
Nope. The 'text' field will just have the 'titulo' contents. To have both, you would have to do something like this: copyField source=title dest=titulo/ copyField source=title dest=text/ copyField source=titulo dest=text/ -Michael

RE: How come this query string starts with wildcard?

2011-08-10 Thread Michael Ryan
I think this is because ) is treated as a token delimiter. So (foo)bar is treated the same as (foo) bar (that is, bar is treated as a separate word). So (foo)* is really parsed as (foo) * and thus the * is treated as the start of a new word. -Michael

RE: schema.xml changes, need re-indexing ?

2011-07-27 Thread Michael Ryan
You should be fine - no need to re-index your data. Adding and removing fields is generally safe to do without a re-index. Changing a field (its type, analyzers, etc) requires more caution and generally does require a re-index. -Michael

RE: Returning total matched document count with SolrJ

2011-06-30 Thread Michael Ryan
SolrDocumentList docs = queryResponse.getResults(); long totalMatches = docs.getNumFound(); -Michael

RE: Sorting by vale of field

2011-06-29 Thread Michael Ryan
You could try adding a new int field (like typeSort) that has the desired sort values. So when adding a document with type:car, also add typeSort:1; when adding type:van, also add typeSort:2; etc. Then you could do sort=typeSort asc to get them in your desired order. I think this is also

Using FieldCache in SolrIndexSearcher - crazy idea?

2011-06-28 Thread Michael Ryan
I am a user of Solr 3.2 and I make use of the distributed search capabilities of Solr using a fairly simple architecture of a coordinator + some shards. Correct me if I am wrong: In a standard distributed search with QueryComponent, the first query sent to the shards asks for fl=myUniqueKey or

omitTermFreqAndPositions in a TextField fieldType

2011-06-16 Thread Michael Ryan
Is it possible to use omitTermFreqAndPositions=true in a fieldType declaration that uses class=solr.TextField? I've tried doing this and it does not seem to work (i.e., the prx file size does not change). Using it in a field declaration does work, but I'd rather set it in the fieldType so I