Re: Solr performance issue

2011-03-14 Thread Jonathan Rochkind
I've definitely had cases in 1.4.1 where even though I didn't have an OOM error, Solr was being weirdly slow, and increasing the JVM heap size fixed it. I can't explain why it happened, or exactly how you'd know this was going on, I didn't see anything odd in the logs to indicate, I just

Re: Solr performance issue

2011-03-14 Thread Jonathan Rochkind
It's actually, as I understand it, expected JVM behavior to see the heap rise to close to it's limit before it gets GC'd, that's how Java GC works. Whether that should happen every 20 seconds or what, I don't nkow. Another option is setting better JVM garbage collection arguments, so GC

Re: disquery - difference qf qs / pf ps

2011-03-10 Thread Jonathan Rochkind
On 3/10/2011 8:15 AM, Gastone Penzo wrote: Thank you very much. i understand the difference beetween qs and ps but not what pf is...is it necessary to use ps? It's not neccesary to use anything, including Solr. pf: Will take the entire query the user entered, make it into a single phrase,

Re: True master-master fail-over without data gaps

2011-03-09 Thread Jonathan Rochkind
On 3/9/2011 12:05 PM, Otis Gospodnetic wrote: But check this! In some cases one is not allowed to save content to disk (think copyrights). I'm not making this up - we actually have a customer with this cannot save to disk (but can index) requirement. Do they realize that a Solr index is on

Re: Excluding results from more like this

2011-03-09 Thread Jonathan Rochkind
Yeah, that just restricts what items are in your main result set (and adding -4 has no real effect). The more like this set is constructed based on your main result set, for each document in it. As far as I can see from here: http://wiki.apache.org/solr/MoreLikeThis ..there seems to be no

Re: Same index is ranking differently on 2 machines

2011-03-09 Thread Jonathan Rochkind
Yes, but the identical index with the identical solrconfig.xml and the identical query and the identical version of Solr on two different machines should preduce identical results. So it's a legitimate question why it's not. But perhaps queryNorm isn't enough to answer that. Sorry, it's out

Re: Same index is ranking differently on 2 machines

2011-03-09 Thread Jonathan Rochkind
you mention is identical, I am as certain as I can be. I too think there must be a difference I have missed but I have run out of ideas for what to check! Frustrating :) On Mar 9, 2011, at 4:38 PM, Jonathan Rochkind wrote: Yes, but the identical index with the identical solrconfig.xml

Re: NRT in Solr

2011-03-09 Thread Jonathan Rochkind
Interesting, does anyone have a summary of what techniques zoie uses to do this? I don't see any docs on the technical details. On 3/9/2011 5:29 PM, Smiley, David W. wrote: Zoie adds NRT to Solr: http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin I haven't tried it yet but looks

Re: Solr Hanging all of sudden with update/csv

2011-03-08 Thread Jonathan Rochkind
My guess is that you're running out of RAM. Actual Java profiling is beyond me, but I have seen issues on updating that were solved by more RAM. If you are updating every few minutes, and your new index takes more than a few minutes to warm, you could be running into overlapping warming

RE: True master-master fail-over without data gaps

2011-03-08 Thread Jonathan Rochkind
I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync

Re: dismax, and too much qf?

2011-03-07 Thread Jonathan Rochkind
I use about that many qf's in Solr 1.4.1. It works. I'm not entirely sure if it has performance implications -- I do have searching that is somewhat slower then I'd like, but I'm not sure if the lengthy qf is a contributing factor, or other things I'm doing (like a dozen different

RE: Full Text Search with multiple index and complex requirements

2011-03-06 Thread Jonathan Rochkind
While it might be possible to work things out, not just one but several of your requirements are things that are difficult for Solr to do or which solr isn't really optimized to do. Are you sure you need an inverted indexing tool like Solr at all, as opposed to some kind of store (rdbms or

RE: Model foreign key type of search?

2011-03-04 Thread Jonathan Rochkind
Yep, it's tricky to do this sort of thing in Solr. One way to do it would be to try and reindex the main item on some regular basis with the keywords/comments actually flattened into the main record. Maybe along with a field for number_of_comments, so you can boost on that or what have you.

RE: When Index is Updated Frequently

2011-03-04 Thread Jonathan Rochkind
If you can make that solution work for you, I think it is a wise one which will serve you well. In some cases that solution won't work, because you _need_ the frequently changing data in Solr to be searched against in Solr. But if you can get away without that, I think you will be well-served

Re: uniqueKey merge documents on commit

2011-03-03 Thread Jonathan Rochkind
Nope, there is not. On 3/3/2011 10:55 AM, Tim Gilbert wrote: Hi, I have a unique key within my index, but rather than the default behavour of overwriting I am wondering if there is a method to merge the two different documents on commit of the second document. I have a testcase which

Re: FilterQuery OR statement

2011-03-03 Thread Jonathan Rochkind
You might also consider splitting your two seperate AND clauses into two seperate fq's: fq=field1:(1 OR 2 OR 3 OR 4) fq=field2:(4 OR 5 OR 6 OR 7) That will cache the two seperate clauses seperately in the field cache, which is probably preferable in general, without knowing more about your

Re: multiple localParams for each query clause

2011-03-02 Thread Jonathan Rochkind
Not per clause, no. But you can use the nested queries feature to set local params for each nested query instead. Which is in fact one of the most common use cases for local params. q=_query_:{type=x q.field=z}something AND _query_:{!type=database}something URL encode that whole thing

Re: multi-core solr, specifying the data directory

2011-03-02 Thread Jonathan Rochkind
Meanwhile, I'm having trouble getting the expected behavior at all. I'll try to give the right details (without overwhelming with too many), if anyone can see what's going on. Solr 1.4.1. Multi-core. 'Main' solr home with solr.xml at /opt/solr/solr_indexer/solr.xml The solr.xml includes

Re: multi-core solr, specifying the data directory

2011-03-02 Thread Jonathan Rochkind
is) -- or maybe even on some other value, the tomcat base url or something? Is _that_ a bug? On 3/2/2011 3:38 PM, Jonathan Rochkind wrote: Meanwhile, I'm having trouble getting the expected behavior at all. I'll try to give the right details (without overwhelming with too many), if anyone can see what's

Re: multi-core solr, specifying the data directory

2011-03-01 Thread Jonathan Rochkind
AS - www.cominvent.com On 1. mars 2011, at 00.00, Jonathan Rochkind wrote: Unless I'm doing something wrong, in my experience in multi-core Solr in 1.4.1, you NEED to explicitly provide an absolute path to the 'data' dir. I set up multi-core like this: cores adminPath=/admin/cores core name=some_core

Re: multi-core solr, specifying the data directory

2011-03-01 Thread Jonathan Rochkind
Hmm, okay, have to try to find time to install the example/multicore and see. It's definitely never worked for me, weird. Thanks. On 3/1/2011 2:38 PM, Chris Hostetter wrote: : Unless I'm doing something wrong, in my experience in multi-core Solr in : 1.4.1, you NEED to explicitly provide an

Re: solr different sizes on master and slave

2011-03-01 Thread Jonathan Rochkind
The slave should not keep multiple copies _permanently_, but might temporarily after it's fetched the new files from master, but before it's committed them and fully wamred the new index searchers in the slave. Could that be what's going on, is your slave just still working on committing and

Re: Query on multivalue field

2011-03-01 Thread Jonathan Rochkind
Each token has a position set on it. So if you index the value alpha beta gamma, it winds up stored in Solr as (sort of, for the way we want to look at it) document1: alpha:position 1 beta:position 2 gamma: postition 3 If you set the position increment gap large, then

Re: multi-core solr, specifying the data directory

2011-03-01 Thread Jonathan Rochkind
/data}/dataDir -Mike On 3/1/2011 4:38 PM, Jonathan Rochkind wrote: Hmm, okay, have to try to find time to install the example/multicore and see. It's definitely never worked for me, weird. Thanks. On 3/1/2011 2:38 PM, Chris Hostetter wrote: : Unless I'm doing something wrong, in my experience

setting different solrconfig.xml for a core

2011-02-28 Thread Jonathan Rochkind
So I think I ought to be able to set up a particular solr core to use a different file for solrconfig.xml. (The reason I want to do this is so I can have master and slave in replication have the exact same repo checkout for their conf directory, but have the master using a different

Re: setting different solrconfig.xml for a core

2011-02-28 Thread Jonathan Rochkind
On 2/28/2011 1:09 PM, Ahmet Arslan wrote: (The reason I want to do this is so I can have master and slave in replication have the exact same repo checkout for their conf directory, but have the master using a different solrconfig.xml, one set up to be master.) How about using same

Re: setting different solrconfig.xml for a core

2011-02-28 Thread Jonathan Rochkind
to find a file that does actually exist. Unless I put the name solrconfig.xml in there, then it works fine, heh. On 2/28/2011 3:00 PM, Jonathan Rochkind wrote: On 2/28/2011 1:09 PM, Ahmet Arslan wrote: (The reason I want to do this is so I can have master and slave in replication have the exact

Re: setting different solrconfig.xml for a core

2011-02-28 Thread Jonathan Rochkind
Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Monday, February 28, 2011 2:03 PM To: solr-user@lucene.apache.org Subject: Re: setting different solrconfig.xml for a core Okay, I did manage to find a clue from the log that it's not working, when it's not working: INFO: Jk

Re: setting different solrconfig.xml for a core

2011-02-28 Thread Jonathan Rochkind
tried. May have had a syntax error in my master-solrconfig.xml file, even though the Solr log files didn't report any, maybe when there's a syntax error Solr just silently gives up on the config file and presents an empty index, I dunno. On 2/28/2011 3:46 PM, Jonathan Rochkind wrote: Yeah

Re: setting different solrconfig.xml for a core

2011-02-28 Thread Jonathan Rochkind
enable.searcher or something. I'm not entirely sure in what places the enable attribute is recognized and in what places it isn't, but it LOOKS like it's recognized on the listener tag. I think. On 2/28/2011 3:52 PM, Jonathan Rochkind wrote: Aha, wait, I think I've made it work, as simple

Re: setting different solrconfig.xml for a core

2011-02-28 Thread Jonathan Rochkind
that the StandardRequestHandler will honor it. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Monday, February 28, 2011 3:09 PM To: solr-user@lucene.apache.org Subject: Re: setting different

suggestion: do not require masterUrl for slave config

2011-02-28 Thread Jonathan Rochkind
Suggestion, curious what other people think of it, if I should bother filing a JIRA and/or trying to come up with a patch. Currently, when you configure a replication lst name=slave, you HAVE to give it a masterUrl. SEVERE: org.apache.solr.common.SolrException: 'masterUrl' is required for a

multi-core solr, specifying the data directory

2011-02-28 Thread Jonathan Rochkind
Unless I'm doing something wrong, in my experience in multi-core Solr in 1.4.1, you NEED to explicitly provide an absolute path to the 'data' dir. I set up multi-core like this: cores adminPath=/admin/cores core name=some_core instanceDir=some_core /core /cores Now, setting instanceDir like

RE: Disabling caching for fq param?

2011-02-28 Thread Jonathan Rochkind
As far as I know there is not, it might be beneficial, but also worth considering: thousands of users isn't _that_ many, and if that same clause is always the same per user, then if the same user does a query a second time, it wouldn't hurt to have their user-specific fq in the cache. A single

RE: query results filter

2011-02-24 Thread Jonathan Rochkind
Hmm, depending on what you are actually needing to do, can you do it with a simple fq param to filter out what you want filtered out, instead of needing to write custom Java as you are suggesting? It would be a lot easier to just use an fq. How would you describe the documents you want to

RE: Best way for a query-expander?

2011-02-19 Thread Jonathan Rochkind
I don't think there's any way to do this in Solr, although you could write your own query parser in Java if you wanted to. You can set defaults , invariants and appends values on your request handler, but I don't think that's flexible enough to do what you want.

Re: GET or POST for large queries?

2011-02-17 Thread Jonathan Rochkind
Yes, I think it's 1024 by default. I think you can raise it in your config. But your performance may suffer. Best would be to try and find a better way to do what you want without using thousands of clauses. This might require some custom Java plugins to Solr though. On 2/17/2011 3:52 PM,

optimize and mergeFactor

2011-02-16 Thread Jonathan Rochkind
In my own Solr 1.4, I am pretty sure that running an index optimize does give me significant better performance. Perhaps because I use some largeish (not huge, maybe as large as 200k) stored fields. So I'm interested in always keeping my index optimized. Am I right that if I set mergeFactor

Re: optimize and mergeFactor

2011-02-16 Thread Jonathan Rochkind
Thanks for the answers, more questions below. On 2/16/2011 3:37 PM, Markus Jelsma wrote: 200.000 stored fields? I asume that number includes your number of documents? Sounds crazy =) Nope, I wasn't clear. I have less than a dozen stored field, but the value of a stored field can sometimes

Re: Solr multi cores or not

2011-02-16 Thread Jonathan Rochkind
Solr multi-core essentially just lets you run multiple seperate distinct Solr indexes in the same running Solr instance. It does NOT let you run queries accross multiple cores at once. The cores are just like completely seperate Solr indexes, they are just conveniently running in the same

minimum Solr slave replication config

2011-02-16 Thread Jonathan Rochkind
Solr 1.4.1. So, from the documentation at http://wiki.apache.org/solr/SolrReplication I was wondering if I could get away without having any actual configuration in my slave at all. The replication handler is turned on, but if I'm going to manually trigger replication pulls while supplying

Re: Solr multi cores or not

2011-02-16 Thread Jonathan Rochkind
www.sirsidynix.com -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, February 16, 2011 4:09 PM To: solr-user@lucene.apache.org Cc: Thumuluri, Sai Subject: Re: Solr multi cores or not Solr multi-core essentially just lets you run multiple seperate distinct Solr

Re: Multicore boosting to only 1 core

2011-02-15 Thread Jonathan Rochkind
No. In fact, there's no way to search over multi-cores at once in Solr at all, even before you get to your boosting question. Your different cores are entirely different Solr indexes, Solr has no built-in way to combine searches accross multiple Solr instances. [Well, sort of it can, with

Re: schema.xml configuration for file names?

2011-02-15 Thread Jonathan Rochkind
You can't just send arbitrary XML to Solr for update, no. You need to send a Solr Update Request in XML. You can write software that transforms that arbitrary XML to a Solr update request, for simple cases it could even just be XSLT. There are also a variety of other mediator pieces that

RE: Concurrent updates/commits

2011-02-09 Thread Jonathan Rochkind
Solr does handle concurrency fine. But there is NOT transaction isolation like you'll get from an rdbms. All 'pending' changes are (conceptually, anyway) held in a single queue, and any commit will commit ALL of them. There isn't going to be any data corruption issues or anything from

RE: relational db mapping for advanced search

2011-02-08 Thread Jonathan Rochkind
I have no great answer for you, this is to me a generally unanswered question, hard to do Solr with this sort of thing, I think you seem to understand it properly. There ARE some interesting new features in trunk (not 1.4) that may be relevant, although to my perspective none of them provide

RE: prices

2011-02-04 Thread Jonathan Rochkind
Your prices are just dollars and cents? For actual queries, you might consider an int type rather than a float type. Multiple by a hundred to put it in the index, then multiply your values in queries by a hundred before putting them in the query. Same for range facetting, just divide by 100

Re: chaning schema

2011-02-03 Thread Jonathan Rochkind
It could be related Tomcat. I've had inconsistent experiences there too, I _thought_ I could delete just the contents of the data/ directory, but at some point I realized that wasn't working, confusing me as to whether I was remembering correctly that deleting just the contents ever worked.

Re: OAI on SOLR already done?

2011-02-02 Thread Jonathan Rochkind
The trick is that you can't just have a generic black box OAI-PMH provider on top of any Solr index. How would it know where to get the metadata elements it needs, such as title, or last-updated date, etc. Any given solr index might not even have this in stored fields -- and a given app might

Re: OAI on SOLR already done?

2011-02-02 Thread Jonathan Rochkind
On 2/2/2011 5:19 PM, Dennis Gearon wrote: Does something like this work to extract dates, phone numbers, addresses across international formats and languages? Or, just in the plain ol' USA? What are you talking about? There is nothing discussed in this thread that does any 'extracting' of

RE: DismaxParser Query

2011-01-27 Thread Jonathan Rochkind
Yes, I think nested queries are the only way to do that, and yes, nested queries like Daniel's example work (I've done it myself). I haven't really tried to get into understanding/demonstrating _exactly_ how the relevance ends up working on the overall master query in such a situation, but it

Re: How to edit / compile the SOLR source code

2011-01-26 Thread Jonathan Rochkind
[Btw, this is great, thank you so much to Solr devs for providing simple ant-based compilation, and not making me install specific development tools and/or figure out how to use maven to compile, like certain other java projects. Just make sure ant is installed and 'ant dist', I can do that!

Re: in-index representaton of tokens

2011-01-25 Thread Jonathan Rochkind
Why does it matter? You can't really get at them unless you store them. I don't know what table per column means, there's nothing in Solr architecture called a table or a column. Although by column you probably mean more or less Solr field. There is nothing like a table in Solr. Solr is

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Jonathan Rochkind
I haven't figured out any way to achieve that AT ALL without making a seperate Solr index just to serve autosuggest queries. At least when you want to auto-suggest on a multi-value field. Someone posted a crazy tricky way to do it with a single-valued field a while ago. If you can/are willing

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Jonathan Rochkind
Ah, sorry, I got confused about your requirements, if you just want to match at the beginning of the field, it may be more possible. Using edgegrams or wildcard. If you have a single-valued field. Do you have a single-valued or a multi-valued field? That is, does each document have just one

Re: Specifying optional terms with standard (lucene) request handler?

2011-01-25 Thread Jonathan Rochkind
With the 'lucene' query parser? include q.op=OR and then put a + (mandatory) in front of every term in the 'q' that is NOT optional, the rest will be optional. I think that will do what want. Jonathan On 1/25/2011 5:07 PM, Daniel Pötzinger wrote: Hi I am searching for a way to specify

RE: in-index representaton of tokens

2011-01-25 Thread Jonathan Rochkind
die. - Original Message From: Jonathan Rochkind rochk...@jhu.edu To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, January 25, 2011 9:29:36 AM Subject: Re: in-index representaton of tokens Why does it matter? You can't really get at them unless you store them. I

Re: Taxonomy in SOLR

2011-01-24 Thread Jonathan Rochkind
There aren't any great general purpose out of the box ways to handle hieararchical data in Solr. Solr isn't an rdbms. There may be some particular advice on how to set up a particular Solr index to answer particular questions with regard to hieararchical data. I saw a great point made

RE: filter update by IP

2011-01-23 Thread Jonathan Rochkind
My favorite other external firewall'ish technology is just an apache front-end reverse proxying to the Java servlet (such as Solr), with access controls in apache. I haven't actually done it with Solr myself though, my Solr is behind a firewall accessed by trusted apps only. Be careful

RE: api key filtering

2011-01-22 Thread Jonathan Rochkind
If you COULD solve your problem by indexing 'public', or other tokens from a limited vocabulary of document roles, in a field -- then I'd definitely suggest you look into doing that, rather than doing odd things with Solr instead. If the only barrier is not currently having sufficient logic at

Re: Which QueryParser to use

2011-01-20 Thread Jonathan Rochkind
On 1/20/2011 1:42 AM, kun xiong wrote: Thar example string means our query is BooleanQuery containing BooleanQuerys. I am wondering how to write a complicated BooleanQuery for dismax, like (A or B or C) and (D or E) Or I have to use Lucene query parser. You can't do it with dismax. You might

Re: Showing facet values in alphabetical order

2011-01-20 Thread Jonathan Rochkind
Are you showing the facets with facet parameters in your request? Then you can ask for the facets to be returned sorted by byte-order with facet.sort=index. Got nothing to do with your schema, let alone your DIH import configuration that you showed us. Just a matter of how you ask Solr for

Re: Adding weightage to the facets count

2011-01-20 Thread Jonathan Rochkind
Maybe?: Just keep the 'weightages' in an external store of some kind (rdbms, nosql like mongodb, just a straight text config file that your app loads into a hash internally, whatever), rather than Solr, and have your app look them up for each facet value to be displayed, after your app fetches

Re: Indexing all permutations of words from the input

2011-01-20 Thread Jonathan Rochkind
Why do you want to do this, what is it meant to accomplish? There might be a better way to accomplish what it is you are trying to do; I can't think of anything (which doesn't mean it doesn't exist) that what you're actually trying to do would be required in order to do. What sorts of

Re: Indexing all permutations of words from the input

2011-01-20 Thread Jonathan Rochkind
' type field that won't tokenize. On 1/20/2011 4:40 PM, Martin Jansen wrote: On 20.01.11 22:19, Jonathan Rochkind wrote: On 1/20/2011 4:03 PM, Martin Jansen wrote: I'm looking for ananalyzer configuration for Solr 1.4 that accomplishes the following: Given the input abc xyz foo I would like

Re: Opensearch Format Support

2011-01-20 Thread Jonathan Rochkind
No, not exactly. In general, people don't expose their Solr API direct to the world -- they front Solr with some software that is exposed to the world. (If you do expose your Solr API directly to the world, you will need to think carefully about security, and make sure you aren't letting

Re: Return all contents from collection

2011-01-19 Thread Jonathan Rochkind
I know that this is often a performance problem -- but Erick, I am interested in the 'better solution' you hint at! There are a variety of cases where you want to 'dump' all documents from a collection. One example might be in order to build a Google SiteMap for your app that's fronting your

Re: Local param tag voodoo ?

2011-01-19 Thread Jonathan Rochkind
What query are you actually trying to do? There's probably a way to do it, possibly using nested queries -- but not using illegal syntax like some of your examples! If you explain what you want to do, someone may be able to tell you how. From the hints in your last message, I suspect nested

Re: unix permission styles for access control

2011-01-19 Thread Jonathan Rochkind
No. There is no built in way to address 'bits' in Solr that I am aware of. Instead you can think about how to transform your data at indexing into individual tokens (rather than bits) in one or more field, such that they are capable of answering your query. Solr works in tokens as the basic

Re: unix permission styles for access control

2011-01-19 Thread Jonathan Rochkind
Yep, that's what I'm suggesting as one possible approach to consider, whether it will work or not depends on your specifics. Character length in a token doesn't really matter for solr performance. It might be less confusing to actually put read update delete own (or whatever 'o' stands for)

Re: facet or filter based on user's history

2011-01-19 Thread Jonathan Rochkind
The problem is going to be 'near real time' indexing issues. Solr 1.4 at least does not do a very good job of handling very frequent commits. If you want to add to the user's history in the Solr index ever time they click the button, and they click the button a lot, and this naturally leads

Re: performance during index switch

2011-01-19 Thread Jonathan Rochkind
During commit? A commit (and especially an optimize) can be expensive in terms of both CPU and RAM as your index grows larger, leaving less CPU for querying, and possibly less RAM which can cause Java GC slowdowns in some cases. A common suggestion is to use Solr replication to seperate out

Re: performance during index switch

2011-01-19 Thread Jonathan Rochkind
On 1/19/2011 2:56 PM, Tri Nguyen wrote: Yes, during a commit. I'm planning to do as you suggested, having a master do the indexing and replicating the index to a slave which leads to my next questions. During the slave replicates the index files from the master, how does it impact

Re: Search on two core and two schema

2011-01-18 Thread Jonathan Rochkind
Solr can't do that. Two cores are two seperate cores, you have to do two seperate queries, and get two seperate result sets. Solr is not an rdbms. On 1/18/2011 12:24 PM, Damien Fontaine wrote: I want execute this query : Schema 1 : field name=id type=string indexed=true stored=true

Re: StopFilterFactory and qf containing some fields that use it and some that do not

2011-01-13 Thread Jonathan Rochkind
It's a known 'issue' in dismax, (really an inherent part of dismax's design with no clear way to do anything about it), that qf over fields with different stop word definitions will produce odd results for a query with a stopword. Here's my understanding of what's going on:

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Jonathan Rochkind
Scanning for only 'valid' utf-8 is definitely not simple. You can eliminate some obviously not valid utf-8 things by byte ranges, but you can't confirm valid utf-8 alone by byte ranges. There are some bytes that can only come after or before other certain bytes to be valid utf-8. There is no

RE: verifying that an index contains ONLY utf-8

2011-01-13 Thread Jonathan Rochkind
So you're allowed to put the entire original document in a stored field in Solr, but you aren't allowed to stick it in, say, a redis or couchdb too? Ah, beaurocracy. But no reason what you are doing won't work, as you of course already know from doing it. If you actually know the charset of

RE: start value in queries zero or one based?

2011-01-13 Thread Jonathan Rochkind
You could have tried it and seen for yourself on any Solr server in your possession in less time than it took to have this thread. And if you don't have a Solr server, then why do you care? But the answer is 0. http://wiki.apache.org/solr/CommonQueryParameters#start The default value is 0

Re: pruning search result with search score gradient

2011-01-12 Thread Jonathan Rochkind
Some times I've _considered_ trying to do this (but generally decided it wasn't worth it) was when I didn't want those documents below the threshold to show up in the facet values. In my application the facet counts are sometimes very pertinent information, that are sometimes not quite as

Re: Improving Solr performance

2011-01-10 Thread Jonathan Rochkind
I see a lot of people using shards to hold different types of documents, and it almost always seems to be a bad solution. Shards are intended for distributing a large index over multiple hosts -- that's it. Not for some kind of federated search over multiple schemas, not for access control.

Re: Tuning StatsComponent

2011-01-10 Thread Jonathan Rochkind
I found StatsComponent to be slow only when I didn't have enough RAM allocated to the JVM. I'm not sure exactly what was causing it, but it was pathologically slow -- and then adding more RAM to the JVM made it incredibly fast. On 1/10/2011 4:58 AM, Gora Mohanty wrote: On Mon, Jan 10, 2011

Re: Improving Solr performance

2011-01-10 Thread Jonathan Rochkind
On 1/10/2011 5:03 PM, Dennis Gearon wrote: What I seem to see suggested here is to use different cores for the things you suggested: different types of documents Access Control Lists I wonder how sharding would work in that scenario? Sharding has nothing to do with that scenario at all.

Re: Improving Solr performance

2011-01-10 Thread Jonathan Rochkind
And I don't think I've seen anyone suggest a seperate core just for Access Control Lists. I'm not sure what that would get you. Perhaps a separate store that isn't Solr at all, in some cases. On 1/10/2011 5:36 PM, Jonathan Rochkind wrote: Access Control Lists

RE: (FQ) Filter Query Caching Differences with OR and AND?

2011-01-06 Thread Jonathan Rochkind
Jonathan Rochkind wrote: Each 'fq' clause is it's own cache key. 1. fq=foo:bar OR foo:baz = one entry in filter cache 2. fq=foo:barfq=foo:baz = two entries in filter cache, will not use cached entry from #1 3. fq=foo:bar = One entry, will use cached entry from #2 4. fq=foo:bar

Re: searching against unstemmed text

2011-01-04 Thread Jonathan Rochkind
Do you have to do anything special to search against a field in Solr? No, that's what Solr does. Please be more specific about what you are trying to do, what you expect to happen, and what happens instead. If your Solr field is analyzed to stem, then indeed you can only match stemmed

Re: Sub query using SOLR?

2011-01-04 Thread Jonathan Rochkind
Yeah, I don't believe there's any good way to do it in Solr 1.4. You can make two queries, first make your 'sub' query, get back the list of values, then construct the second query where you do {!field v=field_name} val1 OR val2 OR val3 OR valN Kind of a pain, and there is a maximum

Re: Advice on Exact Matching?

2011-01-04 Thread Jonathan Rochkind
There is a hacky kind of thing that Bill Dueber figured out for using multiple fields and dismax to BOOST exact matches, but include all matches in the result set. You have to duplicate your data in a second non-tokenized field. Then you use dismax pf to super boost matches on the

Re: DIH and UTF-8

2010-12-29 Thread Jonathan Rochkind
I haven't tried it yet, but I _think_ in Rails if you are using the 'mysql2' adapter (now standard with Rails3) instead of 'mysql', it might handle utf-8 better with less areas for gotchas. I think if the underlying mysql database is set to use utf-8, then, at least with mysql2 adapter, you

RE: Solr 1.4.1 stats component count not matching facet count for multi valued field

2010-12-23 Thread Jonathan Rochkind
Interesting, the wiki page on StatsComponent says multi-valued fields may be slow , and may use lots of memory. http://wiki.apache.org/solr/StatsComponent Apparently it should also warn that multi-valued fields may not work at all? I'm going to add that with a link to the JIRA ticket.

RE: Solr 1.4.1 stats component count not matching facet count for multi valued field

2010-12-23 Thread Jonathan Rochkind
Aha! Thanks, sorry, I'll clarify on my wiki edit. From: Chris Hostetter [hossman_luc...@fucit.org] Sent: Friday, December 24, 2010 12:11 AM To: solr-user@lucene.apache.org Subject: RE: Solr 1.4.1 stats component count not matching facet count for multi

RE: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria

2010-12-22 Thread Jonathan Rochkind
This won't actually give you the number of distinct facet values, but will give you the number of documents matching your conditions. It's more equivalent to SQL without the distinct. There is no way in Solr 1.4 to get the number of distinct facet values. I am not sure about the new features

Re: Duplicate values in multiValued field

2010-12-22 Thread Jonathan Rochkind
In my experience, that should work fine. Facetting in 1.4 works fine on multi-valued fields, and a duplicate value in the multi-valued field shouldn't be a problem. On 12/22/2010 2:31 AM, Andy wrote: If I put duplicate values into a multiValued field, would that cause any issues? For example

Re: White space in facet values

2010-12-22 Thread Jonathan Rochkind
Another technique, which works great for facet fq's and avoids the need to worry about escaping, is using the field query parser instead: fq={!field f=Product}Electric Guitar Using the field query parser avoids the need for ANY escaping of your value at all, which is convenient in the

Re: White space in facet values

2010-12-22 Thread Jonathan Rochkind
Huh, does !term in 4.0 mean the same thing as !field in 1.4? What you describe as !term in 4.0 dev is what I understand as !field in 1.4 doing. On 12/22/2010 10:01 AM, Yonik Seeley wrote: On Wed, Dec 22, 2010 at 9:53 AM, Dyer, Jamesjames.d...@ingrambook.com wrote: The phrase solution works

Re: Solr query to get results based on the word length (letter count)

2010-12-22 Thread Jonathan Rochkind
No good way. At indexing time, I'd just store the number of chars in the title in a field of it's own. You can possibly do that solely in schema.xml with clever use of analyzers and copyField. Solr isn't an rdbms. Best to de-normalize at index time so what you're going to want to query is

Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria

2010-12-22 Thread Jonathan Rochkind
(and waiting for Solr to construct the very large response), then you're out of luck. But if you're willing to get back all the values in the response too, that'll work, true. On 12/22/2010 11:23 AM, Erik Hatcher wrote: On Dec 22, 2010, at 09:21 , Jonathan Rochkind wrote: This won't actually

Re: full text search in multiple fields

2010-12-22 Thread Jonathan Rochkind
Did you reindex after you changed your analyzers? On 12/22/2010 12:57 PM, PeterKerk wrote: Hi guys, There's one more thing to get this code to work as I need I just found out... Im now using:q=title_search:hort*defType=lucene as iorixxx suggested. it works good BUT, this query doesnt find

Re: Case Insensitive sorting while preserving case during faceted search

2010-12-21 Thread Jonathan Rochkind
Hoss, I think the use case being asked about is specifically doing a facet.sort though, for cases where you actually do want to sort facet values with facet.sort, not sort records -- while still presenting the facet values with original case, but sorting them case insensitively. The solutions

RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Jonathan Rochkind
noticed that those files have changed. The commit is intended to remedy that - it causes a new index reader to be created, based upon the new on disk files, which will include updates from both syncs. Upayavira On Mon, 13 Dec 2010 23:11 -0500, Jonathan Rochkind rochk...@jhu.edu wrote: Sorry, I

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-14 Thread Jonathan Rochkind
Yeah, I understand basically how caches work. What I don't understand is what happens in replication if, the new segment files are succesfully copied, but the actual commit fails due to maxAutoWarmingSearches. The new files are on disk... but the commit could not succeed and there is NOT a

<    1   2   3   4   5   >