Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
to handle? Case changes? Letter/non-letter transitions? All of the above? Best, Erick On Mon, Dec 29, 2014 at 3:07 PM, Jonathan Rochkind rochk...@jhu.edu wrote: On 12/29/14 5:24 PM, Jack Krupansky wrote: WDF is powerful, but it is not magic. In general, the indexed data is expected to be clean

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
a limitation. -- Jack Krupansky On Tue, Dec 30, 2014 at 11:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Thanks Erick! Yes, if I set splitOnCaseChange=0, then of course it'll work -- but then query for mixedCase will no longer also match mixed Case. I think I want WDF to... kind of do all

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
On 12/30/14 11:45 AM, Alexandre Rafalovitch wrote: On 30 December 2014 at 11:12, Jonathan Rochkind rochk...@jhu.edu wrote: I'm a bit confused about what splitOnCaseChange combined with catenateWords is meant to do at all. It _is_ generating both the split and single-word tokens at query time

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
of the features that Solr is missing is support for the Google-like feature of splitting concatenated words (regardless of case.) That's worthy of a Jira. -- Jack Krupansky On Tue, Dec 30, 2014 at 11:44 AM, Jonathan Rochkind rochk...@jhu.edu wrote: I guess I don't understand what the four use

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Jonathan Rochkind
On 12/30/14 12:35 PM, Walter Underwood wrote: You want preserveOriginal=“1”. You should only do this processing at index time. If I only do this processing at index time, then mixedCase at query time will no longer match mixed Case in the index/source material. I think I'm having trouble

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jonathan Rochkind
(the whole directory, including data) between runs after you've changed your schema (at least any of your analysis that pertains to indexing). Mixing old and new schema definitions can add to the confusion! Good luck! Erick On Wed, Sep 3, 2014 at 8:48 AM, Jonathan Rochkind rochk...@jhu.edu wrote

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jonathan Rochkind
On 12/29/14 5:24 PM, Jack Krupansky wrote: WDF is powerful, but it is not magic. In general, the indexed data is expected to be clean while the query might be sloppy. You need to separate the index and query analyzers and they need to respect that distinction I do not understand what separate

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-03 Thread Jonathan Rochkind
that the defaults for WDFF are _not_ identical. catenateWords and catenateNumbers are 1 in the index portion and 0 in the query section. Still, this shouldn't be a problem all other things being equal. Best, Erick On Tue, Sep 2, 2014 at 12:43 PM, Jonathan Rochkind rochk...@jhu.edu wrote: On 9/2/14 1:51

WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
Hello, I'm running into a case where a query is not returning the results I expect, and I'm hoping someone can offer some explanation that might help me fine tune things or understand what's up. I am running Solr 4.3. My filter chain includes a WordDelimiterFilter and, later a filter that

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Sep 2, 2014 at 12:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Hello, I'm running into a case where a query is not returning

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Sep 2, 2014 at 1:07 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Thanks for the response. I understand the problem a little bit better after

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-09-02 Thread Jonathan Rochkind
On 9/2/14 1:51 PM, Erick Erickson wrote: bq: In my actual index, query MacBook is matching ONLY mac book, and not macbook I suspect your query parameters for WordDelimiterFilterFactory doesn't have catenate words set. What do you see when you enter these in both the index and query portions of

Re: solr as nosql - pulling all docs vs deep paging limitations

2013-12-18 Thread Jonathan Rochkind
On 12/17/13 1:16 PM, Chris Hostetter wrote: As i mentioned in the blog above, as long as you have a uniqueKey field that supports range queries, bulk exporting of all documents is fairly trivial by sorting on your uniqueKey field and using an fq that also filters on your uniqueKey field modify

Re: json update moves doc to end

2013-12-03 Thread Jonathan Rochkind
What order, the order if you supply no explicit sort at all? Solr does not make any guarantees about what order documents will come back in if you do not ask for a sort. In general in Solr/lucene, the only way to update a document is to re-add it as a new document, so that's probably what's

Re: Need idea to standardize keywords - ring tone vs ringtone

2013-10-28 Thread Jonathan Rochkind
Do you know about the Solr synonym feature? That seems more applicable to what you're describing then stopwords. I'd stay away from stopwords entirely here, and try to do what you want with synonyms. Multi-word synonyms can be tricky, I'm not entirely sure the right way to do it for this use

Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Jonathan Rochkind
This is good to know, and I find it welcome advice; I would recommend making sure this advice is clearly highlighted in the relevant Solr docs, such as any getting started docs. I'm not sure everyone realizes this, and some go down tomcat route without realizing the Solr committers recommend

solr 4.3, autocommit, maxdocs

2013-07-15 Thread Jonathan Rochkind
I have a solr 4.3 instance I am in the process of standing up. It started out with an empty index. I have in it's solrconfig.xml, updateHandler class=solr.DirectUpdateHandler2 autoCommit maxDocs10/maxDocs openSearcherfalse/openSearcher /autoCommit updateHandler I

Re: solr 4.3, autocommit, maxdocs

2013-07-15 Thread Jonathan Rochkind
for visibility. You can either change the value to true, or alternatively call a deterministic commit call at the end of your load (a solr/update?commit=true will default to openSearcher=true). Hope that's of use! Jason On Jul 15, 2013, at 9:52 AM, Jonathan Rochkind rochk...@jhu.edu wrote

SolrJ and initializing logger in solr 4.3?

2013-07-11 Thread Jonathan Rochkind
I am using SolrJ in a Java (actually jruby) project, with Solr 4.3. When I instantiate an HttpSolrServer, I get the dreaded: log4j:WARN No appenders could be found for logger (org.apache.solr.client.solrj.impl.HttpClientUtil). log4j:WARN Please initialize the log4j system properly. log4j:WARN

SolrJ 4.3 to Solr 1.4

2013-07-11 Thread Jonathan Rochkind
So, trying to use a SolrJ 4.3 to talk to an old Solr 1.4. Specifically to add documents. The wiki at http://wiki.apache.org/solr/Solrj suggests, I think, that this should work, so long as you: server.setParser(new XMLResponseParser()); However, when I do this, I still get a

Re: SolrJ 4.3 to Solr 1.4

2013-07-11 Thread Jonathan Rochkind
reading this wants to share any other potential gotchas on solrj 4.3 talking to solr 1.4, feel free! On 7/11/13 4:24 PM, Jonathan Rochkind wrote: So, trying to use a SolrJ 4.3 to talk to an old Solr 1.4. Specifically to add documents. The wiki at http://wiki.apache.org/solr/Solrj suggests, I

Solr, ICUTokenizer with Latin-break-only-on-whitespace

2013-06-20 Thread Jonathan Rochkind
(to solr-user, CC'ing author I'm responding to) I found the solr-user listserv contribution at: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201305.mbox/%3c51965e70.6070...@elyograg.org%3E Which explain a way you can supply custom rulefiles to ICUTokenizer, in this case to tell

Re: Solr, ICUTokenizer with Latin-break-only-on-whitespace

2013-06-20 Thread Jonathan Rochkind
appear to be working now. Thanks! And thanks for this feature. On 6/20/2013 3:40 PM, Shawn Heisey wrote: On 6/20/2013 1:26 PM, Jonathan Rochkind wrote: I want, for instance, C++ Language to be tokenized into C++, Language. But the ICUTokenizer, even with the rulefiles=Latn:Latin-break-only

Solr 4.3, Tomcat, Error filterStart

2013-05-30 Thread Jonathan Rochkind
I am trying to get Solr installed in Tomcat, and having trouble. I am trying to use the instructions at http://wiki.apache.org/solr/SolrTomcat as a guide. Trying to start with the example Solr from the Solr distro. Tried using the Tried with both a binary distro with existing solr.war, and

Re: Solr 4.3, Tomcat, Error filterStart

2013-05-30 Thread Jonathan Rochkind
Thanks! I guess I should have asked on-list BEFORE wasting 4 hours fighting with it myself, but I was trying to be a good user and do my homework! Oh well. Off to the logging instructions, hope I can figure them out -- if you could update the tomcat instructions with the simplest possible

Re: Solr 4.3, Tomcat, Error filterStart

2013-05-30 Thread Jonathan Rochkind
I'm going to add a note to http://wiki.apache.org/solr/SolrLogging , with the Tomcat sample Error filterStart error, as an example of something you might see if you have not set up logging. Then at least in the future, googling solr tomcat error filterStart might lead someone to the clue that

Re: Solr 4.3, Tomcat, Error filterStart

2013-05-30 Thread Jonathan Rochkind
Okay, sadly, i still can't get this to work. Following the instructions at: https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty I copied solr/example/lib/ext/*.jar into my tomcat's ./lib, and copied solr/example/resources/log4j.properties

Re: Solr 4.3, Tomcat, Error filterStart

2013-05-30 Thread Jonathan Rochkind
logging setup, which ended up confirmed. Jonathan On 5/30/2013 3:19 PM, Jonathan Rochkind wrote: Okay, sadly, i still can't get this to work. Following the instructions at: https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty I copied solr

replication without automated polling, just manual trigger?

2013-05-15 Thread Jonathan Rochkind
I want to set up Solr replication between a master and slave, where no automatic polling every X minutes happens, instead the slave only replicates on command. [1] So the basic question is: What's the best way to do that? But I'll provide what I've been doing etc., for anyone interested.

writing a custom Filter plugin?

2013-05-13 Thread Jonathan Rochkind
Does anyone know of any tutorials, basic examples, and/or documentation on writing your own Filter plugin for Solr? For Solr 4.x/4.3? I would like a Solr 4.3 version of the normalization filters found here for Solr 1.4: https://github.com/billdueber/lib.umich.edu-solr-stuff But those are

Re: Solr - Remove specific punctuation marks

2012-09-24 Thread Jonathan Rochkind
When I do things like this and want to avoid empty tokens even though previous analysis might result in some--I just throw one of these at the end of my analysis chain: !-- get rid of empty string tokens. max is required, although we don't really care. -- filter

Re: How to exactly match fields which are multi-valued?

2012-03-08 Thread Jonathan Rochkind
Well, if you really want EXACT exact, just use a KeywordTokenizer (ie, not tokenize at all). But then matches will really have to be EXACT, including punctuation, whitespace, diacritics, etc. But a query will only match if it 'exactly' matches one value in your multi-valued field. You could

Re: need to support bi-directional synonyms

2012-02-23 Thread Jonathan Rochkind
Honestly, I'd just map em both the same thing in the index. sprayer, washer = sprayer or sprayer, washer = sprayer_washer At both index and query time. Now if the source document includes either 'sprayer' or 'washer', it'll get indexed as 'sprayer_washer'. And if the user enters either

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Jonathan Rochkind
So I don't really know what I'm talking about, and I'm not really sure if it's related or not, but your particular query: The Beatles as musicians : Revolver through the Anthology With the lone word that's a ':', reminds me of a dismax stopwords-type problem I ran into. Now, I ran into it on

Re: replication, disk space

2012-01-19 Thread Jonathan Rochkind
Thanks for the response. I am using Linux (RedHat). It sounds like it may possibly be related to that bug. But the thing is, the timestamped index directory is looking to me like it's the _current_ one, with the non-timestamped one being an old out of date one. So that does not seem to be

Re: replication, disk space

2012-01-19 Thread Jonathan Rochkind
Hmm, I don't have a replication.properties file, I don't think. Oh wait, yes I do there it is! I guess the replication process makes this file? Okay I don't see an index directory in the replication.properties file at all though. Below is my complete replication.properties. So I'm

Re: replication, disk space

2012-01-19 Thread Jonathan Rochkind
On 1/18/2012 1:53 PM, Tomás Fernández Löbbe wrote: As far as I know, the replication is supposed to delete the old directory index. However, the initial question is why is this new index directory being created. Are you adding/updating documents in the slave? what about optimizing it? Are you

Re: replication, disk space

2012-01-19 Thread Jonathan Rochkind
be lack of CPU or RAM on the server to do what's being asked of it. But if that's the best I can do, 20 minutes of unavailability, I'll take it). On 1/19/2012 12:37 PM, Jonathan Rochkind wrote: Hmm, I don't have a replication.properties file, I don't think. Oh wait, yes I do there it is! I

replication, disk space

2012-01-18 Thread Jonathan Rochkind
So Solr 1.4. I have a solr master/slave, where it actually doesn't poll for replication, it only replicates irregularly when I issue a replicate command to it. After the last replication, the slave, in solr_home, has a data/index directory as well as a data/index.20120113121302 directory.

replication failure, logs or notice?

2012-01-12 Thread Jonathan Rochkind
I think maybe my Solr 1.4 replications have been failing for quite some time, without me realizing it, possibly due to lack of disk space to replicate some large segments. Where would I look to see if a replication failed? Just the standard solr log? What would I look for? There's no

Re: changing omitNorms on an already built index

2011-11-07 Thread Jonathan Rochkind
On 10/27/2011 9:14 PM, Erick Erickson wrote: Well, this could be explained if your fields are very short. Norms are encoded into (part of?) a byte, so your ranking may be unaffected. Try adding debugQuery=on and looking at the explanation. If you've really omitted norms, I think you should see

changing omitNorms on an already built index

2011-10-27 Thread Jonathan Rochkind
So Solr 1.4. I decided I wanted to change a field to have omitNorms=true that didn't previously. So I changed the schema to have omitNorms=true. And I reindexed all documents. But it seems to have had absolutely no effect. All relevancy rankings seem to be the same. Now, I could have a

Re: Questions about LocalParams syntax

2011-09-20 Thread Jonathan Rochkind
I don't have the complete answer. But I _think_ if you do one 'bq' param with multiple space-seperated directives, it will work. And escaping is a pain. But can be made somewhat less of a pain if you realize that single quotes can sometimes be used instead of double-quotes. What I do:

Re: XML injection interface in select servlet?

2011-09-20 Thread Jonathan Rochkind
On Sep 20, 2011, at 04:33 , Jan Peter Stotz wrote: I am now asking myself why would someone implement such a bloodcurdling vulnerability into a web service? Until now I haven't found an exploit using the parameters in a way an attacker would get an advantage. But the way those parameters are

Re: JSON indexing failing...

2011-09-19 Thread Jonathan Rochkind
So I'm not an expert in the Solr JSON update message, never used it before myself. It's documented here: http://wiki.apache.org/solr/UpdateJSON But Solr is not a structured data store like mongodb or something; you can send it an update command in JSON as a convenience, but don't let that

Re: query for point in time

2011-09-15 Thread Jonathan Rochkind
You didn't tell us what your schema looks like, what fields with what types are involved. But similar to how you'd do it in your database, you need to find 'documents' that have a start date before your date in question, and an end date after your date in question, to find the ones whose

Re: query for point in time

2011-09-15 Thread Jonathan Rochkind
I think there's something wrong with your database then, but okay. You still haven't said what your Solr schema looks like -- that list of values doesn't say what the solr field names or types are. I think this is maybe because you don't actually have a Solr database and have no idea how Solr

RE: need some guidance about how to configure a specific solr solution.

2011-08-12 Thread Jonathan Rochkind
I don't know anything about LifeRay (never heard of it), but it sounds like you've actually figured out what you need to know about LifeRay, all you've got left is: how to replicate the writer solr server content into the readers. This should tell you how:

RE: paging size in SOLR

2011-08-10 Thread Jonathan Rochkind
I would imagine the performance penalties with deep paging will ALSO be there if you just ask for 1 rows all at once though, instead of in, say, 100 row paged batches. Yes? No? -Original Message- From: simon [mailto:mtnes...@gmail.com] Sent: Wednesday, August 10, 2011 10:44 AM To:

Re: Remote backup of Solr index over low-bandwith connection

2011-08-09 Thread Jonathan Rochkind
You can use rsync to automatically only transfer the files that have changed. I don't think you'll have to home grow your own 'only transfer the diffs' solution, I think rsync will do that for you. But yes, running an optimization, after many updates/deletes, will generally mean nearly

RE: Multiple Cores on different machines?

2011-08-09 Thread Jonathan Rochkind
tables. Others are suggesting 2 separate indexes on 2 different machines and using SOLRs capacity to combine cores and generate a third index that denormalizes the tables for us. What capability is that, exaclty? I think you may be imagining it. Solr does have some capability to distribute

Re: Weighted facet strings

2011-08-08 Thread Jonathan Rochkind
One kind of hacky way to accomplish some of those tasks involves creating a lot more Solr fields. (This kind of 'de-normalization' is often the answer to how to make Solr do something). So facet fields are ordinarily not tokenized or normalized at all. But that doesn't work very well for

Re: Dispatching a query to multiple different cores

2011-08-08 Thread Jonathan Rochkind
However, if you unify your schemas to do this, I'd consider whether you really want seperate cores/shards in the first place. If you want to search over all of them together, what are your reasons to put them in seperate solr indexes in the first place? Ordinarily, if you want to search over

Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jonathan Rochkind
Dismax queries can. But sort=termfreq(all_lists_text,'indie+music') is not using dismax. Apparenty termfreq function can not? I am not familiar with the termfreq function. To understand why you'd need to reindex, you might want to read up on how lucene actually works, to get a basic

Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Jonathan Rochkind
On 8/8/2011 5:10 PM, Markus Jelsma wrote: Will the StatsComponent in Solr do what we need with minimal configuration? Can the StatsComponent only be used on a subset of the data? For example, only look at data from certain months? If i remember correctly, it cannot. Well, if you index things

Re: Indexing tweet and searching @keyword OR #keyword

2011-08-04 Thread Jonathan Rochkind
It's the WordDelimiterFactory in your filter chain that's removing the punctuation entirely from your index, I think. Read up on what the WordDelimiter filter does, and what it's settings are; decide how you want things to be tokenized in your index to get the behavior your want; either get

Re: Is there anyway to sort differently for facet values?

2011-08-04 Thread Jonathan Rochkind
No, it can not. It just sorts alphabetically, actually by raw byte-order. No other facet sorting functionality is available, and it would be tricky to implement in a performant way because of the way lucene works. But it would certainly be useful to me too if someone could figure out a way

Re: What's the best way (practice) to do index distribution at this moment? Hadoop? rsyncd?

2011-08-04 Thread Jonathan Rochkind
I'm not sure what you mean by index distribution, that could possibly mean several things. But Solr has had a replication feature built into it from 1.4, that can probably handle the same use cases as rsync, but better. So that may be what you want. There are certainly other experiments

Re: lucene/solr, raw indexing/searching

2011-08-04 Thread Jonathan Rochkind
It depends. Okay, the source contains 4 harv. l. rev. 45 . Do you want a user entered harv. to ALSO match harv (without the period) in source, and vice versa? Or do you require it NOT match? Or do you not care? The default filter analysis chain will index 4 harv. l. rev. 45 essentially as

Re: Dismax mm per field

2011-08-03 Thread Jonathan Rochkind
There is not, and the way dismax works makes it not really that feasible in theory, sadly. One thing you could do instead is combine multiple separate dismax queries using the nested query syntax. This will effect your relevancy ranking possibly in odd ways, but anything that accomplishes 'mm

Re: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Jonathan Rochkind
There's no great way to do this. I understand your problem as: It's a multi-valued field, but you want to sort on whichever of those values matched the query, not on the values that didn't. (Not entirely clear what to do if the documents are in the result set becuse of a match in an entirely

Re: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Jonathan Rochkind
Not so much that it's a corner case in the sense of being unusual neccesarily (I'm not sure), it's just something that fundamentally doesn't fit well into lucene's architecture. I'm not sure that filing a JIRA will be much use, it's really unclear how one would get lucene to do this, it would

Re: Setting up Namespaces to Avoid Running Multiple Solr Instances

2011-08-03 Thread Jonathan Rochkind
I think that Solr multi-core (nothing to do with CPU cores, just what it's called in Solr) is what you're looking for. http://wiki.apache.org/solr/CoreAdmin On 8/3/2011 2:25 PM, Mike Papper wrote: Hi, we run several independent websites on the same machines. Each site uses a similar codebase

Re: lucene/solr, raw indexing/searching

2011-08-02 Thread Jonathan Rochkind
In your solr schema.xml, are the fields you are using defined as text fields with analyzers? It sounds like you want no analysis at all, which probably means you don't want text fields either, you just want string fields. That will make it impossible to search for individual tokens though,

Re: Jetty error message regarding EnvEntry in WebAppContext

2011-08-02 Thread Jonathan Rochkind
On 8/2/2011 11:42 AM, Marian Steinbach wrote: Can anyone tell me how a working configuration for Jetty 6.1.22 would have to look like? You know that Solr distro comes with a jetty with a Solr in it, right, as an example application? Even if you don't want to use it for some reason, that

Re: performance crossover between single index and sharding

2011-08-02 Thread Jonathan Rochkind
What's the reasoning behind having three shards on one machine, instead of just combining those into one shard? Just curious. I had been thinking the point of shards was to get them on different machines, and there'd be no reason to have multiple shards on one machine. On 8/2/2011 1:59 PM,

Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)

2011-08-01 Thread Jonathan Rochkind
Any changes you make related to stemming or normalization are likely going to require a re-index, just how it goes, just how solr/lucene works. What you can do just by normalizing at query time is limited, almost any good solution to this type of problem is going to require normalization at

Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)

2011-08-01 Thread Jonathan Rochkind
On 8/1/2011 12:42 PM, Paul Libbrecht wrote: Otherwise i need to backup the whole index and try to reindex overnight when cms users are sleeping. With some work you can do this using an extra solr that just pulls everything, then swaps the indexes (that needs a bit of downtime), then

Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)

2011-08-01 Thread Jonathan Rochkind
On 8/1/2011 1:40 PM, Mike Sokolov wrote: If you want to avoid re-indexing, you could consider building a synonym file that is generated using your rule set, and then using that to expand your queries. You'd need to get a list of all terms in your index and then process them to generate

Re: colocated term stats

2011-07-28 Thread Jonathan Rochkind
Not sure if this will do what you want, but one way might be using facets. Take the term you are interested in, and apply it as an fq. Now the result set will include only documents that include that term. So also request facets for that result set, the top 10 facets are the top 10 terms

Re: Exact match not the first result returned

2011-07-28 Thread Jonathan Rochkind
Keep in mind that if you use a field type that includes spaces (eg StrField, or KeywordTokenizer), then if you're using dismax or lucene query parsers, the only way to find matches in this field on queries that include spaces will be to do explicit phrase searches with double quotes. These

Re: Possible to use quotes in dismax qf?

2011-07-28 Thread Jonathan Rochkind
It's not clear to me why you would try to do that, I'm not sure it makes a lot of sense. You want to find all documents that have sail boat as a phrase AND have sail somewhere in them AND have boat somewhere in them? That's exactly the same as just all documents that have sail boat as a

Re: Index

2011-07-28 Thread Jonathan Rochkind
I have no idea what you mean. A file on your disk? What does INDEX in solr mean? Be more specific and clear, perhaps provide an example, and maybe someone can help you. On 7/28/2011 5:45 PM, GAURAV PAREEK wrote: Hi All, How we can check the particular;ar file is not INDEX in solr ?

Re: An idea for an intersection type of filter query

2011-07-27 Thread Jonathan Rochkind
I don't know the answer to feasibilty either, but I'll just point out that boolean OR corresponds to set union, not set intersection. So I think you probably mean a 'union' type of filter query; 'intersection' does not seem to describe what you are describing; ordinary 'fq' values are

Re: Speeding up search by combining common sub-filters

2011-07-27 Thread Jonathan Rochkind
I'm pretty sure Solr/lucene have no such optimization already, but it's not clear to me that it would result in much of a performance benefit, just because of the way lucene works, it's not obvious to me that the second version of your query will be noticeably faster than the first version.

slave data files way bigger than master

2011-07-26 Thread Jonathan Rochkind
So I've got Solr 1.4. I've got replication going on. Once a day, before replication, I optimize on master. Then I replicate. I'd expect optimization before replicate would basically replace all files on slave, this is expected. But that means I'd also expect that the index files on slave

Re: commit time and lock

2011-07-25 Thread Jonathan Rochkind
Thanks, this is helpful. I do indeed periodically update or delete just about every doc in the index, so it makes sense that optimization might be neccesary even in post 1.4, but I'm still on 1.4 -- add this to another thing to look into rather than assume after I upgrade. Indeed I was

RE: Re: previous and next rows of current record

2011-07-22 Thread Jonathan Rochkind
: Jonathan Rochkind To : solr-user@lucene.apache.org; Subject : Re: previous and next rows of current record I think maybe I know what you mean. You have a result set generated by a query. You have an item detail page in your web app -- on that item detail page, you want to give next/previous buttons

RE: commit time and lock

2011-07-22 Thread Jonathan Rochkind
How old is 'older'? I'm pretty sure I'm still getting much faster performance on an optimized index in Solr 1.4. This could be due to the nature of my index and queries (which include some medium sized stored fields, and extensive facetting -- facetting on up to a dozen fields in every

Re: Java replication takes slaves down

2011-07-21 Thread Jonathan Rochkind
How often do you replicate? Could it be a too-frequent-commit problem? (a replication is a commit to the slave). On 7/21/2011 4:39 AM, Alexander Valet | edelight wrote: Hi everybody, we are using Solr 1.4.1 as our search backend and are replicating (Java based) from one master to four

Re: previous and next rows of current record

2011-07-21 Thread Jonathan Rochkind
I think maybe I know what you mean. You have a result set generated by a query. You have an item detail page in your web app -- on that item detail page, you want to give next/previous buttons for current search results. If that's it, read on (although news isn't good), if that's not it,

Re: Determine which field term was found?

2011-07-21 Thread Jonathan Rochkind
I've had this problem too, although never come up with a good solution. I've wondered, is there any clever way to use the highlighter to accomplish tasks like this, or is that more trouble than any help it'll get you? Jonathan On 7/21/2011 5:27 PM, Yonik Seeley wrote: On Thu, Jul 21, 2011

Re: defType argument weirdness

2011-07-20 Thread Jonathan Rochkind
Huh, I'm still not completely following. I'm sure it makes sense if you understand the underlying implemetnation, but I don't understand how 'type' and 'defType' don't mean exactly the same thing, just need to be expressed differently in different location. Sorry for beating a dead horse, but

RE: Updating fields in an existing document

2011-07-20 Thread Jonathan Rochkind
Nope, you're not missing anything, there's no way to alter a document in an index but reindexing the whole document. Solr's architecture would make it difficult (although never say impossible) to do otherwise. But you're right it would be convenient for people other than you. Reindexing a

RE: defType argument weirdness

2011-07-19 Thread Jonathan Rochkind
Is it generally recognized that this terminology is confusing, or is it just me? I do understand what they do (at least well enough to use them), but I find it confusing that it's called defType as a main param, but type in a LocalParam, when to me they both seem to do the same thing --

Re: NRT and commit behavior

2011-07-18 Thread Jonathan Rochkind
In practice, in my experience at least, a very 'expensive' commit can still slow down searches significantly, I think just due to CPU (or i/o?) starvation. Not sure anything can be done about that. That's my experience in Solr 1.4.1, but since searches have always been async with commits, it

RE: Uninstall Solr

2011-07-01 Thread Jonathan Rochkind
There's no general documentation on that, because it depends on exactly what container you are using (Tomcat? Jetty? Something else?) and how you are using it. It is confusing, but blame Java for that, nothing unique to Solr. So since there's really nothing unique to Solr here, you could try

Re: Index Version and Epoch Time?

2011-06-28 Thread Jonathan Rochkind
On 6/28/2011 1:38 PM, Pranav Prakash wrote: - Will the commit by incremental indexer script also commit the previously uncommitted changes made by full indexer script before it broke? Yes, as long as the Solr instance hasn't crashed. Anything added but not yet committed sticks around

Re: moving to multicore without changing existing index

2011-06-28 Thread Jonathan Rochkind
Nope. But you can move your existing index into a core in a multi-core setup. But a multi-core setup is a multi-core setup, there's no way to have an index accessible at a non-core URL in a multi-core setup. On 6/28/2011 2:53 PM, lee carroll wrote: hi I'm looking at setting up multi core

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-22 Thread Jonathan Rochkind
Yeah, I see your points. It's complicated. I'm not sure either. But the thing is: in order to use a feature like that you'd have to really think hard about the query analysis of your fields, and which ones will produce which tokens in which situations You need to think really hard about

Re: MultiValued facet behavior question

2011-06-22 Thread Jonathan Rochkind
Okay, so since you put cardiologist in the 'q', you only want facet values that have 'cardiologist' (or 'Cardiologist') to show in up the facet list. In general, there's no good way to do that. But. If you want to do some client-side processing before you submit the query to Solr, and on

RE: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-21 Thread Jonathan Rochkind
Thanks, that's helpful. It still seems like current behavior does the wrong thing in _many_ cases (I know a lot of people get tripped up by it, sometimes on this list) -- but I understand your cases where it does the right thing, and where what I'm suggesting would be the wrong thing.

Re: getting started

2011-06-16 Thread Jonathan Rochkind
On 6/16/2011 4:41 PM, Mari Masuda wrote: One reservation I have is that eventually we would like to be able to type in Iraq and find records across all of the collections at once instead of having to search each collection separately. Although I don't know anything about it at this stage, I

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-15 Thread Jonathan Rochkind
mixing fields with different analysis in a 'qf'. On 6/14/2011 5:25 PM, Jonathan Rochkind wrote: Okay, let's try the debug trace again without a pf to be less confusing. One field in qf, that's ordinary text tokenized, and does get hits: q=churchill%20%3A%20rooseveltqt=searchqf=title1_tmm=100

Re: Multiple indexes

2011-06-15 Thread Jonathan Rochkind
Next, however, I predict you're going to ask how you do a 'join' or otherwise query accross both these cores at once though. You can't do that in Solr. On 6/15/2011 1:00 PM, Frank Wesemann wrote: You'll configure multiple cores: http://wiki.apache.org/solr/CoreAdmin Hi. How to have multiple

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-15 Thread Jonathan Rochkind
/2011 5:25 PM, Jonathan Rochkind wrote: Okay, let's try the debug trace again without a pf to be less confusing. One field in qf, that's ordinary text tokenized, and does get hits: q=churchill%20%3A%20rooseveltqt=searchqf=title1_tmm=100%debugQuery=truepf= str name=rawquerystringchurchill

ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-14 Thread Jonathan Rochkind
I'm aware that using a field tokenized with KeywordTokenizerFactory is in a dismax 'qf' is often going to result in 0 hits on that field -- (when a whitespace-containing query is entered). But I do it anyway, for cases where a non-whitespace-containing query is entered, then it hits. And in

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-14 Thread Jonathan Rochkind
| title1_t:roosevelt)~0.01)~3) ()/str On 6/14/2011 5:19 PM, Jonathan Rochkind wrote: I'm aware that using a field tokenized with KeywordTokenizerFactory is in a dismax 'qf' is often going to result in 0 hits on that field -- (when a whitespace-containing query is entered). But I do it anyway

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread Jonathan Rochkind
Um, normally that would never happen, because, well, like you say, the inverted index doesn't have docC for term K1, because doc C didn't include term K1. If you search on q=K1, then how/why would docC ever be in your result set? Are you seeing it in your result set? The question then would

Re: Default query parser operator

2011-06-07 Thread Jonathan Rochkind
Nope, not possible. I'm not even sure what it would mean semantically. If you had default operator OR ordinarily, but default operator AND just for field2, then what would happen if you entered: field1:foo field2:bar field1:baz field2:bom Where the heck would the ANDs and ORs go? The

  1   2   3   4   5   >