Re: How do I best store and retrieve ISO country codes?

2007-08-24 Thread Yonik Seeley
On 8/24/07, Simon Peter Nicholls [EMAIL PROTECTED] wrote: I've just noticed that for ISO 2 character country codes such as BE and IT, my queries are not working as expected. The field is being stored as country_t, dynamically from acts_as_solr v0.9, as follows (from schema.xml):

Re: sort problem

2007-09-02 Thread Yonik Seeley
On 9/2/07, michael ravits [EMAIL PROTECTED] wrote: this is the field definition: field name=msgid type=slong indexed=true stored=true required=true / holds message id's, values range from 0 to 127132531 can I disable this cache? No, sorting wouldn't work without it. The cache structure

Re: sort problem

2007-09-03 Thread Yonik Seeley
On 9/3/07, Marcus Stratmann [EMAIL PROTECTED] wrote: If you could live with a cap of 2B on message id, switching to type int would decrease the memory usage to 4 bytes per doc (presumably you don't need range queries?) I haven't found exact definitions of the fieldTypes anywhere. Does

Re: -field:[* TO *] doesn't seem to work

2007-09-03 Thread Yonik Seeley
Can you provide the full query response (with debugging output)? -Yonik On 9/3/07, Jérôme Etévé [EMAIL PROTECTED] wrote: Hi all I've got a problem here with the '-field:[* TO *]' syntax. It doesn't seem to work as expected

Re: Multiple Values -Structured?

2007-09-04 Thread Yonik Seeley
You could index both a compound field and the components separately. This could be simplified by sending the value in once as the compound format: review,1 Jan 2007 revision, 2 Jan 200 And then use a copyField with a regex tokenizer to extract and index the date into a separate field. You

Re: solr.py problems with german Umlaute

2007-09-06 Thread Yonik Seeley
On 9/6/07, Brian Carmalt [EMAIL PROTECTED] wrote: Try it with title.encode('utf-8'). As in: kw = {'id':'12','title':title.encode('utf-8'),'system':'plone','url':'http://www.google.de'} It seems like the client library should be responsible for encoding, not the user. So try changing

Re: Replication broken.. no helpful errors?

2007-09-06 Thread Yonik Seeley
On 9/6/07, Matthew Runo [EMAIL PROTECTED] wrote: The thing is that a new searcher is not opened if I look in the stats.jsp page. The index version never changes. The index version is read from the index... hence if the lucene index doesn't change (even if a ew snapshot was taken), the version

Re: searching where a value is not null?

2007-09-06 Thread Yonik Seeley
On 9/6/07, David Whalen [EMAIL PROTECTED] wrote: Hi all. I'm trying to construct a query that in pseudo-code would read like this: field != '' I'm finding it difficult to write this as a solr query, though. Stuff like: NOT field:() doesn't seem to do the trick. any ideas? perhaps

Re: caching query result

2007-09-06 Thread Yonik Seeley
On 9/6/07, Jae Joo [EMAIL PROTECTED] wrote: I have 13 millions and have facets by states (50). If there is a mechasim to chche, I may get faster result back. How fast are you getting results back with standard field faceting (facet.field=state)?

Re: FW: Minor mistake on the Wiki

2007-09-07 Thread Yonik Seeley
On 9/7/07, Lance Norskog [EMAIL PROTECTED] wrote: In the page http://wiki.apache.org/solr/UpdateXmlMessages We find: Optional attributes on doc * boost = float - default is 1.0 (See Lucene docs for definition of boost.) * NOTE: make sure norms

Re: adding without overriding dups - DirectUpdateHandler2.java does not implement?

2007-09-07 Thread Yonik Seeley
On 9/7/07, Lance Norskog [EMAIL PROTECTED] wrote: It appears that DirectUpdateHandler2.java does not actually implement the parameters that control whether to override existing documents. It's been proposed that most of these be deprecated anyway and replaced with a simple overwrite=true/false.

Re: adding without overriding dups - DirectUpdateHandler2.java does not implement?

2007-09-07 Thread Yonik Seeley
On 9/7/07, Lance Norskog [EMAIL PROTECTED] wrote: No, I'm just doing standard overwriting. It just took a little digging to be able to do it :) Overwriting is the default... you shouldn't have to do specify anything extra when indexing the document. -Yonik

Re: quirks with sorting

2007-09-10 Thread Yonik Seeley
On 9/10/07, David Whalen [EMAIL PROTECTED] wrote: I'm seeing a weird problem with sorting that I can't figure out. I have a query that uses two fields -- a source column and a date column. I search on the source and I sort by the date descending. What I'm seeing is that depending on the

Re: My Solr index keeps growing

2007-09-10 Thread Yonik Seeley
On 9/10/07, Robin Bonin [EMAIL PROTECTED] wrote: I had created a new index over the weekend, and the final size was a few hundred megs. I just checked and now the index folder is up to 1.7 Gig. Is this due to results being cached? can I set a limit to how large the index will grow? is there

Re: Solr and KStem

2007-09-10 Thread Yonik Seeley
if you credit contributions, but if so please include OCLC. Seems only fair since I did this on their dime :) Cheers! harry -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Friday, September 07, 2007 3:59 PM To: solr

Re: Removing lengthNorm from the calculation

2007-09-10 Thread Yonik Seeley
If you aren't using index-time document boosting, or field boosting for that field specifically, then set omitNorms=true for that field in the schema, shut down solr, completely remove the index, and then re-index. The norms for each field consist of the index-time boost multiplied by the length

Re: largish test data set?

2007-09-17 Thread Yonik Seeley
If you want to see what performance will be like on the next release, you could try upgrading Solr's internal version of lucene to trunk (current dev version)... there have been some fantastic improvements in indexing speed. For query speed/throughput, Solr 1.2 or trunk should do fine. -Yonik

Re: EdgeNGramTokenFilter, term position?

2007-09-17 Thread Yonik Seeley
On 9/16/07, Ryan McKinley [EMAIL PROTECTED] wrote: Should the EdgeNGramFilter use the same term position for the ngrams within a single token? It feels like that is the right approach. I don't see value in having them sequential, and I can think of uses for having them overlap. -Yonik

Re: Customize the way relevancy is calculated

2007-09-18 Thread Yonik Seeley
On 9/18/07, Amitha Talasila [EMAIL PROTECTED] wrote: The 65% of the relevance can be computed while indexing the document and posted as a field. But the keyword match is a run time score .Is there any way of getting the relevance score as a combination of this 65% and 35%? A FunctionQuery

Re: pluggable functions

2007-09-18 Thread Yonik Seeley
On 9/18/07, Jon Pierce [EMAIL PROTECTED] wrote: Reflection could be used to look up and invoke the constructor with appropriately-typed arguments. If we assume only primitive types and ValueSources are used, I don't think it would be too hard to craft a drop-in replacement that works with

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Yonik Seeley
On 9/19/07, Norberto Meijome [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007 01:46:53 -0400 Ryan McKinley [EMAIL PROTECTED] wrote: Stu is referring to Federated Search - where each index has some of the It really should be Distributed Search I think (my mistake... I started out calling it

Re: useColdSearcher = false... not working in 1.2?

2007-09-19 Thread Yonik Seeley
On 9/19/07, Adam Goldband [EMAIL PROTECTED] wrote: Anyone else using this, and finding it not working in Solr 1.2? Since we've got an automated release process, I really need to be able to have the appserver not see itself as done warming up until the firstSearcher is ready to go... but with

Re: Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets

2007-09-19 Thread Yonik Seeley
On 9/19/07, Laurent Hoss [EMAIL PROTECTED] wrote: We want to (mis)use facet search to get the number of (unique) field values appearing in a document resultset. We have paging of facets, so just like normal search results, it does make sense to list the total number of facets matching. The

Re: How can i make a distribute search on Solr?

2007-09-20 Thread Yonik Seeley
On 9/19/07, Norberto Meijome [EMAIL PROTECTED] wrote: Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? Not really... you could force a *lot* of different problems into map-reduce (that's sort of the point... being able to automatically parallelize a lot of different

Re: Term extraction

2007-09-20 Thread Yonik Seeley
On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote: However, I'd like to be able to analyze documents more intelligently to recognize phrase keywords such as open source, Microsoft Office, Bill Gates rather than splitting each word into separate tokens (the field is never used in search queries

Re: Solr and FieldCache

2007-09-20 Thread Yonik Seeley
On 9/20/07, Walter Ferrara [EMAIL PROTECTED] wrote: I'm just wondering, as this cached object could be (theoretically) pretty big, do I need to be aware of some OOM? I know that FieldCache use weakmaps, so I presume the cached array for the older reader(s) will be gc-ed when the reader is no

Re: Solr and FieldCache

2007-09-20 Thread Yonik Seeley
On 9/20/07, Walter Ferrara [EMAIL PROTECTED] wrote: I have an index with several fields, but just one stored: ID (string, unique). I need to access that ID field for each of the tops nodes docs in my results (this is done inside a handler I wrote), code looks like: Hits hits =

Re: Problem getting the FacetCount

2007-09-21 Thread Yonik Seeley
On 9/21/07, Amitha Talasila [EMAIL PROTECTED] wrote: But when we make a facet query like, http://localhost:8983/solr/select?q=ipodrows=0facet=truefacet.limit=-1fac et.query=weight:{0m TO 100m}, the facet count is coming as 0.We are indexing it as a string field because if the user searches for

Re: Term extraction

2007-09-21 Thread Yonik Seeley
On 9/21/07, Pieter Berkel [EMAIL PROTECTED] wrote: Yonik: This is the approach I had in mind, will it still work if I put the SynonymFilter after the word-delimiter filter in the schema config? SynonymFilter doesn't currently have the capability to handle multiple tokens at the same position in

Re: I can't delete, why?

2007-09-25 Thread Yonik Seeley
On 9/25/07, Ben Shlomo, Yatir [EMAIL PROTECTED] wrote: I know I can delete multiple docs with the following: deletequerymediaId:(6720 OR 6721 OR )/query/delete My question is can I do something like this? deletequerylanguageId:123 AND manufacturer:456 /query/delete (It does not work for

Re: How to get debug information while indexing?

2007-09-26 Thread Yonik Seeley
On 9/26/07, Urvashi Gadi [EMAIL PROTECTED] wrote: Hi, I am trying to create my own application using SOLR and while trying to index my data i get Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/update or Server returned HTTP response code: 500 for URL:

Re: searching for non-empty fields

2007-09-27 Thread Yonik Seeley
On 9/27/07, Pieter Berkel [EMAIL PROTECTED] wrote: While in theory -URL: should be valid syntax, the Lucene query parser doesn't accept it and throws a ParseException. I don't have time to work on that now, but I did just open a bug: https://issues.apache.org/jira/browse/LUCENE-1006 -Yonik

Re: moving index

2007-09-27 Thread Yonik Seeley
On 9/27/07, Jae Joo [EMAIL PROTECTED] wrote: I do need to move the index files, but have a concerns any potential problem including performance? Do I have to keep the original document for querying? I assume you posted XML documents in Solr XML format (like adddoc...)? If so, that is just an

Re: searching for non-empty fields

2007-09-27 Thread Yonik Seeley
On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/27/07, Pieter Berkel [EMAIL PROTECTED] wrote: While in theory -URL: should be valid syntax, the Lucene query parser doesn't accept it and throws a ParseException. I don't have time to work on that now, OK, I lied :-) It was simple

Re: custom sorting

2007-09-27 Thread Yonik Seeley
On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote: Using something like this, how would the custom SortComparatorSource get a parameter from the request to use in sorting calculations? perhaps hook in via function query: dist(10.4,20.2,geoloc) And either manipulate the score with that and

Re: Color search

2007-09-28 Thread Yonik Seeley
If it were just a couple of colors, you could have a separate field for each color and then index the percent in that field. black:70 grey:20 and then you could use a function query to influence the score (or you could sort by the color percent). However, this doesn't scale well to a large

Re: small rsync index question

2007-09-28 Thread Yonik Seeley
On 9/28/07, Brian Whitman [EMAIL PROTECTED] wrote: For some reason sending a commit/ is not refreshing the index It should... are there any errors in the logs? do you see the commit in the logs? Check the stats page to see info about when the current searcher was last opened too. -Yonik

Re: Schema version question

2007-09-28 Thread Yonik Seeley
On 9/28/07, Robert Purdy [EMAIL PROTECTED] wrote: I was wondering if anyone could help me, I just completed a full index of my data (about 4 million documents) and noticed that when I was first setting up the schema I set the version number to 1.2 thinking that solr 1.2 uses schema version

Re: Request for graphics

2007-09-28 Thread Yonik Seeley
On 9/28/07, Clay Webster [EMAIL PROTECTED] wrote: i'm late for dinner out, so i'm just attaching it here. Most attachments are stripped :-) -Yonik

Re: Searching combined English-Japanese index

2007-10-01 Thread Yonik Seeley
On 10/1/07, Maximilian Hütter [EMAIL PROTECTED] wrote: When I search using an English term, I get results but the Japanese is not encoded correctly in the response. (although it is UTF-8 encoded) One quick thing to try is the python writer (wt=python) to see the actual unicode values of what

Re: Major CPU performance problems under heavy user load with solr 1.2

2007-10-01 Thread Yonik Seeley
On 10/1/07, Robert Purdy [EMAIL PROTECTED] wrote: Hi there, I am having some major CPU performance problems with heavy user load with solr 1.2. I currently have approximately 4 million documents in the index and I am doing some pretty heavy faceting on multi-valued columns. I know that doing

Re: Searching combined English-Japanese index

2007-10-01 Thread Yonik Seeley
On 10/1/07, Maximilian Hütter [EMAIL PROTECTED] wrote: Yonik Seeley schrieb: On 10/1/07, Maximilian Hütter [EMAIL PROTECTED] wrote: When I search using an English term, I get results but the Japanese is not encoded correctly in the response. (although it is UTF-8 encoded) One quick

Re: Searching combined English-Japanese index

2007-10-02 Thread Yonik Seeley
On 10/2/07, Maximilian Hütter [EMAIL PROTECTED] wrote: Are you sure, they are wrong in the index? It's not an issue with Jetty output encoding since the python writer takes the string and converts it to ascii before that. Since Solr does no charset encoding itself on output, that must mean that

Re: Seeing if an entry exists in an index for a set of terms

2007-10-03 Thread Yonik Seeley
On 10/3/07, Ian Holsman [EMAIL PROTECTED] wrote: Hi. I was wondering if there was a easy way to give solr a list of things and finding out which have entries. ie I pass it a list Bill Clinton George Bush Mary Papas (and possibly 20 others) to a solr index which contains news articles

Re: Best way to change weighting based on the presence of a field

2007-10-05 Thread Yonik Seeley
On 10/5/07, Mike Klaas [EMAIL PROTECTED] wrote: The other option is to use a function query on the value stored in a field (which could represent a range of 'badness'). This can be used directly in the dismax handler using the bf (boost function) query parameter. In the near future, you can

Re: Urldecode Problem

2007-10-07 Thread Yonik Seeley
On 10/6/07, Frederik M. Kraus [EMAIL PROTECTED] wrote: Looks like we ran into a urldecode problem when having certain query strings. This is what happens: Client: Jeffrey's Bay - Jeffrey%26%2339%3Bs+Bay (php 5.2 urlencode/rawurlencode) It looks like the client is doing XML escaping as

Re: High-Availability deployment

2007-10-08 Thread Yonik Seeley
On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote: I'm about to deploy SOLR in a production environment Cool, can you share exactly what it will be used for? and so far I'm a bit concerned about availability. I have a system that is responsible for fetching data from a database and then

Re: High-Availability deployment

2007-10-08 Thread Yonik Seeley
On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote: Well I believe I can live with some staleness at certain moments, but it's not good as users are supposed to need it 24x7. So the common practice is to make one of the slaves as the new master and switch things over to it and after the

Re: High-Availability deployment

2007-10-08 Thread Yonik Seeley
On 10/8/07, Daniel Alheiros [EMAIL PROTECTED] wrote: Hmm, is there any exception thrown in case the index get corrupted (if it's not caused by OOM and the JVM crashes)? The document uniqueness SOLR offers is one of the many reasons I'm using it and should be excellent to know when it's gone.

Re: Availability Issues

2007-10-08 Thread Yonik Seeley
On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: Have you taken a thread dump to see what is going on? We can't do it b/c during the unresponsive time we can't access the admin site (/solr/admin) at all. I don't know how to do a thread dump via the command line kill -3

Re: Availability Issues

2007-10-08 Thread Yonik Seeley
On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: The logs show nothing but regular activity. We do a tail -f on the logfile and we can read it during the unresponsive period and we don't see any errors. You don't see log entries for requests until after they complete. When a server becomes

Re: Availability Issues

2007-10-08 Thread Yonik Seeley
On 10/8/07, David Whalen [EMAIL PROTECTED] wrote: Do you see any requests that took a really long time to finish? The requests that take a long time to finish are just simple queries. And the same queries run at a later time come back much faster. Our logs contain 99% inserts and 1%

Re: Facets and running out of Heap Space

2007-10-09 Thread Yonik Seeley
On 10/9/07, David Whalen [EMAIL PROTECTED] wrote: I run a faceted query against a very large index on a regular schedule. Every now and then the query throws an out of heap space error, and we're sunk. So, naturally we increased the heap size and things worked well for a while and then the

Re: Facets and running out of Heap Space

2007-10-09 Thread Yonik Seeley
On 10/9/07, David Whalen [EMAIL PROTECTED] wrote: This is only used during the term enumeration method of faceting (facet.field type faceting on multi-valued or full-text fields). What if I'm faceting on just a plain String field? It's not full-text, and I don't have multiValued set for

Re: Facets and running out of Heap Space

2007-10-10 Thread Yonik Seeley
On 10/10/07, Mike Klaas [EMAIL PROTECTED] wrote: Have you tried setting multivalued=true without reindexing? I'm not sure, but I think it will work. Yes, that will work fine. One thing that will change is the response format for stored fields arr name=foostrval1/str/arr instead of str

Re: Internal Server Error and waitSearcher=false for commit/optimize

2007-10-11 Thread Yonik Seeley
On 10/10/07, Jason Rennie [EMAIL PROTECTED] wrote: We're using solr 1.2 and a nightly build of the solrj client code. We very occasionally see things like this: org.apache.solr.client.solrj.SolrServerException: Error executing query at

Re: doubled/halved performance?

2007-10-11 Thread Yonik Seeley
On 10/11/07, Mike Klaas [EMAIL PROTECTED] wrote: I'm seeing some interesting behaviour when doing benchmarks of query and facet performance. Note that the query cache is disabled, and the index is entirely in the OS disk cache. filterCache is fully primed. Often when repeatedly measuring

Re: Instant deletes without committing

2007-10-13 Thread Yonik Seeley
On 10/11/07, BrendanD [EMAIL PROTECTED] wrote: Yes, we have some huge performance issues with non-cached queries. So doing a commit is very expensive for us. We have our autowarm count for our filterCache and queryResultCache both set to 4096. But I don't think that's near high enough. We did

Re: query syntax performance difference?

2007-10-13 Thread Yonik Seeley
On 10/11/07, BrendanD [EMAIL PROTECTED] wrote: Is there a difference in the performance for the following 2 variations on query syntax? The first query was a response from Solr by using a single fq parameter in the URL. The second query was a response from Solr by using separate fq parameter

Re: Non-sortable types in sample schema

2007-10-13 Thread Yonik Seeley
On 10/13/07, Lance Norskog [EMAIL PROTECTED] wrote: The sample schema in Solr 1.2 supplies two variants of integers, longs, floats, doubles. One variant is sortable and one is not. What is the point of having both? Why would I choose the non-sorting variants? Do they store fewer bytes per

Re: comment-out a filter?

2007-10-15 Thread Yonik Seeley
On 10/15/07, David Whalen [EMAIL PROTECTED] wrote: I want to comment-out a filter in my schema.xml, specifically the solr.EnglishPorterFilterFactory filter. I want to know -- will this cause me to have to re-build my index? Or will a restart of SOLR get the job done? Yes, you will need to

Re: Search results problem

2007-10-17 Thread Yonik Seeley
On 10/17/07, Maximilian Hütter [EMAIL PROTECTED] wrote: I also found this: Controls the maximum number of terms that can be added to a Field for a given Document, thereby truncating the document. Increase this number if large documents are expected. However, setting this value too high may

Re: GET_SCORES flag in SolrIndexSearcher

2007-10-20 Thread Yonik Seeley
On 10/19/07, Chris Hostetter [EMAIL PROTECTED] wrote: (it doesn't matter that parseSort returns null when the sort string is just score ... SolrIndexSearcher recognizes a null Sort as being the default sort by score) Yep... FYI, I did this early on specifically because no sort and score desc

Re: Performance when indexing or cold cache

2007-10-22 Thread Yonik Seeley
On 10/22/07, Walter Underwood [EMAIL PROTECTED] wrote: lst name=appends str name=fq(pushstatus:A AND (type:movie OR type:person))/str /lst /requestHandler Perhaps try setting up a static warming query for this filter and any other common filters? Also look for correlations

Re: Using wildcard with accented words

2007-10-22 Thread Yonik Seeley
On 10/22/07, Erik Hatcher [EMAIL PROTECTED] wrote: Perhaps this is a case that Solr could address with a third analyzer configuration (it already has query, and index differentiation) that could be incorporated for wildcard queries. Thoughts on that? I've actually thought about it

Re: Search results problem

2007-10-22 Thread Yonik Seeley
On 10/19/07, Maximilian Hütter [EMAIL PROTECTED] wrote: Yonik Seeley schrieb: On 10/17/07, Maximilian Hütter [EMAIL PROTECTED] wrote: I also found this: Controls the maximum number of terms that can be added to a Field for a given Document, thereby truncating the document. Increase

Re: Search results problem

2007-10-23 Thread Yonik Seeley
On 10/23/07, Maximilian Hütter [EMAIL PROTECTED] wrote: ??? maxFieldLength only applies to the number of tokens indexed. You will always get the complete field back if it's stored, regardless of what maxFieldLength is. What I meant was, that it is different from just having a field with

Re: Payloads for multiValued fields?

2007-10-24 Thread Yonik Seeley
On 10/24/07, Alf Eaton [EMAIL PROTECTED] wrote: Yonik Seeley wrote: Could you perhaps index the captions as #1 this is the first caption #2 this is the second caption And then when just look for #n in the highlighted results? For display, you could also strip out the #n

Re: Empty field error when boosting a dismax query using bf

2007-10-24 Thread Yonik Seeley
On 10/24/07, Alf Eaton [EMAIL PROTECTED] wrote: I'm trying to use the bf parameter to boost a dismax query based on the value of a certain (integer) field. The trouble is that for some of the documents this field is empty (rather than zero), which means that there's an error when using the

Re: where did my foreign language go?

2007-10-24 Thread Yonik Seeley
On 10/24/07, Ian Holsman [EMAIL PROTECTED] wrote: Hi. I'm in the middle of bringing up a new solr server and am using the trunk. (where I was using an earlier nightly release of about 2-3 weeks ago on my old server) now, when I do a search for 日本 (japan) it used to show the kanji in the q

Re: My filters are not used

2007-10-24 Thread Yonik Seeley
On 10/24/07, Norskog, Lance [EMAIL PROTECTED] wrote: I am creating a filter that is never used. Here is the query sequence: q=*:*fq=contentid:00*start=0rows=200 q=*:*fq=contentid:00*start=200rows=200 q=*:*fq=contentid:00*start=400rows=200 q=*:*fq=contentid:00*start=600rows=200

Re: Forced Top Document

2007-10-25 Thread Yonik Seeley
On 10/25/07, Chris Hostetter [EMAIL PROTECTED] wrote: : The typical use case, though, is for the featured document to be on top only : for certain queries. Like in an intranet where someone queries 401K or : retirement or similar, you want to feature a document about benefits that : would

Re: prefix-search ingnores the lowerCaseFilter

2007-10-25 Thread Yonik Seeley
On 10/25/07, Max Scheffler [EMAIL PROTECTED] wrote: Is it possible that the prefix-processing ignores the filters? Yes, It's a known limitation that we haven't worked out a fix for yet. The issue is that you can't just run the prefix through the filters because of things like stop words,

Re: indexing one documents with different populated fields causes a deletion of documents in with other populated fileds

2007-10-25 Thread Yonik Seeley
On 10/25/07, Anton Valdstein [EMAIL PROTECTED] wrote: Does solr check automatically for duplicate texts in other fields and delete documents that have the same text stored in other fields? Solr automatically overwrites (deletes old versions of) documents with the same uniqueKey field

Re: SOLR 1.3 Release?

2007-10-25 Thread Yonik Seeley
On 10/25/07, Matthew Runo [EMAIL PROTECTED] wrote: Any ideas on when 1.3 might be released? We're starting a new project and I'd love to use 1.3 for it - is SVN head stable enough for use? I think it's stable in the sense of does the right thing and doesn't crash, but IMO isn't stable in the

Re: indexing one documents with different populated fields causes a deletion of documents in with other populated fileds

2007-10-25 Thread Yonik Seeley
On 10/25/07, Anton Valdstein [EMAIL PROTECTED] wrote: thanks, that explains a lot (:, I have another question: about how the idf is calculated: is the document frequency the sum of all documents containing the term in one of their fields or just in the field the query contained? idfs are

Re: CollectionDistribution - Changes reflected immediately on master, but only after tomcat restart on slave

2007-10-26 Thread Yonik Seeley
On 10/26/07, Karen Loughran [EMAIL PROTECTED] wrote: But after distribution of this latest snapshop to the slave the collection does not show the update (with solr admin query url or via java query client) UNLESS I restart tomcat ? Sounds like a config issue with the scripts... pulling the

Re: prefix-search ingnores the lowerCaseFilter

2007-10-29 Thread Yonik Seeley
On 10/29/07, Martin Grotzke [EMAIL PROTECTED] wrote: On Thu, 2007-10-25 at 10:48 -0400, Yonik Seeley wrote: On 10/25/07, Max Scheffler [EMAIL PROTECTED] wrote: Is it possible that the prefix-processing ignores the filters? Yes, It's a known limitation that we haven't worked out a fix

Re: Phrase Query Performance Question

2007-10-30 Thread Yonik Seeley
On 10/30/07, Haishan Chen [EMAIL PROTECTED] wrote: Thanks a lot for replying Yonik! I am running solr on a windows 2003 server (standard version). intel Xeon CPU 3.00GHz, with 4.00 GB RAM. The index is locate on Raid5 with 2 million documents. Is there any way to improve query performance

Re: FW: Score customization

2007-10-31 Thread Yonik Seeley
On 10/31/07, Victoria Kaganski [EMAIL PROTECTED] wrote: Does FunctionQuery actually override the default similarity function? If it does, how can I still access the similarity value? FunctionQuery returns the *value* of a field (or a function of it) as the value for a query - it does not use

Re: fieldNorm seems to be killing my score

2007-11-01 Thread Yonik Seeley
Hmmm, a norm of 0.0??? That implies that the boost for that field (text) was set to zero when it was indexed. How did you index the data (straight HTTP, SolrJ, etc)? What does your schema for this field (and copyFields) look like? -Yonik On 11/1/07, Robert Young [EMAIL PROTECTED] wrote: Hi,

Re: SOLR 1.3: defaultOperator always defaults to OR although AND is specifed.

2007-11-01 Thread Yonik Seeley
Try the latest... I just fixed this. -Yonik On 11/1/07, Britske [EMAIL PROTECTED] wrote: experimenting with SOLR 1.3 and discovered that although I specified solrQueryParser defaultOperator=AND/ in schema.xml q=a+b behaves as q=a OR B instead of q=a AND b Obviously this is not correct. I

Re: overlapping onDeckSearchers message

2007-11-03 Thread Yonik Seeley
On 11/3/07, Brian Whitman [EMAIL PROTECTED] wrote: I have a solr index that hasn't had many problems recently but I had the logs open and noticed this a lot during indexing: [16:23:34.086] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 That means that one searcher hasn't yet finished

Re: FW: Score customization

2007-11-03 Thread Yonik Seeley
. -Yonik From: [EMAIL PROTECTED] on behalf of Yonik Seeley Sent: Wed 10/31/2007 7:21 PM To: solr-user@lucene.apache.org Subject: Re: FW: Score customization On 10/31/07, Victoria Kaganski [EMAIL PROTECTED] wrote: Does FunctionQuery actually override

Re: customer request handler doesn't envok the query tokenization chain

2007-11-04 Thread Yonik Seeley
On 11/4/07, Yu-Hui Jin [EMAIL PROTECTED] wrote: Let's say we defined a customer filed type that when querying and indexing, the solr.LowerCaseFilterFactory is used as the last filter to low-case all letters. In the Analysis UI, we found tokenization is working correctly. We also defined a

Re: customer request handler doesn't envok the query tokenization chain

2007-11-05 Thread Yonik Seeley
On 11/5/07, Yu-Hui Jin [EMAIL PROTECTED] wrote: Just curious, does the default operator ( AND or OR) specify the relationship between a field/value component or between the tokens of the same field/value componenet? between any clauses in a boolean query. e.g. for a query like this:

Re: Phrase Query Performance Question and score threshold

2007-11-05 Thread Yonik Seeley
On 11/5/07, Haishan Chen [EMAIL PROTECTED] wrote: As for the first issues. The number of different phrase queries have performance issues I found so far are about 10. If these are normal phrase queries (no slop), a good solution might be to simply index and query these phrases as a single

Re: specify index location

2007-11-05 Thread Yonik Seeley
On 11/5/07, evol__ [EMAIL PROTECTED] wrote: Just a remark: !-- Used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If replication is in use, this should match the replication configuration. -- Might be a good idea

Re: value boosts? (boosting a multiValued field's data)

2007-11-05 Thread Yonik Seeley
On 11/6/07, evol__ [EMAIL PROTECTED] wrote: Hi. Is the expansion method described in the following year old post still the best available way to do this? http://www.nabble.com/newbie-Q-regarding-schema-configuration-tf1814271.html#a4956602 The way I understand it, indexing these field

Re: query syntax

2007-11-06 Thread Yonik Seeley
On 11/6/07, Traut [EMAIL PROTECTED] wrote: I have in index document with field name and its value is somename123 Why I can't find anything with query name:somename123* This is a prefix query. No analysis is done on the prefix, so it may not match analysis that

Re: Can you parse the contents of a field to populate other fields?

2007-11-07 Thread Yonik Seeley
On 11/6/07, Kristen Roth [EMAIL PROTECTED] wrote: Yonik - thanks so much for your help! Just to clarify; where should the regex go for each field? Each field should have a different FieldType (referenced by the type XML attribute). Each fieldType can have it's own analyzer. You can use a

Re: SOLR 1.2 - Duplicate Documents??

2007-11-08 Thread Yonik Seeley
On Nov 7, 2007 12:30 PM, realw5 [EMAIL PROTECTED] wrote: We did have Tomcat crash once (JVM OutOfMem) durning an indexing process, could that be a possible source of the issue? Yes. Deletes are buffered and carried out in a different phase. -Yonik

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread Yonik Seeley
On Nov 10, 2007 4:24 PM, David Neubert [EMAIL PROTECTED] wrote: So if I am hitting multiple fields (in the same search request) that invoke different Analyzers -- am I at a dead end, and have to result to consequetive multiple queries instead Solr handles that for you automatically. The

Re: solr range query

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 8:02 AM, Heba Farouk [EMAIL PROTECTED] wrote: I would like to use solr to return ranges of searches on an integer field, if I wrote in the url offset:[0 TO 10], it returns documents with offset values 0, 1, 10 only but I want to return the range 0,1,2, 3, 4 ,10. How can

Re: no segments* file found

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 3:46 AM, SDIS M. Beauchamp [EMAIL PROTECTED] wrote: If I don't optimize, I 've got a too many files open at about 450K files and 3 Gb index You may need to increase the number of filedescriptors in your system. If you're using Linux, see this:

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 2:20 PM, David Neubert [EMAIL PROTECTED] wrote: Erik - thanks, I am considering this approach, verses explicit redundant indexing -- and am also considering Lucene - There's not a well defined solution in either IMO. - problem is, I am one week into both technologies (though

Re: Exception in SOLR when querying for fields of type string

2007-11-13 Thread Yonik Seeley
On Nov 13, 2007 6:23 PM, Kasi Sankaralingam [EMAIL PROTECTED] wrote: It is not tokenized, it is a string field, so will it still match photo for field 'title_s' and book for the default field? Yes, because the query parser splits up things by whitespace before analyzers are even applied. Do you

Re: how to load custom valuesource as plugin

2007-11-14 Thread Yonik Seeley
Unfortunately, the function query parser isn't currently pluggable. -Yonik On Nov 14, 2007 2:02 PM, Britske [EMAIL PROTECTED] wrote: I've created a simple valueSource which is supposed to calculate a weighted sum over a list of supplied valuesources. How can I let Solr recognise this

Re: score customization

2007-11-17 Thread Yonik Seeley
On Nov 15, 2007 11:06 AM, Jae Joo [EMAIL PROTECTED] wrote: I am looking for the way to get the score - only hundredth - ex. 4.09something like that. Currently, it has 7 decimal digits. float name=score1.8032384/float If you want to display scores only to the hundredths place, simply do that in

Re: Payloads in Solr

2007-11-17 Thread Yonik Seeley
On Nov 17, 2007 2:18 PM, Tricia Williams [EMAIL PROTECTED] wrote: I was wondering how Solr people feel about the inclusion of Payload functionality in the Solr codebase? All for it... depending on what one means by payload functionality of course. We should probably hold off on adding a new

<    1   2   3   4   5   6   7   8   9   10   >