RE: Updates from Multiple JVM

2007-08-07 Thread Lance Norskog
Is the question whether or not you can run solr in two different servlet containers and index into the same data set? While we're on the topic, can you search with one and index with the other? Or do snapshots have to be in the middle? Lance -Original Message- From: LP [mailto:[EMAIL

Delete of non-existent record succeeds

2007-08-08 Thread Lance Norskog
When I delete a record that does not exist, via delete-by-id, the status returned is 0. Shouldn't the operation fail? In fact, does any operation actually fail? Cheers, Lance

Multivalued fields and the 'copyField' operator

2007-08-09 Thread Lance Norskog
I'm adding a field to be the source of the spellcheck database. Since that is its only job, it has raw text lower-cased, de-Latin1'd, and de-duplicated. Since it is only for the spellcheck DB, it does not need to keep duplicates. I specified it as 'multiValued=false and used copyField from a

RE: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Lance Norskog
Jython is a Python interpreter implemented in Java. (I have a lot of Python code.) Total throughput in the servlet is very sensitive to the total number of servlet sockets available v.s. the number of CPUs. The different analysers have very different performance. You might leave some data in

RE: Multivalued fields and the 'copyField' operator

2007-08-09 Thread Lance Norskog
, Lance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Thursday, August 09, 2007 5:28 PM To: solr-user@lucene.apache.org Subject: Re: Multivalued fields and the 'copyField' operator On 8/9/07, Lance Norskog [EMAIL PROTECTED] wrote: I'm

RE: Best use of wildcard searches

2007-08-10 Thread Lance Norskog
The Protégé project at Stanford has nice tools for editing knowledge bases, taxonomies, etc. http://protege.stanford.edu/ -Original Message- From: Jonathan Woods [mailto:[EMAIL PROTECTED] Sent: Thursday, August 09, 2007 10:45 PM To: solr-user@lucene.apache.org Subject: RE: Best use of

RE: Problem with stemming

2007-08-13 Thread Lance Norskog
You need this book: -Original Message- From: David Whalen [mailto:[EMAIL PROTECTED] Sent: Monday, August 13, 2007 1:00 PM To: solr-user@lucene.apache.org Subject: RE: Problem with stemming Yonik: I only raised the question to the group after I had looked in the schema.xml. There are

RE: Problem with stemming

2007-08-13 Thread Lance Norskog
(Oops, try again.) You need this book: http://www.amazon.com/Lucene-Action-Erik-Hatcher/dp/1932394281/ref=pd_bbs_sr _1/103-4871137-7111056?ie=UTF8s=booksqid=1187037246sr=8-1 Lucene in Action by Eric Hatcher and Otis Gospodnetic. It does not cover Solr really, but you will understand what

Indexing speed: web v.s. solrj app

2007-08-15 Thread Lance Norskog
Is indexing via solrj faster than going through the web service? There are three cases: Read a file from a local file system and indexing it directly, Read a file on one machine and indexing it on another, and Run solrj and read a file, then directly update the index. I'm talking

Overall performance: network v.s. SAN file system

2007-08-15 Thread Lance Norskog
Is anyone doing Solr installations with a SAN file system? Like IBM Storage Tank or Apple XSAN or Red Hat GFS? What are your experiences? Thanks, Lance

Solr, Lucene and patents

2007-08-15 Thread Lance Norskog
Does anyone know what the patent situation is with Lucene and Solr? What patents affect it, what you can and cannot do with it? Thanks, Lance

Replacing existing documents in the index

2007-08-16 Thread Lance Norskog
Hi- We recrawl the same places and update blindly without checking if a document is already in the index. We have a use case where we would like to delete documents (porn) and have them stay deleted. To implement this use case now, we would need to check the existence of the document and check

RE: solr + carrot2

2007-08-17 Thread Lance Norskog
Hello- The Lucene interface is cool, but not many people put their indexes on machines with Swing access. I just did a Solr integration by copying the eTools.ch implementation. This took several edits. As long as we're making requests, please do a general-pupose implementation by cloning the

RE: Solr 1.1. vs. 1.2.

2007-08-20 Thread Lance Norskog
While we're on the topic, there appear to be a ton of new features in 1.3, and they are getting debugged. When do you plan to do an official 1.3 release? -Original Message- From: Yu-Hui Jin [mailto:[EMAIL PROTECTED] Sent: Friday, August 17, 2007 11:53 PM To: solr-user@lucene.apache.org

RE: solr + carrot2

2007-08-20 Thread Lance Norskog
PROTECTED] Sent: Monday, August 20, 2007 12:03 PM To: solr-user@lucene.apache.org Subject: Re: solr + carrot2 On 20-Aug-07, at 11:24 AM, Lance Norskog wrote: Exactly! The Lucene version requires direct access to the file. Our indexes are on servers which do not have graphics (VNC) configured

Commit performance

2007-08-20 Thread Lance Norskog
How long should a commit take? I've got about 9.8G of data for 9M of records. (Yes, I'm indexing too much data.) My commits are taking 20-30 seconds. Since other people set the autocommit to 1 second, I'm guessing we have a major mistake somewhere in our configurations. We have a lot of

RE: clear index

2007-08-21 Thread Lance Norskog
It might be worthwhile to have a hibernate mode for solr, where suspend waits until all requests are finished, then closes all files and rejects all new requests. Later a wakeup command would bring it back online. During this time, a remotely controlled job could remove the data directory. This

Replacing existing documents

2007-08-21 Thread Lance Norskog
Recently someone mentioned that it would be possible to have a 'replace existing document' feature rather than just dropping and adding documents with the same unique id. We have a few use cases in this area and I'm researching whether it is effective to check for a document via Solr queries, or

Solr scoring: relative or absolute?

2007-08-22 Thread Lance Norskog
Are the score values generated in Solr relative to the index or are they against an absolute standard? Is it possible to create a scoring algorithm with this property? Are there parts of the score inputs that are absolute? My use case is this: I would like to do a parallel search against two

RE: How to realize index spaces

2007-08-23 Thread Lance Norskog
Are these separate Lucene index files which can be updated and optimized individually? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Hill Sent: Thursday, August 23, 2007 2:23 PM To: solr-user@lucene.apache.org Subject: Re: How to realize index

RE: solr + carrot2

2007-08-27 Thread Lance Norskog
: http://demo.carrot2.org/head/webstart/ Please let us know if the new UI works for you. Thanks, Staszek On 20/08/07, Lance Norskog [EMAIL PROTECTED] wrote: Exactly! The Lucene version requires direct access to the file. Our indexes are on servers which do not have graphics (VNC) configured

Index corruption checker?

2007-08-30 Thread Lance Norskog
Is there an app that walks a Lucene index and checks for corruption? How would we know if our index had become corrupted? Thanks, Lance

RE: Processing solr response....

2007-09-04 Thread Lance Norskog
This goes through the Solr http response, right? The Solr XSL processor feature will do this for you. You write an XSL script and add it to $SOLR/conf/xslt. You then use extra parameters in a query. The output XML will be transformed by the XSL. The XSL can create anything you want: lists of

Tomcat logging

2007-09-05 Thread Lance Norskog
Hi- Here are the lines to add to the end of Tomcat's conf/logging.properties file to get rid of query/update logging noise: org.apache.solr.core.SolrCore.level = WARNING org.apache.solr.handler.XmlUpdateRequestHandler.level = WARNING org.apache.solr.search.SolrIndexSearcher.level = WARNING I

RE: Indexing very large files.

2007-09-06 Thread Lance Norskog
Now I'm curious: what is the use case for documents this large? Thanks, Lance Norskog

RE: solr.py problems with german Umlaute

2007-09-06 Thread Lance Norskog
I researched this problem before. The problem I found is that Python strings are not Unicode by default. You have to do something to make them Unicode. Here are the links I found: http://www.reportlab.com/i18n/python_unicode_tutorial.html http://evanjones.ca/python-utf8.html

FW: Space costs of dynamic fields?

2007-09-07 Thread Lance Norskog
is that you can't say, give me fields a*_t but not b*_t in a query. I haven't found others in the mail archives or the wiki. Thanks, Lance Norskog

org.apache.lucene.util.English missing

2007-09-07 Thread Lance Norskog
Hi folks- The Lucene Spellchecker unit test expects a Java class org.apache.lucene.util.English. I can't find it in the source trees on svn.apache.org. Can someone please mail it to me? Thanks, Lance Norskog

FW: Minor mistake on the Wiki

2007-09-07 Thread Lance Norskog
fields where the index-time boost should be stored. This NOTE appears to be block-copied from the following entry about field-level boosts, and makes no sense here. Lance Norskog

adding without overriding dups - DirectUpdateHandler2.java does not implement?

2007-09-07 Thread Lance Norskog
Hi- It appears that DirectUpdateHandler2.java does not actually implement the parameters that control whether to override existing documents. Should I use DirectUpdateHandler instead? Apparently DUH is slower than DUH2, but DUH implements these parameters. (We do so many overwrites that

RE: adding without overriding dups - DirectUpdateHandler2.java does not implement?

2007-09-07 Thread Lance Norskog
dups - DirectUpdateHandler2.java does not implement? On 9/7/07, Lance Norskog [EMAIL PROTECTED] wrote: It appears that DirectUpdateHandler2.java does not actually implement the parameters that control whether to override existing documents. It's been proposed that most of these be deprecated

RE: adding without overriding dups - DirectUpdateHandler2.java does not implement?

2007-09-10 Thread Lance Norskog
Norskog -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Friday, September 07, 2007 2:47 PM To: solr-user@lucene.apache.org Subject: Re: adding without overriding dups - DirectUpdateHandler2.java does not implement? On 7-Sep-07, at 1:35 PM, Lance Norskog wrote: Hi

RE: Authentication

2007-09-14 Thread Lance Norskog
You can try the public/private key certficate system. You deploy it to jetty/tomcat somehow, and curl has options to send it. We haven't tried this. The authentication happens at the http container level, not in the solr config. -Original Message- From: Bill Au [mailto:[EMAIL PROTECTED]

RE: Searching items with in the search results with SOLR

2007-09-18 Thread Lance Norskog
Question: if it is a filter query, it will be cached in the filter query cache? Follow-on questions if this is true: Is this the full results of the filter query? What exactly is cached? Thanks, Lance Norskog -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent

Formula for open file descriptors

2007-09-18 Thread Lance Norskog
Hi- In early June Mike Klass posted a formula for the number of file descriptors needed by Solr: For each segment, 7 + num indexed fields per segment. There should be log_{base mergefactor}(numDocs) * mergeFactor segments, approximately. Is this still true? Thanks, Lance

RE: Strange behavior when searching with accents

2007-09-21 Thread Lance Norskog
searching with accents On Thu, 2007-09-20 at 11:13 -0700, Lance Norskog wrote: English and French are messy, so heuristic methods are the only possible. Spanish is rigorously clean, and stemming should be done from the declension rules and irregular conjugation tables. This involves large (fast

Geographical distance searching

2007-09-26 Thread Lance Norskog
It is a best practice to store the master copy of this data in a relational database and use Solr/Lucene as a high-speed cache. MySQL has a geographical database option, so maybe that is a better option than Lucene indexing. Lance (P.s. please start new threads for new topics.) -Original

RE: dataset parameters suitable for lucene application

2007-09-26 Thread Lance Norskog
My limited experience with larger indexes is: 1) the logistics of copying around and backing up this much data, and 2) indexing is disk-bound. We're on SAS disks and it makes no difference between one indexing thread and a dozen (we have small records). Smaller returns are faster. You need to

RE: Index multiple languages with multiple analyzers with the same field

2007-09-28 Thread Lance Norskog
Other people custom-create a separate dynamic field for each language they want to support. The spellchecker in Solr 1.2 wants just one field to use as its word source, so this fits. We have a more complex version of this problem: we have content with both English and other languages. Searching

RE: Searching combined English-Japanese index

2007-10-01 Thread Lance Norskog
Some servlet containers don't do UTF-8 out of the box. There is information about this on the wiki. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, October 01, 2007 9:45 AM To: solr-user@lucene.apache.org Subject: Re: Searching

Questions about unit test assistant TestHarness

2007-10-01 Thread Lance Norskog
of these problems fixed in the Solr 1.3 trunk? Should I just grab whatever's there and use them with 1.2? Thanks, Lance Norskog

RE: Searching combined English-Japanese index

2007-10-02 Thread Lance Norskog
Python does not do Unicode strings natively, you have to do them explicitly. It is possible that your python receiver is not doing the right thing with the incoming strings. Also, Jetty has problems with UTF-8; the Wiki has more on this. Lance -Original Message- From: Maximilian Hütter

RE: how to make sure a particular query is ALWAYS cached

2007-10-04 Thread Lance Norskog
nicely. Cheers, Lance Norskog -Original Message- From: Britske [mailto:[EMAIL PROTECTED] Sent: Thursday, October 04, 2007 1:38 PM To: solr-user@lucene.apache.org Subject: Re: how to make sure a particular query is ALWAYS cached hossman wrote: : I want a couple of costly queries

RE: Handling empty query

2007-10-04 Thread Lance Norskog
If a field is required, and always has data, this query will enumerate all documents: field:[* TO *] -Original Message- From: Guangwei Yuan [mailto:[EMAIL PROTECTED] Sent: Thursday, October 04, 2007 3:26 PM To: solr-user@lucene.apache.org Subject: Handling empty query Hi, Does Solr

RE: Merging Fields

2007-10-05 Thread Lance Norskog
A gotcha here is that copyField creates multiple values. Each field copied in becomes a separate field. If you wanted a single-valued field this will not work. Lance Norskog -Original Message- From: Keene, David [mailto:[EMAIL PROTECTED] Sent: Friday, October 05, 2007 10:50 AM To: solr

RE: Spell Check Handler

2007-10-08 Thread Lance Norskog
Great! One comment: if I type a word that happens to be real, it may not be what I actually want. A spell checker should still recommend similar words. Computer programmers are all perfect spellers, and this can blind us to what matters to ordinary people :) Lance Norskog -Original

Re: solr tuple/tag store

2007-10-09 Thread Lance Norskog
seeing the actual queries that are slow, it's difficult to determine what the problem is. Have you tried using EXPLAIN ( http://dev.mysql.com/doc/refman/5.0/en/explain.html) to check if your query is using the table indexes effectively? Pieter On 10/10/2007, Lance Norskog [EMAIL PROTECTED

Non-sortable types in sample schema

2007-10-13 Thread Lance Norskog
The sample schema in Solr 1.2 supplies two variants of integers, longs, floats, doubles. One variant is sortable and one is not. What is the point of having both? Why would I choose the non-sorting variants? Do they store fewer bytes per record? Thanks, Lance Norskog

copyField limitation

2008-01-17 Thread Lance Norskog
types is that with defauting, you exactly duplicate the information without relying on your feeding software. With 'date' field formula syntax, this is the only way to have duplicate fields for different purposes. Thanks for your time, Lance Norskog

RE: solr 1.3

2008-01-21 Thread Lance Norskog
Would somone please consider marking a label on the Subversion repository that says, This is a clean version? I only do HTTP requests and have no custom software, so I don't care about internal interfaces changing. Thanks, Lance Norskog -Original Message- From: Mike Klaas [mailto

RE: copyField limitation

2008-01-21 Thread Lance Norskog
://issues.apache.org/jira/browse/SOLR-464 Thanks for your time, Lance Norskog -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Thursday, January 17, 2008 2:53 PM To: solr-user@lucene.apache.org Subject: Re: copyField limitation On Jan 17, 2008 4:53 PM

RE: copyField limitation

2008-01-22 Thread Lance Norskog
A more interesting use case: Analyzing text and finding a number, like the mean word length or the mean number of repeated words. These are standard tools for spam detection. To create these, we would want to shovel text into a text processing chain that creates an integer. We then want to both

RE: Solr feasibility with terabyte-scale data

2008-01-23 Thread Lance Norskog
We use two indexed copies of the same text, one with stemming and stopwords and the other with neither. We do phrase search on the second. You might use two different OCR implementations and cross-correlate the output. Lance -Original Message- From: Phillip Farber [mailto:[EMAIL

Log4j cookbook for request logging

2008-01-25 Thread Lance Norskog
Is it possible to log incoming requests? I'd love to have the incoming IP and request string. What is the exact set of class names for this? Thanks, Lance Norskog

RE: spellcheckhandler

2008-01-30 Thread Lance Norskog
We use Solr 1.2. I copied the 1.2 spellchecker and made an equivalent phrase pair index generator. Using this we can take an example spelling and find example words pairs for each suggestion. We have not deployed this. Lance Norskog -Original Message- From: Mike Klaas [mailto:[EMAIL

RE: Querying multiple dynamicField

2008-02-04 Thread Lance Norskog
You can use the copyField directive to copy all 'sentence_*' fields into one indexed field. You then have a named field that you can search against. Lance Norskog -Original Message- From: Renaud Delbru [mailto:[EMAIL PROTECTED] Sent: Friday, February 01, 2008 6:48 PM To: solr-user

RE: Indexing Japanese English

2008-02-07 Thread Lance Norskog
Here are the comments for CJKTokenizer. First, is this what you want? Remember, there are three Japanese writing systems. /** * CJKTokenizer was modified from StopTokenizer which does a decent job for * most European languages. It performs other token methods for double-byte * Characters: the

RE: Query with literal quote character: 6'2

2008-02-07 Thread Lance Norskog
Some people loathe UTF-8 and do all of their text in XML entities. This might work better for your punctuation needs. But it still won't help you with Prince :) -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 9:25 AM To:

Lucene index verifier

2008-02-07 Thread Lance Norskog
amount of time? Thanks, Lance Norskog

RE: Memory improvements

2008-02-07 Thread Lance Norskog
Solr 1.2 has a bug where if you say commit after N documents it does not. But it does honor the commit after N milliseconds directive. This is fixed in Solr 1.3. -Original Message- From: Sundar Sankaranarayanan [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 3:30 PM To:

RE: Lucene index verifier

2008-02-08 Thread Lance Norskog
on performance/search/indexing. -Grant On Feb 7, 2008, at 11:15 PM, Lance Norskog wrote: (Sorry, my Lucene java-user access is wonky.) I would like to verify that my snapshots are not corrupt before I enable them. What is the simplest program to verify that a Lucene index

RE: range vs. filter queries

2008-02-11 Thread Lance Norskog
Is it not possible to make a grid of your boxes? It seems like this would be a more efficient query: grid:N100_S50_E250_W412 This is how GIS systems work, right? Lance -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, February 11, 2008 6:13 PM To:

RE: Performance help for heavy indexing workload

2008-02-12 Thread Lance Norskog
1) autowarming: it means that if you have a cached query or similar, and do a commit, it then reloads each cached query. This is in solrconfig.xml 2) sorting is a pig. A sort creates an array of N integers where N is the size of the index, not the query. If the sorted field is anything but an

RE: upgrading to lucene 2.3

2008-02-12 Thread Lance Norskog
What will this improve? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, February 12, 2008 6:48 AM To: solr-user@lucene.apache.org Subject: Re: upgrading to lucene 2.3 On Feb 12, 2008 9:25 AM, Robert Young [EMAIL PROTECTED]

RE: solr to work for my web application

2008-02-13 Thread Lance Norskog
I strongly recommend that you switch from the latest nightly build to the Solr 1.2 release. Lance -Original Message- From: Thorsten Scherler [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 13, 2008 4:03 AM To: solr-user@lucene.apache.org Subject: Re: solr to work for my web

RE: Questions about filters and scoring

2008-02-18 Thread Lance Norskog
3) But then would not 'certificate anystopword found' match your phrase? I wound up making a separate index without stopwords just so that my phrase lookups would work. (I do not have the luxury of re-indexing, so now I'm stuck with this design even if there is a better one.) I also made one

RE: escaping special chars in query

2008-02-19 Thread Lance Norskog
You may also use Unicode escapes: \u for example. -Original Message- From: Reece [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 19, 2008 10:04 AM To: solr-user@lucene.apache.org Subject: Re: escaping special chars in query The bottom of the Lucene query syntax page:

RE: what's the schedule of the release of solr 1.3?

2008-03-01 Thread Lance Norskog
An alternative would be for someone to give a subversion checkout number against 1.3-dev which represents a solid working checkout. There are a lot of people using 1.3-dev in production, could you all please tell us what checkout number you are using? Cheers, Lance -Original Message-

Fastest Solr query

2008-03-01 Thread Lance Norskog
The fastest solr query I can find is any query on unused dynamic field name: unused_dynamic_field_s:3 Is there another query style that should be faster? See this line in http://wiki.apache.org/solr/SolrConfigXml pingQueryq=solramp;version=2.0amp;start=0amp;rows=0/pingQuery A better ping

RE: Use of get instead of post may be causing some problems

2008-03-06 Thread Lance Norskog
I just switched to doing posts for queries. We have a bunch of filters etc. and Solr stopped working on tomcat. -Original Message- From: Benson Margulies [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2008 12:43 PM To: solr-user Subject: Use of get instead of post may be causing

Finding an empty field

2008-03-13 Thread Lance Norskog
) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 159) Cheers, Lance Norskog

RE: sort by index id descending?

2008-03-19 Thread Lance Norskog
... another magic field name like score ... This could be done with a separate magic punctuation like $score, $mean (the mean score), etc.so $docid would work. Cheers, Lance -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 18, 2008 9:01 PM To:

Preferential boosting

2008-03-20 Thread Lance Norskog
with duration 3 above the others? These do not work (at least for me): *:* OR duration:3^2.0 duration:[* TO *] duration:3^2.0 duration:3^2.0 OR -duration:3 Thanks, Lance Norskog

RE: Preferential boosting

2008-03-20 Thread Lance Norskog
at 3:13 PM, Lance Norskog [EMAIL PROTECTED] wrote: Suppose I have a schema with an integer field called 'duration'. I want to find all records, but if the duration is 3 I want those records to be boosted. The index has 10 records, with duration between 2 and 4. What is the query

RE: stopwords and phrase queries

2008-03-21 Thread Lance Norskog
. We solved this problem by making a separate indexed field with a simplified text type: no stopwords. Phrase searches go against the 'rawfield' and word searches go against it first. You may want to also filter out punctuation or Sound Of Music will not bring up Sound Of Music! Cheers, Lance

RE: How to index multiple sites with option of combining results in search

2008-03-26 Thread Lance Norskog
In fact, 55m records works fine in Solr; assuming they are small records. The problem is that the index files wind up in the tens of gigabytes. The logistics of doing backups, snapping to query servers, etc. is what makes this index unwieldy, and why multiple shards are useful. Lance

RE: synonyms

2008-03-28 Thread Lance Norskog
Lucas- Your examples are Portuguese and Spanish. You might find a Spanish-language stemmer that follows the very rigid conjugation in Spanish (and I'm assuming in Portuguese as well). Spanish follows conjugation rules that embed much more semantics than English, so a huge number of synonyms can

Facet Query

2008-04-11 Thread Lance Norskog
What do facet queries do that is different from the regular query? What is a use case where I would use a facet.query in addition to the regular query? Thanks, Lance Norskog From the wiki: http://wiki.apache.org/solr/SimpleFacetParameters#head-529bb9b985632b36cbd46 a37bde9753772e47cdd

Meta: Mail quirk of solr-user

2008-04-11 Thread Lance Norskog
Hi- When I reply to a solr-user mail, the To: address is the sender instead of solr-user. Didn't it used to be solr-user? Lance

Lucene Modules - LucQE [lucky] Lucene Query Expansion Module

2008-04-24 Thread Lance Norskog
http://lucene-qe.sourceforge.net/ This is a much smarter technique for doing query expansion with synonyms, using Rocchio's Algorithm. Has anyone tried to shoehorn this into Solr? It's a little weird: it needs an analyser, a searcher, and a similarity function. It should be possible to refactor

RE: Solr with Auto-suggest

2008-04-25 Thread Lance Norskog
This what the spellchecker does. It makes a separate Lucene index of n-gram letters and searches those. Works pretty well and it is outside the main index. I did an experimental variation indexing word pairs as phrases, and it worked well too. Lance Norskog -Original Message- From: Ryan

MultiCore on Wiki

2008-04-30 Thread Lance Norskog
The MultiCore writeup on the Wiki (http://wiki.apache.org/solr/MultiCore) says: ... Configuration-core-dataDir The data directory for a given core. (optional) How can a core not have its own dataDir? What happens if this is not set? Cheers, Lance Norskog

MultiCore and Distributed Search

2008-05-01 Thread Lance Norskog
Is Distributed Search () in the main line yet? Is it considered useable? And, how closely does it match the Wiki entry? https://issues.apache.org/jira/browse/SOLR-303 https://issues.apache.org/jira/browse/SOLR-303 http://wiki.apache.org/solr/DistributedSearch

RE: Help optimizing

2008-05-06 Thread Lance Norskog
One cause of out-of-memory is multiple simultaneous requests. If you limit the query stream to one or two simultaneous requests, you might fix this. No, Solr does not have an option for this. The servlet containers have controls for this that you have to dig very deep to find. Lance Norskog

RE: Help optimizing

2008-05-06 Thread Lance Norskog
There are two integer types, 'sint' and 'integer'. On an integer, you cannot do a range check (that makes sense). But! Lucene sort makes an array of integers for every record. On an integer field, it creates an integer array. On any other kind of field, each array item has a lot more. So, if you

RE: Multiple Index creation

2008-05-07 Thread Lance Norskog
To search against multiple Solrs, you can use http://wiki.apache.org/solr/DistributedSearch in Solr 1.3. This is not tied to the MultiCore feature. -Original Message- From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 06, 2008 9:28 PM To:

RE: Solr feasibility with terabyte-scale data

2008-05-09 Thread Lance Norskog
page 'SchemaDesignTips'. Cheers, Lance Norskog

RE: Indexing HTML Content

2008-05-22 Thread Lance Norskog
The HTMLStripReader tool worked very well for us. It handles garbled HTML well. The only hole we found was that it does not find alt-text attributes for images. Also, note that this code is written as a Java Reader class rather than a Solr class. This makes it useful for other projects. Given the

RE: Announcement of Solr Javascript Client

2008-05-27 Thread Lance Norskog
Nice! Another technique for the denial-of-service problem: you can regulate the number of simultaneous active servlets. Most servlet containers have a configuration for this somewhere. This will slow down legit users but will still avoid killing the server machine. -Original Message-

RE: How to describe 2 entities in dataConfig for the DataImporter?

2008-05-30 Thread Lance Norskog
You might try creating your whole transform as an SQL database view rather than with the Solr transformer toolkit. This would also make it easier to directly examine the data to be indexed. Lance -Original Message- From: Julio Castillo [mailto:[EMAIL PROTECTED] Sent: Thursday, May 29,

RE: Num docs

2008-06-07 Thread Lance Norskog
This appears in the stats.jsp page. Both the total of document 'slots' and the number of live documents. -Original Message- From: Marcus Herou [mailto:[EMAIL PROTECTED] Sent: Saturday, June 07, 2008 2:09 AM To: solr-user@lucene.apache.org Subject: Num docs Hi. Is there a way of

XSL scripting

2008-06-09 Thread Lance Norskog
This started out in the num-docs thread, but deserves its own. And a wiki page. There is a more complex and general way to get the number of documents in the index. I run a query against solr and postprocess the output with an XSL script. Install this xsl script as home/conf/xslt/numfound.xsl.

RE: UnicodeNormalizationFilterFactory

2008-06-24 Thread Lance Norskog
ISOLatin1AccentFilterFactory works quite well for us. It solves our basic euro-text keyboard searching problem, where protege should find protégé. (protege with two accents.) -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 24, 2008 4:05 PM To:

Simple mistake in Wiki

2008-07-24 Thread Lance Norskog
Should this refer to facet.mincount instead of facet.limit? The default is true if facet.limit is greater than 0, false otherwise. http://wiki.apache.org/solr/SimpleFacetParameters facet.sort Set to true, this parameter indicates that constraints should be sorted by their count. If false,

RE: Out of memory on Solr sorting

2008-07-29 Thread Lance Norskog
A sneaky source of OutOfMemory errors is the permanent generation. If you add this: -XX:PermSize=64m -XX:MaxPermSize=96m You will increase the size of the permanent generation. We found this helped. Also note that when you undeploy a war file, the old deployment has permanent storage

RE: Administrative questions

2008-08-13 Thread Lance Norskog
I wrote shell tasks that start, stop, and heartbeat the server and run them from cron (unix). Heartbeat means: 1) is the tomcat even running, 2) does tomcat return the Solr admin page, 3) does Solr return a search. For an indexer, 4) does solr return from a commit. Stopping the server via the

RE: .wsdl for example....

2008-08-18 Thread Lance Norskog
Various Java web service libraries come with 'wsdl2java' and 'java2wsdl' programs. You just run 'java2wsdl' on the Java soap description. -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, August 18, 2008 6:53 PM To: solr-user@lucene.apache.org Subject: Re:

RE: shards and performance

2008-08-21 Thread Lance Norskog
We found that searching by itself was faster with the Distributed multicore search over three cores in the same servlet engine, than one just one core. Faceting and sorting use more memory than simple searches, and we could not do faceting on our one simple index. We needed this for data

RE: How to know if a field is null?

2008-08-23 Thread Lance Norskog
And, a negative query does not work, so if this is the only clause, you have to say: *:* AND -field[* TO *] Where *:* is a special code for all documents. It's like learning a language: there is the normal grammar, there are the unusual cases, and then there are the bizarre slang expressions.

RE: How to know if a field is null?

2008-08-25 Thread Lance Norskog
Has this been fixed in solr 1.3? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, August 25, 2008 5:44 AM To: solr-user@lucene.apache.org Subject: Re: How to know if a field is null? On Mon, Aug 25, 2008 at 5:33 AM, Erik Hatcher

  1   2   3   4   5   6   7   8   9   10   >