Re: Best use of wildcard searches

2007-08-09 Thread Erick Erickson
I just saw an e-mail from Yonik suggesting escaping the space. I know so little about Solr that all I can do is parrot Yonik... Erick On 8/8/07, Matthew Runo [EMAIL PROTECTED] wrote: OK. So a followup question.. ?q=department_exact:Apparel%3EMen's%

Re: Solr commit takes too long

2007-09-11 Thread Erick Erickson
Is there any chance you're optimizing each time you commit? Erick On 9/10/07, Marius Hanganu [EMAIL PROTECTED] wrote: Hi, We're having a problem when commiting to SOLR. Our application commits right after each update - we need the data to be available instantaneously. The index' size is

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Erick Erickson
DISCLAIMER: This is from a Lucene-centric viewpoint. That said, this may be useful For your line number, page number etc perspective, it is possible to index special guaranteed-to-not-match tokens then use the termdocs/termenum data, along with SpanQueries to figure this out at search time.

Re: Solr Highlighting, word index

2007-11-30 Thread Erick Erickson
It's good you already have the data because if you somehow got it from some sort of calculations I'd have to tell my product manager that the feature he wanted that I told him couldn't be done with our data was possible after all G... About page breaks: Another approach to paging is to index a

Re: Issues using keyword searching and facet search together in a search operation

2007-12-04 Thread Erick Erickson
I can't answer the question, but I *can* guarantee that the people who can will give you *much* better responses if you include some details. Like which analyzers you use, how you submit the query, samples of the two queries that work and the one that doesn't. Imagine you're on the receiving end

Re: Tomcat6 env-entry

2007-12-05 Thread Erick Erickson
The beautiful thing about a wiki is that *anybody* can update them. It's especially useful if someone who's just struggled through the issues can write something up since the pain is still fresh G. Especially if you're better than I am about writing things down All of which leads me to ask if

Re: Searching for two terms together in a multiValued TextField

2007-12-06 Thread Erick Erickson
Scoring isn't that simple, but don't ask me details G.. This link might be useful: http://lucene.apache.org/java/docs/scoring.html Erick On Dec 6, 2007 2:15 PM, Phillip Farber [EMAIL PROTECTED] wrote: Hello Hoss, I appreciate your detailed response. I think I like your second alternative

Re: Solr, Multiple processes running

2007-12-11 Thread Erick Erickson
member's index (or indices - some users have multiple indices) separate. I can't give out the total number of Simpy users, but I can tell you it is weeell beyond 1000 :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Erick Erickson

Re: Solr, Multiple processes running

2007-12-11 Thread Erick Erickson
How much data are we talking about here? Because it seems *much* simpler to just index a field with each document indicating the user and then just AND that user's ID in with your query. Or think about facets (although I admit I don't know enough about facets to weigh in on its merits, it's just

Re: Retrieving Tokens

2007-12-20 Thread Erick Erickson
I think that what Yonik wants is a higher-level response. *Why* do you want to process the tokens later? What is the use case you're trying to satisfy? Best Erick On Dec 20, 2007 1:37 AM, Rishabh Joshi [EMAIL PROTECTED] wrote: What are you trying to do with the tokens? Yonik, we wanted a

Re: Making stemming dynamic at query time

2007-12-20 Thread Erick Erickson
Well, you *still* have to store the stemmed and unstemmed version in your index, otherwise you can't distinguish between, say, run and running because you'd have indexed run both times. But you could think about using special tokenizing. That is, for a word that's stemmed, index a stem form.

Re: solr and NFS in distributed deployment, real time indexing and real time searching

2007-12-20 Thread Erick Erickson
You might try searching the Lucene users list for NFS. I know there has been frequent discussion of locking issues etc. But since I'm not using an NFS mount, I just glossed over them. Also, my recollection is that many (most? all?) of the underlying issues have been dealt with with new versions

Re: Sorting within groups of equal-scored result documents

2008-01-11 Thread Erick Erickson
I did something like this in low-level Lucene using FieldSortedHitQueue. The searchable lucene users list should have more details. Don't know how to do it in SOLR though... Erick On Jan 11, 2008 1:04 PM, Jörg Kiegeland [EMAIL PROTECTED] wrote: Hello, I have a query of the form (a or b).

Re: field:(-null) returns records where field was not specified

2008-01-14 Thread Erick Erickson
Have you seen this page? http://lucene.apache.org/java/docs/queryparsersyntax.html From that page: Note: The NOT operator cannot be used with just one term. For example, the following search will return no results: NOT jakarta apache Erick On Jan 14, 2008 9:30 AM, Karen Loughran [EMAIL

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
I don't think this is a StringBuilder limitation, but rather your Java JVM doesn't start with enough memory. i.e. -Xmx. In raw Lucene, I've indexed 240M files Best Erick On Jan 16, 2008 10:12 AM, David Thibault [EMAIL PROTECTED] wrote: All, I just found a thread about this on the

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
P.S. Lucene by default limits the maximum field length to 10K tokens, so you have to bump that for large files. Erick On Jan 16, 2008 11:04 AM, Erick Erickson [EMAIL PROTECTED] wrote: I don't think this is a StringBuilder limitation, but rather your Java JVM doesn't start with enough memory

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
On 1/16/08, Erick Erickson [EMAIL PROTECTED] wrote: P.S. Lucene by default limits the maximum field length to 10K tokens, so you have to bump that for large files. Erick On Jan 16, 2008 11:04 AM, Erick Erickson [EMAIL PROTECTED] wrote: I don't think

Re: Some sort of join in SOLR?

2008-01-17 Thread Erick Erickson
I would *strongly* encourage you to store them together as one document. There's no real method of doing DB like joins in the underlying Lucene search engine. But that's generic advice. The question I have for you is What's the big deal about coordinating the sources? That is, you have to have

Re: Multisearching with Solr

2008-01-21 Thread Erick Erickson
You can always use the trunk build, but you'll have to check the status of SOLR-303 to be sure it's in the trunk... Here's a thread that discusses this... http://mail.google.com/mail/?zx=wmtcqx3ngeupshva=1#label/Solr/11799e3704804489 Best Erick On Jan 21, 2008 10:55 AM, David Pratt [EMAIL

Re: Multisearching with Solr

2008-01-21 Thread Erick Erickson
On Jan 21, 2008 11:34 AM, David Pratt [EMAIL PROTECTED] wrote: Hi Erick. Thank you for your reply. Unfortunately, I cannot access the link you provided. It this message from the solr-user list? Many thanks. Regards, David Erick Erickson wrote: You can always use the trunk build

Re: Solr feasibility with terabyte-scale data

2008-01-22 Thread Erick Erickson
Just to add another wrinkle, how clean is your OCR? I've seen it range from very nice (i.e. 99.9% of the words are actually words) to horrible (60%+ of the words are nonsense). I saw one attempt to OCR a family tree. As in a stylized tree with the data hand-written along the various branches in

Re: Inverted Search Engine

2008-01-23 Thread Erick Erickson
As chance would have it, this was just discussed over on the lucene user's list. See the thread.. Inverted search / Search on profilenetBest Erick On Jan 23, 2008 1:38 PM, George Everitt [EMAIL PROTECTED] wrote: Verity had a function called profiler which was essentially an inverted search

Re: Is it possible to add synonyms run time?

2008-01-25 Thread Erick Erickson
? Thanks for all the help! Ravish On Jan 25, 2008 3:59 PM, Erick Erickson [EMAIL PROTECTED] wrote: To me, it's really a question of where the work should be done given your problem space. Injecting synonyms at index time allows the queries to be simpler/faster. Injecting the synonyms

Re: Is it possible to add synonyms run time?

2008-01-25 Thread Erick Erickson
To me, it's really a question of where the work should be done given your problem space. Injecting synonyms at index time allows the queries to be simpler/faster. Injecting the synonyms at query time gets complex but is more flexible. As always, it's a time/space tradeoff. If you're willing to

Re: Search result not coming for normal special characters...

2008-02-08 Thread Erick Erickson
What analyzers are you using? Many analyzers (both index and query time) will remove non-alpha characters. Best Erick On Feb 7, 2008 1:14 PM, nithyavembu [EMAIL PROTECTED] wrote: Hi All, Now i am facing problem in special character search. I tried with the following special characters

Re: Search result not coming for normal special characters...

2008-02-09 Thread Erick Erickson
When in doubt, use WhitespaceAnalyzer and build up from there. It's the simplest. Look at the Lucene docs for what the various analyzers do under the covers. Note: WhitespaceAnalyzer does NOT transform to lowercase, you have to do that yourself or compose your own analyzer. Erick On Feb 9,

Re: Performance help for heavy indexing workload

2008-02-12 Thread Erick Erickson
Well, the *first* sort to the underlying Lucene engine is expensive since it builds up the terms to sort. I wonder if you're closing and opening the underlying searcher for every request? This is a definite limiter. Disclaimer: I mostly do Lucene, not SOLR (yet), so don't *even* ask me how to

Re: Alpha numeric sort problem

2008-02-14 Thread Erick Erickson
I admit I know little about SOLR, but wouldn't an AlphaOnlySorter ignore the digits? Erick On Thu, Feb 14, 2008 at 3:51 AM, Mahesh Udupa [EMAIL PROTECTED] wrote: Hello, I have following entry in my title list: Content1 Content2 Content3 Content4 Content5 If I try to Sort it in

Re: quick question

2008-02-18 Thread Erick Erickson
Beating Hossman to the punch http://people.apache.org/~hossman/#threadhijackhttp://people.apache.org/%7Ehossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even

Re: Survey: How do you store your fields?

2008-03-21 Thread Erick Erickson
As always, it depends. Just from a complexity perspective, my first choice is to store everything in one repository. If I can store everything in Lucene, I'm a happy camper. If I *must* use a database, I'd prefer to store everything there if possible. I only use both if I can't avoid it because

Re: synonyms

2008-03-28 Thread Erick Erickson
Your problem might be solved by (from memory, so check it), using a filter for indexing that collapses flexed (accented etc?) characters. See IsoLatin1AccentFilter Best Erick On Tue, Mar 25, 2008 at 1:56 PM, Lucas F. A. Teixeira [EMAIL PROTECTED] wrote: Hello all, We r having some

Re: synonyms

2008-03-28 Thread Erick Erickson
. A. Teixeira [EMAIL PROTECTED] wrote: Thanks Erick, But its already being used :-( still looking for something :-) Thank you! []s, Lucas Erick Erickson wrote: Your problem might be solved by (from memory, so check it), using a filter for indexing that collapses flexed (accented

Re: multi-language searching with Solr

2008-05-05 Thread Erick Erickson
You might want to bounce over to the Lucene user's list and search for language. This topic has arisen many times and there's some good discussion. And have you searched the solr users list of language? I know it's turned up here as well. Best Erick On Mon, May 5, 2008 at 4:28 PM, Eli K [EMAIL

Re: Searching for empty fields

2008-05-07 Thread Erick Erickson
the really simple way is to index none for fields that are empty then just search on color:none. On Tue, May 6, 2008 at 9:06 PM, Brendan Grainger [EMAIL PROTECTED] wrote: Hi, Not sure if this is what you want, but to search for 'empty' fields we use something like this: (*:* AND

Re: Solr hardware specs

2008-05-09 Thread Erick Erickson
This still isn't very helpful. How big are the docs? How many fields do you expect to index? What is your expected query rate? You can get away with an old laptop if your docs are, say, 5K each and you only expect to query it once a day and have one text field. If each doc is 10M, you're

Re: How Special Character '' used in indexing

2008-05-09 Thread Erick Erickson
I don't see a semi-colon at the end of your entity reference, is that a typo? i.e. amp; On Fri, May 9, 2008 at 9:26 AM, Ricky [EMAIL PROTECTED] wrote: I have tried sending the 'amp' instead of '' like the following, field name =companyA amp K Inc/field. But i still get the same error entity

Re: slowdown after 15K queries

2008-06-02 Thread Erick Erickson
But are you sure you're not just masking the problem? That is, your limit may now be 90,000 queries... I always assume this kind of thing is a memory leak somewhere, have you any tools to monitor your memory consumption and see if that's ever-rising? Best Erick On Mon, Jun 2, 2008 at 10:38 AM,

Re: which type of fields are to be compressed

2008-07-15 Thread Erick Erickson
Compression is only relevant for the original text, not the indexed part. So in terms of searching, it's irrelevant. Where it is relevant is when you *fetch* the document (e.g. doe = hits.doc(32)), the de-compression work is done (for stored documents). Depending upon your app, this may or may

Re: Indexing time boosts on particular field

2008-08-05 Thread Erick Erickson
I think you want to boost specific clauses at *search* time, not index time. Something like adding a clause +CourseType:MATHMATICS^10 Best Erick On Tue, Aug 5, 2008 at 4:35 PM, Vicky_Dev [EMAIL PROTECTED]wrote: Hi Requirement: For given document , if course type = MATHMATICS then search

Re: Index size vs. number of documents

2008-08-13 Thread Erick Erickson
I'm surprised, as you are, by the non-linearity. Out of curiosity, what is your MaxFieldLength? By default only the first 10,000 tokens are added to a field per document. If you haven't set this higher, that could account for it. As far as I know, optimization shouldn't really affect the index

Re: Can I copy an index built on a Windows system to a Unix/Linux system?

2008-08-15 Thread Erick Erickson
I've done exactly this many times in straight Lucene. Since Solr is built on Lucene, I wouldn't anticipate any problems. Make sure your transfer is binary mode... Best Erick On Fri, Aug 15, 2008 at 8:02 AM, johnwarde [EMAIL PROTECTED] wrote: Hi, Can I copy an index built on a Windows

Re: Date field mystery

2008-09-15 Thread Erick Erickson
The guys who really know will be able to provide you much better feedback if you include: your field definitions probably your locale settings. And have you looked with Luke at your index to see what the data actually looks like for that field in that record? Is it possible that the date is

Re: How to copy a solr index to another index with a different schema collapsing stored data?

2008-09-17 Thread Erick Erickson
You *might* be able to reconstruct enough of the original documents from your indexes to create another without recrawling. I know Luke can reconstruct documents form an index, but for unstored data it's slow and may be lossy. But it may suit your needs given how long it takes to make your index

Re: Best practice advice needed!

2008-09-25 Thread Erick Erickson
How long does it take to build the entire index? Can you just rebuild it from scratch every night? That would be the simplest. Best Erick On Thu, Sep 25, 2008 at 12:48 PM, sundar shankar [EMAIL PROTECTED]wrote: Hi, We have an index of courses (about 4 million docs in prod) and we have a

Re: Does Solr Indexing Websites possible?

2008-10-01 Thread Erick Erickson
Have you looked at Nutch? It's built on top of Lucene and might be a better fit. But you simply must give more details about what your requirements to get a meaningful answer. Imagine *you* were reading your e-mail without knowing anything except the information contained in the message. How

Re: one particular doc in results should always come first for a particular query

2010-04-05 Thread Erick Erickson
Hmmm, how do you know which particular record corresponds to which keyword? Is this a list known at index time, as in this record should come up first whenever bonkers is the keyword? If that's the case, you could copy the magic keyword to a different field (say magic_keyword) and boost it right

Re: exact match coming as second record

2010-04-05 Thread Erick Erickson
What do you get back when you specify debugQuery=on? Best Erick On Mon, Apr 5, 2010 at 7:31 PM, Mark Fletcher mark.fletcher2...@gmail.comwrote: Hi, I am using the dismax handler. I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have boosted myfield^20.0. Even with such

Re: Searching Lucene Indexes with Solr

2010-04-07 Thread Erick Erickson
Copying from another answer to this question on the list (See how to deploy index on SOLR)... It is possible but you have to take care to match Solr's schema with the structure of documents in the Lucene index. The correct field names and query-analyzers should be configured in schema.xml HTH

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-07 Thread Erick Erickson
Well, for a quick trial using trunk, I had to remove the UnicodeNormalizationFactory, is that yours? But with that removed, I get the results you do, ASSUMING that you've set your default operator to AND in schema.xml... Believe it or not, it all changes and all your queries return a hit if you

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-08 Thread Erick Erickson
illustrating the behavior and maybe poke around to see if it's an easy fix. Thanks Erick On Thu, Apr 8, 2010 at 8:16 AM, Robert Muir rcm...@gmail.com wrote: Erick, this sounds like https://issues.apache.org/jira/browse/SOLR-1852 On Wed, Apr 7, 2010 at 10:04 PM, Erick Erickson erickerick

Re: numFound:0 when documents exists

2010-04-08 Thread Erick Erickson
We can't help with the information you've provided. Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Thu, Apr 8, 2010 at 7:23 AM, Pooja Verlani pooja.verl...@gmail.comwrote: Hi, In our search engine, we are getting numFound to be 0 for some queries where documents

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-08 Thread Erick Erickson
effects am I forgetting about? thanks, Demian -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, April 07, 2010 10:04 PM To: solr-user@lucene.apache.org Subject: Re: solr.WordDelimiterFilterFactory problem with hyphenated terms? Well

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-08 Thread Erick Erickson
. On Thu, Apr 8, 2010 at 10:01 AM, Erick Erickson erickerick...@gmail.com wrote: Your're right, it sure looks related. But according to that JIRA, it's fixed in trunk and I'm pretty sure I have a very recent version that I built from code I updated within the last few days. I'll

Re: SOLR Exact match problem - Punctuations, double quotes etc.

2010-04-15 Thread Erick Erickson
What analyzer is your field using at index and query time? See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersSome analyzers strip punctuation, some don't. Some lowercase, some don't. You can chain filters together to do

Re: SOLR Exact match problem - Punctuations, double quotes etc.

2010-04-16 Thread Erick Erickson
Well, I think that's part of your problem. WhitespaceAnalyzer does exactly what it says, splits on whitespace. So indexing carbon and searching carbon. won't generate a hit. If KeywordAnalyzer doesn't work for you, you could consider either using one of the Pattern* guys or write your own.

Re: Facet count problem

2010-04-18 Thread Erick Erickson
Can we see the actual field definitions from your schema file. Ahmet's question is vital and is best answered if you'll copy/paste the relevant configuration entries But based on what you *have* posted, I'd guess you're trying to facet on tokenized fields, which is not recommended. You might

Re: Help using boolean operators

2010-04-19 Thread Erick Erickson
If you're submitting this: field1 : This is a good string then you're searching in field1 ONLY for This. the tokens is, a good and string are being searched against your default search field as defined in your schema. Have you tried parenthesizing? Try the SOLR admin page for looking at

Re: LucidWorks Solr

2010-04-19 Thread Erick Erickson
This is a little bit of hijacking going on here, but It's algorithmic. That is, there isn't a list of variants that stem to the same infinitive, and your statement always the same infintive for any derivate of the word isn't quite what happens. Stemmers will always produce the same

Re: LucidWorks Solr

2010-04-19 Thread Erick Erickson
no big deal, just wanted to mention. On Mon, Apr 19, 2010 at 1:24 PM, dar...@ontrenet.com wrote: This is a little bit of hijacking going on here, but You are right. Accept my regrets. It's algorithmic. That is, there isn't a list of variants that stem to the same infinitive, and

Re: Help using boolean operators

2010-04-19 Thread Erick Erickson
?id you try parenthesizing: field1:(This is a good string) You can try lots of things easily by going to http://localhost:8983/solr/admin/form.jsp and clicking the debug enable checkbox... HTH Erick On Mon, Apr 19, 2010 at 12:23 PM, MitchK mitc...@web.de wrote: Erick, I am a little bit

Re: Help using boolean operators

2010-04-19 Thread Erick Erickson
earlier too. To test query parsing, submit your query to http://localhost:8983/solr/select?q=your_querydebugQuery=true and look at the parsed query output. Erik On Apr 19, 2010, at 6:45 PM, Erick Erickson wrote: ?id you try parenthesizing: field1:(This is a good string) You can try

Re: Help using boolean operators

2010-04-20 Thread Erick Erickson
:this +field1:good +field1:string Is that ok to do. Thanks, Sandhya -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 20, 2010 4:16 AM To: solr-user@lucene.apache.org Subject: Re: Help using boolean operators ?id you try parenthesizing: field1

Re: performance of million documents search

2010-04-25 Thread Erick Erickson
NGrams might help here, search the SOLR list for NGram and I think you'll find that this subject has been discussed several times... HTH Erick On Sat, Apr 24, 2010 at 9:26 PM, weiqi wang weiqi...@gmail.com wrote: Hi, I have about 2 million documents in my index. I want to search them by a

Re: Slow Date-Range Queries

2010-04-29 Thread Erick Erickson
Hmmm, what does the rest of your query look like? And does adding debugQuery=on show anything interesting? Best Erick On Thu, Apr 29, 2010 at 6:54 AM, Jan Simon Winkelmann winkelm...@newsfactory.de wrote: ((valid_from:[* TO 2010-04-29T10:34:12Z]) AND (valid_till:[2010-04-29T10:34:12Z TO

Re: Evangelism

2010-04-29 Thread Erick Erickson
This is a Lucene story, but may well apply... By the time I'd sent a request for assistance to the vendor of one of our search tools and received the reply you didn't give us the right license number, I'd found Lucene, indexed part of my corpus and run successful searches against it. And had

Re: Solr commit issue

2010-05-01 Thread Erick Erickson
The underlying IndexReader must be reopened. If you're searching for a document with a searcher that was opened before the document was indexed, it won't show up on the search results. I'm guessing that your statement that when you search for it with some test is coincidence, but that's just a

Re: Embedded Server and Webapp using same Index???

2010-05-01 Thread Erick Erickson
The problem here, I think, is that you're updating the index in a manner that the regular SOLR webapp doesn't know about. So the index changes without SOLR knowing it has to reopen the index to see the modifications. Something to try: curl http://localhost:8983/solr/update -F stream.body=' commit

Re: OutOfMemoryError when using query with sort

2010-05-03 Thread Erick Erickson
How many unique terms are in your sort field? On Sun, May 2, 2010 at 11:48 PM, Hamid Vahedi hvb...@yahoo.com wrote: I install 64 bit windows and my problem solved. also i using shard mode (100 M doc per machine with one solr instance) is there better solution? because i insert at least 5M doc

Re: SpellChecking

2010-05-03 Thread Erick Erickson
It would help a lot to see your actual config file, and if you provided a bit more detail about what failure looks like Best Erick On Mon, May 3, 2010 at 9:43 AM, Jan Kammer jan.kam...@mni.fh-giessen.dewrote: Hi there, I want to enable spellchecking, but i got many fields. I tried

Re: Commit takes 1 to 2 minutes, CPU usage affects other apps

2010-05-05 Thread Erick Erickson
The mail servers are often not too friendly with attachments, so people either inline configs or put them on a server and post the URL. HTH Erick On Wed, May 5, 2010 at 12:06 PM, Markus Fischer mar...@fischer.name wrote: Hi, On 05.05.2010 03:49, Chris Hostetter wrote: : Are you

Re: How to do partial beginning matches

2010-05-06 Thread Erick Erickson
There's really no connection between NGrams and *. NGrams can be used to handle hairy wildcard expressions, in particular searching for things like *blah* is one potential use of NGrams. But your problem is simple to solve without bothering with NGrams, just use the begin* syntax, no special

Re: How to do partial beginning matches

2010-05-06 Thread Erick Erickson
, or something? Thanks, Felix 2010/5/6 Erick Erickson erickerick...@gmail.com There's really no connection between NGrams and *. NGrams can be used to handle hairy wildcard expressions, in particular searching for things like *blah* is one potential use of NGrams. But your problem

Re: solr Query taking a huge time

2010-05-11 Thread Erick Erickson
You really have to give some more details about *why* you issue such a query and what you are measuring, search time? total response time (which would include network transmission)?. *Of course* matching 1.8M records will take some time.. especially if you're trying to return the entire set of

Re: negative numbers in range

2010-05-11 Thread Erick Erickson
We really need to see your schema definitions for the relevant field. For instance, if you're storing these as text you may just be losing the negative sign which would lead to all sorts of interesting failures.. Best Erick On Tue, May 11, 2010 at 9:53 AM, Christopher Gross

Re: how to patch solr-236 in mac os

2010-05-11 Thread Erick Erickson
In Eclipse (you *may* need to have the subclipse plugin installed), just right-click on the projectteamapply patch and follow the wizard HTH Erick On Tue, May 11, 2010 at 12:50 PM, Jonty Rhods jonty.rh...@gmail.com wrote: hi David, thanks for quick reply.. please give me full command. so

Re: Weird Behavior When Querying Field of Type String

2010-05-11 Thread Erick Erickson
What is the debug output of the query? That would shed some light on the issue... Best Erick On Tue, May 11, 2010 at 5:48 PM, Alex Wang aw...@crossview.com wrote: Hi, I am getting a weird behavior in my Solr (1.4) index: I have a field defined as follows: field name=productType

Re: Weird Behavior When Querying Field of Type String

2010-05-12 Thread Erick Erickson
immediately by reply e-mail and delete this message. On May 11, 2010, at 7:13 PM, Erick Erickson wrote: What is the debug output of the query? That would shed some light on the issue... Best Erick On Tue, May 11, 2010 at 5:48 PM, Alex Wang aw...@crossview.commailto: aw...@crossview.com

Re: Weird Behavior When Querying Field of Type String

2010-05-12 Thread Erick Erickson
the raw *indexed* terms from the admin console? I am not familiar with the admin console. Thanks, On May 12, 2010, at 10:18 AM, Erick Erickson wrote: Hmmm, nothing looks odd about that, except perhaps the casing. If you use the admin console to look at the raw terms, is productbean mixed case

Re: multi-valued associated fields

2010-05-12 Thread Erick Erickson
queries on the data you intent to. Regards Eric On Wed, May 12, 2010 at 3:12 PM, Erick Erickson erickerick...@gmail.com wrote: I'm not entirely sure this is germane, but there's absolutely no requirement that all documents in SOLR have the same fields. So it's possible for you to index

Re: Weird Behavior When Querying Field of Type String

2010-05-12 Thread Erick Erickson
tell me how to find the raw *indexed* terms from the admin console? I am not familiar with the admin console. Thanks, On May 12, 2010, at 10:18 AM, Erick Erickson wrote: Hmmm, nothing looks odd about that, except perhaps the casing. If you use the admin console to look at the raw terms

Re: Strange behavior for certain words

2010-05-12 Thread Erick Erickson
Hmmm, there's not much information to go on here. You might review this page: http://wiki.apache.org/solr/UsingMailingLists and post with more information. At minimum, the field definitions, the query output (include debugQuery=on), perhaps what comes out of the analysis admin page for both

Re: SolrUser - ERROR:SCHEMA-INDEX-MISMATCH

2010-05-13 Thread Erick Erickson
Not at present, you must re-index your documents when you redefine your schema to change existing documents. Field updating of documents already indexed is being worked on, but it's not available yet. Best Erick On Thu, May 13, 2010 at 3:58 PM, Anderson vasconcelos anderson.v...@gmail.com

Re: SolrUser - Reindex

2010-05-13 Thread Erick Erickson
Probably your analyzer is removing the @ symbol, it's hard to say if you don't include the relevant parts of your schema. This page might help: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersBest Erick On Thu, May 13, 2010

Re: SolrUser - Reindex

2010-05-13 Thread Erick Erickson
HTMLStripStandardTokenizerFactory ? Thanks 2010/5/13 Erick Erickson erickerick...@gmail.com Probably your analyzer is removing the @ symbol, it's hard to say if you don't include the relevant parts of your schema. This page might help: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http

Re: SolrUser - ERROR:SCHEMA-INDEX-MISMATCH

2010-05-13 Thread Erick Erickson
, and this works. This is the way that i must go on? (This could generate a trouble in the future?) What's the advantages to set the field type to long? I must mantain this field in string type? Thanks 2010/5/13 Erick Erickson erickerick...@gmail.com Not at present, you must re-index your

Re: match to non tokenizable word (helloworld)

2010-05-16 Thread Erick Erickson
You might want to look at ngrams and/or shingles. In this case I suspect that ngrams are better suited, I don't think shingles applies with the direction you stated, but your problem description is so short I thought I'd mention it. Although your collection of words can work (think synonyms) if

Re: Solr Search problem; cannot search the existing word in the index content

2010-05-17 Thread Erick Erickson
A couple of things: 1 try searching with debugQuery=on attached to your URL, that'll give you some clues. 2 It's really worthwhile exploring the admin pages for a while, it'll also give you a world of information. It takes a while to understand what the various pages are telling you, but you'll

Re: sort by field length

2010-05-24 Thread Erick Erickson
Are you sure you want to recompute the length when sorting? It's the classic time/space tradeoff, but I'd suggest that when your index is big enough to make taking up some more space a problem, it's far too big to spend the cycles calculating each term length for sorting purposes considering you

Re: sort by field length

2010-05-25 Thread Erick Erickson
this way? The relevance calculations already factor in both term frequency and field length. What's the use-case for sorting by field length given the above? Best Erick On Tue, May 25, 2010 at 3:40 AM, Sascha Szott sz...@zib.de wrote: Hi Erick, Erick Erickson wrote: Are you sure you want

Re: question about indexing...

2010-05-25 Thread Erick Erickson
Don't forget to re-index after you make the change Lance suggested... Erick On Tue, May 25, 2010 at 4:51 PM, Lance Norskog goks...@gmail.com wrote: Change type=string to type=text. This causes the field to be analyzed and then searching on words finds the document. On Tue, May 25, 2010 at

Re: sort by field length

2010-05-26 Thread Erick Erickson
leave as an exercise for the reader. I really think you're reinventing the wheel here and looking at the default scoring mechanism would be a good use of your time. Best Erick On Wed, May 26, 2010 at 4:04 AM, Sascha Szott sz...@zib.de wrote: Hi Erick, Erick Erickson wrote: Ah, I may have

Re: Does SOLR Allow q= (A or B) AND (C or D)?

2010-05-27 Thread Erick Erickson
You can get a lot of mileage out of the admin analysis page and the full interface page, especially by turning on the debug option on the admin full interface page. It takes a bit of practice to read the debug output, but it's really, really, really worth it Best Erick On Thu, May 27, 2010

Re: Prefix-Search with Stopwords - no results?

2010-05-28 Thread Erick Erickson
Hmmm, I don't really see the problem here. I'll have to use English examples... Searching on the* (assuming the is a stopword) will search on (them OR theory OR thespian) assuming those three words are in your index. It will NOT search on the. So I think you're OK, or are you seeing anomalous

Re: Changing schema without having to reindex

2010-05-28 Thread Erick Erickson
No. You can add new documents which will reflect the new schema, but you can't retroactively update your index. In your specific example, it's not possible to losslessly recreate the data to store from the indexed fields. Consider stopword removal, or lowercasing. HTH Erick On Fri, May 28, 2010

Re: Storing different entities in Solr

2010-05-28 Thread Erick Erickson
You most certainly *can* store the many-many relationship, you are just denormalizing your data. I know it goes against the grain of any good database admin, but it's very often a good solution for a search application. You've gotta forget almost everything you learned about how data *should* be

Re: Prefix-Search with Stopwords - no results?

2010-05-29 Thread Erick Erickson
Well, the index does, indeed, get bigger. But the searches get much faster because there's no term expansion going on. It's another time/space tradeoff. I'm afraid you'll have to just experiment a bit to see if this is an acceptable tradeoff. in your particular situation The real memory hit

Re: Luke browser does not show non-String Solr fields?

2010-05-30 Thread Erick Erickson
The Solr admin page as access to (and uses) the field definitions you've put in the config file. Luke has no knowledge of this configuration, you have to choose your analyzer from the drop down and select the one closest to what's in your config file for SOLR. Are you perhaps using an analyzer in

Re: Luke browser does not show non-String Solr fields?

2010-05-30 Thread Erick Erickson
that the default is PersianAnalyzer. I switched to StandardAnalyzer and tried a few different Lucene Compatibility values but it didn't help :-( On Sun, May 30, 2010 at 4:40 AM, Erick Erickson erickerick...@gmail.com wrote: The Solr admin page as access to (and uses) the field definitions

Re: Luke browser does not show non-String Solr fields?

2010-05-30 Thread Erick Erickson
to show non-string values, too. On Sun, May 30, 2010 at 10:57 AM, Erick Erickson erickerick...@gmail.com wrote: Then you have to provide a lot more detail about what you did and what you're seeing and what you think you should see. You might review this page: http://wiki.apache.org/solr

Re: index growing with updates

2010-06-03 Thread Erick Erickson
Assuming your config is set up to replace unique keys, you're really doing a delete and an add (under the covers). It could very well be that the deleted version of the document is still in your index taking up space and will be until it is purged. HTH Erick On Thu, Jun 3, 2010 at 10:22 AM,

  1   2   3   4   5   6   7   8   9   10   >