SolrJ/Solr version mismatch error

2012-12-11 Thread Sean Timm
I ran into this today it took me longer than it should have to figure out the problem, so I wanted to write and share my experience to save someone else some time. A web search and a search through the mail archives didn't provide any elucidation. If you run SolrJ 4.0.0 BETA connecting to

Re: Does SOLR provide a java class to perform url-encoding

2010-05-25 Thread Sean Timm
Java provides one. You probably want to use utf-8 as the encoding scheme. http://java.sun.com/javase/6/docs/api/java/net/URLEncoder.html Note you also will want to strip or escape character that are meaningful in the Solr/Lucene query syntax.

Re: AutoSuggest with custom sorting

2010-05-04 Thread Sean Timm
Chris Hostetter wrote: this can be accomplished by indexing a numeric field containing the length of the field as a number, and then doing a secondary sort on it. the fieldNorm typically takes care of this sort of thing for you, but is more of a generalized concept, and doesn't give you exact

DataImportHandler

2010-02-08 Thread Sean Timm
It looks like the dataimporter.functions.escapeSql(String) function escapes quotes, but fails to escape '\' characters which are problematic especially when the field value ends in a \. Also, on failure, I get an alarming notice of a possible resource leak. I couldn't find Jira issues for

[Fwd: [ANN] Solr 1.4.0 Released]

2009-11-10 Thread Sean Timm
---BeginMessage--- Apache Solr 1.4 has been released and is now available for public download! http://www.apache.org/dyn/closer.cgi/lucene/solr/ Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful

Re: [Fwd: [ANN] Solr 1.4.0 Released]

2009-11-10 Thread Sean Timm
Apologies. Meant to forward the message to a corporate internal list. I blame my e-mail address auto-complete. ;-) Sean Timm wrote: Subject: [ANN] Solr 1.4.0 Released From: Grant Ingersoll gsing...@apache.org Date

Re: how to pronounce solr

2009-05-08 Thread Sean Timm
This is the funniest e-mail I've had all day. SOLer is the typical pronunciation, but I've heard solAR as well. It's the description of pirate-like that made me chuckle. -Sean Charles Federspiel wrote: Hi, My company is evaluating different open-source indexing and search software and we

Re: what crawler do you use for Solr indexing?

2009-03-06 Thread Sean Timm
We too use Heritrix. We tried Nutch first but Nutch was not finding all of the documents that it was supposed to. When Nutch and Heritrix were both set to crawl our own site to a depth of three, Nutch missed some pages that were linked directly from the seed. We ended up with 10%-20% fewer pages

Re: what crawler do you use for Solr indexing?

2009-03-06 Thread Sean Timm
jobs. As of 'Heritrix writer', could you write the crawling results to XML or do you think inserting into MySQL would be better? And where can I find documentation for creating Heritrix writer? I really want to make it work for Solr. Thanks! Tony On Fri, Mar 6, 2009 at 8:08 AM, Sean Timm

Re: Query regarding setTimeAllowed(Integer) and setRows(Integer)

2009-02-18 Thread Sean Timm
- From: Sean Timm [mailto:tim...@aol.com] Sent: Wednesday, February 18, 2009 1:00 AM To: solr-user@lucene.apache.org Subject: Re: Query regarding setTimeAllowed(Integer) and setRows(Integer) Jana, Kumar Raja wrote: 2. If I set SolrQuery.setTimeAllowed(2000) Will this kill query processing

Re: Query regarding setTimeAllowed(Integer) and setRows(Integer)

2009-02-17 Thread Sean Timm
Jana, Kumar Raja wrote: 2. If I set SolrQuery.setTimeAllowed(2000) Will this kill query processing after 2 secs? (I know this question sounds silly but I just want a confirmation from the experts J That is the idea, but only some of the code is within the timer. So, there are cases

Re: [VOTE] Community Logo Preferences

2008-11-26 Thread Sean Timm
https://issues.apache.org/jira/secure/attachment/12394165/solr-logo.png https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg

Re: Solr security

2008-11-17 Thread Sean Timm
http://issues.apache.org/jira/browse/SOLR-527 (An XML commit only request handler) is pertinent to this discussion as well. -Sean Ian Holsman wrote: There was a patch by Sean Timm you should investigate as well. It limited a query so it would take a maximum of X seconds to execute

Re: Solr security

2008-11-17 Thread Sean Timm
I believe the Solr replication scripts require POSTing a commit to read in the new index--so at least limited POST capability is required in most scenarios. -Sean Lance Norskog wrote: About that read-only switch for Solr: one of the basic HTTP design guidelines is that GET should only return

date math in bf?

2008-10-15 Thread Sean Timm
Is it possible to do date math in a FunctionQuery? This doesn't work, but I'm looking for something like: bf=recip((NOW-updated),1,200,10) when using DisMax to get the elapsed time between NOW and when the document was updated (where updated is a Date field). I know one can do

Re: Solr vs. SOLR

2008-10-03 Thread Sean Timm
I heard a story that the 'r' in Solr back in the CNet days stood for Resin (the servlet container). True? Clearly the w/ replication makes more sense now as probably both Tomcat and Jetty deployments are more common now. Just curious, Sean Chris Hostetter wrote: : Can we spell out the

Re: dismax - undefined field exception

2008-09-22 Thread Sean Timm
Add echoParams=all to your URL and look for the cat field in one of the passed parameters. Specifically, in pf and qf. These can be defaulted in the solrconfig.xml file. -Sean Jon Drukman wrote: whenever i try to use qt=dismax i get the following error: Sep 22, 2008 11:50:48 AM

Re: problem index accented character with release version of solr 1.3

2008-09-18 Thread Sean Timm
From the XML 1.0 spec.: Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646. So, \005 is not a legal XML character. It appears the old StAX implementation was more lenient than it should have been and Woodstox is doing the

Re: admin/logging page and Effective level

2008-09-17 Thread Sean Timm
Chris-- Sorry, your e-mail got lost in the noise. You're right, there does appear to be a problem. I can reproduce this by setting the root level to OFF and then setting it back to INFO. I'll take a look into it. Have you opened a JIRA issue for this? -Sean Chris Hostetter wrote: I'm

Re: admin/logging page and Effective level

2008-09-17 Thread Sean Timm
I didn't see a bug on this issue, so I opened SOLR-774 with a patch to fix this. -Sean Sean Timm wrote: Chris-- Sorry, your e-mail got lost in the noise. You're right, there does appear to be a problem. I can reproduce this by setting the root level to OFF and then setting it back

Re: What's the bottleneck?

2008-09-17 Thread Sean Timm
://issues.apache.org/jira/browse/SOLR-502 (timeout searches) and https://issues.apache.org/jira/browse/LUCENE-997 This is committed on trunk and will be in 1.3. Don't ask me how it works, b/c I haven't tried it yet, but maybe Sean Timm or someone can help out. I'm not sure if returns partial

Re: How to boost the score higher in case user query matches entire field value than just some words within a field

2008-08-21 Thread Sean Timm
Length normalization in the Similarity class will generally favor shorter fields. For example, with the DefaultSimilarity, the length norm for a 2 term field is 0.625. For a three term field it is 0.5. The norm is multiplied by the score. I say generally will favor because the length norm

Re: How to boost the score higher in case user query matches entire field value than just some words within a field

2008-08-21 Thread Sean Timm
score length norm function, Doc2's score will be multiplied by 1.0f and Doc1 by 0.875f resulting in the desired behavior. Doc1: Chevrolet Tahoe Hybrid 2008 Doc2: Chevrolet Tahoe 2008 -Sean Mark Miller wrote: Sean Timm wrote: To solve this, we wrote our own Similarity class which extends

Re: How to boost the score higher in case user query matches entire field value than just some words within a field

2008-08-21 Thread Sean Timm
https://issues.apache.org/jira/browse/LUCENE-1360 Simon Hu wrote: I am definitely interested in trying your Similarity class. Can you please post the patch in jira? thanks -Simon Sean Timm wrote: In the example below, Doc1, and Doc2 will all have the same score for the query

Re: TimeExceededException

2008-07-31 Thread Sean Timm
This should be part of the lucene-core-2.4-dev.jar which is in lucene/solr/trunk/lib % unzip -l lucene-core-2.4-dev.jar | grep TimeLimitedCollector 251 06-19-08 08:57 org/apache/lucene/search/TimeLimitedCollector$1.class 1564 06-19-08 08:57

Re: Vote on a new solr logo

2008-07-31 Thread Sean Timm
So how about a run off between #2 (straight line family member with most votes) and #3 (normal font)? -Sean Yonik Seeley wrote: OK, so looking at family totals: 33 - the curvy family (9,10,11) 36 - #3 (normal font) 64 - straight line family Again 36 and 64 aren't directly comparable since

Re: SOLR Timeout

2008-07-10 Thread Sean Timm
If you have a number of long queries running, your system can become CPU bound resulting in low throughput and high response times. There are many ways you can construct a query that will cause it to take a long time to process, but the SOLR-502 patch can only address the ones where the work

Re: dismax query parser crash on double dash

2008-06-03 Thread Sean Timm
I can take a stab at this. I need to see why SOLR-502 isn't working for Otis first though. -Sean Bram de Jong wrote: On Tue, Jun 3, 2008 at 1:26 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: +1. Fault tolerance good. ParseExceptions bad. Can you open a JIRA issue for it? If you feel

Re: dismax query parser crash on double dash

2008-06-02 Thread Sean Timm
It seems that the DisMaxRequestHandler tries hard to handle any query that the user can throw at it. From http://wiki.apache.org/solr/DisMaxRequestHandler: Quotes can be used to group phrases, and +/- can be used to denote mandatory and optional clauses ... but all other Lucene query parser

Re: Caching of DataImportHandler's Status Page

2008-04-25 Thread Sean Timm
Noble-- You should probably include SOLR-505 in your DataImportHandler patch. -Sean Noble Paul നോബിള്‍ नोब्ळ् wrote: It is caused by the new caching feature in Solr. The caching is done at the browser level . Slr just sends appropriate headers. .We had raised an issue to disable that. BTW

Re: too many queries?

2008-04-16 Thread Sean Timm
Jonathan Ariel wrote: How do you to partition the data to a static set and a dynamic set, and then combining them at query time? Do you have a link to read about that? One way would be distributed search (SOLR-303), but distributed idf is not part of the current patch anymore, so you may

Re: Solr interprets UTF-8 as ISO-8859-1

2008-03-31 Thread Sean Timm
Send the URL with the å character URL encoded as %C3%A5. That is the UTF-8 URL encoding. http://myserver:8080/solrproducts/select/?q=all_SV:ljusbl%C3%A5+status:onlinefl=id%2Cartno%2Ctitle_SV%2CtitleSort_SV%2Cdescription_SV%2Csort=titleSort_SV+asc,id+ascstart=0q.op=ANDrows=25 -Sean Daniel

Re: stopwords and phrase queries

2008-03-25 Thread Sean Timm
Music is another domain where this is a real problem. E.g., The The, The Who, not to mention the song and album names. -Sean Walter Underwood wrote: We do a similar thing with a no stopword, no stemming field. There are a surprising number of movie titles that are entirely stopwords. Being

Re: Dedup results on the fly?

2008-02-27 Thread Sean Timm
Take a look at https://issues.apache.org/jira/browse/SOLR-236 Field Collapsing. -Sean Head wrote: I would like to be able to tell SOLR to dedup the results based on a certain set of fields. For example, I like to return only one instance of the set of documents that have the same 'name' and

Re: DisMax deprecated?

2008-02-19 Thread Sean Timm
That is one of my peeves with the Solr Javadocs. Few of the @deprecated tags (if any) tell what you should be using instead. In this particular case, the answer is very simple. The class merely moved to a new package: from

Re: LowerCaseFilterFactory and spellchecker

2007-11-29 Thread Sean Timm
It seems the best thing to do would be to do a case-insensitive spellcheck, but provide the suggestion preserving the original case that the user provided--or at least make this an option. Users are often lazy about capitalization, especially with search where they've learned from web search

Re: leading wildcards

2007-11-15 Thread Sean Timm
Similarly, if you know that you are dealing with domain names or ip addresses (or other text with discrete parts), you can reverse the order of the parts rather than at the character level making it more human readable: com.example.www Your query would then be sent as com.example.* -Sean

Re: Solr scoring: relative or absolute?

2007-08-22 Thread Sean Timm
Indexes cannot be directly compared unless they have similar collection statistics. That is the same terms occur with the same frequency across all indexes and the average document lengths are about the same (though the default similarity in Lucene may not care about average document

Re: UTF-8 encoding problem on one of two Solr setups

2007-08-17 Thread Sean Timm
This may be your problem. The below docs are for the HTTP connector, simlar configuration can be made to the AJP and other connectors See http://tomcat.apache.org/tomcat-6.0-doc/config/http.html URIEncoding This specifies the character encoding used to decode the URI bytes, after %xx

Re: Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Sean Timm
It should probably be configurable: (1) return nothing if no match, (2) substitute with an alternate field, (3) return first sentence or N number of tokens. -Sean Yonik Seeley wrote on 8/9/2007, 5:50 PM: On 8/9/07, Benjamin Higgins [EMAIL PROTECTED] wrote: Thanks Mike. I didn't think of

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

2007-05-09 Thread Sean Timm
Yes, for good (hopefully) or bad. -Sean Shridhar Venkatraman wrote on 5/7/2007, 12:37 AM: Interesting.. Surrogates can also bring the searcher's subjectivity (opinion and context) into it by the learning process ? shridhar Sean Timm wrote: It may not be easy or even possible

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

2007-05-05 Thread Sean Timm
It may not be easy or even possible without major changes, but having global collection statistics would allow scores to be compared across searchers. To do this, the master indexes would need to be able to communicate with each other. An other approach to merging across searchers is

Re: Solr logo poll

2007-04-07 Thread Sean Timm
+1 Shridhar Venkatraman wrote on 4/7/2007, 12:13 AM: B is a bit cartoony (someone said that earlier)..mainly because of the letters, yet fresh. A appears dated (an 80's look). An alternate (C?) that retains the sunflare from B but changes the letters to be more staid may add the