Re: How fast is Solr insert or am i doing something wrong?

2007-01-29 Thread Erik Hatcher
On Jan 29, 2007, at 7:08 PM, Yonik Seeley wrote: On 1/29/07, Antonio Eggberg <[EMAIL PROTECTED]> wrote: Is it a good practice to do after every insert .. is this what is taking the time.. are there any general rule of thumb. Definitely don't do a commit after every insert. Do a single one

Re: SV: Re: How fast is Solr insert or am i doing something wrong?

2007-01-29 Thread Erik Hatcher
Wow, I'm in awe of the uptake of solrb already! Answers now being provided before I even get a chance to chime in. And we haven't even published a gem yet (though I did get it building successfully on a nightly build server, and will get the gems published sometime soon). I've indexed 50k

Re: OR filtering...

2007-01-29 Thread Erik Hatcher
On Jan 29, 2007, at 7:26 PM, Yonik Seeley wrote: On 1/29/07, escher2k <[EMAIL PROTECTED]> wrote: I have a question about the syntax for doing an OR filter in my URL. How do I specify where ((fq=colA[10 TO 20]) AND (fq=state:USA OR fq=country:USA) ? Basically, I am doing a search for a k

Re: SOLR-116

2007-01-29 Thread Erik Hatcher
On Jan 29, 2007, at 8:49 PM, Antonio Eggberg wrote: After doing quite a bit of searching what I understand is that the medicine to my problem of word count is in docTermFreq and TermEnum ... as Chris Hostetter points out clearly for statistical purpose in the post below. (Please note I am n

Re: INTERNET ARCHIVE goes SOLR!

2007-01-29 Thread Erik Hatcher
On Jan 29, 2007, at 11:01 PM, Chris Hostetter wrote: if there are cases where DisMax isn't the right choice for raw user input ... i'm not aware of them, but i'd love to hear about them :) Ok, ok, ok... I'm a self-admitted dismax avoider thus far. I'll remedy that by building in dismax ca

Re: INTERNET ARCHIVE goes SOLR!

2007-01-29 Thread Erik Hatcher
On Jan 29, 2007, at 10:46 PM, Ryan McKinley wrote: Your argument is a good one, and I buy it. However, I've never had a case where a user typing "multiple words" where the expectation was for OR, it is always AND. But there are many cases where the expectation is to to get the best results

Re: SOLR-116

2007-01-29 Thread Chris Hostetter
I'm a little back logged on mail or i would have replied to your word count email earlier... one thing to keep in mind is that the index doesn't deal in "words" it deals in "terms" -- the differnece being a term has "field" and a "token" -- what was discussed in the mail archives leading up to th

Re: INTERNET ARCHIVE goes SOLR!

2007-01-29 Thread Yonik Seeley
On 1/29/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: there is no prefix operator for "OR" so if the default is "AND" there is no way at request time to indicate that some clauses should be optional without reverting to the ugly and missleading binary operator syntax ... Perhaps that's somethi

Re: Querying international characters

2007-01-29 Thread Chris Hostetter
: I have a mirror of the entire dmoz content in a solr index. International : characters seem to be loaded and returned in queries just fine but queries : that _contain_ international character queries return no results for known : matching patterns. : : Is there a filter class I need to be using

Re: INTERNET ARCHIVE goes SOLR!

2007-01-29 Thread Chris Hostetter
: > case where a user typing "multiple words" where the expectation was : > for OR, it is always AND. if the input you are passing in comes straight fram a user -- and that user doesn't understand the Lucene query syntax -- i'd argue that StandardRequestHandler is the wrong choice, and you should

Re: INTERNET ARCHIVE goes SOLR!

2007-01-29 Thread Ryan McKinley
Your argument is a good one, and I buy it. However, I've never had a case where a user typing "multiple words" where the expectation was for OR, it is always AND. But there are many cases where the expectation is to to get the best results possible. With AND you get zero results even when the

Re: INTERNET ARCHIVE goes SOLR!

2007-01-29 Thread Erik Hatcher
On Jan 29, 2007, at 6:15 PM, Chris Hostetter wrote: : > We override defaultOperator of "OR" to "AND". : : We really ought to make AND the default anyway. No, no, no, no, No.. :) Your argument is a good one, and I buy it. However, I've never had a case where a user typing

Re: add CJKTokenizer to solr

2007-01-29 Thread James liu
he now is ok. -- regards jl

Re: add CJKTokenizer to solr

2007-01-29 Thread Erik Hatcher
hoss++ On Jan 29, 2007, at 3:43 PM, Chris Hostetter wrote: : >I realized that solr do not have the CJK package ,but how can I : > add it : > in? : : You need to add the analyzers JAR from Lucene's contrib area to your : Solr application, under WEB-INF/lib. You can get that JAR from the :

SOLR-116

2007-01-29 Thread Antonio Eggberg
Hi: After doing quite a bit of searching what I understand is that the medicine to my problem of word count is in docTermFreq and TermEnum ... as Chris Hostetter points out clearly for statistical purpose in the post below. (Please note I am not so familer with java) http://www.mail-archive.c

Re: example .java to start jetty with solr

2007-01-29 Thread Chris Hostetter
: program. I can't use java -jar start.jar because it spawns a new : process, I need to find the actual java code to set it up. I've tried : setting up the Jetty Server() and doing the addWebApplication() thing : but while Jetty starts, it does not seem to find all the support : files for Solr. t

SV: Re: How fast is Solr insert or am i doing something wrong?

2007-01-29 Thread Antonio Eggberg
Thanks Coda and Yonik! for the prompt answer.. I will give Solr-121 a try.. Cool Cheers Coda Hale <[EMAIL PROTECTED]> skrev: SOLR-121 just got applied to the Solrb library, which allows Solr::Connection#add to accept arrays of documents: connection.add([doc1, doc2, doc3]) Which means you

Re: How fast is Solr insert or am i doing something wrong?

2007-01-29 Thread Coda Hale
SOLR-121 just got applied to the Solrb library, which allows Solr::Connection#add to accept arrays of documents: connection.add([doc1, doc2, doc3]) Which means you can do something like this: connection.add(records.map { |r| make_solr_doc(r) }) Posting more than a single document in a reques

Re: SV: Querying international characters

2007-01-29 Thread Scott Leonard
i'm actually using resin here. On 1/29/07 3:33 PM, "Antonio Eggberg" <[EMAIL PROTECTED]> wrote: > Hi : > > If you haven't done so.. I think you need to enable UTF-8 support in your > tomcat/jetty etc.. for quries from web browsers.. have a look > > http://wiki.apache.org/tomcat/Tomcat/UTF-8 >

Re: OR filtering...

2007-01-29 Thread Mike Klaas
On 1/29/07, escher2k <[EMAIL PROTECTED]> wrote: Hi, I have a question about the syntax for doing an OR filter in my URL. How do I specify where ((fq=colA[10 TO 20]) AND (fq=state:USA OR fq=country:USA) ? Basically, I am doing a search for a keyword across certain fields and I want to filter th

Re: OR filtering...

2007-01-29 Thread Yonik Seeley
On 1/29/07, escher2k <[EMAIL PROTECTED]> wrote: I have a question about the syntax for doing an OR filter in my URL. How do I specify where ((fq=colA[10 TO 20]) AND (fq=state:USA OR fq=country:USA) ? Basically, I am doing a search for a keyword across certain fields and I want to filter the res

OR filtering...

2007-01-29 Thread escher2k
Hi, I have a question about the syntax for doing an OR filter in my URL. How do I specify where ((fq=colA[10 TO 20]) AND (fq=state:USA OR fq=country:USA) ? Basically, I am doing a search for a keyword across certain fields and I want to filter the result set. The user can input city/state/count

Re: How fast is Solr insert or am i doing something wrong?

2007-01-29 Thread Yonik Seeley
On 1/29/07, Antonio Eggberg <[EMAIL PROTECTED]> wrote: Is it a good practice to do after every insert .. is this what is taking the time.. are there any general rule of thumb. Definitely don't do a commit after every insert. Do a single one at the end. -Yonik

SV: Querying international characters

2007-01-29 Thread Antonio Eggberg
Hi : If you haven't done so.. I think you need to enable UTF-8 support in your tomcat/jetty etc.. for quries from web browsers.. have a look http://wiki.apache.org/tomcat/Tomcat/UTF-8 Regards Scott Leonard <[EMAIL PROTECTED]> skrev: I have a mirror of the entire dmoz content in a solr index.

How fast is Solr insert or am i doing something wrong?

2007-01-29 Thread Antonio Eggberg
Hi: Just want to know if this the norm or is it my configuration. I created simple file with 10 000 records, 4 field per record these are id, title, desc, link. First I use the Solrb i.e. ruby gem library to perform insert acording to instructions and it took me about an hour and still counti

Re: INTERNET ARCHIVE goes SOLR!

2007-01-29 Thread Chris Hostetter
congrats on the successfull roll-out Tracey, : We don't use DisMax and (as of now) do not use faceting. : And finally, the hardest part to convert to Solr. : I had to write a PHP front-end custom converter to take our query strings, : parse the clauses and lucene syntax into pieces, and

Re: INTERNET ARCHIVE goes SOLR!

2007-01-29 Thread Chris Hostetter
: > We override defaultOperator of "OR" to "AND". : : We really ought to make AND the default anyway. No, no, no, no, No.. there is no prefix operator for "OR" so if the default is "AND" there is no way at request time to indicate that some clauses should be optional without rev

Querying international characters

2007-01-29 Thread Scott Leonard
I have a mirror of the entire dmoz content in a solr index. International characters seem to be loaded and returned in queries just fine but queries that _contain_ international character queries return no results for known matching patterns. Is there a filter class I need to be using for internat

Re: add CJKTokenizer to solr

2007-01-29 Thread Chris Hostetter
: >I realized that solr do not have the CJK package ,but how can I : > add it : > in? : : You need to add the analyzers JAR from Lucene's contrib area to your : Solr application, under WEB-INF/lib. You can get that JAR from the : latest Lucene release distribution. it's acctually eazier then

Re: How to Index Word, Excel, PDF files?

2007-01-29 Thread Bertrand Delacretaz
On 1/29/07, Leandro Saad <[EMAIL PROTECTED]> wrote: ...I'd like to know if solr can index Word, Excel and PDF files or I must create a xml representation of those files matching my schema?... Currently you must create the XML yourself outside of Solr. This might change, see https://issues.apac

How to Index Word, Excel, PDF files?

2007-01-29 Thread Leandro Saad
Hi all. I'm new to solr and I regret I didn't use this tool before. I'd like to know if solr can index Word, Excel and PDF files or I must create a xml representation of those files matching my schema? Cheers. -- Leandro Rodrigo Saad Cruz software developer - certified scrum master :: scrum.com.

Re: add CJKTokenizer to solr

2007-01-29 Thread Erik Hatcher
On Jan 29, 2007, at 1:08 AM, zha jimmy wrote: hi, all I am try to config solr to support chinese tokenize。 I saw the tips in schema.xml: Then I modified schema.xml positionIncrementGap="100"> class="org.apache.lucene.analysis.cjk.CJKTokenizer "/>