Re: correct escapes in csv-Update files

2008-01-04 Thread Michael Lackhoff
On 03.01.2008 17:16 Yonik Seeley wrote: CSV doesn't use backslash escaping. http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm This is text with a quoted string Thanks for the hint but the result is the same, that is, quoted behaves exactly like \quoted\: - both leave the single unescaped

Query Syntax (Standard handler) Question

2008-01-04 Thread s d
Is there a simpler way to write this query (I'm using the standard handler) ? field1:t1 field1:t2 field1:t1 t2 field2:t1 field2:t2 field2:t1 t2 Thanks,

Re: Query Syntax (Standard handler) Question

2008-01-04 Thread Erik Hatcher
On Jan 4, 2008, at 4:40 AM, s d wrote: Is there a simpler way to write this query (I'm using the standard handler) ? field1:t1 field1:t2 field1:t1 t2 field2:t1 field2:t2 field2:t1 t2 Looks like you'd be better off using the DisMax handler for t1 t2 (without the brackets). Erik

Best practice for storing relational data in Solr

2008-01-04 Thread steve.lillywhite
Hi all, This is a (possibly very naive) newbie question regarding Solr best practice... I run a website that displays/stores data on job applicants, together with information on where they came from (e.g. which recruiter), which office they are applying to, etc. This data is stored in a

Re: Another text I cannot get into SOLR with csv

2008-01-04 Thread Yonik Seeley
On Jan 4, 2008 10:25 AM, Michael Lackhoff [EMAIL PROTECTED] wrote: If the fields value is: 's-Gravenhage I cannot get it into SOLR with CSV. This one works for me fine. $ cat t2.csv id,name 12345,'s-Gravenhage 12345,'s-Gravenhage 12345,s-Gravenhage $ curl

Re: Duplicated Keyword

2008-01-04 Thread Robert Young
I don't quite understand what you're getting at. What is the problem you're encountering or what are you trying to achieve? Cheers Rob On Jan 4, 2008 3:26 PM, Jae Joo [EMAIL PROTECTED] wrote: Hi, Is there any way to dedup the keyword cross the document? Ex. china keyword is in doc1 and

How the star operator works

2008-01-04 Thread Leonardo Santagada
From both lucene and solr docs the star * operator used after a word should find the word plus 0 or more characters after word. I have some documents on a solr index (both in type text and string) and both don't work like that. For example I have a document called Test Document, if I

Re: Another text I cannot get into SOLR with csv

2008-01-04 Thread Ryan McKinley
Michael Lackhoff wrote: If the fields value is: 's-Gravenhage I cannot get it into SOLR with CSV. I tried to double the single quote/apostrophe or escape it in several ways but I either get an error or another character (the escape) in front of the single quote. Is it not possible to have a

Re: correct escapes in csv-Update files

2008-01-04 Thread Walter Underwood
I recommend the opencsv library for Java or the csv package for Python. Either one can write legal CSV files. There are lots of corner cases in CSV and some differences between applications, like whetehr newlines are allowed inside a quoted field. It is best to use a library for this instead of

Re: correct escapes in csv-Update files

2008-01-04 Thread Yonik Seeley
On Jan 4, 2008 4:08 AM, Michael Lackhoff [EMAIL PROTECTED] wrote: Thanks for the hint but the result is the same, that is, quoted behaves exactly like \quoted\: - both leave the single unescaped quote in the record: quoted - both have the problem with a backslash before the escaped quote:

Re: Another text I cannot get into SOLR with csv

2008-01-04 Thread Michael Lackhoff
On 04.01.2008 16:55 Yonik Seeley wrote: On Jan 4, 2008 10:25 AM, Michael Lackhoff [EMAIL PROTECTED] wrote: If the fields value is: 's-Gravenhage I cannot get it into SOLR with CSV. This one works for me fine. $ cat t2.csv id,name 12345,'s-Gravenhage 12345,'s-Gravenhage

Re: Duplicated Keyword

2008-01-04 Thread Jae Joo
title of Document 1 - This is document 1 regarding china - fieldtype = text title of Document 2 - This is document 2 regarding china fieldtype=text Once it is indexed, will index hold 2 china text fields or just 1 china word which is pointing document1 and document2? Jae On Jan 4, 2008

Re: Duplicated Keyword

2008-01-04 Thread Robert Young
You can think of it as the latter but it's quite a bit more complicated than that. For details on how lucene stores it's index check out the file formats page on lucene. http://lucene.apache.org/java/docs/fileformats.html Cheers Rob On Jan 4, 2008 4:59 PM, Jae Joo [EMAIL PROTECTED] wrote:

Re: Backup of a Solr index

2008-01-04 Thread Jörg Kiegeland
A postCommit hook (configured in solrconfig.xml) is called in a safe place for every commit. You could have a program as a hook that normally did nothing unless you had previously signaled to make a copy of the index. Then I will give the postCommit trigger a try and hope that while the

Re: Another text I cannot get into SOLR with csv

2008-01-04 Thread Yonik Seeley
On Jan 4, 2008 11:18 AM, Michael Lackhoff [EMAIL PROTECTED] wrote: On 04.01.2008 16:55 Yonik Seeley wrote: On Jan 4, 2008 10:25 AM, Michael Lackhoff [EMAIL PROTECTED] wrote: If the fields value is: 's-Gravenhage I cannot get it into SOLR with CSV. This one works for me fine. $

Re: SolrJ Javadoc?

2008-01-04 Thread Ryan McKinley
run: ant javadoc-solrj and that will build them... Yes, they should be built into the nightly distribution... Matthew Runo wrote: Hello! I've seen some SVN commits and heard some rumblings of SolrJ javadoc - but can't seem to find any. Is there any yet? I know that SolrJ is still pretty

Re: solr with hadoop

2008-01-04 Thread Mike Klaas
On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote: I have huge index base (about 110 millions documents, 100 fields each). But size of the index base is reasonable, it's about 70 Gb. All I need is increase performance, since some queries, which match big number of documents, are running

Re: solr with hadoop

2008-01-04 Thread Ryan McKinley
Mike Klaas wrote: On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote: I have huge index base (about 110 millions documents, 100 fields each). But size of the index base is reasonable, it's about 70 Gb. All I need is increase performance, since some queries, which match big number of documents,

Re: Query Syntax (Standard handler) Question

2008-01-04 Thread Mike Klaas
On 4-Jan-08, at 1:12 PM, s d wrote: but i want to sum the scores and not use max, can i still do it with the DisMax? am i missing anything ? If you set tie=1.0, dismax functions like dissum. -Mike

parsedquery_ToString

2008-01-04 Thread anuvenk
Is the parsedquery_ToString, the one passed to solr after all the tokenizing and analyzing of the query? For the search term 'chapter 7' i have this parsedquery_ToString str name=parsedquery_toString +(text:(bankruptci chap 7) (7 chapter chap) 7 bankruptci^0.8 | ((name:bankruptci

Re: Query Syntax (Standard handler) Question

2008-01-04 Thread Mike Klaas
It is the fraction of the score non-max terms that get added to the solr. Hence, 1.0=sum everythign. -Mike On 4-Jan-08, at 3:28 PM, anuvenk wrote: Could you elaborate on what the tie param does? I did read the definition in the solr wiki but still not crystal clear. Mike Klaas wrote:

spellcheckhandler

2008-01-04 Thread anuvenk
Is it possible to implement something like this with the spellcheckhandler Like how google does,.. say i search for 'chater 13 bakrupcy', should be able to display these.. did you search for 'chapter 13 bankruptcy' Has someone been able to do this? -- View this message in context:

solr results debugging

2008-01-04 Thread anuvenk
I've been using the solr admin form with debug=true to do some in-depth analysis on some results. Could someone explain how to make sense of this..This is the debugging info for the first result i got. 10.201284 = (MATCH) sum of: 6.2467875 = (MATCH) max plus 0.01 times others of: 6.236769

solr word delimiter

2008-01-04 Thread anuvenk
I have the word delimiter filter factory in the text field definition both at index and query time. But it does have some negative effects on some search terms like h1-b visa It splits this in to three tokens h,1,b. Now if i understand right, does solr look for matches for 'h' separately, '1'