Querying unique documents

2008-03-01 Thread Vijay Khurana
Hi I have a field named threadid in my index and more than one document can have same value for this field. The documents are considered duplicate if they have same value for this field. In some scenarios I am required to get only unique documents from my index. Can I achieve this through one solr

Re: Proposition of a new feature: Dynamic Field Types

2008-03-01 Thread Grant Ingersoll
How many languages are you dealing with? How are you generating your queries? Are you taking your source language and translating it to each of the languages? Or are you just targeting one source to destination language? Back when I was doing CLIR, I would create separate indexes, but

invalid XML character

2008-03-01 Thread Brian Whitman
Once in a while we get this javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4,790470] [14:32:21.877] Message: An invalid XML character (Unicode: 0x6) was found in the element content of the document. [14:32:21.877] at com .sun .org .apache .xerces

Re: invalid XML character

2008-03-01 Thread Yonik Seeley
On Sat, Mar 1, 2008 at 4:22 PM, Brian Whitman [EMAIL PROTECTED] wrote: Once in a while we get this javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4,790470] [14:32:21.877] Message: An invalid XML character (Unicode: 0x6) was [...] Our data comes from all sorts of places and

Re: Facet numFound for facet values?

2008-03-01 Thread Matt M.
Hi, So if my facet fields (and values) are changing depending on the query/filters I have set, it sounds like it is not currently possible to paginate through a single, facet field's values using a total pages value? Thanks, Matt On Fri, Feb 29, 2008 at 6:09 PM, Yonik Seeley [EMAIL PROTECTED]

RE: what's the schedule of the release of solr 1.3?

2008-03-01 Thread Lance Norskog
An alternative would be for someone to give a subversion checkout number against 1.3-dev which represents a solid working checkout. There are a lot of people using 1.3-dev in production, could you all please tell us what checkout number you are using? Cheers, Lance -Original Message-

Re: invalid XML character

2008-03-01 Thread Leonardo Santagada
On 01/03/2008, at 18:35, Yonik Seeley wrote: You could also scan for such chars on the client side before the XML is produced. Can't he put this code on the server before the xml parsing somehow? I would do like you said and do it on the client, but just out of curiosity is this really

Re: about the , operation in solr

2008-03-01 Thread Otis Gospodnetic
Hi Feng, Neither Lucene nor Solr support those operators. I believe there is or used to be a way to specify an open begin/end for the range query, but I don't recall the exact details at the moment. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message

Re: Optimization taking days/weeks

2008-03-01 Thread Otis Gospodnetic
Note that moving to Java 6 alone will not save you. You do need ti give this poor JVM more memory to play with when working with such a big index. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: F Knudson [EMAIL PROTECTED] To:

Re: Optimization taking days/weeks

2008-03-01 Thread Otis Gospodnetic
You changed the positionIncrementGap to reduce index size? Hm, maybe I'm forgetting something at the moment, but I don't think this will make any difference with index size, just with token offsets noted at index time. Are you referring to term index interval (normally 128 by default, I

Re: Optimization taking days/weeks

2008-03-01 Thread Otis Gospodnetic
It's really about the combination of index size, hardware, required response rate, query rate and complexity. You typically try to benchmark this stuff to see where the limit or where the sweet spot is for your hardware. Unfortunately, I don't have an explanation for the sudden jump in

Re: Master/Slave setup

2008-03-01 Thread Otis Gospodnetic
But note one thing here. Pulling a merely modified index (its snapshot) and not the fully optimized index means you'll only pull the delta, while if you fully optimize the index and then the snapshooter runs and then snappuller runs, the *whole* index will be pulled over the network from

Re: Querying unique documents

2008-03-01 Thread Otis Gospodnetic
Vijay, I have a field named threadid in my index and more than one document can have same value for this field. The documents are considered duplicate if they have same value for this field. -- that doesn't quite go together. Solr will not allow duplicates (it will turn them into updates)

Re: invalid XML character

2008-03-01 Thread Yonik Seeley
On Sat, Mar 1, 2008 at 6:47 PM, Leonardo Santagada [EMAIL PROTECTED] wrote: On 01/03/2008, at 18:35, Yonik Seeley wrote: You could also scan for such chars on the client side before the XML is produced. Can't he put this code on the server before the xml parsing somehow? I would do

Fastest Solr query

2008-03-01 Thread Lance Norskog
The fastest solr query I can find is any query on unused dynamic field name: unused_dynamic_field_s:3 Is there another query style that should be faster? See this line in http://wiki.apache.org/solr/SolrConfigXml pingQueryq=solramp;version=2.0amp;start=0amp;rows=0/pingQuery A better ping

Re: Proposition of a new feature: Dynamic Field Types

2008-03-01 Thread Otis Gospodnetic
I don't quite follow everything here (examples?), but I believe IDF of a term is not a per-field value, but index-wide. Does that change the arguments for this proposal then? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: [EMAIL PROTECTED]

Re: Proposition of a new feature: Dynamic Field Types

2008-03-01 Thread Yonik Seeley
On Sat, Mar 1, 2008 at 9:38 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I don't quite follow everything here (examples?), but I believe IDF of a term is not a per-field value, but index-wide. I think Nicolas meant that idfs are field specific, and that is the case (index-wide, per field).

Re: Seeing strange highlighting in multi-valued field (was: Why does highlight use the index analyzer)

2008-03-01 Thread Chris Hostetter
: which is the behavior that I expected, irrespective of whether the field has : one or more values. : : Any idea what could be going on here? not really ... but like i said, i'm not really a highlighter guy. I can't think of any reason why having multiple values would cause this behavior

Re: invalid XML character

2008-03-01 Thread Christian Wittern
Yonik Seeley wrote: On Sat, Mar 1, 2008 at 6:47 PM, Leonardo Santagada [EMAIL PROTECTED] wrote: Can't he put this code on the server before the xml parsing somehow? I would do like you said and do it on the client, but just out of curiosity is this really impossible? We'd have to

Re: Threads in Solr

2008-03-01 Thread Chris Hostetter
: I'm running my tests on server with 4 double-kernel CPU. I was expecting : good improvements from multithreaded solution but I have speed 10th : times worse. Here is how I run those threads, I think I'm doing : something wrong, please advise: As i said, i'm not much of a threads expert, but

Re: invalid XML character

2008-03-01 Thread Yonik Seeley
On Sat, Mar 1, 2008 at 11:26 PM, Christian Wittern [EMAIL PROTECTED] wrote: Yonik Seeley wrote: On Sat, Mar 1, 2008 at 6:47 PM, Leonardo Santagada [EMAIL PROTECTED] wrote: Can't he put this code on the server before the xml parsing somehow? I would do like you said and do it on the