Hi
I have a field named threadid in my index and more than one document can
have same value for this field. The documents are considered duplicate if
they have same value for this field. In some scenarios I am required to get
only unique documents from my index. Can I achieve this through one solr
How many languages are you dealing with? How are you generating your
queries? Are you taking your source language and translating it to
each of the languages? Or are you just targeting one source to
destination language? Back when I was doing CLIR, I would create
separate indexes, but
Once in a while we get this
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4,790470]
[14:32:21.877] Message: An invalid XML character (Unicode: 0x6) was
found in the element content of the document.
[14:32:21.877] at
com
.sun
.org
.apache
.xerces
On Sat, Mar 1, 2008 at 4:22 PM, Brian Whitman [EMAIL PROTECTED] wrote:
Once in a while we get this
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4,790470]
[14:32:21.877] Message: An invalid XML character (Unicode: 0x6) was
[...]
Our data comes from all sorts of places and
Hi,
So if my facet fields (and values) are changing depending on the
query/filters I have set, it sounds like it is not currently possible to
paginate through a single, facet field's values using a total pages value?
Thanks,
Matt
On Fri, Feb 29, 2008 at 6:09 PM, Yonik Seeley [EMAIL PROTECTED]
An alternative would be for someone to give a subversion checkout number
against 1.3-dev which represents a solid working checkout.
There are a lot of people using 1.3-dev in production, could you all please
tell us what checkout number you are using?
Cheers,
Lance
-Original Message-
On 01/03/2008, at 18:35, Yonik Seeley wrote:
You could also scan for such chars on the client side before the XML
is produced.
Can't he put this code on the server before the xml parsing somehow? I
would do like you said and do it on the client, but just out of
curiosity is this really
Hi Feng,
Neither Lucene nor Solr support those operators. I believe there is or used to
be a way to specify an open begin/end for the range query, but I don't recall
the exact details at the moment.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
Note that moving to Java 6 alone will not save you. You do need ti give this
poor JVM more memory to play with when working with such a big index.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: F Knudson [EMAIL PROTECTED]
To:
You changed the positionIncrementGap to reduce index size? Hm, maybe I'm
forgetting something at the moment, but I don't think this will make any
difference with index size, just with token offsets noted at index time. Are
you referring to term index interval (normally 128 by default, I
It's really about the combination of index size, hardware, required response
rate, query rate and complexity. You typically try to benchmark this stuff to
see where the limit or where the sweet spot is for your hardware.
Unfortunately, I don't have an explanation for the sudden jump in
But note one thing here. Pulling a merely modified index (its snapshot) and
not the fully optimized index means you'll only pull the delta, while if you
fully optimize the index and then the snapshooter runs and then snappuller
runs, the *whole* index will be pulled over the network from
Vijay,
I have a field named threadid in my index and more than one document can
have same value for this field. The documents are considered duplicate if
they have same value for this field. -- that doesn't quite go together. Solr
will not allow duplicates (it will turn them into updates)
On Sat, Mar 1, 2008 at 6:47 PM, Leonardo Santagada [EMAIL PROTECTED] wrote:
On 01/03/2008, at 18:35, Yonik Seeley wrote:
You could also scan for such chars on the client side before the XML
is produced.
Can't he put this code on the server before the xml parsing somehow? I
would do
The fastest solr query I can find is any query on unused dynamic field name:
unused_dynamic_field_s:3
Is there another query style that should be faster?
See this line in http://wiki.apache.org/solr/SolrConfigXml
pingQueryq=solramp;version=2.0amp;start=0amp;rows=0/pingQuery
A better ping
I don't quite follow everything here (examples?), but I believe IDF of a term
is not a per-field value, but index-wide. Does that change the arguments for
this proposal then?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: [EMAIL PROTECTED]
On Sat, Mar 1, 2008 at 9:38 PM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
I don't quite follow everything here (examples?), but I believe IDF of a term
is not a per-field value, but index-wide.
I think Nicolas meant that idfs are field specific, and that is the
case (index-wide, per field).
: which is the behavior that I expected, irrespective of whether the field has
: one or more values.
:
: Any idea what could be going on here?
not really ... but like i said, i'm not really a highlighter guy. I
can't think of any reason why having multiple values would cause this
behavior
Yonik Seeley wrote:
On Sat, Mar 1, 2008 at 6:47 PM, Leonardo Santagada [EMAIL PROTECTED] wrote:
Can't he put this code on the server before the xml parsing somehow? I
would do like you said and do it on the client, but just out of
curiosity is this really impossible?
We'd have to
: I'm running my tests on server with 4 double-kernel CPU. I was expecting
: good improvements from multithreaded solution but I have speed 10th
: times worse. Here is how I run those threads, I think I'm doing
: something wrong, please advise:
As i said, i'm not much of a threads expert, but
On Sat, Mar 1, 2008 at 11:26 PM, Christian Wittern [EMAIL PROTECTED] wrote:
Yonik Seeley wrote:
On Sat, Mar 1, 2008 at 6:47 PM, Leonardo Santagada [EMAIL PROTECTED]
wrote:
Can't he put this code on the server before the xml parsing somehow? I
would do like you said and do it on the
21 matches
Mail list logo