Hello.
i have ~37 Million Docs that i want to index.
when i starte a full-import i importing only every 2 Million Docs, because
of better controll over solr and space/heap
so when i import 2 million docs and solr start the commit and the optimize
my used disc-space jumps into the sky.
Hi,
after an upgrade from solr-1.3 to 1.4.1 we're getting an
ArrayIndexOutOfBoundsException for a query with rows=0 and a sort
param specified:
java.lang.ArrayIndexOutOfBoundsException: 0
at
org.apache.lucene.search.FieldComparator$StringOrdValComparator.copy(FieldComparator.java:660)
On Mon, 29 Nov 2010 03:07 -0800, stockii st...@shopgate.com wrote:
Hello.
i have ~37 Million Docs that i want to index.
when i starte a full-import i importing only every 2 Million Docs,
because
of better controll over solr and space/heap
so when i import 2 million docs and
First, don't optimize after every chunk, it's just making extra work for
your system.
If you're using a 3.x or trunk build, optimizing doesn't do much for you
anyway, but
if you must, just optimize after your entire import is done.
Optimizing will pretty much copy the old index into a new set of
Dear list,
another suggestion about SignatureUpdateProcessorFactory.
Why can I make signatures of several fields and place the
result in one field but _not_ make a signature of one field
and place the result in several fields.
Could be realized without huge programming?
Best regards,
Bernd
Am
Why do you want to do this? It'd be the same value, just stored in
multiple fields in the document, which seems a waste. What's
the use-case you're addressing?
Best
Erick
On Mon, Nov 29, 2010 at 8:51 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
Dear list,
another suggestion about
On Monday 29 November 2010 14:51:33 Bernd Fehling wrote:
Dear list,
another suggestion about SignatureUpdateProcessorFactory.
Why can I make signatures of several fields and place the
result in one field but _not_ make a signature of one field
and place the result in several fields.
Use
Am 29.11.2010 14:55, schrieb Markus Jelsma:
On Monday 29 November 2010 14:51:33 Bernd Fehling wrote:
Dear list,
another suggestion about SignatureUpdateProcessorFactory.
Why can I make signatures of several fields and place the
result in one field but _not_ make a signature of one field
Hi, all
I want to use both EdegeNGram analysis and phrase search.
But there is some problem.
On Field which is not using EdgeNGram analysis, phrase search.is good work.
But if using EdgeNGram then phrase search is incorrect.
Now I'm using Solr1.4.0.
Result of EdgeNGram analysis for pci express
Hi all,
How can I backup indexes Solr without stopping the server?
I saw the following link:
http://wiki.apache.org/solr/SolrOperationsTools
http://wiki.apache.org/solr/SolrOperationsTools
http://wiki.apache.org/solr/CollectionDistribution
but I'm afraid that running these scripts 'on
Hi all. I have a little question. Can anyone explain, why this solr search
work so strange? :)
For example, I make schema.xml:
I add some fields with fieldType = text. Here 'text' properties
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
As I understand it, those tools are more Solr 1.3 related, but I don't
see why they shouldn't work on 1.4.
I would say it is very unlikely that you will corrupt an index with
them.
Lucene indexes are write once, that is, any one index file will never
be updated, only replaced. This means that
Hi,
Thank for helping us.
I’m creating a ‘helloword’ plugin in Solr 1.4 in BasicHelloRequestHandler.java
In solrconfig.xml, I added:
requestHandler name=hello
class=com.polyspot.mercury.handler.BasicHelloRequestHandler
!-- default values for query parameters --
lst
Hi,
With the advent of new windows versions, there are increasing
instances of system blue-screens, crashes, freezes and ad-hoc
failures.
If a Solr index is running at the time of a system halt, this can
often corrupt a segments file, requiring the index to be -fix'ed by
rewriting the offending
On a quick look with Solr 3.1, these results are puzzling. Are you
sure that you are searching the field you think you are? I take it you're
searching the text field, but that's controlled by your
defaultSearchField
entry in schema.xml.
Try using the admin page, particularly the full interface
Is there anyway to use DIH to import from Cassandra? Thanks
Hello,
I got a problem that I'm unable to solve: As mentioned in the docs, I put in
a recip(ms(NOW,INDAT),3.16e-11,1,1) at the boost-Function fielf bf.
That is completly ignored by the dismax Search Handler.
The dismax SearchHandler is set to be the default SearchHandler.
If I post a
Hi,
I use the dismax query to search across several fields.
I find I have a lot of documents with the same document name (one of the fields
that the dismax queries) so I wanted to adjust the relevance so that titles
with a newer published date have a higher relevance than documents with the
Hi Jason,
maybe, just use another field w/ creation-/modification-date and boost on
this field?
Regards
Stefan
On Mon, Nov 29, 2010 at 5:28 PM, Jason Brown jason.br...@sjp.co.uk wrote:
Hi,
I use the dismax query to search across several fields.
I find I have a lot of documents with the
Hi Jason,
You can use boost functions in the dismax handler to do this:
http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29
Mat
On Mon, Nov 29, 2010 at 11:28, Jason Brown jason.br...@sjp.co.uk wrote:
Hi,
I use the dismax query to search across several fields.
I find
Great - Thank You.
-Original Message-
From: Mat Brown [mailto:m...@patch.com]
Sent: Mon 29/11/2010 16:33
To: solr-user@lucene.apache.org
Subject: Re: Boost on newer documents
Hi Jason,
You can use boost functions in the dismax handler to do this:
aha okay. thx
i dont know that solr copys the complete index for optimize. can i solr say,
that he start an optimize, but wihtout copy ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1987477.html
Sent from the Solr -
On Mon, 29 Nov 2010 08:43 -0800, stockii st...@shopgate.com wrote:
aha okay. thx
i dont know that solr copys the complete index for optimize. can i solr
say,
that he start an optimize, but wihtout copy ?
No.
The copy is to keep an index available for searches while the optimise
is
On Mon, Nov 29, 2010 at 10:46 AM, Peter Sturge peter.stu...@gmail.com wrote:
If a Solr index is running at the time of a system halt, this can
often corrupt a segments file, requiring the index to be -fix'ed by
rewriting the offending file.
Really? That shouldn't be possible (if you mean the
Hi,
I am in the process of trying to index about 50 mil documents using the data
import handler.
For some reason, about 2 days into the import, I see this message shutdown
hook executing in the log and the solr web server instance exits
gracefully.
I do not see any errors in the entire log. This
In Solr 1.4, I think the replication features should be able to
accomplish your goal, and will be easier to use and more robust.
On 11/29/2010 10:22 AM, Upayavira wrote:
As I understand it, those tools are more Solr 1.3 related, but I don't
see why they shouldn't work on 1.4.
I would say it
Yes, I use the replication only for backup with this call:
http://host:8080/solr/replication?command=backuplocation=/home/jboss/backup
It's work fine but the server must be always up... it's an http call...
I tried also the script 'backup' but it creates hard links and are not
recommended!
i solved the problemAll we need to modify schema file.
Also the spellcheck index is created first when spellcheck.build=true
-
Kumar Anurag
--
View this message in context:
http://lucene.472066.n3.nabble.com/Spellcheck-in-solr-nutch-integration-tp1953232p1988252.html
Sent from
You're right, the OS is asking the server to shut down. In the default
example under Jetty, this is a result of issuing a crtl-c. Is it possible
that something is asking your server to quit? What servlet container
are you running under? Does the Solr server run for more than this
period if you're
ESET NOD32 Antivirus, version of virus signature
database 5659 (20101129) __
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
(20101129) __
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
parameter as boost or score ? I tried but
could't realise too much.
thanks,
Rich
__ Information from ESET NOD32 Antivirus, version of virus signature
database 5659 (20101129) __
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
It is entirely possible that the server is asking solr to shutdown. I'll
have to ask the admin.
I'm running Solr-1.4 inside of Jetty. I definitely have enough disk space.
I think I did notice solr shutting down while it was idle. I just
disregarded it as a fluke... Perhaps there's something
,
Rich
__ Information from ESET NOD32 Antivirus, version of virus
signature
database 5659 (20101129) __
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
Recently, we have started to get Bad file descriptor errors in one of our
Solr instances. This instance is a searcher and its index is stored on a local
SSD. The master however has it's index stored on NFS, which seems to be working
fine, currently.
I have tried restarting tomcat and bringing
Try without autocommit or bump the limit up considerably to see
if it changes the behavior. You should not be getting
this kind of performance hit after the first million docs, so, it's
probably worth exploring.
See if you can find anything in your logs that indicates what's
hogging the critical
I am looking for a clear example of using more than one tokenizer for a
source single field. My application has a single body field which until
recently was all latin characters, but we're now encountering both English
and Japanese words in a single message. Obviously, we need to be using CJK
in
I was just in a meeting where we discussed customer feedback on our
website. One thing that the users would like to see is galleries
where photos that are part of a set are grouped together under a single
result. This is basically field collapsing.
The problem I've got is that for most of
You can use only one tokenizer per analyzer. You'd better use separate fields +
fieldTypes for different languages.
I am looking for a clear example of using more than one tokenizer for a
source single field. My application has a single body field which until
recently was all latin
The problem is that the field is not guaranteed to contain just a single
language. I'm looking for some way to pass it first through CJK, then
Whitespace.
If I'm totally off-target here, is there a recommended way of dealing with
mixed-language fields?
On Mon, Nov 29, 2010 at 5:22 PM, Markus
On Mon, Nov 29, 2010 at 5:30 PM, Jacob Elder jel...@locamoda.com wrote:
The problem is that the field is not guaranteed to contain just a single
language. I'm looking for some way to pass it first through CJK, then
Whitespace.
If I'm totally off-target here, is there a recommended way of
StandardTokenizer doesn't handle some of the tokens we need, like
@twitteruser, and as far as I can tell, doesn't handle Chinese, Japanese or
Korean. Am I wrong about that?
On Mon, Nov 29, 2010 at 5:31 PM, Robert Muir rcm...@gmail.com wrote:
On Mon, Nov 29, 2010 at 5:30 PM, Jacob Elder
On Mon, Nov 29, 2010 at 5:35 PM, Jacob Elder jel...@locamoda.com wrote:
StandardTokenizer doesn't handle some of the tokens we need, like
@twitteruser, and as far as I can tell, doesn't handle Chinese, Japanese or
Korean. Am I wrong about that?
it uses the unigram method for CJK ideographs...
You can only use one tokenizer on given field, I think. But a tokenizer
isn't in fact the only thing that can tokenize, an ordinary filter can
change tokenization too, so you could use two filters in a row.
You could also write your own custom tokenizer that does what you want,
although I'm
On Mon, Nov 29, 2010 at 5:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
* As a tokenizer, I use the WhitespaceTokenizer.
* Then I apply a custom filter that looks for CJK chars, and re-tokenizes
any CJK chars into one-token-per-char. This custom filter was written by
someone other than
NOD32 Antivirus, version of virus
signature
database 5659 (20101129) __
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
__ Information from ESET NOD32 Antivirus, version of virus signature
database 5659 (20101129) __
The message was checked
NOD32 Antivirus, version of virus
signature
database 5659 (20101129) __
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
__ Information from ESET NOD32 Antivirus, version of virus signature
database 5659 (20101129) __
The message was checked
The DataSource subclass route is what I will probably be interested in.
Are there are working examples of this already out there?
On 11/29/10 12:32 PM, Aaron Morton wrote:
AFAIK there is nothing pre-written to pull the data out for you.
You should be able to create your DataSource sub class
in Solr admin (http://localhost:8180/services/admin/)
I can specify something like:
+category_id:200 +xxx:300
but how can I specify a sort option?
sort:category_id+asc
There is an [FULL INTERFACE] /admin/form.jsp link but it does not have sort
option. It seems that you need to append
On Mon, Nov 29, 2010 at 8:02 PM, Ahmet Arslan iori...@yahoo.com wrote:
in Solr admin (http://localhost:8180/services/admin/)
I can specify something like:
+category_id:200 +xxx:300
but how can I specify a sort option?
sort:category_id+asc
There is an [FULL INTERFACE] /admin/form.jsp link
On Mon, Oct 18, 2010 at 5:24 PM, Jason Blackerby jblacke...@gmail.comwrote:
If you know the misspellings you could prevent them from being added to the
dictionary with a StopFilterFactory like so:
Or, you know, correct the data :-)
--
Bill Dueber
Library Systems Programmer
University of
: Why is also the field name (* above) added to the signature
: and not only the content of the field?
:
: By purpose or by accident?
It was definitely deliberate. This way if your signature fields are
fieldA,fieldB,fieldC then these two documents...
Doc1:fielda:XXX
Hi, Erick. There is defaultSearchField in my schema.xml. Can you give me your
example of configure for text field ?(What filters do you use for index and
for query)
--
View this message in context:
http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1989466.html
Sent from the Solr -
On 11/29/2010 3:15 PM, Jacob Elder wrote:
I am looking for a clear example of using more than one tokenizer for a
source single field. My application has a single body field which until
recently was all latin characters, but we're now encountering both English
and Japanese words in a single
54 matches
Mail list logo