Large Hdd-Space using during commit/optimize

2010-11-29 Thread stockii
Hello. i have ~37 Million Docs that i want to index. when i starte a full-import i importing only every 2 Million Docs, because of better controll over solr and space/heap so when i import 2 million docs and solr start the commit and the optimize my used disc-space jumps into the sky.

ArrayIndexOutOfBoundsException for query with rows=0 and sort param

2010-11-29 Thread Martin Grotzke
Hi, after an upgrade from solr-1.3 to 1.4.1 we're getting an ArrayIndexOutOfBoundsException for a query with rows=0 and a sort param specified: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.lucene.search.FieldComparator$StringOrdValComparator.copy(FieldComparator.java:660)

Re: Large Hdd-Space using during commit/optimize

2010-11-29 Thread Upayavira
On Mon, 29 Nov 2010 03:07 -0800, stockii st...@shopgate.com wrote: Hello. i have ~37 Million Docs that i want to index. when i starte a full-import i importing only every 2 Million Docs, because of better controll over solr and space/heap so when i import 2 million docs and

Re: Large Hdd-Space using during commit/optimize

2010-11-29 Thread Erick Erickson
First, don't optimize after every chunk, it's just making extra work for your system. If you're using a 3.x or trunk build, optimizing doesn't do much for you anyway, but if you must, just optimize after your entire import is done. Optimizing will pretty much copy the old index into a new set of

Re: question about Solr SignatureUpdateProcessorFactory

2010-11-29 Thread Bernd Fehling
Dear list, another suggestion about SignatureUpdateProcessorFactory. Why can I make signatures of several fields and place the result in one field but _not_ make a signature of one field and place the result in several fields. Could be realized without huge programming? Best regards, Bernd Am

Re: question about Solr SignatureUpdateProcessorFactory

2010-11-29 Thread Erick Erickson
Why do you want to do this? It'd be the same value, just stored in multiple fields in the document, which seems a waste. What's the use-case you're addressing? Best Erick On Mon, Nov 29, 2010 at 8:51 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Dear list, another suggestion about

Re: question about Solr SignatureUpdateProcessorFactory

2010-11-29 Thread Markus Jelsma
On Monday 29 November 2010 14:51:33 Bernd Fehling wrote: Dear list, another suggestion about SignatureUpdateProcessorFactory. Why can I make signatures of several fields and place the result in one field but _not_ make a signature of one field and place the result in several fields. Use

Re: question about Solr SignatureUpdateProcessorFactory

2010-11-29 Thread Bernd Fehling
Am 29.11.2010 14:55, schrieb Markus Jelsma: On Monday 29 November 2010 14:51:33 Bernd Fehling wrote: Dear list, another suggestion about SignatureUpdateProcessorFactory. Why can I make signatures of several fields and place the result in one field but _not_ make a signature of one field

Using Ngram and Phrase search

2010-11-29 Thread Jason, Kim
Hi, all I want to use both EdegeNGram analysis and phrase search. But there is some problem. On Field which is not using EdgeNGram analysis, phrase search.is good work. But if using EdgeNGram then phrase search is incorrect. Now I'm using Solr1.4.0. Result of EdgeNGram analysis for pci express

Solr Hot Backup

2010-11-29 Thread Rodolico Piero
Hi all, How can I backup indexes Solr without stopping the server? I saw the following link: http://wiki.apache.org/solr/SolrOperationsTools http://wiki.apache.org/solr/SolrOperationsTools http://wiki.apache.org/solr/CollectionDistribution but I'm afraid that running these scripts 'on

search strangeness

2010-11-29 Thread ramzesua
Hi all. I have a little question. Can anyone explain, why this solr search work so strange? :) For example, I make schema.xml: I add some fields with fieldType = text. Here 'text' properties fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index

Re: Solr Hot Backup

2010-11-29 Thread Upayavira
As I understand it, those tools are more Solr 1.3 related, but I don't see why they shouldn't work on 1.4. I would say it is very unlikely that you will corrupt an index with them. Lucene indexes are write once, that is, any one index file will never be updated, only replaced. This means that

BasicHelloRequestHandler plugin

2010-11-29 Thread Hong-Thai Nguyen
Hi, Thank for helping us. I’m creating a ‘helloword’ plugin in Solr 1.4 in BasicHelloRequestHandler.java In solrconfig.xml, I added: requestHandler name=hello class=com.polyspot.mercury.handler.BasicHelloRequestHandler !-- default values for query parameters -- lst

Preventing index segment corruption when windows crashes

2010-11-29 Thread Peter Sturge
Hi, With the advent of new windows versions, there are increasing instances of system blue-screens, crashes, freezes and ad-hoc failures. If a Solr index is running at the time of a system halt, this can often corrupt a segments file, requiring the index to be -fix'ed by rewriting the offending

Re: search strangeness

2010-11-29 Thread Erick Erickson
On a quick look with Solr 3.1, these results are puzzling. Are you sure that you are searching the field you think you are? I take it you're searching the text field, but that's controlled by your defaultSearchField entry in schema.xml. Try using the admin page, particularly the full interface

Solr DataImportHandler (DIH) and Cassandra

2010-11-29 Thread Mark
Is there anyway to use DIH to import from Cassandra? Thanks

bf for Dismax completly ignored by 'recip(ms(NOW,INDAT),3.16e-11,1,1)'

2010-11-29 Thread rall0r
Hello, I got a problem that I'm unable to solve: As mentioned in the docs, I put in a recip(ms(NOW,INDAT),3.16e-11,1,1) at the boost-Function fielf bf. That is completly ignored by the dismax Search Handler. The dismax SearchHandler is set to be the default SearchHandler. If I post a

Boost on newer documents

2010-11-29 Thread Jason Brown
Hi, I use the dismax query to search across several fields. I find I have a lot of documents with the same document name (one of the fields that the dismax queries) so I wanted to adjust the relevance so that titles with a newer published date have a higher relevance than documents with the

Re: Boost on newer documents

2010-11-29 Thread Stefan Matheis
Hi Jason, maybe, just use another field w/ creation-/modification-date and boost on this field? Regards Stefan On Mon, Nov 29, 2010 at 5:28 PM, Jason Brown jason.br...@sjp.co.uk wrote: Hi, I use the dismax query to search across several fields. I find I have a lot of documents with the

Re: Boost on newer documents

2010-11-29 Thread Mat Brown
Hi Jason, You can use boost functions in the dismax handler to do this: http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29 Mat On Mon, Nov 29, 2010 at 11:28, Jason Brown jason.br...@sjp.co.uk wrote: Hi, I use the dismax query to search across several fields. I find

RE: Boost on newer documents

2010-11-29 Thread Jason Brown
Great - Thank You. -Original Message- From: Mat Brown [mailto:m...@patch.com] Sent: Mon 29/11/2010 16:33 To: solr-user@lucene.apache.org Subject: Re: Boost on newer documents Hi Jason, You can use boost functions in the dismax handler to do this:

Re: Large Hdd-Space using during commit/optimize

2010-11-29 Thread stockii
aha okay. thx i dont know that solr copys the complete index for optimize. can i solr say, that he start an optimize, but wihtout copy ? -- View this message in context: http://lucene.472066.n3.nabble.com/Large-Hdd-Space-using-during-commit-optimize-tp1985807p1987477.html Sent from the Solr -

Re: Large Hdd-Space using during commit/optimize

2010-11-29 Thread Upayavira
On Mon, 29 Nov 2010 08:43 -0800, stockii st...@shopgate.com wrote: aha okay. thx i dont know that solr copys the complete index for optimize. can i solr say, that he start an optimize, but wihtout copy ? No. The copy is to keep an index available for searches while the optimise is

Re: Preventing index segment corruption when windows crashes

2010-11-29 Thread Yonik Seeley
On Mon, Nov 29, 2010 at 10:46 AM, Peter Sturge peter.stu...@gmail.com wrote: If a Solr index is running at the time of a system halt, this can often corrupt a segments file, requiring the index to be -fix'ed by rewriting the offending file. Really? That shouldn't be possible (if you mean the

DIH causing shutdown hook executing?

2010-11-29 Thread Phong Dais
Hi, I am in the process of trying to index about 50 mil documents using the data import handler. For some reason, about 2 days into the import, I see this message shutdown hook executing in the log and the solr web server instance exits gracefully. I do not see any errors in the entire log. This

Re: Solr Hot Backup

2010-11-29 Thread Jonathan Rochkind
In Solr 1.4, I think the replication features should be able to accomplish your goal, and will be easier to use and more robust. On 11/29/2010 10:22 AM, Upayavira wrote: As I understand it, those tools are more Solr 1.3 related, but I don't see why they shouldn't work on 1.4. I would say it

R: Solr Hot Backup

2010-11-29 Thread Rodolico Piero
Yes, I use the replication only for backup with this call: http://host:8080/solr/replication?command=backuplocation=/home/jboss/backup It's work fine but the server must be always up... it's an http call... I tried also the script 'backup' but it creates hard links and are not recommended!

Re: Spellcheck in solr-nutch integration

2010-11-29 Thread Anurag
i solved the problemAll we need to modify schema file. Also the spellcheck index is created first when spellcheck.build=true - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-in-solr-nutch-integration-tp1953232p1988252.html Sent from

Re: DIH causing shutdown hook executing?

2010-11-29 Thread Erick Erickson
You're right, the OS is asking the server to shut down. In the default example under Jetty, this is a result of issuing a crtl-c. Is it possible that something is asking your server to quit? What servlet container are you running under? Does the Solr server run for more than this period if you're

solr admin

2010-11-29 Thread Papp Richard
ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com

Re: solr admin

2010-11-29 Thread Erick Erickson
(20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com

special sorting

2010-11-29 Thread Papp Richard
parameter as boost or score ? I tried but could't realise too much. thanks, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com

Re: DIH causing shutdown hook executing?

2010-11-29 Thread Phong Dais
It is entirely possible that the server is asking solr to shutdown. I'll have to ask the admin. I'm running Solr-1.4 inside of Jetty. I definitely have enough disk space. I think I did notice solr shutting down while it was idle. I just disregarded it as a fluke... Perhaps there's something

Re: special sorting

2010-11-29 Thread Tommaso Teofili
, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com

Bad file descriptor Errors

2010-11-29 Thread John Williams
Recently, we have started to get Bad file descriptor errors in one of our Solr instances. This instance is a searcher and its index is stored on a local SSD. The master however has it's index stored on NFS, which seems to be working fine, currently. I have tried restarting tomcat and bringing

Re: DIH causing shutdown hook executing?

2010-11-29 Thread Erick Erickson
Try without autocommit or bump the limit up considerably to see if it changes the behavior. You should not be getting this kind of performance hit after the first million docs, so, it's probably worth exploring. See if you can find anything in your logs that indicates what's hogging the critical

Good example of multiple tokenizers for a single field

2010-11-29 Thread Jacob Elder
I am looking for a clear example of using more than one tokenizer for a source single field. My application has a single body field which until recently was all latin characters, but we're now encountering both English and Japanese words in a single message. Obviously, we need to be using CJK in

Termvector based result grouping / field collapsing?

2010-11-29 Thread Shawn Heisey
I was just in a meeting where we discussed customer feedback on our website. One thing that the users would like to see is galleries where photos that are part of a set are grouped together under a single result. This is basically field collapsing. The problem I've got is that for most of

Re: Good example of multiple tokenizers for a single field

2010-11-29 Thread Markus Jelsma
You can use only one tokenizer per analyzer. You'd better use separate fields + fieldTypes for different languages. I am looking for a clear example of using more than one tokenizer for a source single field. My application has a single body field which until recently was all latin

Re: Good example of multiple tokenizers for a single field

2010-11-29 Thread Jacob Elder
The problem is that the field is not guaranteed to contain just a single language. I'm looking for some way to pass it first through CJK, then Whitespace. If I'm totally off-target here, is there a recommended way of dealing with mixed-language fields? On Mon, Nov 29, 2010 at 5:22 PM, Markus

Re: Good example of multiple tokenizers for a single field

2010-11-29 Thread Robert Muir
On Mon, Nov 29, 2010 at 5:30 PM, Jacob Elder jel...@locamoda.com wrote: The problem is that the field is not guaranteed to contain just a single language. I'm looking for some way to pass it first through CJK, then Whitespace. If I'm totally off-target here, is there a recommended way of

Re: Good example of multiple tokenizers for a single field

2010-11-29 Thread Jacob Elder
StandardTokenizer doesn't handle some of the tokens we need, like @twitteruser, and as far as I can tell, doesn't handle Chinese, Japanese or Korean. Am I wrong about that? On Mon, Nov 29, 2010 at 5:31 PM, Robert Muir rcm...@gmail.com wrote: On Mon, Nov 29, 2010 at 5:30 PM, Jacob Elder

Re: Good example of multiple tokenizers for a single field

2010-11-29 Thread Robert Muir
On Mon, Nov 29, 2010 at 5:35 PM, Jacob Elder jel...@locamoda.com wrote: StandardTokenizer doesn't handle some of the tokens we need, like @twitteruser, and as far as I can tell, doesn't handle Chinese, Japanese or Korean. Am I wrong about that? it uses the unigram method for CJK ideographs...

Re: Good example of multiple tokenizers for a single field

2010-11-29 Thread Jonathan Rochkind
You can only use one tokenizer on given field, I think. But a tokenizer isn't in fact the only thing that can tokenize, an ordinary filter can change tokenization too, so you could use two filters in a row. You could also write your own custom tokenizer that does what you want, although I'm

Re: Good example of multiple tokenizers for a single field

2010-11-29 Thread Robert Muir
On Mon, Nov 29, 2010 at 5:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote: * As a tokenizer, I use the WhitespaceTokenizer. * Then I apply a custom filter that looks for CJK chars, and re-tokenizes any CJK chars into one-token-per-char. This custom filter was written by someone other than

RE: solr admin

2010-11-29 Thread Papp Richard
NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked

RE: special sorting

2010-11-29 Thread Papp Richard
NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __ The message was checked

Re: Solr DataImportHandler (DIH) and Cassandra

2010-11-29 Thread Mark
The DataSource subclass route is what I will probably be interested in. Are there are working examples of this already out there? On 11/29/10 12:32 PM, Aaron Morton wrote: AFAIK there is nothing pre-written to pull the data out for you. You should be able to create your DataSource sub class

RE: solr admin

2010-11-29 Thread Ahmet Arslan
in Solr admin (http://localhost:8180/services/admin/) I can specify something like: +category_id:200 +xxx:300 but how can I specify a sort option? sort:category_id+asc There is an [FULL INTERFACE] /admin/form.jsp link but it does not have sort option. It seems that you need to append

Re: solr admin

2010-11-29 Thread Yonik Seeley
On Mon, Nov 29, 2010 at 8:02 PM, Ahmet Arslan iori...@yahoo.com wrote: in Solr admin (http://localhost:8180/services/admin/) I can specify something like: +category_id:200 +xxx:300 but how can I specify a sort option? sort:category_id+asc There is an [FULL INTERFACE] /admin/form.jsp link

Re: Spell checking question from a Solr novice

2010-11-29 Thread Bill Dueber
On Mon, Oct 18, 2010 at 5:24 PM, Jason Blackerby jblacke...@gmail.comwrote: If you know the misspellings you could prevent them from being added to the dictionary with a StopFilterFactory like so: Or, you know, correct the data :-) -- Bill Dueber Library Systems Programmer University of

Re: question about Solr SignatureUpdateProcessorFactory

2010-11-29 Thread Chris Hostetter
: Why is also the field name (* above) added to the signature : and not only the content of the field? : : By purpose or by accident? It was definitely deliberate. This way if your signature fields are fieldA,fieldB,fieldC then these two documents... Doc1:fielda:XXX

Re: search strangeness

2010-11-29 Thread ramzesua
Hi, Erick. There is defaultSearchField in my schema.xml. Can you give me your example of configure for text field ?(What filters do you use for index and for query) -- View this message in context: http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1989466.html Sent from the Solr -

Re: Good example of multiple tokenizers for a single field

2010-11-29 Thread Shawn Heisey
On 11/29/2010 3:15 PM, Jacob Elder wrote: I am looking for a clear example of using more than one tokenizer for a source single field. My application has a single body field which until recently was all latin characters, but we're now encountering both English and Japanese words in a single