Re: How much free disk space will I need to optimize my index

2014-06-26 Thread Thomas Egense
That is correct, but twice the disk space is theoretically not enough. Worst case is actually three times the storage, I guess this worst case can happen if you also submit new documents to the index while optimizing. I have experienced 2.5 times the disk space during an optimize for a large

Two solr instances access common index

2014-06-26 Thread Prasi S
Hi, Is it possible to point two solr instances to point to a common index directory. Will this work wit changing the lock type? Thanks, Prasi

Stopwords

2014-06-26 Thread Geert Van Huychem
Hello We have the default dutch stopwords implemented in our Solr instance, so words like 'de', 'het', 'ben' are filtered at index time. Is there a way to trick Solr into ignoring those stopwords at query time, when users puts the search terms between quotes? Best Geert

Re: Stopwords

2014-06-26 Thread Shuai Zhang
Hi, In fact, you can use analysis page to check the result of query or index process!   -- Gabriel Zhang On Thursday, June 26, 2014 5:33 PM, Geert Van Huychem ge...@iframeworx.be wrote: Hello   We have the default dutch stopwords implemented in our Solr instance, so words like ‘de’,

Re: Stopwords

2014-06-26 Thread David Stuart
Hi, Not really as the words don’t exist in the corpus field. They way we have got around it in the past is to have another non stopped field that is also searched on (in addition to the the stopped field) with a boost to the score for matches. As an slight alternative you could do the above

Re: Two solr instances access common index

2014-06-26 Thread Uwe Reh
Hi, with the lock type 'simple' I have tree instances (different JREs, GC-Problem) running on the same files. You should use this option only for a readonly system. Otherwise it's easy to corrupt the index. Maybe you should have a look on replication or SolrCloud. Uwe Am 26.06.2014 11:25,

Solr custom Tokenizer Factory works randomly

2014-06-26 Thread Gotz SE
I am new in Solr and I have to do a filter to lemmatize text to index documents and also to lemmatize querys. I created a custom Tokenizer Factory for lemmatized text before passing it to the Standard Tokenizer. Making tests in Solr analysis section works fairly good (on index ok, but on

Re: Two solr instances access common index

2014-06-26 Thread Prasi S
Can you please tell me whihc solr version you have tried with? I tried giving lockType${solr.lock.type:none}/lockType in 2 solr instances and now it is working. I am not getting the write lock exception when starting the second instance. But my scenario is that both solr instances would write to

Re: TokenFilter not working at index time

2014-06-26 Thread Erlend Garåsen
I found the root of the problem. This is very strange, but I guess someone can explain to me why this happens. Take a look at the static block in my factory: http://folk.uio.no/erlendfg/solr/NorwegianLemmatizerFilterFactory.java static { ... } If I remove this block and return a stemmed

Solr Fields Multilingue

2014-06-26 Thread benjelloun
Hello, I have 5000 French, Arabic, English documents. my shema.xml contain 300fields for French Documents. exemple: field name=ContenuDocument type=text_fr multiValued=false indexed=true required=false stored=true/ so what i need to do is detect language of the document before indexing then i

RE: ICUTokenizer or StandardTokenizer or ??? for text_all type field that might include non-whitespace langs

2014-06-26 Thread Allison, Timothy B.
Thank you, Alex, Kuro and Simon. I've had a chance to look into this a bit more. I was under the (wrong) belief that the ICUTokenizer splits on individual Chinese characters like the StandardAnalyzer after (mis)reading these two sources

Re: Spellchecker causing 500 (ISE)

2014-06-26 Thread Meraj A. Khan
Can you share your configuration with us ? have you modified the Solr source code in anyway? On Thu, Jun 26, 2014 at 1:06 AM, Aman Tandon amantandon...@gmail.com wrote: Hi, We are getting the results for the query but the spellchecker component is returning 500. Please help us out.

Re: Solr Fields Multilingue

2014-06-26 Thread Aman Tandon
Hi, I guess this link https://wiki.apache.org/solr/LanguageDetection may help you On Jun 26, 2014 6:12 PM, benjelloun anass@gmail.com wrote: Hello, I have 5000 French, Arabic, English documents. my shema.xml contain 300fields for French Documents. exemple: field name=ContenuDocument

how to log ngroups

2014-06-26 Thread Aman Tandon
Hi, I am grouping in my results and also applying the group limit. Is there is any way to log the ngroups as well along with hits.

Re: How much free disk space will I need to optimize my index

2014-06-26 Thread Walter Underwood
The 3x worst case is: 1. All documents are in one segment. 2. Without merging, all documents are deleted, then re-added and committed. 3. A merge is done. At the end of step 2, there are two equal-sized segments, 2X the space needed. During step 3, a third segment of that size is created. This

Re: How much free disk space will I need to optimize my index

2014-06-26 Thread johnmunir
Thank you all for the reply and shedding more light on this topic. A follow up question: during optimization, If I run out of disk space, what happens other than the optimizer failing? Am I now left with even a larger index than I started with or am I back to the original none optimized index

Re: ICUTokenizer or StandardTokenizer or ??? for text_all type field that might include non-whitespace langs

2014-06-26 Thread Shawn Heisey
On 6/26/2014 7:27 AM, Allison, Timothy B. wrote: So, I'm left with this as a candidate for the text_all field (I'll probably add a stop filter, too): fieldType name=text_all class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.ICUTokenizerFactory/

Re: DIH on Solr

2014-06-26 Thread Wolfgang Hoschek
Try this: http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/Cloudera-Search-User-Guide.html Wolfgang. On Jun 24, 2014, at 11:14 PM, atp annamalai...@hcl.com wrote: Thanks Ahmet, Walfgang , i have installed hbase-indexer on one the server

Re: Two solr instances access common index

2014-06-26 Thread Erick Erickson
bq: But my scenario is that both solr instances would write to the common directory Do NOT do this. Don't even try. I guarantee Bad Things Will Happen. Why do you want to do this? To save disk space? Accomplish NRT searching on multiple machines? Please define the problem you're trying to solve

Nested doc / Block Join Incorrect Responses

2014-06-26 Thread Elliot Ickovic
Using Solr 4.8.1. I am creating an index containing Solr documents both with and without nested documents. When Indexing documents from a single SolrJ client on a single thread if I do not call commit() after each document add() I see some erroneous documents returned from my child of or parent

Re: Nested doc / Block Join Incorrect Responses

2014-06-26 Thread Mikhail Khludnev
Hello Elliot, Parent doc is mandatory, you can't omit it. Thus instead of: add() - single001 you have to add() - fakeparent000 : [single001] there was no plans to support any sort of flexibility there... On Thu, Jun 26, 2014 at 9:52 PM, Elliot Ickovic elliot.icko...@gmail.com wrote: Using

Re: Spellchecker causing 500 (ISE)

2014-06-26 Thread Chris Hostetter
: We are getting the results for the query but the spellchecker component is : returning 500. Please help us out. : : *query*: http://localhostt:8111/solr/srch/select?q=malerkotlaqt=search what version of solr? what does your solrconfig.xml show for /select the spellcheck config? what does

Re: Nested doc / Block Join Incorrect Responses

2014-06-26 Thread Elliot Ickovic
Hi Mikhail, Thank you for the quick response! If I instead of: add() - fakeparent000 : [single001] I do : add() - single000 : [fakeChild001] will this prevent the index from appearing corrupted? This way I can retain my logical top level docs. What is the reason I need add a fake doc? If

Re: Two solr instances access common index

2014-06-26 Thread Jack Krupansky
Erick, I agree, but... wouldn't it be SO COOL if it did work! Avoid all the ridiculous complexity of cloud. Have a temporary lock to permit and exclude updates. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Thursday, June 26, 2014 12:37 PM To:

Re: Paging while indexes

2014-06-26 Thread Shalin Shekhar Mangar
You can use the Cursor based paging API added in 4.7 which is much more resilient to index updates. See the section titled How cursors are Affected by Index Updates at https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results On Mon, Jun 23, 2014 at 2:08 PM, Bram Van Dam

Re: Two solr instances access common index

2014-06-26 Thread Walter Underwood
Cool? More like generally useless. --wunder On Jun 26, 2014, at 12:44 PM, Jack Krupansky j...@basetechnology.com wrote: Erick, I agree, but... wouldn't it be SO COOL if it did work! Avoid all the ridiculous complexity of cloud. Have a temporary lock to permit and exclude updates. --

Re: Paging while indexes

2014-06-26 Thread Shalin Shekhar Mangar
There's also a new searcher lease feature which might land in Solr in future. https://issues.apache.org/jira/browse/SOLR-2809 On Fri, Jun 27, 2014 at 1:18 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: You can use the Cursor based paging API added in 4.7 which is much more

Getting Solr 4 to index the simple names of files

2014-06-26 Thread jrusnak
I am building an enterprise search Engine with Solr 4.8.1 (and the AJAX solr interface, not relevant to this question though) - in doing so, I am attempting to display the file names of each indexed document in my GUI search results. In my gui, I can successfully display any field that is in

Re: Nested doc / Block Join Incorrect Responses

2014-06-26 Thread Elliot Ickovic
Tried the following: add() - fakeparent000 : [single001] //with new 'doc-type:fakeparent' add() - parent001 : [child001_1, child001_2] commit() Then query: {!child of='doc-type:parent'}doc-type:parent response now contains *fakeparent000*, *single001*, child001_1, child001_2 should only

Re: Search results not as expected.

2014-06-26 Thread Chris Hostetter
: *ab:(system entity) OR ab:authorization* : Number of results returned 2 : which is not expected. : It seems this query makes the previous terms as OR if the next term is : introduced by an OR. in general, that's they way the boolean operators like AND/OR work in all of the various parser

Re: group.ngroups is set to an incorrect value - specific field types

2014-06-26 Thread Chris Hostetter
I think you are correct -- deinitely looks like a bug to me... https://issues.apache.org/jira/browse/LUCENE-5790 : Date: Fri, 13 Jun 2014 10:45:12 + : From: 海老澤 志信 shinobu_ebis...@waku-2.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org

Re: Two solr instances access common index

2014-06-26 Thread Erick Erickson
bq: Avoid all the ridiculous complexity of cloud And then re-introduce a single point of failure. Bad disk == unfortunate consequences But frankly I don't see why you would ever _need_ to write from two Solr instances. Wouldn't simply having one writer (which you could change when you

Re: Getting Solr 4 to index the simple names of files

2014-06-26 Thread Erick Erickson
Let's back up here. I'm guessing (since you didn't say) that you're using ExtractingRequestHandler here? How are you sending docs to Solr? You can always use literal.filename=whatever. Best, Erick On Thu, Jun 26, 2014 at 2:02 PM, jrusnak jrus...@live.unc.edu wrote: I am building an enterprise

numFound is changing when start parameter changed

2014-06-26 Thread CONAN
hi I use solr4.4 , 2 shards and 2 replicas and I found a problem on solrCloud search. If I perform a query with start=0 and rows=10 and say fq=ownerId:123 , I get numFound=225. If I simply change the start param to start=6, I get numFound=223. and i change the start param to start=10 , i get

Re: Adding router.field property to an existing collection.

2014-06-26 Thread Modassar Ather
Thanks Damien for your response. We have modified our Solr schema a little bit to add router.field. Regards, Modassar On Thu, Jun 26, 2014 at 1:23 AM, Damien Dykman damien.dyk...@gmail.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Modassar, I ran into the same issue (Solr