Re: What's wrong

2015-06-25 Thread Test Test
Hi, You're right.I've put my field in text_general type and i found the problem. For example, i have this like a index : london, , ,capital, , ,populous,city, ,united,kingdom When i mock like this :clauses[0] = new SpanTermQuery(new Term(details, populous));clauses[1] = new SpanTermQuery(new

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Erick Erickson
bq: Try not to store fields as much as possible. Why? Storing fields certainly adds lots of size to the _disk_ files, but have much less effect on memory requirements than one might think. The *.fdt and *.fdx files in your index are used for the stored data, and they're only read for the top N

Re: fq versus q

2015-06-25 Thread Esther Goldbraich
Thank you all for collaborative thinking! Ran additional benchmarks as proposed. Some results: All solr caches are enabled (queryResultCache hit ratio = 0.02): q fq {!cache=false} delta original query 28 295 267 w/o grouping 58 325 267 w/o sort on date 28 293 265 All solr caches are disabled

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread wwang525
schema.xml http://lucene.472066.n3.nabble.com/file/n4213864/schema.xml solrconfig.xml http://lucene.472066.n3.nabble.com/file/n4213864/solrconfig.xml -- View this message in context:

Re: fq versus q

2015-06-25 Thread Erick Erickson
Side note on dates and fqs. If you're using NOW in your date expressions you may be able to re-use fqs by using date math, see: https://lucidworks.com/blog/date-math-now-and-filter-queries/ Of course this may not be applicable in your situation... FWIW, Erick On Thu, Jun 25, 2015 at 8:03 AM,

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Wenbin Wang
Hi Erick, The configuration is largely the default one, and I have not made much change. I am also quite new to Solr although I have a lot of experience in other search products. The whole list of fields need to be retrieved, so I do not have much of a choice. The total size of the index files

Re: fq versus q

2015-06-25 Thread Shai Erera
The tables came across corrupt, here they are (times in ms): Caches enabled: q fq delta original query28295267 w/o grouping 58325267 w/o sort on date 28293265 Caches disabled: q fq delta original query

RE: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Markus Jelsma
You may also want to try Paoding if you have enough time to spend: https://github.com/cslinmiso/paoding-analysis -Original message- From:Zheng Lin Edwin Yeo edwinye...@gmail.com Sent: Thursday 25th June 2015 11:38 To: solr-user@lucene.apache.org Subject: Re: Tokenizer and Filter

Re: /suggest through SolrJ?

2015-06-25 Thread Alessandro Benedetti
I have provided a Patch, actually i am not sure about the contribution process, i will read the documentation. Can anybody give me a feedback ? https://issues.apache.org/jira/browse/SOLR-7719 Cheers 2015-06-24 14:31 GMT+01:00 Alessandro Benedetti benedetti.ale...@gmail.com :

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Erick Erickson
You're missing the point. One of the things that can really affect response time is too-frequent commits. The fact that the commit configurations have been commented out indicate that the commits are happening either manually (curl, HTTP request or the like) _or_ you have, say, a SolrJ client that

solr 4.8.1 to 4.10.4 upgrade / luceneMatchVersion

2015-06-25 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello. We would like to upgrade from solr 4.8.1 to 4.10.4 and for existing collections (at least initially) continue to use the 4.8 lucene format rather than the latest 4.10 format. Two main reasons for the preference: (a) no need to worry about a not-yet-upgraded 4.8 replica recovering from

Re: solr 4.8.1 to 4.10.4 upgrade / luceneMatchVersion

2015-06-25 Thread Shawn Heisey
On 6/25/2015 9:08 AM, Christine Poerschke (BLOOMBERG/ LONDON) wrote: We would like to upgrade from solr 4.8.1 to 4.10.4 and for existing collections (at least initially) continue to use the 4.8 lucene format rather than the latest 4.10 format. Two main reasons for the preference: (a) no

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Shawn Heisey
On 6/25/2015 10:27 AM, Wenbin Wang wrote: To clarify the work: We are very early in the investigative phase, and the indexing is NOT done continuously. I indexed the data once through Admin UI, and test the query. If I need to index again, I can use curl or through the Admin UI. The Solr

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Wenbin Wang
To clarify the work: We are very early in the investigative phase, and the indexing is NOT done continuously. I indexed the data once through Admin UI, and test the query. If I need to index again, I can use curl or through the Admin UI. The Solr 4.7 seems to have a default setting of

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread Wenbin Wang
Hi Guys, I have no problem changing it to 2. However, we are talking about two different applications. The Solr 4.7 has two applications: example and example-DIH. The application example-DIH is the one I started with since it works with database. The example-DIH has the default setting to 4.

Exact phrase search on very large text

2015-06-25 Thread Mike Thomsen
I need to be able to do exact phrase searching on some documents that are a few hundred kb when treated as a single block of text. I'm on 4.10.4 and it complains when I try to put something larger than 32kb in using a textfield with the keyword tokenizer as the tokenizer. Is there any way I can

Re: [ANNOUNCE] Book: Apache Solr Enterprise Search Server, Third Edition

2015-06-25 Thread david.w.smi...@gmail.com
If you order on Packt’s site then you can save a lot with these discount codes — good thru July 30th: 30% on Print books: ASESP30 20% on Ebooks: ASESE20 https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition ~ David On Mon, Jun 22, 2015

Re: fq versus q

2015-06-25 Thread Esther Goldbraich
Cache=false - cause the use-case requires distinct time ranges, no reuse. When using fq: q is set to *:*. Are there any alternatives for the grouping algorithm? If not, is there a way to reuse filter results between 2 passes? Thank you, Esther From: Yonik Seeley ysee...@gmail.com To:

Re: How to do a Data sharding for data in a database table

2015-06-25 Thread William Bell
1GB is too small to start. Try starting the same on both: -Xms8196m -Xmx8196m We use 12GB for these on a similar sized index and it works good. Send schema.xml and solrconfig.xml. Try not to store fields as much as possible. On Wed, Jun 24, 2015 at 8:08 AM, wwang525 wwang...@gmail.com wrote:

Re: DIH deletes cause opening of searchers

2015-06-25 Thread Mikhail Khludnev
On Tue, Jun 23, 2015 at 9:23 AM, Rudolf Grigeľ grige...@gmail.com wrote: How can I prevent opening new searcher after every delete statement ? comment updateLog tag in solrconfig.xml (it always help) -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics

Re: Sorting documents by nested / child docs with FunctionQueries

2015-06-25 Thread Mikhail Khludnev
no way. it's SOLR-6096 aka SOLR-6700 On Thu, Jun 25, 2015 at 9:16 AM, מאיה גלעד maiki...@gmail.com wrote: Hey Your example works on my cloud but my problem didn't resolve. I'be checked and found the following : 1. When a child is created with multivalues it can be queried correctly with

Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Zheng Lin Edwin Yeo
Hi, Does anyone knows what is the correct replacement for these 2 tokenizer and filter factory to index chinese into Solr? - SmartChineseSentenceTokenizerFactory - SmartChineseWordTokenFilterFactory I understand that these 2 tokenizer and filter factory are already deprecated in Solr 5.1, but I

Re: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Zheng Lin Edwin Yeo
Thank you. I've tried that, but when I do a search, it's returning much more highlighted results that what it supposed to. For example, if I enter the following query: http://localhost:8983/solr/chinese1/highlight?q=我国 I get the following results: highlighting:{ chinese1:{

Re: MappingCharFilterFactory and start and end offsets

2015-06-25 Thread Dmitry Kan
Hi Steve, Sorry for a late reply, been quite busy. I have had afterthoughts immediately after sending the question, in line with what you said: I meant the source token start and end offset positions. When MCFF is removed, the $ disappears after ST and start and end offsets of all the terms are

RE: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Markus Jelsma
Hi - we are actually using some other filters for Chinese, although they are not specialized for Chinese: tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CJKWidthFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter

RE: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Markus Jelsma
Hello - you can use HMMChineseTokenizerFactory instead. http://lucene.apache.org/core/5_2_0/analyzers-smartcn/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizerFactory.html -Original message- From:Zheng Lin Edwin Yeo edwinye...@gmail.com Sent: Thursday 25th June 2015 11:02 To:

Re: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Zheng Lin Edwin Yeo
Hi, The result doesn't seems that good as well. But you're not using the HMMChineseTokenizerFactory? The output below is from the filters you've shown me. highlighting:{ chinese1:{ id:[chinese1], title:[em我国/em1em月份的制造业产值同比仅增长/em0],