Text search NGram

2016-03-07 Thread G, Rajesh
Hi Team, We have the blow type and we have indexed the value "title": "Microsoft Visual Studio 2006" and "title": "Microsoft Visual Studio 8.0.61205.56 (2005)" When I search for title:(Microsoft Visual AND Studio AND 2005) I get Microsoft Visual Studio 8.0.61205.56 (2005) as the second

Re: Spatial Search on Postal Code

2016-03-07 Thread Manohar Sripada
Thanks Again Emir! I will try this way. Thanks David! It looks like building of polygons at index time is better option than at query time. Thanks, Manohar On Sat, Mar 5, 2016 at 7:54 PM, david.w.smi...@gmail.com < david.w.smi...@gmail.com> wrote: > Another path to consider is doing this

Re: Text search NGram

2016-03-07 Thread Binoy Dalal
What query parser are you using? Additionally, run the same query with =true and see how your results are being scored to find out why the ms vs 2006 shows up before 2005. On Mon, 7 Mar 2016, 16:14 G, Rajesh, wrote: > Hi Team, > > We have the blow type and we have indexed

Re: [Migration Solr4 to Solr5] Collection reload error

2016-03-07 Thread Gerald Reinhart
Hi, To give you some context, we are migrating from Solr4 and solr5, the client code and the configuration haven't changed but now we are facing this problem. We have already checked the commit behaviour configuration and it seems good. Here it is : Server side, we have 2 collections

Solr crash

2016-03-07 Thread Mugeesh Husain
Hello everyone, Could you suggest me in case of so crashe? I am writing a script for so crash, in which logic, I should be implementing -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-crash-tp4262090.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud sharding strategy

2016-03-07 Thread shamik
Thanks Eric and Walter, this is extremely insightful. One last followup question on composite routing. I'm trying to have a better understanding of index distribution. If I use language as a prefix, SolrCloud guarantees that same language content will be routed to the same shard. What I'm curious

Multiple custom Similarity implementations

2016-03-07 Thread Parvesh Garg
Hi, We have a requirement where we want to run an A/B test over multiple Similarity implementations. Is it possible to define multiple similarity tags in schema.xml file and chose one using the URL parameter? We are using solr 4.7 Currently, we are planning to have different cores with different

Re: Solr Cloud sharding strategy

2016-03-07 Thread Erick Erickson
What do you mean "the rest of the cluster"? The routing is based on the key provided. All of the "enu" prefixes will go to one of your shards. All the "deu" docs will appear on one shard. All the "esp" will be on one shard. All the "chs" docs will be on one shard. Which shard will each go to?

Re: High Cpu sys usage

2016-03-07 Thread Shawn Heisey
On 3/7/2016 2:23 AM, Toke Eskildsen wrote: > How does this relate to YouPeng reporting that the CPU usage increases? > > This is not a snark. YouPeng mentions kernel issues. It might very well > be that IO is the real problem, but that it manifests in a non-intuitive > way. Before memory-mapping

Currency range Filter taking long time

2016-03-07 Thread stephanustedy
Hi. I have issues about currency filter. 1. Currency my field is price_c which is can be USD or local currency. I'm doing filter price_c:[0 TO 500]. but sometimes my query takes more than 2 seconds. 2. Warm up I Have master - slave replication with 1 master and 2 slave with same spec. everytime

ngrams with position

2016-03-07 Thread elisabeth benoit
Hello, I'm using solr 4.10.1. I'd like to index words with ngrams of fix lenght with a position in the end. For instance, with fix lenght 3, Amsterdam would be something like: a0 (two spaces added at beginning) am1 ams2 mst3 ste4 ter5 erd6 rda7 dam8 am9 (one more space in the end) The number

UIMA processing issues with atomic updates

2016-03-07 Thread srinivasarao vundavalli
Hi, I have Solr 5.5.0 configured with UIMA and Tika. I am facing issues when I am doing atomic updates for the documents already indexed. 3 true false

RE: Text search NGram

2016-03-07 Thread G, Rajesh
Hi Binoy, It is Standard Query Parser Thanks Rajesh Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. This e-mail and/or its attachments are

Re: Text search NGram

2016-03-07 Thread Emir Arnautovic
Hi Rajesh, It is most likely related to norms - you can try setting omitNorms="true" and reindexing content. Anyway, it is not common to use just ngrams for matching content - in such case you can expect more unexpected ordering/results. You should combine ngrams fields with normally

timeAllowed behavior

2016-03-07 Thread Anatoli Matuskova
Hey there, I'm a bit lots with timeAllowed lately. I'm not using solr cloud and have a monolitic index. I have the Solr version 4.5.1 in production. Now I'm testing Solr 5 and timeAllowed seems to behave different. In 4.5, when it was hit, it used to return the partial results it could collect.

Re: Text search NGram

2016-03-07 Thread Jack Krupansky
Absolutely, but so what? Nothing in any Solr query is going to be based on character position. Also, adding and removing characters in a char filter is a really bad idea if you might want to do highlighting since the token character position would not line up with the original source text. --

RE: Text search NGram

2016-03-07 Thread G, Rajesh
Hi Emir, I got it. Thanks Emir it was helpful Thanks Rajesh Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. This e-mail and/or its attachments

Re: Text search NGram

2016-03-07 Thread Jack Krupansky
The charFilter isn't doing anything useful - the white space tokenzier will ignore extra white space anyway. -- Jack Krupansky On Mon, Mar 7, 2016 at 5:44 AM, G, Rajesh wrote: > Hi Team, > > We have the blow type and we have indexed the value "title": "Microsoft > Visual

RE: Text search NGram

2016-03-07 Thread G, Rajesh
Hi Emir, Thanks for you email. Can you please help me to understand what do you mean by "e.g. boost if matching tokenized fileds to make sure exact matches are ordered first" Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th

Re: Text search NGram

2016-03-07 Thread Emir Arnautovic
Hi Rajesh, Solution includes 2 fields - one "ngram" field (like your txt_token) and other "nonngram" field - just tokenized (like your txt_token without ngram token filter). If you have two documents: 1. ABCDEF 2. ABCD And you are searching for ABCD, if you use only ngram field, both are

Re: Text search NGram

2016-03-07 Thread Emir Arnautovic
Not sure I understood question. What I meant is you to try setting omitNorms="false" to your txt_token field type if you want to stick with ngram only solution:

RE: Text search NGram

2016-03-07 Thread G, Rajesh
Hi Jack, Please correct me if iam wrong I added Char filter because In Analyzer[solr ui] I have provided "Microsoft office" in Field Value (Index) now WhitespaceTokenizerFactory produces the below result Office starts at 10. if I leave additional space say 2 more spaces Office starts at

RE: Text search NGram

2016-03-07 Thread G, Rajesh
Hi Emir, I have already applied and then I have applied . Is this what you wanted me to have in my config? Thanks Rajesh Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City,

Re: Stopping Solr JVM on OOM

2016-03-07 Thread Muhammad Zahid Iqbal
You can use ping functionality by setting time-out that suits for your container/web-apps. If its not working then you can restart your container. Cheers! If any other solution I am interested too. On Fri, Feb 26, 2016 at 2:19 AM, CP Mishra wrote: > Solr & Lucene dev folks

Re: Custom field using PatternCaptureGroupFilterFactory

2016-03-07 Thread Jay Potharaju
Thanks Jack, the problem was my regex. Following regex worked. Jay On Sun, Mar 6, 2016 at 7:43 PM, Jack Krupansky wrote: > The filter name, "Capture Group", says it all - only pattern groups are > captured and you have not specified even a single group. See the

Re: Stopping Solr JVM on OOM

2016-03-07 Thread Shawn Heisey
On 2/25/2016 2:06 PM, Fuad Efendi wrote: > The best practice: do not ever try to catch Throwable or its descendants > Error, VirtualMachineError, OutOfMemoryError, and etc. > > Never ever. > > Also, do not swallow InterruptedException in a loop. > > Few simple rules to avoid hanging application.

Re: Solr Deserialize/Read .fdt file

2016-03-07 Thread Bin Wang
Hi Jack, Thanks a lot for your response. I agree the question is too much into Lucene which is outside the scope of Solr. However, for those of you who is interested in understanding the Solr Index more, here are a few resources to help: (1) Luke : a local

Re: Solrcloud Batch Indexing

2016-03-07 Thread Bin Wang
Hi Eric, Thanks for your quick response. >From the data's perspective, we have 300+ million rows and believe it or not, the source data is from relational database (Hive) and the database is rebuilt every day (I am as frustrated as most of you who read this but it is what it is) and potentially

Re: Solrcloud Batch Indexing

2016-03-07 Thread Erick Erickson
I'm wondering if you need map reduce at all ;)... The achilles heel with M/R viz: Solr is all the copying around that's done at the end of the cycle. For really large bulk indexing jobs, that's a reasonable price to pay.. How many docs and how would you characterize them as far as size, fields,

Re: Custom field using PatternCaptureGroupFilterFactory

2016-03-07 Thread Jack Krupansky
Great. And you shouldn't need the "{1}" - the square brackets match a single character by definition. -- Jack Krupansky On Mon, Mar 7, 2016 at 12:20 PM, Jay Potharaju wrote: > Thanks Jack, the problem was my regex. Following regex worked. > "([a-zA-Z0-9]{1})"

Solrcloud Batch Indexing

2016-03-07 Thread Bin Wang
Hi there, I have a fairly big data set that I need to quick index into Solrcloud. I have done some research and none of them looked really good to me. (1) Kite Morphline: I managed to get it working, the mapreduce finished in a few minutes which is good, however, it took a really long time,

Re: Solr Cloud sharding strategy

2016-03-07 Thread Erick Erickson
20M docs is actually a very small collection by the "usual" Solr standards unless they're _really_ large documents, i.e. large books. Actually, I wouldn't even shard to begin with, it's unlikely that it's necessary and it adds inevitable overhead. If you _must_ shard, just go with <1>, but again

Solr Json API How to escape space in search string

2016-03-07 Thread Iana Bondarska
Hi All, could you please tell me if escaping special characters in search keywords works in json api. e.g. I have document { "string_s":"new value" } And I want to query "string_s" field with keyword "new value". In path params api I can escape spaces in keyword as well as other special

Warning and Error messages in Solr's log

2016-03-07 Thread Steven White
Hi folks, In Solr's solr-8983-console.log I see the following (about 50 in a span of 24 hours when index is on going): WARNING: Couldn't flush user prefs: java.util.prefs.BackingStoreException: Couldn't get file lock. What does it mean? Should I wary about it? What about this one:

Solr Cloud sharding strategy

2016-03-07 Thread Shamik Bandopadhyay
Hi, I'm trying to figure the best way to design/allocate shards for our Solr Cloud environment.Our current index has around 20 million documents, in 10 languages. Around 25-30% of the content is in English. Rest are almost equally distributed among the remaining 13 languages. Till now, we had

Re: Solrcloud Batch Indexing

2016-03-07 Thread Erick Erickson
Bin: The MRIT/Morphlines only makes sense if you have lots more nodes devoted to the M/R jobs than you do Solr shards since the actual work done to index a given doc is exactly the same either with MRIT/Morphlines or just sending straight to Solr. A bit of background here. I mentioned that

Re: High Cpu sys usage

2016-03-07 Thread Toke Eskildsen
On Sun, 2016-03-06 at 08:26 -0700, Shawn Heisey wrote: > On 3/5/2016 11:44 PM, YouPeng Yang wrote: > > We are using Solr Cloud 4.6 in our production for searching service > > since 2 years ago.And now it has 700GB in one cluster which is comprised > > of 3 machines with ssd. At beginning

Re: Solr Cloud sharding strategy

2016-03-07 Thread Walter Underwood
Excellent advice, and I’d like to reinforce a few things. * Solr indexing is CPU intensive and generates lots of disk IO. Faster CPUs and faster disks matter a lot. * Realistic user query logs are super important. We measure 95th percentile latency and that is dominated by rare and malformed

Re: Solr Json API How to escape space in search string

2016-03-07 Thread Yonik Seeley
On Mon, Mar 7, 2016 at 5:49 PM, Iana Bondarska wrote: > Hi All, > could you please tell me if escaping special characters in search keywords > works in json api. > e.g. I have document > { > "string_s":"new value" > } > And I want to query "string_s" field with keyword "new

Re: Solr Json API How to escape space in search string

2016-03-07 Thread Jack Krupansky
Backslash in JSON just tells JSON to escape the next character, while what you really want is to pass a backslash through to the Solr query parser, which you can do with a double backslash. Alternatively, you could use quotes around the string in Solr, which would require you to escape the quotes

How can I monitor the jetty thread pool

2016-03-07 Thread Yago Riveiro
Hi, How can I monitor the jetty thread pool? I want to do a zabbix graph with this info but the JMX doesn't show any entry for this. - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-monitor-the-jetty-thread-pool-tp4262298.html Sent from the

Re: Solr Cloud sharding strategy

2016-03-07 Thread shamik
Thanks a lot, Erick. You are right, it's a tad small with around 20 million documents, but the growth projection around 50 million in next 6-8 months. It'll continue to grow, but maybe not at the same rate. From the index size point of view, the size can grow up to half a TB from its current

Re: Solr Cloud sharding strategy

2016-03-07 Thread Erick Erickson
Still, 50M is not excessive for a single shard although it's getting into the range that I'd like proof that my hardware etc. is adequate before committing to it. I've seen up to 300M docs on a single machine, admittedly they were tweets. YMMV based on hardware and index complexity of course.