Is term~ effect available as a eDisMax param or a TokenFilter?

2014-07-02 Thread Alexandre Rafalovitch
Hello, I am trying to match the names. In UI, I can do it by doing name~ or name~2, but I can't expect users to do that and I don't want to do pre-tokenization in the middleware to inject that. Also, only specific fields are names, people can also enter phone numbers, which I don't want to fuzz

Re: Integrating solr with Hadoop

2014-07-02 Thread gurunath
Thanks Eric, I will watch out for Map reduce option. It will be helpfull if I get any links to set up hadoop with solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715p4145157.html Sent from the Solr - User mailing list archive at

RE: NPE when using facets with the MLT handler.

2014-07-02 Thread Markus Jelsma
Hi, i don't think this is ever going to work with the MLT Handler, you should use the regular SearchHandler instead. -Original message- From:SafeJava T t...@safejava.com Sent: Monday 30th June 2014 17:52 To: solr-user@lucene.apache.org Subject: NPE when using facets with the MLT

RE: Memory Leaks in solr 4.8.1

2014-07-02 Thread Markus Jelsma
Hi, you can safely ignore this, it is shutting down anyway. Just don't reload the app a lot of times without actually restarting Tomcat. -Original message- From:Aman Tandon amantandon...@gmail.com Sent: Wednesday 2nd July 2014 7:22 To: solr-user@lucene.apache.org Subject: Memory

Re: Understanding fieldNorm differences between 3.6.1 and 4.9 solrs

2014-07-02 Thread Aaron Daubman
Wow - so apparently I have terrible recall and should re-read this thread I started on the same topic when upgrading from 1.4 to 3.6 and hit a very similar fieldNorm issue almost two years ago! =)

Re: How to integrate nlp in solr

2014-07-02 Thread parnab kumar
Aman, I feel focusing on Question-Answering and Information Extraction components of NLP should help you achieve what you are looking for. Go through this book *Taming Text * (http://www.manning.com/ingersoll/ ) . Most of your queries should be answered including details on

OCR - Saving multi-term position

2014-07-02 Thread Manuel Le Normand
Hello, Many of our indexed documents are scanned and OCR'ed documents. Unfortunately we were not able to improve much the OCR quality (less than 80% word accuracy) for various reasons, a fact which badly hurts the retrieval quality. As we use an open-source OCR, we think of changing every scanned

RE: Endeca to Solr Migration

2014-07-02 Thread Dyer, James
We migrated a big application from Endeca (6.0, I think) a several years ago. We were not using any of the business UI tools, but we found that Solr is a lot more flexible and performant than Endeca. But with more flexibility comes more you need to know. The hardest thing was to migrate the

Re: OCR - Saving multi-term position

2014-07-02 Thread Michael Della Bitta
I don't have first hand knowledge of how you implement that, but I bet a look at the WordDelimiterFilter would help you understand how to emit multiple terms with the same positions pretty easily. I've heard of this bag of word variants approach to indexing poor-quality OCR output before for

Customise score

2014-07-02 Thread rachun
Dear all, Could anybody suggest me how to customize the score? So, I have data like this .. {ID : '0001', Title :'MacBookPro',Price: 400,Base_score:'121.2'} {ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2'} {ID : '0003', Title :'Laptop',Price: 300,Base_score:'155.7'} Notice that I

Re: Customise score

2014-07-02 Thread Gora Mohanty
On 2 July 2014 20:32, rachun rachun.c...@gmail.com wrote: Dear all, Could anybody suggest me how to customize the score? So, I have data like this .. {ID : '0001', Title :'MacBookPro',Price: 400,Base_score:'121.2'} {ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2'} {ID : '0003',

Re: Clubbing queries with different criterias together?

2014-07-02 Thread lalitjangra
Thanks Ahmet, I tried with multiple combinations finally got it using full query as nested query. Is it fine to use full query inside nested query with filters _query_ as below.

Re: Customise score

2014-07-02 Thread rachun
Gora, firstly I would like thank you for your quick response. .../select?q=MacBooksort=SUM(base_score, score)+descwt=jsonindent=true I tried that but it didn't work and I got this error message error:{ msg:Can't determine a Sort Order (asc or desc) in sort spec 'SUM(base_score, score)

Re: OCR - Saving multi-term position

2014-07-02 Thread Erick Erickson
Problem here is that you wind up with a zillion unique terms in your index, which may lead to performance issues, but you probably already know that :). I've seen situations where running it through a dictionary helps. That is, does each term in the OCR match some dictionary? Problem here is that

Re: OCR - Saving multi-term position

2014-07-02 Thread Manuel Le Normand
Thanks for your answers Erick and Michael. The term confidence level is an OCR output metric which tells for every word what are the odds it's the actual scanned term. I wish the OCR prog to output all the suspected words that sum up to above ~90% of confidence it is the actual term instead of

Re: Does Solr move documents between shards when the value of the shard key is updated ?

2014-07-02 Thread IJ
So - we do end up with two copies / versions of the same document (uniqueid) - one in each of the two shards - Is this a BUG or a FEATURE in Solr ? Have a follow up question - In case one were to attempt to delete the document -lets say usng the CloudSolrServer - deleteById() API - would that

Re: Migration from Autonomy IDOL to SOLR

2014-07-02 Thread wrdrvr
I know that this is an old thread, but I wanted to pass on some additional information in blatant self promotion. We've just completed an IDOL to Solr migration for our e commerce site with approximately 40 Million items and anywhere between 200,000 to 300,000 searches per day. I am documenting

Re: Slow QTimes - 5 seconds for Small sized Collections

2014-07-02 Thread IJ
This issue was finally resolved. Adding an explicit Host - IP address mapping on /etc/host file seemed to do the trick. The one strange thing is - before the host file entry was made - we were unable to simulate the 5 second delay from the linux shell by performing a simple nslookup host name. In

Re: Customise score

2014-07-02 Thread Ahmet Arslan
Hi, Why did you use upper case? What happens when you use : sort=sum(... On Wednesday, July 2, 2014 6:23 PM, rachun rachun.c...@gmail.com wrote: Gora, firstly I would like thank you for your quick response. .../select?q=MacBooksort=SUM(base_score, score)+descwt=jsonindent=true I tried

Re: Customise score

2014-07-02 Thread Jack Krupansky
I think the white space after the comma is the culprit. No white space is allowed in function queries that are embedded, such as in the sort parameter. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Wednesday, July 2, 2014 2:19 PM To: solr-user@lucene.apache.org

Re: Migration from Autonomy IDOL to SOLR

2014-07-02 Thread Jack Krupansky
Thanks for posting this. -- Jack Krupansky -Original Message- From: wrdrvr Sent: Wednesday, July 2, 2014 1:47 PM To: solr-user@lucene.apache.org Subject: Re: Migration from Autonomy IDOL to SOLR I know that this is an old thread, but I wanted to pass on some additional information in

Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system

2014-07-02 Thread Tom Chen
Hi, When we run Solr Map Reduce Indexer Tool ( https://github.com/markrmiller/solr-map-reduce-example), it generates indexes on HDFS The last stage is Go Live to merge the generated index to live SolrCloud index. If the live SolrCloud write index to local file system (rather than HDFS), the Go

Re: OCR - Saving multi-term position

2014-07-02 Thread Jack Krupansky
Take a look at the synonym filter as well. I mean, basically that's exactly what you are doing - adding synonyms at each position. -- Jack Krupansky -Original Message- From: Manuel Le Normand Sent: Wednesday, July 2, 2014 12:57 PM To: solr-user@lucene.apache.org Subject: Re: OCR -

Re: Customise score

2014-07-02 Thread rachun
Hi Ahmet, I also tried this .../select?q=MacBooksort=sum(base_score, score)+descwt=jsonindent=true I got the same error error:{ msg:Can't determine a Sort Order (asc or desc) in sort spec 'sum(base_score, score) desc', pos=15, code:400}} Best regards, Chun -- View this message in

Re: Customise score

2014-07-02 Thread rachun
Hi Jack, I tried as you suggest .../select?q=MacBooksort=sum(base_score,score)+descwt=jsonindent=true but it didn't work and I got this error message error:{ msg:sort param could not be parsed as a query, and is not a field that exists in the index: sum(base_score,score), code:400}}

Re: Customise score

2014-07-02 Thread Jack Krupansky
You probably don't have a field named score. That said, the Solr error message is not very useful at all! If you want to reference the document score, I don't think there is a direct way to do it, but you can indirectly by using the query function:

Re: OCR - Saving multi-term position

2014-07-02 Thread Koji Sekiguchi
Hi Manuel, I think OCR error correction is one of well-known NLP tasks. I'd thought it could be implemented in the past by using Lucene. This is a brief idea: 1. You have got a Lucene index. This existing index is made from correct (i.e. error free) documents that are same domain of OCR

Re: Does Solr move documents between shards when the value of the shard key is updated ?

2014-07-02 Thread Erick Erickson
bq: Is this a BUG or a FEATURE in Solr How about just the way it works? You've changed the route key with the same unique key, taking control of the routing. When you change that routing, how is Solr to know where the _old_ document lived? It would have to, say, query the entire cluster for any

Re: Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system

2014-07-02 Thread Erick Erickson
How would the MapReduceIndexerTool (MRIT for short) find the local disk to write from HDFS to for each shard? All it has is the information in the Solr configs, which are usually relative paths on the local Solr machines, relative to SOLR_HOME. Which could be different on each node (that would be

Re: CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true

2014-07-02 Thread Umesh Prasad
Created the jira .. https://issues.apache.org/jira/browse/SOLR-6222 On 30 June 2014 23:53, Joel Bernstein joels...@gmail.com wrote: Sure, go ahead create the ticket. I think there is more we can here as well. I suspect we can get the CollapsingQParserPlugin to work with

RE: Memory Leaks in solr 4.8.1

2014-07-02 Thread Aman Tandon
We reload at interval of 6/7 days and restart may be in 15/18 days if the response becomes too slow On Jul 2, 2014 7:09 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, you can safely ignore this, it is shutting down anyway. Just don't reload the app a lot of times without actually

Re: How to integrate nlp in solr

2014-07-02 Thread Aman Tandon
Thanks pranab, I am unfamiliar with payloads, can you provide some info about payload and how they are helpful in nlp On Jul 2, 2014 7:41 PM, parnab kumar parnab.2...@gmail.com wrote: Aman, I feel focusing on Question-Answering and Information Extraction components of NLP should help

Re: Streaming large updates with SolrJ

2014-07-02 Thread Chris Hostetter
: Now that I think about it, though, is there a way to use the Update Xml : messages with something akin to the cloud solr server? I only see examples : posting to actual Solr instances, but we really need to be able to take : advantage of the zookeepers to send our updates to the appropriate

schema / config file names

2014-07-02 Thread John Smodic
Is it required for the schema.xml and solrconfig.xml to have those exact filenames? Can I alias schema.xml to foo.xml in some way, for example? Thanks.

Re: schema / config file names

2014-07-02 Thread Chris Hostetter
: Is it required for the schema.xml and solrconfig.xml to have those exact : filenames? It's an extremelely good idea ... but strictly speaking no... https://cwiki.apache.org/confluence/display/solr/CoreAdminHandler+Parameters+and+Usage#CoreAdminHandlerParametersandUsage-CREATE This smells

RE: Memory Leaks in solr 4.8.1

2014-07-02 Thread Chris Hostetter
This is a long standing issue in solr, that has some suggested fixes (see jira comments), but no one has been seriously afected by it enough for anyone to invest time in trying to improve it... https://issues.apache.org/jira/browse/SOLR-2357 In general, the fact that Solr is moving away from

Re: schema / config file names

2014-07-02 Thread John Smodic
That's good to know. I don't actually want to do it. I want to see just how much of Solr's schema and configuration can be reliably validated. The error messages I've been getting back for misconfigured setups are less than ideal at times. But it should be easy for me to validate certain

Re: Memory Leaks in solr 4.8.1

2014-07-02 Thread Aman Tandon
Thanks chris, independent of servlet container is good. Eagerly waiting for solr 5 :) With Regards Aman Tandon On Thu, Jul 3, 2014 at 7:58 AM, Chris Hostetter hossman_luc...@fucit.org wrote: This is a long standing issue in solr, that has some suggested fixes (see jira comments), but no

Re: Slow QTimes - 5 seconds for Small sized Collections

2014-07-02 Thread Shawn Heisey
On 7/2/2014 11:55 AM, IJ wrote: Here is a short wishlist based on the experience in debugging this issue: 1. Wish SolrQueryResponse could contain a list of node names / shard-replica names that a request passed through for processing the query (when debug is turned ON) 2. Wish

Re: schema / config file names

2014-07-02 Thread Tirthankar Chatterjee
Chris, We have actually done that. Our requirement was basically have a single installation of Solr to assume different roles and each role had its own changes for optimisation done on solrconfig.xml and schema.xml When we start a role we basically adapt to file role_solrconfig.xml and

Re: Customise score

2014-07-02 Thread rachun
Hi, Jack, Thank you very much for you solution its works! I'm sorry that I didn't make it clear at the beginning for 'score' which i mean document score (solr produce it at query time). Thank you very much for all of you, Chun. -- View this message in context:

Re: Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-02 Thread Shawn Heisey
On 7/1/2014 4:57 AM, mskeerthi wrote: I have to download my 5 million records from sqlserver to solr into one index. I am getting below exception after downloading 1 Million records. Is there any configuration or another to download from sqlserver to solr. Below is the exception i am getting

Re: External File Field eating memory

2014-07-02 Thread Kamal Kishore Aggarwal
Any replies ?? On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi Team, I have recently implemented EFF in solr. There are about 1.5 lacs(unsorted) values in the external file. After this implementation, the server has become slow. The solr query time

Re: External File Field eating memory

2014-07-02 Thread Alexandre Rafalovitch
How would we know where the problem is? It's your custom implementation. And it's your own documents, so we don't know field sizes/etc. And it's your own metric (ok, Indian metric, but lacs are fairly unknown outside of India). Seriously though, have you tried using any memory profilers and