Indexing strategies?

2014-02-12 Thread manju16832003
Hi, I'm facing a dilemma of choosing the indexing strategies. My application architecture is - I have a listing table in my DB - For each listing, I have 3 calls to a URL Datasource of different system I have 200k records Time taken to index 25 docs is 1Minute, so for 200k it might take

Phonetic search on multiple fields

2014-02-12 Thread Navaa
Hi, I am beginner of solr, I am trying to implement phonetic search in my application my code in schema.xml for fieldType fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter

Indexing in the case of entire shard failure

2014-02-12 Thread elmerfudd
We have a system which consists of 2 shards while every shard has a leader and one replica. During indexing one of the shards (both leader and replica) was shut down. We got two types of HTTP requests: rm= Service Unavailable and rm=OK. From this we’ve got to the conclusion that the shard which

Re: Indexing in the case of entire shard failure

2014-02-12 Thread Jack Krupansky
Note: In SolrCloud terminology, a leader is also a replica. IOW, you have two replicas, one of which (and it can vary over time) is elected as leader for that shard. The other shards remain capable of indexing even if one shard becomes unavailable. That is expected - and desired - behavior in

Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Navaa
Hi, I am using solr for searching phoneticly equivalent string my schema contains... fieldType name=text_general_doubleMetaphone class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory /

Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Pisarev, Vitaliy
I am running a very simple performance experiment where I post 2000 documents to my application. Who in turn persists them to a relational DB and sends them to Solr for indexing (Synchronously, in the same request). I am testing 3 use cases: 1. No indexing at all - ~45 sec to post 2000

Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Mark Miller
Doing a standard commit after every document is a Solr anti-pattern. commitWithin is a “near-realtime” commit in recent versions of Solr and not a standard commit. https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching - Mark http://about.me/markrmiller On Feb 12, 2014, at

Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Joel Bernstein
Yes, committing after each document will greatly degrade performance. I typically use autoCommit and autoSoftCommit to set the time interval between commits, but commitWithin should have a similar effect.. I often see performance of 2000+ docs per second on the load using auto commits. When

Newb - Search not returning any results

2014-02-12 Thread leevduhl
I setup a Solr Core and populated it with documents but I am not able to get any results when attempting to search the documents. A generic search (q=*.*) returns all documents (and fields/values within those documents), however when I try to search using specific criteria I get no results back.

RE: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Pisarev, Vitaliy
I absolutely agree and I even read the NRT page before posting this question. The thing that baffles me is this: Doing a commit after each add kills the performance. On the other hand, when I use commit within and specify an (absurd) 1ms delay,- I expect that this behavior will be equivalent to

Re: Importing database DIH

2014-02-12 Thread Gora Mohanty
On 12 February 2014 20:53, Maheedhar Kolla maheedhar.ko...@gmail.com wrote: Hi , I need help with importing data, through DIH. ( using solr-3.6.1, tomcat6 ) I see the following error when I try to do a full-import from my local MySQL table (

Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Eric_Peng
I was just trying to use SolrJ Client to import XML data to Solr server. And I read SolrJ wiki that says SolrJ lets you upload content in XML and Binary format I realized there is a XML parser in Solr (We can use a dataUpadateHandler in Solr default UI Solr Core Dataimport) So I was wondering

Importing database DIH

2014-02-12 Thread Maheedhar Kolla
Hi , I need help with importing data, through DIH. ( using solr-3.6.1, tomcat6 ) I see the following error when I try to do a full-import from my local MySQL table ( http:/s/solr//dataimport?command=full-import ). snip .. str name=Total Requests made to DataSource0/str str name=Total

RE: Importing database DIH

2014-02-12 Thread Pisarev, Vitaliy
It can be anything from wrong credentials, to missing driver in the class path, to malformed connection string, etc.. What does the Solr log say? -Original Message- From: Maheedhar Kolla [mailto:maheedhar.ko...@gmail.com] Sent: יום ד 12 פברואר 2014 17:23 To:

Re: Newb - Search not returning any results

2014-02-12 Thread Gora Mohanty
On 12 February 2014 20:57, leevduhl ld...@corp.realcomp.com wrote: [...] However, when I try to search specifically where mailingcity=redford I don't get any results back. See the following query/results. Query:

Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Dmitry Kan
Cross-posting my answer from SO: According to this wiki: https://wiki.apache.org/solr/NearRealtimeSearch the commitWithin is a soft-commit by default. Soft-commits are very efficient in terms of making the added documents immediately searchable. But! They are not on the disk yet. That means the

Re: Indexing strategies?

2014-02-12 Thread Erick Erickson
I'd seriously consider a SolrJ program that pulled the necessary data from two of your systems, held it in cache and then pulled the data from your main system and enriched it with the cached data. Or export your information from your remote systems and import them into a single system where you

Re: Phonetic search on multiple fields

2014-02-12 Thread Erick Erickson
First, why are you talking about DoubleMetaphone when your fieldType uses BeiderMorseFilterFactory? Which points up a basic issue you need to wrap your head around or you'll be endlessly confused. At least I was... Your analysis chains _must_ do compatible things at index and query time. The

Re: Importing database DIH

2014-02-12 Thread Maheedhar Kolla
Thanks for the comments/advice. I did mess with the drivers ( by deliberately moving the libs) and it did fail as it is supposed to. When I looked into catalina.out, I realized that the problem lies with data directory being owned by root instead of tomcat6. I changed it so that tomcat6 can

Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Erick Erickson
Hmmm, before going there let's be sure you're trying to do what you think you are. Solr does _not_ index arbitrary XML. There is a very specific format of XML that describes solr documents that _can_ be indexed. But random XML is not supported. See the documents in example/exampledocs for the XML

Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Shawn Heisey
On 2/12/2014 8:21 AM, Eric_Peng wrote: I was just trying to use SolrJ Client to import XML data to Solr server. And I read SolrJ wiki that says SolrJ lets you upload content in XML and Binary format I realized there is a XML parser in Solr (We can use a dataUpadateHandler in Solr default

Re: Newb - Search not returning any results

2014-02-12 Thread leevduhl
Thanks the syntax correction solved the problem. I actually thought I tried that before I posted. Thanks Lee -- View this message in context: http://lucene.472066.n3.nabble.com/Newb-Search-not-returning-any-results-tp4116905p4116930.html Sent from the Solr - User mailing list archive at

Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Erick Erickson
Here's some additional background that may shed light on the performance.. http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Wed, Feb 12, 2014 at 7:40 AM, Dmitry Kan solrexp...@gmail.com wrote: Cross-posting my answer from SO:

Re: Solr perfromance with commitWithin seesm too good to be true. I am afraid I am missing something

2014-02-12 Thread Jack Krupansky
The explicit commit will cause your app to be delayed until that commit completes, and then Solr would be idle until that request completion makes its way back to your app and you submit another request which finds its way to Solr, maybe a few ms. That includes network latency. That interval of

Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Eric_Peng
Thanks a lot, learnt a lot from it -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901p4116937.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Eric_Peng
Thanks you so much Erick, I will try to write my owe XML parser -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-how-to-upload-XML-by-using-SolrJ-Client-Java-Code-tp4116901p4116936.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using numeric ranges in Solr query

2014-02-12 Thread Rafał Kuć
Hello! Just specify the left boundary, like: price:[900 TO 1000] -- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ When user enter a price in price field, for Ex: 1000 USD, i want to fetch all items with price

Re: Question about how to upload XML by using SolrJ Client Java Code

2014-02-12 Thread Jack Krupansky
There is also an XSLT update handler option to transform raw XML to Solr XML on the fly. If anybody here has used it, feel free to chime in. See: http://wiki.apache.org/solr/XsltUpdateRequestHandler and

Using numeric ranges in Solr query

2014-02-12 Thread jay67
When user enter a price in price field, for Ex: 1000 USD, i want to fetch all items with price around 1000 USD. I found in documentation that i can use price:[* to 1000] like that. It will get all items with from 1 to 1000 USD. But i want to get results where price is between 900 to 1000 USD.

Re: Using numeric ranges in Solr query

2014-02-12 Thread Jack Krupansky
Is price a float/double field? price:[99.5 TO 100.5] -- price near 100 price:[900 TO 1000] or price:[899.5 TO 1000.5] -- Jack Krupansky -Original Message- From: jay67 Sent: Wednesday, February 12, 2014 12:03 PM To: solr-user@lucene.apache.org Subject: Using numeric ranges in Solr

Re: Set up embedded Solr container and cores programmatically to read their configs from the classpath

2014-02-12 Thread Alan Woodward
Hi Robert, I don't think this is possible at the moment, but I hope to get https://issues.apache.org/jira/browse/SOLR-4478 in for Lucene/Solr 4.7, which should allow you to inject your own SolrResourceLoader implementation for core creation (it sounds as though you want to wrap the core's

RE: Solr4 performance

2014-02-12 Thread Joshi, Shital
Does Solr4 load entire index in Memory mapped file? What is the eviction policy of this memory mapped file? Can we control it? _ From: Joshi, Shital [Tech] Sent: Wednesday, February 05, 2014 12:00 PM To: 'solr-user@lucene.apache.org' Subject: Solr4

Re: Solr4 performance

2014-02-12 Thread Shalin Shekhar Mangar
No, Solr doesn't load the entire index in memory. I think you'll find Uwe's blog most helpful on this matter: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html On Thu, Feb 13, 2014 at 12:27 AM, Joshi, Shital shital.jo...@gs.com wrote: Does Solr4 load entire index in Memory

Re: Solr4 performance

2014-02-12 Thread Greg Walters
Shital, Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's a pretty decent explanation of memory mapped files. I don't believe that the default configuration for solr is to use MMapDirectory but even if it does my understanding is that the entire

RE: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Beale, Jim (US-KOP)
Hi David, I finally got back to this again, after getting sidetracked for a couple of weeks. I implemented things in accordance with my understanding of what you wrote below. Using SolrJ, the code to index the spatial field is as follows, private void addSpatialField(double lat, double lon,

filtering/faceting by a big list IDs

2014-02-12 Thread Tri Cao
Hi all,I am running a Solr application and I would need to implement a feature that requires faceting and filtering on a large list of IDs. The IDs are stored outside of Solr and is specific to the current logged on user. An example of this is the articles/tweets the user has read in the last few

Re: Solr4 performance

2014-02-12 Thread Shawn Heisey
On 2/12/2014 12:07 PM, Greg Walters wrote: Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html as it's a pretty decent explanation of memory mapped files. I don't believe that the default configuration for solr is to use MMapDirectory but even if it does my

Re: filtering/faceting by a big list IDs

2014-02-12 Thread Joel Bernstein
Tri, You will most likely need to implement a custom QParserPlugin to efficiently handle what you described. Inside of this QParserPlugin you could create the logic that would bring in your outside list of ID's and build a DocSet that could be applied to the fq and the facet.query. I haven't

Re: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Smiley, David W.
That’s pretty weird. It appears that somehow a Spatial4j Point class is having it’s toString() called on it (which looks like Pt(x=-72.544123,y=41.85) ) and then Spatial4j is trying to parse this which isn’t in a valid format — the toString is more debug-ability. Your SolrJ code looks

Re: Solr4 performance

2014-02-12 Thread Roman Chyla
And perhaps one other, but very pertinent, recommendation is: allocate only as little heap as is necessary. By allocating more, you are working against the OS caching. To know how much is enough is bit tricky, though. Best, roman On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey

Re: Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Paul Libbrecht
Navaa, you need query expansion for that. E.g. if your query goes through dismax, you need to add the two field names to the qf parameter. The nice thing is that qf can be: text^3.0 test.stemmed^2 text.phonetic^1 And thus exact matches are preferred to stemmed or phonetic matches. This is

Re: Optimize Index in solr 4.6

2014-02-12 Thread Shawn Heisey
On 2/6/2014 4:00 AM, Shawn Heisey wrote: I would not recommend it, but if you know for sure that your infrastructure can handle it, then you should be able to optimize them all at once by sending parallel optimize requests with distrib=false directly to the Solr cores that hold the shard

RE: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Beale, Jim (US-KOP)
Hi David, You wrote: Perhaps you’ve got some funky UpdateRequestProcessor from experimentation you’ve done that’s parsing then toString’ing it? No, nothing at all. The update processing is straight out-of-the-box Solr. And also, your stack trace should have more to it than what you

Re: Indexing spatial fields into SolrCloud (HTTP)

2014-02-12 Thread Smiley, David W.
Your new code should also work, and should be equivalent. The longer stack trace you have is of the wrapping SolrException which wraps another exception — InvalidShapeException. You should also see the stack trace of InvalidShapeException which should originate out of Spatial4j. ~ David

Weird issue with q.op=AND

2014-02-12 Thread Shamik Bandopadhyay
Hi, I'm facing a weird problem while using q.op=AND condition. Looks like it gets into some conflict if I use multiple appends condition in conjunction. It works as long as I've one filtering condition in appends. lst name=appends str name=fqSource:TestHelp/str /lst Now, the moment I

Re: Weird issue with q.op=AND

2014-02-12 Thread Shawn Heisey
On 2/12/2014 3:32 PM, Shamik Bandopadhyay wrote: Hi, I'm facing a weird problem while using q.op=AND condition. Looks like it gets into some conflict if I use multiple appends condition in conjunction. It works as long as I've one filtering condition in appends. lst name=appends str

Re: Weird issue with q.op=AND

2014-02-12 Thread shamik
Thanks a lot Shawn. Changing the appends filtering based on your suggestion worked. The part which confused me bigtime is the syntax I've been using so far without an issue (barring the q.op part). lst name=appends str name=fqSource:TestHelp | Source:downloads | -AccessMode:internal |

Re: Weird issue with q.op=AND

2014-02-12 Thread Shawn Heisey
On 2/12/2014 4:58 PM, shamik wrote: Thanks a lot Shawn. Changing the appends filtering based on your suggestion worked. The part which confused me bigtime is the syntax I've been using so far without an issue (barring the q.op part). lst name=appends str name=fqSource:TestHelp |

change character correspondence in icu lib

2014-02-12 Thread alxsss
Hello, I use icu4j-49.1.jar, lucene-analyzers-icu-4.6-SNAPSHOT.jar for one of the fields in the form filter class=solr.ICUFoldingFilterFactory / I need to change one of the accent char's corresponding letter. I made changes to this file

Re: change character correspondence in icu lib

2014-02-12 Thread Alexandre Rafalovitch
Not a direct answer, but the usual next question is: are you absolutely sure you are using the right jars? Try renaming them and restarting Solr. If it complains, you got the right ones. If not Also, unzip those jars and see if your file made it all the way through the build pipeline.

Re: Weird issue with q.op=AND

2014-02-12 Thread shamik
Thanks, I'll take a look at the debug data. -- View this message in context: http://lucene.472066.n3.nabble.com/Weird-issue-with-q-op-AND-tp4117013p4117047.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing strategies?

2014-02-12 Thread manju16832003
Hi Erick, Thank you very much, those are valuable suggestions :-). I would give a try. Appreciate your time. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-strategies-tp4116852p4117050.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Weird issue with q.op=AND

2014-02-12 Thread Jack Krupansky
Did you mean to use || for the OR operator? A single | is not treated as an operator - it will be treated as a term and sent through normal term analysis. -- Jack Krupansky -Original Message- From: Shamik Bandopadhyay Sent: Wednesday, February 12, 2014 5:32 PM To:

Limit amount of search result

2014-02-12 Thread rachun
Dear all gurus, I would like to limit amount of search result, let's say I have many shop which is selling shirt. So when I search white shirt I want to give a maximum number per shop (ex. 5). The result should be like this... - Shop A - Shop A - Shop B - Shop B - Shop B - Shop B - Shop B -

Re: Limit amount of search result

2014-02-12 Thread Sameer Maggon
Chun, Have you looked at Grouping / Field Collapsing feature in solr? https://wiki.apache.org/solr/FieldCollapsing If shop is one of your field, you can use field collapsing on that field with a maximum of 'n' to return per field value (or group). Sameer. -- www.measuredsearch.com tw:

Re: Join Scoring

2014-02-12 Thread anand chandak
Re-posting... Thanks, Anand On 2/12/2014 10:55 AM, anand chandak wrote: Thanks David, really helpful response. You mentioned that if we have to add scoring support in solr then a possible approach would be to add a custom QueryParser, which might be taking Lucene's JOIN module. I have

Re: Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Paul Libbrecht
Navaa, You need the query to be sent to the two fields. In dismax, this is easy. Paul On 12 février 2014 14:22:33 HNEC, Navaa navnath.thomb...@xtremumsolutions.com wrote: Hi, I am using solr for searching phoneticly equivalent string my schema contains... fieldType

APACHE SOLR: Pass a file as query parameter and then parse each line to form a criteria

2014-02-12 Thread rajeev.nadgauda
Hi , I am new to solr , i need help with the following PROBLEM: I have a huge file of 1 lines i want this to be an inclusion or exclusion in the query . i.e each line like ( line1 or line2 or ..) How can this be achieved in solr , is there a custom implementation that i would need to

Solr delta indexing approach

2014-02-12 Thread lalitjangra
Hi,I am working on a prototyope where i have a content source i am indexing all documents strore the index in solr.Now i have pre-condition that my content source is ever changing means there is always new content added to it. As i have read that solr use to do indexing on full source only

Re: Solr delta indexing approach

2014-02-12 Thread Alexandre Rafalovitch
You have read that Solr needs to reindex a full source. That's correct (unless you use atomic updates). But - the important point is - this is per document. So, once you indexed your 1 documents, you don't need to worry about them until they change. Just go ahead and index your additional

Re: Searching phonetic by DoubleMetaphone soundex encoder

2014-02-12 Thread Navaa
hi, Thanks for your reply.. I m beginner of solr kindly elaborate it mor details because in my solrconfig.xml requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows5/int str name=dfname/str

Re: Solr delta indexing approach

2014-02-12 Thread lalitjangra
Thanks Alex, Yes my source system maintains the crettion last modificaiton system of each document. As per your inputs, can i assume that next time when solr starts indexing, it scans all the prsent in source but only picks those for indexing which are either new or have been updated since

RE: Solr delta indexing approach

2014-02-12 Thread Sadler, Anthony
I had this problem when I started to look at Solr as an index for a file server. What I ended up doing was writing a perl script that did this: - Scan the whole filesystem and create an XML that is submitted into Solr for indexing. As this might be some 600,000 files, I break it down into

Re: Solr delta indexing approach

2014-02-12 Thread Alexandre Rafalovitch
I'd start from doing Solr tutorial. It will explain a lot of things. But in summary, you can send data to Solr (best option) or you can pull it using DataImportHandler. Take your pick, do the tutorial, maybe read some books. Then come back with specific questions of where you started. Regards,

Re: Solr delta indexing approach

2014-02-12 Thread Walter Underwood
Why write a Perl script for that? touch new_timestamp find . -newer timestamp | script-to-submit mv new_timestamp timestamp Neither approach deals with deleted files. To do this correctly, you need lists of all the files in the index with their timestamps, and of all the files in the

Re: Solr delta indexing approach

2014-02-12 Thread lalitjangra
Thanks all. I am following couple of articles for same. I am sending data to solr instead of using DIH and able to successfully index data in solr. My concern here is to ensure how to minimize solr indexing so that only updated data is indexed each time out of all data items. Is this something

Re: Integrating Oauth2 with Solr MailEntityProcessor

2014-02-12 Thread Dileepa Jayakody
Hi again, Anybody interested in this feature for Solr MailEntityProcessor? WDYT? Thanks, Dileepa On Thu, Jan 30, 2014 at 11:00 AM, Dileepa Jayakody dileepajayak...@gmail.com wrote: Hi All, I think Oauth2 integration is a valid usecase for Solr when it comes to importing data from

RE: Solr delta indexing approach

2014-02-12 Thread Sadler, Anthony
At the risk of derailing the thread: We do a lot more in the script than is mentioned here: We pull out parts of the path and mangle them (for example turn them into a UNC path for users to use, or pull out a client name or job number using a known folder structure). As for deleted files,