Multi threaded document atomic OR in-place updates
I have a scenario as follows: There are 2 separate threads where each will try to update the same document in a single index for 2 separate fields, for which we are using atomic OR in-place updates. For e.g. id is the unique field in the index thread-1 will update following info: id:1001 field-1:abc1001 thread-2 will update following info: id:1001 field-2:xyz1002 The updates are done on the same core index asynchronously. What i would need to know is that will there be at any time inconsistency in the index. Both the 2 threads will update different fields for the same id field. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Problem with Synonyms
SOLR has a nice analysis page. You can use it to get insight what is happening after each filter is applied at index/search time Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Synonyms-tp4087905p4087915.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AW: Surprising score?
Is there a way to omitNorms and still be able to use {!boost b=boost} ? OR you could let /omitNorms=false/ as usual and have your custom Similarity implementation with the length normalization method overridden for using a constant value of 1. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Surprising-score-tp4075436p4075722.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR guidance required
Aditya, As suggested by others, definitely you should use the filter queries directly to query SOLR. Just keep your indexes updated. Keep all your fields indexed/stored as per your requirements. Refer through the filter query wiki http://wiki.apache.org/solr/CommonQueryParameters http://wiki.apache.org/solr/CommonQueryParameters http://wiki.apache.org/solr/SimpleFacetParameters http://wiki.apache.org/solr/SimpleFacetParameters BTW, almost all the job sites out there (whether small/medium/big) use SOLR/lucene to power their searches :) Best Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-guidance-required-tp4062188p4062422.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TooManyClauses: maxClauseCount is set to 1024
Just increase the value of /maxClauseCount/ in your solrconfig.xml. Keep it large enough. Best Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/TooManyClauses-maxClauseCount-is-set-to-1024-tp4056965p4056966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TooManyClauses: maxClauseCount is set to 1024
Update: Also remove your range queries from the main query and specify it as a filter query. Best Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/TooManyClauses-maxClauseCount-is-set-to-1024-tp4056965p4056969.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Response time in client was much longer than QTime in tomcat
SOLR's QTime represents actual time it spent on searching, where as your c# client response time might be the total time spent in sending HTTP request and getting back the response(which might also include parsing the results) . Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Response-time-in-client-was-much-longer-than-QTime-in-tomcat-tp4034148p4034996.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need basic information
Do logstash/graylog2 do log processing/searching in real time? Or can scale for real time need? I guess harshadmehta is looking for real-time indexing/search. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588p4004996.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need basic information
One basic and trivial solution could be to have schema like; Date (of type date/string) -- this would store the '-mm-dd' format date Tag (of type string) -- the text/tag 'Account' goes into this account-id (of type sint/int) -- account id like '123' goes into this action (of type sting) -- values like 'created'/'updated' goes into this Then just push your logs into solr. http://wiki.apache.org/solr/UpdateCSV http://wiki.apache.org/solr/UpdateCSV Then to get log activity for account id '123', you could query like: http://localhost:port/solr/select/?q=id:123fq=Tag:Accountfq=Date:[d1 TO d2] then process the results for plotting/reporting OR you could ask for faceting on the 'action' field like; http://localhost:port/solr/select/?q=id:123fq=Tag:Accountfq=Date:[d1 TO d2]facet=truefacet.field=action This way you have facet count for created/updated/deleted etc. Hope this is what u r looking for. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588p4004637.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Maximum index size on single instance of Solr
We have a 48GB index size on a single shard. 20+ million documents. Recently migrated to SOLR 3.5 But we have a cluster of SOLR servers for hosting searches. But i do see to migrate to SOLR sharding going forward. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Maximum-index-size-on-single-instance-of-Solr-tp4004171p4004418.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr indexing slows down after few minutes
Did you checked wiki: http://wiki.apache.org/lucene-java/ImproveIndexingSpeed http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Do you commit often? Do you index with multiple threads? Also try experimenting with various available MergePolicies introduced from SOLR 3.4 onwards Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/solr-indexing-slows-down-after-few-minutes-tp4004337p4004421.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: optimum solr core size
How many documents are there in the index? How many stored/indexed fields? There is no magic number as yet for defining the size of a single core(whether no. of docs OR the size of index), but 123GB seems to be on a higher side, so, you could definitely go for sharding of indexes. BTW, how are your searches/indexing performing over the time? Are there any impact? Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/optimum-solr-core-size-tp4004251p4004424.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Load Testing in Solr
Hi Dhaivat, JMeter is a nice tool. But it all depends what sort of load are you expecting, how complex queries are you expecting(sorting/filtering/textual searches). You need to consider all these to benchmark. Thanx Pravedsh -- View this message in context: http://lucene.472066.n3.nabble.com/Load-Testing-in-Solr-tp4004117p4004428.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query Time problem on Big Index Solr 3.5
How often the documents are added in the index? Try lessen down the optimize frequency. BTW do you optimize only on master (which should be the desired way). Also specifically for dates ranges, try to use the filter queries, this way it would be cached and would thus be faster. This would also apply to other fields which require very less analysis or have limited unique fields. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Query-Time-problem-on-Big-Index-Solr-3-5-tp4003660p4004437.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query Time problem on Big Index Solr 3.5
13 GB index isn't too big, but going forward index sharding is the solutions for large single core indexes. This way you can scale out. This links will give you some info: http://lucidworks.lucidimagination.com/display/solr/Distributed+Search+with+Index+Sharding http://lucidworks.lucidimagination.com/display/solr/Distributed+Search+with+Index+Sharding Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Query-Time-problem-on-Big-Index-Solr-3-5-tp4003660p4004630.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query during a query
Did you checked SOLR Field Collapsing/Grouping. http://wiki.apache.org/solr/FieldCollapsing http://wiki.apache.org/solr/FieldCollapsing If this is what you are looking for. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Query-during-a-query-tp4004624p4004631.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: IndexWrite in Lucene/Solr 3.5 is slower?
BTW, Have you changed the MergePolicy MergeScheduler settings also? Since Lucene 3.x/3.5 onwards, there have been new MergePolicy MergeScheduler implementations available, like TieredMergePolicy ConcurrentMergeScheduler. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/IndexWrite-in-Lucene-Solr-3-5-is-slower-tp3989764p3989768.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dynamically pick dataDirs
While n being a higher value, firing 100 cores wouldn't be a viable solution. How do I achiever this in solr, in short I would like to have a single core and get results out of multiple index searchers and that implies multiple index readers. When you'd want to have single core with multiple index directories (which currently is not supported by SOLR), then why can't you have a single merged index within the core. Lucene supports searching through multiple indexes but this hasn't been inherited by the SOLR by design (I mean using MultiSearcher API's for a single core with multiple index directories in it). BTW, how big your index(es) are? Total documents? total size? etc. If each core is small(MBs/ few GBs) then you could merge few of them together. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Dynamically-pick-dataDirs-tp3973368p3973682.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR 3.5 Index Optimization not producing single .cfs file
Thanx Mike, If you really must have a CFS (how come?) then you can call TieredMergePolicy.setNOCFSRatio(1.0) -- not sure how/where this is exposed in Solr though. BTW, would this impact the search performance? I mean i was just trying few random keyword searches(without sort and filters) on both the system(1.4.1 vs 3.5) and found that 3.5 searches takes longer time than the 1.4.1(around 10-20% slower). Haven't done any load test till now Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-5-Index-Optimization-not-producing-single-cfs-file-tp3958619p3961441.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR 3.5 Index Optimization not producing single .cfs file
Hi, I've migrated the search servers to the latest stable release (SOLR-3.5) from SOLR-1.4.1. We've fully recreated the index for this. After index completes, when im optimizing the index then it is not merging the index into a single .cfs file as was being done with 1.4.1 version. We've set the , useCompoundFiletrue/useCompoundFile Is it something related to the new MergePolicy being used with SOLR 3.x onwards (I suppose it is TieredMergePolicy with 3.x version)? If yes should i change it to the LogByteSizeMergePolicy? Does this change requires complete rebuilt OR will do incrementally? Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-5-Index-Optimization-not-producing-single-cfs-file-tp3958619.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any way to get reference to original request object from within Solr component?
Hi Sujit, The Http parameters ordering is above the SOLR level. Don't think this could be controlled at SOLR level. You can append all required values in a single Http param at then break at your component level. Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Any-way-to-get-reference-to-original-request-object-from-within-Solr-component-tp3833703p3834082.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Basic SOLR help needed
When I do a query using the Admin tool: INST_NAME:KENTUCKY TECH PADUCAH (There is a docment in the db that meets this INST_NAME exactly) Try using this way: INST_NAME:(KENTUCKY TECH PADUCAH) This way all the 3 terms would be searched in the field INST_NAME, otherwise only the first term KENTUCKY is searched in the INST_NAME and rest terms like TECH and PADUCAH are searched in your default search field Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Basic-SOLR-help-needed-tp3759855p375.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting and searching on a field
I have read about the option of copying this to a different field, using one for searching by tokenizing, and one for sorting. That would be the optimal way of doing it. Since sorting requires the fields not to be analyzed/tokenized, while the searching requires it. The copy field would be the optimal solution for doing it. Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-and-searching-on-a-field-tp3584992p3587906.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Search Across Multiple Cores not working when quering on specific field
but when i searched on a specific field than it is not working http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1; q=mnemonic_value:United Why distributed search is not working when i search on a particular field.? Since you have multiple shard infra, do the cores share the same configurations(schema.xml/solrconfig.xml etc.)?? What error/output you are getting for sharded query? Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-Across-Multiple-Cores-not-working-when-quering-on-specific-field-tp3585013p3587890.html Sent from the Solr - User mailing list archive at Nabble.com.
Generic RemoveDuplicatesTokenFilter
Hi All, Currently, the SOLR's existing http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory RemoveDuplicatesTokenFilter filters the duplicate tokens with the same text and logical at the same position. In my case, if the same term appears duplicate one after the other then i need to remove all duplicates and consume only single occurance of the term (even if the positionincrementgap ==1). For e.g. the input stream is as: /quick brown brown brown fox jumps jumps over the little little lazy brown dog/ Then the output shld be: quick brown fox jumps over the little lazy brown dog. To acheive this, I implemented my own version of /RemoveDuplicatesTokenFilter/ with overridden /process()/ method as: protected Token process(Token t) throws IOException { Token nextTok = peek(1); if(t!=null nextTok!=null){ if(t.termText().equalsIgnoreCase(nextTok.termText())){ return null; } } return t; } The above implementation works as per desired and the continuous duplicates are getting removed :) Any advice/feedback for the above implementation :) Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Generic-RemoveDuplicatesTokenFilter-tp3581656p3581656.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr or SQL fultext search
Go ahead with SOLR based text search. Thats what it is meant for and does it great. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-or-SQL-fultext-search-tp3566654p3569894.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr using very high I/O
Can u share more info: like what is your H/W infra, CPU, RAM, HDD?? From where you pick the records/documents to index; RDBMS, Files, Network?? Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-using-very-high-I-O-tp3567076p3569903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to improve facet search?
What is the type of the field on which you are getting facets (string, Text, int, date etc.). Is it multivalued or not? How many unique values do you have for the field? What is your filtercache setting in your solrconfig.xml? Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-improve-facet-search-tp3569910p3569955.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to improve facet search?
How many unique terms do you have in the faceting field Since there are lot of evictions, consider increasing the size of the filtercache. Try to keep evictions to min. BTW how much is your index size (GB/MB??) How much RAM is allocated? Above All: Have you benchmarked your search? Is searching done in milis/secs/mins?? I am trying to understand if your search could already be performing quite good/OK. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-improve-facet-search-tp3569910p3570048.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: cache monitoring tools?
facet.limit=50 your facet.limit seems too high. Do you actually require this much? Since there a lot of evictions from filtercache, so, increase the maxsize value to your acceptable limit. Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/cache-monitoring-tools-tp3566645p3566811.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr sorting issue : can not sort on multivalued field
Was that field multivalued=true earlier by any chance??? Did you rebuild the index from scratch after changing it to multivalued=false ??? Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-sorting-issue-can-not-sort-on-multivalued-field-tp3564266p3566832.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to make effective search with fq and q params
Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ - threading, http clients, connection managers
1) Is it safe to reuse a single _mgr and _client across all 28 cores? both are thread-safe API as per HttpClient specs. You shld go ahead with this. Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-threading-http-clients-connection-managers-tp3485012p3486436.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to prevent number-of-matching-terms in contributing score
Did you rebuild the index from scratch. Since this is index time factor, you need to build complete index from scratch. Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/to-prevent-number-of-matching-terms-in-contributing-score-tp3486373p3486447.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to prevent number-of-matching-terms in contributing score
Hi Samar, You can write your custom similarity implementation, and override the /lengthNorm()/ method to return a constant value. Then in your /schema.xml/ specify your custom implementation as the default similarity class. But you need to rebuild your index from scratch for this to come into effect(also set /omitNorms=true/ for your fields where you need this feature) Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/to-prevent-number-of-matching-terms-in-contributing-score-tp3486373p3486512.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: best way for sum of fields
I Guess, This has nothing to do with search part. You can post process the search results(I mean iterate through your results and sum it) Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486536.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: hierarchical synonym
If I understood correctly, this seems you are wanting facets/hierarchical facets. Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/hierarchical-synonym-tp344p3440090.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: inconsistent results when faceting on multivalued field
Could u clarify on below: When I make a search on facet.qua_code=1234567 ?? Are u trying to say, when u fire a fresh search for a facet item, like; q=qua_code:1234567?? This this would fetch for documents where qua_code fields contains either the terms 1234567 OR both terms (1234567 9384738.and others terms). This would be since its a multivalued field and hence if you see the facet, then its shown for both the terms. If I reword the query as 'facet.query=qua_code:1234567 TO 1234567', I only get the expected counts You will get facet for documents which have term 1234567 only (facet.query would apply to the facets,so as to which facet to be picked/shown) Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/inconsistent-results-when-faceting-on-multivalued-field-tp3438991p3440128.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Painfully slow indexing
Are you posting through HTTP/SOLRJ? Your script time 'T' includes time between sending POST request -to- the response fetched after successful response right?? Try sending in small batches like 10-20. BTW how many documents are u indexing??? Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Painfully-slow-indexing-tp3434399p3440175.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting single documents by fq on unique field, performance
This approach seems fine. You might benchmark it through load test etc. Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-single-documents-by-fq-on-unique-field-performance-tp3440229p3440351.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: upgrading 1.4 to 3.x
Just look into your tomcat logs in more detail.specifically the logs when tomcat loads the solr application's web context. There you might find some clues or just post the logs snapshot here. Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/upgrading-1-4-to-3-x-tp3415044p3421225.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: text search and data aggregation, thoughts?
Hi Esteban, A lot depends on a lot of things: 1) How much volume(total documents) 2) size of index 3) How you represent the data-aggregated part in your UI. Your option-2 seems to be a suitable way to go. This way you tune each cores separately. Also the use-cases for updating each document/product in both indexes also seems different. One is updated when a product is added/updated. Other is updated when a product in viewed/sold from search results Option-1 can be used in case you are showing the data-aggregation stats on the search results page only along with each item. If it is shown in the item-detail page then option-1 seems better. Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/text-search-and-data-aggregation-thoughts-tp3416330p3421361.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: boosting and relevancy options from solr extensibility points -java-
in a certain time period (say christmas) I will promote a doc in christmas keyword You might check the QueryElevation component in SOLR. or based on users interest I will boost a specific category of products. or (I am not sure how can I do this one) I will boost docs that current user's friends (source:facebook) purchased/used/... You can check https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation apache mahout for this purpose. It's got recommendation engine that works pretty well. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/boosting-and-relevancy-options-from-solr-extensibility-points-java-tp3149916p3395752.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there a way to know which mm value was used?
You can explicitly pass /mm/ for every search, and get it in your response, otherwise use /debugQuery=true/, it will give you all implicitly used defaults (but you wouldn't want to use this in production) Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-know-which-mm-value-was-used-tp3395746p3395765.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hierarchical faceting with Date
You count index the date as a text field(or use a new text field to store date as text) and then try it on this new field Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Hierarchical-faceting-with-Date-tp3394521p3395824.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what is scheduling ? why should we do this?how to achieve this ?
SCHEDULING in OS terminology, is, when you specify cron jobs on linux/unix machines (and scheduled tasks in windows machines). What ever task that you schedule along with time/date or interval, it will be automatically invoked, so, you don't have to manually log into the machine and call the script/batch. SOLR scheduling is also same, but with internal mechanism provided by SOLR to set schedule to automatically invoke; delta-import, full-import, commit, etc. This would help, so, you're not dependent at OS level because for different OS's you have to schedule it differently(cron/scheduled-tasks). Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-scheduling-why-should-we-do-this-how-to-achieve-this-tp3287115p3292068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to update solr cache when i delete records from remote database?
You would have to delete them from SOLR also, and then commit it (commit will automatically refresh your caches). Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-update-solr-cache-when-i-delete-records-from-remote-database-tp3291879p3292074.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how i am getting data in my search field eventhough i removed data in my remote database?
http://lucene.472066.n3.nabble.com/how-to-update-solr-cache-when-i-delete-records-from-remote-database-td3291879.html http://lucene.472066.n3.nabble.com/how-to-update-solr-cache-when-i-delete-records-from-remote-database-td3291879.html -- View this message in context: http://lucene.472066.n3.nabble.com/how-i-am-getting-data-in-my-search-field-eventhough-i-removed-data-in-my-remote-database-tp3289008p3292095.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Viewing the complete document from within the index
Reconstructing the document might not be possible, since,only the stored fields are actually stored document-wise(un-inverted), where as the indexed-only fields are put as inverted way. In don't think SOLR/Lucene currently provides any way, so, one can re-construct document in the way you desire. (It's sort of reverse engineering not supported) Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Viewing-the-complete-document-from-within-the-index-tp3288076p3292111.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what is scheduling ? why should we do this?how to achieve this ?
The Wiki link that you referred is quite old and is not into active development. I would prefer the OS based scheduling using cron jobs. You can check below link. http://wiki.apache.org/solr/CollectionDistribution http://wiki.apache.org/solr/CollectionDistribution Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-scheduling-why-should-we-do-this-how-to-achieve-this-tp3287115p3292212.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: difference between shard and core in solr
a single core is an index with same schema , is this wat core really is ? YES. A single core is a independent index with its own unique schema. You go with a new core for cases where your schema/analysis/search requirements are completely different from your existing core(s). can a single core contain two separate indexes with different schema in it ? NO (for same reason as explained above). Is a shard refers to a collection of index in a single physical machine ?can a single core be presented in different shards ? You can think of a Shard as a big index distributed across a cluster of machines. So all shards belonging to a single core share same schema/analysis/search requirements. You go with sharding when index is not scalable on a single machine, or, when your index grows really big in size. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/difference-between-shard-and-core-in-solr-tp3178214p3178249.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Start parameter messes with rows
i just wanna be clear in the concepts of core and shard ? a single core is an index with same schema , is this wat core really is ? can a single core contain two separate indexes with different schema in it ? Is a shard refers to a collection of index in a single physical machine ?can a single core be presented in different shards ? You might look into following thread: http://lucene.472066.n3.nabble.com/difference-between-shard-and-core-in-solr-td3178214.html Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Start-parameter-messes-with-rows-tp3174637p3178678.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR Shard failover Query
Thanx Shawn, When I first set things up, I was using SOLR-1537 on Solr 1.5-dev. By the time I went into production, I had abandoned that idea and rolled out a stock 1.4.1 index with two complete server chains, each with 7 shards. So, Both 2 chains were configured under cluster in load balanced manner? Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Shard-failover-Query-tp3178175p3181400.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How could I monitor solr cache
This might be of some help: http://wiki.apache.org/solr/SolrJmx http://wiki.apache.org/solr/SolrJmx Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/How-could-I-monitor-solr-cache-tp3181317p3181407.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR Shard failover Query
Hi, SOLR has sharding feature, where we can distribute single search request across shards; the results are collected,scored, and, then response is generated. Wanted to know, what happens in case of failure of specific shard(s), suppose, one particular shard machine is down? Does the request fails, or, is this handled gracefully by SOLR? Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Shard-failover-Query-tp3178175p3178175.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deleted docs in IndexWriter Cache (NRT related)
commit would be the safest way for making sure the deleted content doesn't show up. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Deleted-docs-in-IndexWriter-Cache-NRT-related-tp3177877p3178179.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is it possible to extract all the tokens from solr?
You can use lucene for doing this. It provides TermEnum API to enumerate all terms of field(s). SOLR-1.4.+ also provides a special request handler for this purpose. Check it if that helps Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-extract-all-the-tokens-from-solr-tp3168362p3168589.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to build lucene-solr (espeically if behind a firewall)?
If behind proxy; then use: ant dist ${build_files:autoproxy} Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-build-lucene-solr-espeically-if-behind-a-firewall-tp3163038p3165568.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: POST for queries, length/complexity limit of fq?
1. I assume that it's worthwhile to rely on POST method instead of GET when issuing a search. Right? As I can see, this should work. We do restrict users search by passing unique id's(sometimes in thousands) in 'fq' and use POST method Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/POST-for-queries-length-complexity-limit-of-fq-tp3162405p3165586.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do I specify a different analyzer at search-time?
You can configure analyzer for 'index-time' for 'search-time' for each of your field-types in schema.xml Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-specify-a-different-analyzer-at-search-time-tp3159463p3165593.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re:OOM at solr master node while updating document
You just need to allocate more heap to your JVM. BTW are you doing any complex search while indexing is in progress, like getting large set of documents. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/OOM-at-solr-master-node-while-updating-document-tp3140018p3147475.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Read past EOF error due to broken connection
Did you do manual copy of index from Master to Slave of servers. I suppose, it won't be copied properly. If this is the case, then you can check the size of indexes on both servers. Otherwise, you would've to recreate the indexes. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Read-past-EOF-error-due-to-broken-connection-tp3091247p3098737.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Removing duplicate documents from search results
Would you care to even index the duplicate documents? Finding duplicacy in content fields would be not so easy as in some untokenized/keyword field. May be you could do this filtering at indexing time before sending the document to SOLR. Then the question comes, which one document should go(from a group of duplicates)?? The latest one? -- View this message in context: http://lucene.472066.n3.nabble.com/Removing-duplicate-documents-from-search-results-tp3099214p3099432.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Read past EOF error due to broken connection
First commit and then try again to search. You can also use lucene's CheckIndex tool to check fix your index (it may remove some corrupt segments in your index) Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Read-past-EOF-error-due-to-broken-connection-tp3091247p3094334.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search is taking long-long time.
Was your searches always slow, OR, since you did some changes at index/config/schema level? Is it due to 5-mins index updation? Are you warming ur searches? Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Search-is-taking-long-long-time-tp3095306p3098552.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: relevant result for query with boost factor on parameters
You can try following: 1. Try to increase boost for fields(say, field-1^100, field-2^20), and pass field-3 as a filtered query(using fq parameter). This way field-3 won't effect the scoring. 2. Some implicit factors like length normalization would defer the results, so, you can also switch it off Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/relevant-result-for-query-with-boost-factor-on-parameters-tp3079337p3085406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: relevant result for query with boost factor on parameters
but if suppose field1 does not contain both the term *rock and roll, * *special attention *then field 2 results should take the priority (show the results which has both the terms first and then show the results with respect to boost factor or relevance) if both the fields do not contain these terms together (show as normal one with field1 having more relevance than field2) You wud've to experiment with different boost values to arrive at some benchmark. Start with same for field-1 field-2, then inc. for field-1 a little bit... :) Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/relevant-result-for-query-with-boost-factor-on-parameters-tp3079337p3085424.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search failed even if it has the keyword .
First check, in your schema.xml, which is your default search field. Also look if you are using WordDelimiterFilterFactory in your schema.xml for the specific field. This would tokenize your words on every capital letter, so, for the word DescribeYourImageWithAMovieTitle will be broken into multiple tokens and each will be searchable. -- View this message in context: http://lucene.472066.n3.nabble.com/Search-failed-even-if-it-has-the-keyword-tp3075626p3075644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: difficult sort
I'm not sure, but have looked at Collapsing feature in SOLR yet? You may have to apply patch for 1.4.1 version, if this is what u want? -- View this message in context: http://lucene.472066.n3.nabble.com/difficult-sort-tp3075563p3075661.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search failed even if it has the keyword .
What is the type for the field's defaultquery title in your schema.xml ? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-failed-even-if-it-has-the-keyword-tp3075626p3075797.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: difficult sort
Yes. Then I beleive you would need multiple queries -- View this message in context: http://lucene.472066.n3.nabble.com/difficult-sort-tp3075563p3075802.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOlR -- Out of Memory exception
If you are sending whole CSV in a single HTTP request using curl, why not consider sending it in smaller chunks? -- View this message in context: http://lucene.472066.n3.nabble.com/SOlR-Out-of-Memory-exception-tp3074636p3075091.html Sent from the Solr - User mailing list archive at Nabble.com.
High 100% CPU usage with SOLR 1.4.1
Hi, I'm planning to upgrade my system from SOLR1.2.1 to SOLR1.4.1 version. We had done some lucene level optimizations on the SOLR slaves in the earlier system(1.2.1), like: 1. removed the synchronized block from the SegmentReader class's isDeleted() method 2. removed the synchronized block from the FSDirectory.FSIndexInput class's readInternal() method Since, in 1.4.1 we have better alternatives as with NIOFSDirectory Read only index reader (by defauly used by SOLR1.4.1), so, we did not applied earlier changes with 1.4.1 version. Now when load testing with 1.4.1, my CPU usage goes as high as 100%. When i repeat the load test with my earlier setup(1.2.1) the CPU usage is below 50-55%. But the total throughput of the new(1.441) version is much higher than the older(1.2.1) I would need some help in minimizing the CPU load on the new system. Could possibly NIOFSDirectory attributes to high CPU? Is there a mechanism in 1.4.1 to use the SimpleFSDirectory implementation for searching(would this require full re-index)? Help will be appreciated :) Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/High-100-CPU-usage-with-SOLR-1-4-1-tp3068667p3068667.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: High 100% CPU usage with SOLR 1.4.1
Hi Yonik, Thanx for the prompt reply. This is a relief :) Just 1 more question. Wouldn't the 100% CPU load would affect the system, as system process would starve for the CPU? I tried the load test 1st with 4-cores and then with 8-cores, still the CPU usage was reaching 100% We have index of about 32GB with 100+ fields indexed,18 fields stored using an optimized index for search Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/High-100-CPU-usage-with-SOLR-1-4-1-tp3068667p3068778.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen
Try to look for snapshot.current file in the logs folder in ur SOLR-home dist in your slave server, if this shows the older snapshot. I also faced the similar issue(but with SOLR 1.2.1), using the collection-distribution scripts. The way i resolved it was: 1. Stopped the index replication script(s) index updation scripts. 2. cleaning the slave(s) status directory from the master (keep the status directory only delete its contents) 3. removed the snapshot.current file from salve's [SOLR-Home]/log folder 4. restart the snapshooter on master and snappuller on slave(s) Hope this helps Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3068903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: High 100% CPU usage with SOLR 1.4.1
Yes Erick, I did create an artificial load test with 30 users concurrently doing search (around 28000 samples of actual queries). With 1.4.1, the test completes within 3hrs without any failures (with SOLR1.2.1 it wouldn't match with this performance, i.e., in 3 hrs it could only do 9700 samples). My actual production load is much less than that(3hrs cycle is actually spans to 24 hrs on production). I will repeat this with actual load now. Thanx all 4 ur time :) Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/High-100-CPU-usage-with-SOLR-1-4-1-tp3068667p3070663.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do I make sure the resulting documents contain the query terms?
k0 -- A | C k1 -- A | B k2 -- A | B | C k3 -- B | C Now let q=k1, how do I make sure C doesn't appear as a result since it doesn't contain any occurence of k1? Do we bother to do that. Now that's what lucene does :) -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-make-sure-the-resulting-documents-contain-the-query-terms-tp3031637p3033451.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Applying synonyms increase the data size from MB to GBs
Since you r using expand=true , so, every time a matching synonym entry is found the analyzer is expanding the term with all synonyms set in the index. This may cause the index to grow in size. -- View this message in context: http://lucene.472066.n3.nabble.com/Applying-synonyms-increase-the-data-size-from-MB-to-GBs-tp3028700p3028877.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Feature: skipping caches and info about cache use
SOLR1.3+ logs only the fresh queries in the logs. If you re-run the same query then it is served from cache, and not printed on the logs(unless cache(s) are not warmed or sercher is reopened). So, Otis's proposal would definitely help in doing some benchmarks baselining the search :) -- View this message in context: http://lucene.472066.n3.nabble.com/Feature-skipping-caches-and-info-about-cache-use-tp3020325p3028894.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strategy -- Frequent updates in our application
You can use DataImportHandler for your full/incremental indexing. Now NRT indexing could vary as per business requirements (i mean delay cud be 5-mins ,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will be indexed incrementally. BTW, r u having Master+Slave SOLR setup? -- View this message in context: http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting
BTW, why r u sorting on this field? You could also index store this field twice. First, in its original value, and then second, by encoding to some unique code/hash and index it and sort on that. -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-tp3017285p3019055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strategy -- Frequent updates in our application
You can go ahead with the Master/Slave setup provided by SOLR. Its trivial to setup and you also get SOLR's operational scripts for index synch'ing b/w Master-to-Slave(s), OR the Java based replication feature. There is no need to re-invent other architecture :) -- View this message in context: http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019475.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Anyway to know changed documents?
If your index size if smaller (a few 100 MBs), you can consider the SOLR's operational script tools provided with distribution to sync indexes from Master to Slave servers. It will update(copies) the latest index snapshot from Master to Slave(s). SOLR wiki provides good info on how to set them as Cron, so, no manual intervention is required. BTW, SOLR1.4+ ,also has feature where only the changed segment gets synched(but then index need not be optimized) -- View this message in context: http://lucene.472066.n3.nabble.com/Anyway-to-know-changed-documents-tp3009527p3010015.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query problem in Solr
We're using Solr to search on a Shop index and a Product index Do you have 2 separate indexes (using distributed shard search)?? I'm sure you are actually having only single index. Currently a Shop has a field `shop_keyword` which also contains the keywords of the products assigned to it. You mean, for a shop, you are first concatenating all keywords of all products and then saving in shop_keywords field for the shop?? In this case there is no way u can identify which keyword occurs in which product in ur index. You might need to change the index structure, may be, when u post documents, then post a single document for a single product(with fields like title,price,shop-id, etc), instead of single document for a single shop. Hope I make myself clear -- View this message in context: http://lucene.472066.n3.nabble.com/Query-problem-in-Solr-tp3009812p3010072.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re: Anyway to know changed documents?
SOLR wiki will provide help on this. You might be interested in pure Java based replication too. I'm not sure,whether SOLR operational will have this feature(synch'ing only changed segments). You might need to change configuration in searchconfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Anyway-to-know-changed-documents-tp3009527p3010085.html Sent from the Solr - User mailing list archive at Nabble.com.
Can we stream binary data with StreamingUpdateSolrServer ?
Hi, I'm using StreamingUpdateSolrServer to post a batch of content to SOLR1.4.1. By looking at StreamingUpdateSolrServer code, it looks it only provides the content to be streamed in XML format only. Can we use it to stream data in binary format? -- View this message in context: http://lucene.472066.n3.nabble.com/Can-we-stream-binary-data-with-StreamingUpdateSolrServer-tp3001813p3001813.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: UniqueKey field in schema.xml
Create a new unique field for this purpose, like, myUniqueField, then, just combine (product-id+cust-id) and post it to this new field. -- View this message in context: http://lucene.472066.n3.nabble.com/UniqueKey-field-in-schema-xml-tp2987807p2988098.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is omitNorms
omitNorms=true on a field will have following effect: 1. length normalization will not work on the specific field-- Which means matching documents with shorter length will not be preferred/boost over matching documents with greater length for the specific field, at search time. 2. Index time boosting will not be available on the field. If, both the above cases are not required by you, then, you can set omitNorms=true for the specific fields. This has an added advantage, it will save you some(or a lot of) RAM also, since, with omitNorms=false on total N fields in the index will require RAM of size: Total docs in index * 1 byte * N -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-omitNorms-tp2987547p2988124.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FieldCache
This is because you may be having only 10 unique terms in your indexed Field. BTW, what do you mean by controlling the FieldCache? -- View this message in context: http://lucene.472066.n3.nabble.com/FieldCache-tp2987541p2988142.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is document tag in data-config.xml of Solr
document tag represents to the actual SOLR document that will be posted by the DIH. This mapping is used by the DIH to map DB-to-index document. You can have multiple entity tags, as you might be pulling data from more than 1 table. You can only have one document tag in you db-data-config.xml (remember, the purpose of db-data-config.xml is to map db-structure TO index-structure semantics) -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-document-tag-in-data-config-xml-of-Solr-tp2978668p2988176.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too many Boolean Clause and Filter Query
I'm sure you can fix this by increasing maxBooleanClauses value to some max. This shld apply to filter query as well -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-Boolean-Clause-and-Filter-Query-tp2974848p2988190.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Out of memory on sorting
For saving Memory: 1. allocate as much memory to the JVM (especially if you are using 64bit OS) 2. You can set omitNorms=true for your date id fields (actually for all fields where index-time boosting length normalization isn't required. This will require a full reindex) 3. Are you sorting on all document available in index. Try to limit it using filter queries. 4. Avoid match all docs query like, q=*:* (if you are using this) 5. If you could do away with sorting on ID field, and sort on field with lesser unique terms Hope this helps -- View this message in context: http://lucene.472066.n3.nabble.com/Out-of-memory-on-sorting-tp2960578p2988336.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to integrate solr with spring framework
Just read through: http://www.springbyexample.org/examples/solr-client.html http://static.springsource.org/spring-roo/reference/html/base-solr.html -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-integrate-solr-with-spring-framework-tp2955540p2988363.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Huge performance drop in distributed search w/ shards on the same server/container
Do you really require multi-shards? Single core/shard will do for even millions of documents and the search will be faster than searching on multi-shards. Consider multi-shard when you cannot scale-up on a single shard/machine(e.g, CPU,RAM etc. becomes major block). Also read through the SOLR distributed search wiki to check on all tuning up required at application server(Tomcat) end, like maxHTTP request settings. For a single request in a multi-shard setup internal HTTP requests are made through all queried shards, so, make sure you set this parameter higher. -- View this message in context: http://lucene.472066.n3.nabble.com/Huge-performance-drop-in-distributed-search-w-shards-on-the-same-server-container-tp2938421p2988464.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How does Solr's MoreLikeThis component internally work to get results?
This will help: http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ -- View this message in context: http://lucene.472066.n3.nabble.com/How-does-Solr-s-MoreLikeThis-component-internally-work-to-get-results-tp2938407p2988487.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is omitNorms
What would be the default value for omitNorms? --- Default value is false Is general advise to ignore this and set the value explicitly? --- Depends on your requirement. Do this on field-per-field basis. Set to false on fields where you want the norms, or, set to true on fields where you want to omit the norms -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-omitNorms-tp2987547p2988714.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: FieldCache
Since FieldCache is an expert level API in lucene, there is no direct control provided by SOLR/Lucene to control its size. -- View this message in context: http://lucene.472066.n3.nabble.com/FieldCache-tp2987541p2989443.html Sent from the Solr - User mailing list archive at Nabble.com.