Re: Mulitple facet - fq
Thnx guys. -- Yavuz Selim YILMAZ 2010/10/20 Tim Gilbert tim.gilb...@morningstar.com Sorry, what Pradeep said, not Prasad. My apologies Pradeep. -Original Message- From: Tim Gilbert Sent: Wednesday, October 20, 2010 12:18 PM To: 'solr-user@lucene.apache.org' Subject: RE: Mulitple facet - fq As Prasad said: fq=(category:corporate category:personal) But you might want to check your schema.xml to see what you have here: !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=AND / You can always specify your operator in your search between your facets. fq=(category:corporate AND category:personal) or fq=(category:corporate OR category:personal) I have an application where I am using searches on 10 more facets with AND OR + and - options and it works flawlessly. fq=(+category:corporate AND -category:personal) meaning category is corporate and not personal. Tim -Original Message- From: Pradeep Singh [mailto:pksing...@gmail.com] Sent: Wednesday, October 20, 2010 11:56 AM To: solr-user@lucene.apache.org Subject: Re: Mulitple facet - fq fq=(category:corporate category:personal) On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ yvzslmyilm...@gmail.com wrote: Under category facet, there are multiple selections, whicih can be personal,corporate or other How can I get both personal and corporate ones, I tried fq=category:corporatefq=category:personal It looks easy, but I can't find the solution. -- Yavuz Selim YILMAZ
Re: RAM increase
On Thu, Oct 21, 2010 at 10:46 AM, satya swaroop satya.yada...@gmail.com wrote: Hi all, I increased my RAM size to 8GB and i want 4GB of it to be used for solr itself. can anyone tell me the way to allocate the RAM for the solr. [...] You will need to set up the allocation of RAM for Java, via the the -Xmx and -Xms variables. If you are using something like Tomcat, that would be done in the Tomcat configuration file. E.g., this option can be added inside /etc/init.d/tomcat6 on new Debian/Ubuntu systems. Regards, Gora
Re:why sorl is slower than lucene so much?
I found the problem's cause.It's the DocSetCollector. my fitler query result's size is about 300,so the DocSetCollector.getDocSet() is OpenBitSet. And 300 OpenBitSet.fastSet(doc) op is too slow. So I used SolrIndexSearcher's TopFieldDocs search(Query query, Filter filter, int n, Sort sort), and it's normal. At 2010-10-20 19:21:27,kafka0102 kafka0...@163.com wrote: For solr's SolrIndexSearcher.search(QueryResult qr, QueryCommand cmd), I find it's too slowly.my index's size is about 500M, and record's num is 3984274.my query is like q=xxfq=fid:1fq=atm:[int_time1 TO int_time2]. fid's type is fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ atm's type is fieldType name=sint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/. for the test, I closed solr's cache's config and used another lucene's code like bottom: private void test2(final ResponseBuilder rb) { try { final SolrQueryRequest req = rb.req; final SolrIndexSearcher searcher = req.getSearcher(); final SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand(); final ExecuteTimeStatics timeStatics = ExecuteTimeStatics.getExecuteTimeStatics(); final ExecuteTimeUnit staticUnit = timeStatics.addExecuteTimeUnit(test2); staticUnit.start(); final ListQuery query = cmd.getFilterList(); final BooleanQuery booleanFilter = new BooleanQuery(); for (final Query q : query) { booleanFilter.add(new BooleanClause(q,Occur.MUST)); } booleanFilter.add(new BooleanClause(cmd.getQuery(),Occur.MUST)); logger.info(q:+query); final Sort sort = cmd.getSort(); final TopFieldDocs docs = searcher.search(booleanFilter,null,20,sort); final StringBuilder sbBuilder = new StringBuilder(); for (final ScoreDoc doc :docs.scoreDocs) { sbBuilder.append(doc.doc+,); } logger.info(hits:+docs.totalHits+,result:+sbBuilder.toString()); staticUnit.end(); } catch (final Exception e) { throw new RuntimeException(e); } } for the test, I first called above's code and then solr's search(...). The result is : lucence's about 20ms and solr's about 70ms. I'm so confused. And,I wrote another code using filter like bottom,but the range query's result num is not correct. Can anybody knows the reasons? private void test1(final ResponseBuilder rb) { try { final SolrQueryRequest req = rb.req; final SolrIndexSearcher searcher = req.getSearcher(); final SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand(); final ExecuteTimeStatics timeStatics = ExecuteTimeStatics.getExecuteTimeStatics(); final ExecuteTimeUnit staticUnit = timeStatics.addExecuteTimeUnit(test1); staticUnit.start(); final ListQuery query = cmd.getFilterList(); final BooleanFilter booleanFilter = new BooleanFilter(); for (final Query q : query) { setFilter(booleanFilter,q); } final Sort sort = cmd.getSort(); final TopFieldDocs docs = searcher.search(cmd.getQuery(),booleanFilter,20,sort); logger.info(hits:+docs.totalHits); staticUnit.end(); } catch (final Exception e) { throw new RuntimeException(e); } }
Using a custom repository to store solr index files
Hi everyone, I was looking at using the Embedded Solr server through SolrJ and I have a couple of concerns. I'd like to use a custom repository to store my index. Is there a way I can define this. Is there a data output interface I can implement for this purpose? Or can this be done in some way? Any feedback is appreciated. Thanks in advance. -- Regards, Tharindu
A bug in ComplexPhraseQuery ?
Hi, We have installed ComplexPhraseQuery and since that we can see strange behaviour in proximity search. We have the 2 following queries: (text:(protein digest~50)) (text:(digest protein~50)) Without ComplexPhraseQuery, both queries are returning 6 documents matching. With ComplexPhraseQuery, query 1 returns 4 documents and query 2 returns 5 documents! It seems that proximity search is broken. Is this a known problem ? Thanks for your help. Regards, J-Michel -- View this message in context: http://lucene.472066.n3.nabble.com/A-bug-in-ComplexPhraseQuery-tp1744659p1744659.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: RAM increase
You will also need to switch to a 64 bits JVM You might have to add the `-d64` flag as well as the `-Xms` and `-Xmx` - Original Message - From: Gora Mohanty g...@mimirtech.com To: solr-user@lucene.apache.org Sent: Thursday, October 21, 2010 2:34 AM Subject: Re: RAM increase On Thu, Oct 21, 2010 at 10:46 AM, satya swaroop satya.yada...@gmail.com wrote: Hi all, I increased my RAM size to 8GB and i want 4GB of it to be used for solr itself. can anyone tell me the way to allocate the RAM for the solr. [...] You will need to set up the allocation of RAM for Java, via the the -Xmx and -Xms variables. If you are using something like Tomcat, that would be done in the Tomcat configuration file. E.g., this option can be added inside /etc/init.d/tomcat6 on new Debian/Ubuntu systems. Regards, Gora
Re: Using a custom repository to store solr index files
On Thu, 21 Oct 2010 14:42 +0530, Tharindu Mathew mcclou...@gmail.com wrote: Hi everyone, I was looking at using the Embedded Solr server through SolrJ and I have a couple of concerns. I'd like to use a custom repository to store my index. Is there a way I can define this. Is there a data output interface I can implement for this purpose? Or can this be done in some way? Why do you want to do this? Solr embeds a lucene index, and Lucene has a Directory interface, that can be implemented differently (something other than the default FSDirectory implementation). Upayavira
MoreLikeThis explanation?
Hi, Does the latest Solr provide an explanation for results returned by MLT? I want to get the interesting terms for each result that overlap with the source document. This set of terms will vary from result to result possibly. Thanks! Darren
Re: Import From MYSQL database
You need to look into actual logs of the system. There you will see more details why import failed. check tomcat or jetty logs -- View this message in context: http://lucene.472066.n3.nabble.com/Import-From-MYSQL-database-tp1738753p1745246.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MoreLikeThis explanation?
(10/10/21 20:33), dar...@ontrenet.com wrote: Hi, Does the latest Solr provide an explanation for results returned by MLT? No, but there is an open issue: https://issues.apache.org/jira/browse/SOLR-860 Koji -- http://www.rondhuit.com/en/
FieldCache
Hi, does a field which should be cached needs to be indexed? I have a binary field which is just stored. Retrieving it via FieldCache.DEFAULT.getTerms returns empty ByteRefs. Then I found the following post: http://www.mail-archive.com/d...@lucene.apache.org/msg05403.html How can I use the FieldCache with a binary field? -- Kind regards, Mathias
Re: why sorl is slower than lucene so much?
2010/10/21 kafka0102 kafka0...@163.com: I found the problem's cause.It's the DocSetCollector. my fitler query result's size is about 300,so the DocSetCollector.getDocSet() is OpenBitSet. And 300 OpenBitSet.fastSet(doc) op is too slow. As I said in my other response to you, that's a perfect reason why you want Solr to cache that for you (unless the filter will be different each time). -Yonik http://www.lucidimagination.com
Re: MoreLikeThis explanation?
Thank you! On Thu, 2010-10-21 at 23:03 +0900, Koji Sekiguchi wrote: (10/10/21 20:33), dar...@ontrenet.com wrote: Hi, Does the latest Solr provide an explanation for results returned by MLT? No, but there is an open issue: https://issues.apache.org/jira/browse/SOLR-860 Koji
Re: RAM increase
Jean-Sebastien Vachon wrote: You will also need to switch to a 64 bits JVM You might have to add the `-d64` flag as well as the `-Xms` and `-Xmx` I've actually had no luck googling what's up with the -d64. Can you point me to any documentation on what effect it has, and on particular on what the boundary -Xmx size is that requires -d64? Jonathan
Re: RAM increase
Everything ovger ~3.7 3.7GB RAM (2^32, use your calculator) needs 64 bit addressing. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Thu, 10/21/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: RAM increase To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Thursday, October 21, 2010, 9:56 AM Jean-Sebastien Vachon wrote: You will also need to switch to a 64 bits JVM You might have to add the `-d64` flag as well as the `-Xms` and `-Xmx` I've actually had no luck googling what's up with the -d64. Can you point me to any documentation on what effect it has, and on particular on what the boundary -Xmx size is that requires -d64? Jonathan
DistributedSearchDesign and multiple requests
I'm using Solr 1.4. My observations and this page http://wiki.apache.org/solr/DistributedSearchDesign#line-254 indicate that the general strategy for Distributed Search is something like: 1. Query the shards with the user's query and fl=unique_field,score 2. Re-query (maybe a subset of) the shards for certain documents by unique_field with the field list the user requested. 3. Maybe re-query the shards again to flesh out faceting info. I'm encountering a significant performance penalty using DistributedSearch due to these additional queries, and it seems like there are some obvious optimizations that could avoid them in certain cases. For example, a way to say I claim the fields I'm requesting are small enough that querying again for stored fields is worse than just getting the stored fields in the first request. (assert_tiny_data=truefl=tiny_stored_field,unique_field) Or, If the field list of the original query is contained in the first round of shard requests, don't bother querying again for more fields. (fl=unique_field,score) Has anyone else looked into this? I'd be interested to learn if there are issues that makes these kind of shortcuts difficult before I dig in. Thanks, -Jeff Wartes
RE: RAM increase
Memory limits info: http://www.oracle.com/technetwork/java/hotspotfaq-138619.html#gc_heap_32bit -d64 usage info: http://stackoverflow.com/questions/1443677/what-impact-if-any-does-the-d64-swtich-have-on-sun-jvm-resident-memory-usage Steve -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Thursday, October 21, 2010 1:08 PM To: solr-user@lucene.apache.org Subject: Re: RAM increase Everything ovger ~3.7 3.7GB RAM (2^32, use your calculator) needs 64 bit addressing. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Thu, 10/21/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: RAM increase To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Thursday, October 21, 2010, 9:56 AM Jean-Sebastien Vachon wrote: You will also need to switch to a 64 bits JVM You might have to add the `-d64` flag as well as the `-Xms` and `-Xmx` I've actually had no luck googling what's up with the -d64. Can you point me to any documentation on what effect it has, and on particular on what the boundary -Xmx size is that requires -d64? Jonathan
how well does multicore scale?
I'm exploring the possibility of using cores as a solution to bookmark folders in my solr application. This would mean I'll need tens of thousands of cores... does this seem reasonable? I have plenty of CPUs available for scaling, but I wonder about the memory overhead of adding cores (aside from needing to fit the new index in memory). Thoughts? -mike
[solrmarc-tech] JVM XX:+UseCompressedOops
Is anyone using the newish JVM XX:+UseCompressedOops with Solr? Do you have reason to believe it's helpful? Is there any way it can be harmful? I am hoping it reduces my memory consumption somewhat. An old thread with someone asking the same question, but with no answers: http://osdir.com/ml/solr-user.lucene.apache.org/2009-07/msg00663.html -- You received this message because you are subscribed to the Google Groups solrmarc-tech group. To post to this group, send email to solrmarc-t...@googlegroups.com. To unsubscribe from this group, send email to solrmarc-tech+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en.
multiple cores, solr.xml and replication
Hi there, I noticed that the java-based replication does not make replication of multiple core automatic. For example, if I have a master with 7 cores, any slave I set up has to explicitly know about each of the 7 cores to be able to replicate them. This information is stored in solr.xml, and since this file is out of the conf/ directory, it's impossible to make the java-based replication copy this file over each slave. Is this by design? For those of you doing multicore replication, how do you handle it? Is overwriting solr.xml when persist=true is used thread-safe? What happens if I create 2 different cores at the same time? I ask because I have 7 cores total and I always end with only 2 or 3 cores in my solr.xml after doing a bulk delta-import across cores. didier
Re: how well does multicore scale?
No, it does not seem reasonable. Why do you think you need a seperate core for every user? mike anderson wrote: I'm exploring the possibility of using cores as a solution to bookmark folders in my solr application. This would mean I'll need tens of thousands of cores... does this seem reasonable? I have plenty of CPUs available for scaling, but I wonder about the memory overhead of adding cores (aside from needing to fit the new index in memory). Thoughts? -mike
Re: multiple cores, solr.xml and replication
On 10/21/2010 1:42 PM, didier deshommes wrote: I noticed that the java-based replication does not make replication of multiple core automatic. For example, if I have a master with 7 cores, any slave I set up has to explicitly know about each of the 7 cores to be able to replicate them. This information is stored in solr.xml, and since this file is out of the conf/ directory, it's impossible to make the java-based replication copy this file over each slave. Is this by design? For those of you doing multicore replication, how do you handle it? My slave replication handler looks like this, used for all cores. The solr.core.name parameter is dynamically replaced with the name of the current core: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlhttp://HOST:8983/solr/${solr.core.name}/replication/str str name=pollInterval00:00:15/str /lst /requestHandler Shawn
Re: multiple cores, solr.xml and replication
On Thu, Oct 21, 2010 at 3:00 PM, Shawn Heisey s...@elyograg.org wrote: On 10/21/2010 1:42 PM, didier deshommes wrote: I noticed that the java-based replication does not make replication of multiple core automatic. For example, if I have a master with 7 cores, any slave I set up has to explicitly know about each of the 7 cores to be able to replicate them. This information is stored in solr.xml, and since this file is out of the conf/ directory, it's impossible to make the java-based replication copy this file over each slave. Is this by design? For those of you doing multicore replication, how do you handle it? My slave replication handler looks like this, used for all cores. The solr.core.name parameter is dynamically replaced with the name of the current core: I use this configuration too but doesn't this assume that solr.xml is the same in master and slave? what happens when master creates a new core? didier requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlhttp://HOST:8983/solr/${solr.core.name}/replication/str str name=pollInterval00:00:15/str /lst /requestHandler Shawn
OutOfMemory and auto-commit
If I do _not_ have any auto-commit enabled, and add 500k documents and commit at end, no problem. If I instead set auto-commit maxDocs to 10 (pretty large number), and try to add 500k docs, with autocommits theoretically happening every 100k... I run into an OutOfMemory error. Can anyone think of any reasons that would cause this, and how to resolve it? All I can think of is that in the first case, my newSearcher and firstSearcher warming queries don't run until the 'document add' is completely done. In the second case, there are newSearcher and firstSearcher warming queries happening at the same time another process is continuing to stream 'add's to Solr. Although at a maxDocs of 10, I shouldn't (I think) get _overlapping_ warming queries, the warming queries should be done before the next commit. I think. But nonetheless, just the fact that warming queries are happening at the same time 'add's are continuing to stream, could that be enough to somehow increase memory usage enough to run into OOM?
Re: A bug in ComplexPhraseQuery ?
--- On Thu, 10/21/10, jmr jmpala...@free.fr wrote: From: jmr jmpala...@free.fr Subject: A bug in ComplexPhraseQuery ? To: solr-user@lucene.apache.org Date: Thursday, October 21, 2010, 12:53 PM Hi, We have installed ComplexPhraseQuery and since that we can see strange behaviour in proximity search. We have the 2 following queries: (text:(protein digest~50)) (text:(digest protein~50)) Without ComplexPhraseQuery, both queries are returning 6 documents matching. With ComplexPhraseQuery, query 1 returns 4 documents and query 2 returns 5 documents! It seems that proximity search is broken. Is this a known problem ? ComplexPhraseQuery is ordered phrase query where default Lucene's PhraseQuery is unordered. With ComplexPhrase order or terms are important.
Re: Multiple Similarity
Is it possible to define different Similarity classes for different fields? No. See http://search-lucene.com/m/g9cVf23EQO11/ We have a use case where we are interested in avoid term frequency (tf) when our fields are multiValued. May be omitTermFreqAndPositions=true?
Re: multiple cores, solr.xml and replication
On 10/21/2010 2:14 PM, didier deshommes wrote: I use this configuration too but doesn't this assume that solr.xml is the same in master and slave? what happens when master creates a new core? That's a very good question, one that I can't answer. I don't dynamically create new cores. If you create the same core on the slave and its configuration includes that replication config, my expectation (until proven otherwise) would be that it should work.
Re: different results depending on result format
quick follow-up: I also notice that the query from solrj gets version=1, whereas the admin webapp puts version=2.2 on the query string, although this param doesn't seem to change the xml results at all. Does this indicate an older version of solrj perhaps? -Mike On 10/21/2010 04:47 PM, Mike Sokolov wrote: I'm experiencing something really weird: I get different results depending on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml. I spent quite a while staring at query params to make sure everything else is the same, and they do seem to be. At first I thought the problem related to the javabin format change that has been talked about recently, but I am using solr 1.4.0 and solrj 1.4.0. Notice in the two entries that the wt param is different and the hits result count is different. Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select/ params={wt=xmlrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} hits=261 status=0 QTime=1 Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select params={wt=javabinrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} hits=57 status=0 QTime=0 The xml format results seem to be the correct ones. So one thought I had is that I could somehow fall back to using xml format in solrj, but I tried SolrQuery.set('wt','xml') and that didn't have the desired effect (I get 'wt=javabinwt=javabin' in the log - ie the param is repeated, but still javabin). Am I crazy? Is this a known issue? Thanks for any suggestions
Solr sorting problem
Hey guys, I have a list of people indexed in Solr. I am trying to sort by their first names but I keep getting results that are not alphabetically sorted (I see the names starting with W before the names starting with A). I have a feeling that the results are first being sorted by relevancy then sorted by first name. Is there a way I can get the results to be sorted alphabetically? Thanks, Moazzam
Re: Solr sorting problem
need additional information . Sorting is easy in Solr just by passing the sort parameter However, when it comes to text sorting it depends on how you analyse and tokenize your fields Sorting does not work on fields with multiple tokens. http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F On Thu, Oct 21, 2010 at 7:24 PM, Moazzam Khan moazz...@gmail.com wrote: Hey guys, I have a list of people indexed in Solr. I am trying to sort by their first names but I keep getting results that are not alphabetically sorted (I see the names starting with W before the names starting with A). I have a feeling that the results are first being sorted by relevancy then sorted by first name. Is there a way I can get the results to be sorted alphabetically? Thanks, Moazzam
Strange file name after installing solr
apache-solr-1.4.1Hello folks, I'm very new user to solr. Please help What I have in hand: 1) apache-solr-1.4.1; 2) Geronimo After installing solr.war using Geronimo administration GUI, I got a strange file, under the opt/dev/ofwi-geronimo2.1.6/repository/default/*solr/1287558884961/solr-1287558884961.war. *Is this alright or any thing abnormal? My Geronimo says that solr running status, but when start, I got an error java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/opt/dev... Thanks indeed for your time With regards, Bac Hoang
Re: OutOfMemory and auto-commit
Yes. Indexing activity suspends until the commit finishes, then starts. Having both queries and indexing on the same Solr will have this memory problem. Lance On Thu, Oct 21, 2010 at 1:16 PM, Jonathan Rochkind rochk...@jhu.edu wrote: If I do _not_ have any auto-commit enabled, and add 500k documents and commit at end, no problem. If I instead set auto-commit maxDocs to 10 (pretty large number), and try to add 500k docs, with autocommits theoretically happening every 100k... I run into an OutOfMemory error. Can anyone think of any reasons that would cause this, and how to resolve it? All I can think of is that in the first case, my newSearcher and firstSearcher warming queries don't run until the 'document add' is completely done. In the second case, there are newSearcher and firstSearcher warming queries happening at the same time another process is continuing to stream 'add's to Solr. Although at a maxDocs of 10, I shouldn't (I think) get _overlapping_ warming queries, the warming queries should be done before the next commit. I think. But nonetheless, just the fact that warming queries are happening at the same time 'add's are continuing to stream, could that be enough to somehow increase memory usage enough to run into OOM? -- Lance Norskog goks...@gmail.com
Re: how can i use solrj binary format for indexing?
Hi Gora, I really appreciate. Your reply was a great help to me. :) I hope everything is fine with you. Regards, Jason Gora Mohanty-3 wrote: On Mon, Oct 18, 2010 at 8:22 PM, Jason, Kim hialo...@gmail.com wrote: Sorry for the delay in replying. Was caught up in various things this week. Thank you for reply, Gora But I still have several questions. Did you use separate index? If so, you indexed 0.7 million Xml files per instance and merged it. Is it Right? Yes, that is correct. We sharded the data by user ID, so that each of the 25 cores held approximately 0.7 million out of the 3.5 million records. We could have used the sharded indices directly for search, but at least for now have decided to go with a single, merged index. Please let me know how to work multiple instances and cores in your case. [...] * Multi-core Solr setup is quite easy, via configuration in solr.xml: http://wiki.apache.org/solr/CoreAdmin . The configuration, i.e., schema, solrconfig.xml, etc. need to be replicated across the cores. * Decide which XML files you will post to which core, and do the POST with curl, as usual. You might need to write a little script to do this. * After indexing on the cores is done, make sure to do a commit on each. * Merge the sharded indexes (if desired) as described here: http://wiki.apache.org/solr/MergingSolrIndexes . One thing to watch out for here is disk space. When merging with Lucene IndexMergeTool, we found that a rough rule of thumb was that intermediate steps in the merge would require about twice as much space as the total size of the indexes to be merged. I.e., if one is merging 40GB of data in sharded indexes, one should have at least 120GB free. Regards, Gora -- View this message in context: http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1750669.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using a custom repository to store solr index files
Thanks for your answer Upayavira. Appreciate it. I want to do this because of a clustering requirement. When clustering takes place in the product I'm working on the custom repository we use replicates accordingly and makes data available to all nodes. But if this is available on the file system this does not happen. So according to your answer I'll get the source and take a look at the Directory interface. On Thu, Oct 21, 2010 at 4:53 PM, Upayavira u...@odoko.co.uk wrote: On Thu, 21 Oct 2010 14:42 +0530, Tharindu Mathew mcclou...@gmail.com wrote: Hi everyone, I was looking at using the Embedded Solr server through SolrJ and I have a couple of concerns. I'd like to use a custom repository to store my index. Is there a way I can define this. Is there a data output interface I can implement for this purpose? Or can this be done in some way? Why do you want to do this? Solr embeds a lucene index, and Lucene has a Directory interface, that can be implemented differently (something other than the default FSDirectory implementation). Upayavira -- Regards, Tharindu
Re: how well does multicore scale?
Hi Mike, I've also considered using a separate cores in a multi tenant application, ie a separate core for each tenant/domain. But the cores do not suit that purpose. If you check out documentation no real API support exists for this so it can be done dynamically through SolrJ. And all use cases I found, only had users configuring it statically and then using it. That was maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. So your better off using a single index and with a user id and use a query filter with the user id when fetching data. On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote: No, it does not seem reasonable. Why do you think you need a seperate core for every user? mike anderson wrote: I'm exploring the possibility of using cores as a solution to bookmark folders in my solr application. This would mean I'll need tens of thousands of cores... does this seem reasonable? I have plenty of CPUs available for scaling, but I wonder about the memory overhead of adding cores (aside from needing to fit the new index in memory). Thoughts? -mike -- Regards, Tharindu
Re: A bug in ComplexPhraseQuery ?
iorixxx wrote: ComplexPhraseQuery is ordered phrase query where default Lucene's PhraseQuery is unordered. With ComplexPhrase order or terms are important. Thanks for your answer. With this request: (text:(protein digest~50)) || (text:(digest protein~50)) I get my 6 documents. In my opinion, ordering term in a proximity search does not make sense! So the work around for us is to generate the opposite search every time a proximity operator is used. not very elegant! Anyway, thaks again for the answer, J-Michel -- View this message in context: http://lucene.472066.n3.nabble.com/A-bug-in-ComplexPhraseQuery-tp1744659p1750748.html Sent from the Solr - User mailing list archive at Nabble.com.