Re: How can i indexing MS-Outlook files?
http://www.aduna-software.com/technologies/aperture/overview.view this component Aperture worked for me.. Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001 On Tue, Dec 23, 2008 at 7:42 PM, Norberto Meijome numard...@gmail.com wrote: On Sun, 14 Dec 2008 19:22:00 -0800 (PST) Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Perhaps an easier alternative is to index not the MS-Outlook files themselves, but email messages pulled from the IMAP or POP servers, if that's where the original emails live. PST files ('outlook files') are local to the end user and quite possibly their contents aren't available in the server anymore. Another alternative could be to access, from Exchange's file system itself, the files that represent each object... I don't know whether this is still possible in Exchange 2007, or whether it is 'sanctioned' by MS... Possibly some kind of object interface with exchange itself would be most desirable _ {Beto|Norberto|Numard} Meijome FAST, CHEAP, SECURE: Pick Any TWO I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Re: [ANNOUNCE] Solr Logo Contest Results
looks cool :), how about a talking mascot as Jeryl Cook twoenc...@gmail.com On Thu, Dec 18, 2008 at 1:38 PM, Mathijs Homminga mathijs.hommi...@knowlogy.nl wrote: Good choice! Mathijs Homminga Chris Hostetter wrote: (replies to solr-user please) On behalf of the Solr Committers, I'm happy to announce that we the Solr Logo Contest is officially concluded. (Woot!) And the Winner Is... https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg ...by Michiel We ran into a few hiccups during the contest making it take longer then intended, but the result was a thorough process in which everyone went above and beyond to ensure that the final choice best reflected the wishes of the community. You can expect to see the new logo appear on the site (and in the Solr app) in the next few weeks. Congrats Michiel! -Hoss -- Knowlogy Helperpark 290 C 9723 ZA Groningen +31 (0)50 2103567 http://www.knowlogy.nl mathijs.hommi...@knowlogy.nl +31 (0)6 15312977 -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Re: Solr on Solaris
your out of memory :). each instance of an application server you can technically only allocate like 1024mb to the JVM, to take advantage of the memory you need to run multiple instances of the application server. are you using RAMDirectory with SOLR? On Thu, Dec 4, 2008 at 10:40 PM, Kashyap, Raghu [EMAIL PROTECTED] wrote: We are running solr on a solaris box with 4 CPU's(8 cores) and 3GB Ram. When we try to index sometimes the HTTP Connection just hangs and the client which is posting documents to solr doesn't get any response back. We since then have added timeouts to our http requests from the clients. I then get this error. java.lang.OutOfMemoryError: requested 239848 bytes for Chunk::new. Out of swap space? java.lang.OutOfMemoryError: unable to create new native thread Exception in thread JmxRmiRegistryConnectionPoller java.lang.OutOfMemoryError: unable to create new native thread We are running JDK 1.6_10 on the solaris box. . The weird thing is we are running the same application on linux box with JDK 1.6 and we haven't seen any problem like this. Any suggestions? -Raghu -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Re: Mock solr server
are you trying to unit test something? I would simply make use of the Embedded SOLR component in your unit tests.. On 11/27/08, Robert Young [EMAIL PROTECTED] wrote: Hi, Does anyone know of an easy to use Mock solr server? Thanks Rob -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Re: EmbeddedSolrServer questions
i am using embeddedSolrServer and simply has a queue that documents are sent to ..and a listerner on that queue that writes it to the index.. or just keep it simple, and do a synchronization block around the method in the writeserver that writes the document to the index. Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001 On Tue, Nov 18, 2008 at 9:36 AM, Thierry Templier [EMAIL PROTECTED] wrote: Hello, I have some questions regarding the use of the EmbeddedSolrServer in order to embed a solr instance into a Java application. 1°) Is an instance of the EmbeddedSolrServer class threadsafe when used by several concurent threads? 2°) Regarding to transactions, can an instance of the EmbeddedSolrServer class be used in order to make two transactions in the same time by two different threads? Thanks for your help, Thierry -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Max Number of Facets
is there a limit on the number of facets that i can create in Solr?(dynamically generated facets.) -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Re: Max Number of Facets
I understand what you mean..I am building a system that will dynammically generate facets which could possible be thousands , but at most about 6 or 7 facets will be returned using a facet ranking algorithm so I get what you mean if I request in my query that I want 1000 faets back compared to just 6 or 7 i could take a performance hit.. On 10/30/08, Ryan McKinley [EMAIL PROTECTED] wrote: the only 'limit' is the effect on your query times... you could have 1000+ facets if you are ok with the response time. Sorry to give the it depends answer, but it totally depends on your data and your needs. On Oct 30, 2008, at 7:28 AM, Jeryl Cook wrote: is there a limit on the number of facets that i can create in Solr?(dynamically generated facets.) -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001 -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Re: Max Number of Facets
wow ,30k in under 3 seconds On 10/30/08, Stephen Weiss [EMAIL PROTECTED] wrote: I've actually seen cases on our site where it's possible to bring up over 30,000 facets for one query. And they actually come up quickly - like, 3 seconds. It takes longer for the browser to render them. -- Steve On Oct 30, 2008, at 6:04 PM, Ryan McKinley wrote: the only 'limit' is the effect on your query times... you could have 1000+ facets if you are ok with the response time. Sorry to give the it depends answer, but it totally depends on your data and your needs. On Oct 30, 2008, at 7:28 AM, Jeryl Cook wrote: is there a limit on the number of facets that i can create in Solr?(dynamically generated facets.) -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001 -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Using filter to search in SOLR 1.3 with solrj
i can execute what i want simply with using lucene directly Hits hits = searcher.search(customScoreQuery, myQuery.getFilter()); howerver, i can't find the right Class , or method in the API to do this for SOLR the searcher I am using the SOLRServer(Embeded version) to execute the .query... QueryResponse queryResponse = SolrServer.query(customScoreQuery); //will work, BUT I NEED to use the filter as well... Thanks -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Re: Using filter to search in SOLR 1.3 with solrj
I don't have issues adding a filter query to a SolrQuery... i guess ill look at the source code, i just need to pass the a custom Filter object at runtime before i execute a search using the SolrServer.. currently this is all i can do the below with SOLR... SolrServer.query(customScoreQuery); i need a method that would accept this: searcher.search(customScoreQuery, myfilter ); , like i am able todo using lucene searcher. On Thu, Oct 2, 2008 at 1:43 PM, Ryan McKinley [EMAIL PROTECTED] wrote: what about: SolrQuery query = ...; query.addFilterQuery( type:xxx ); On Oct 2, 2008, at 1:23 PM, Jeryl Cook wrote: i can execute what i want simply with using lucene directly Hits hits = searcher.search(customScoreQuery, myQuery.getFilter()); howerver, i can't find the right Class , or method in the API to do this for SOLR the searcher I am using the SOLRServer(Embeded version) to execute the .query... QueryResponse queryResponse = SolrServer.query(customScoreQuery); //will work, BUT I NEED to use the filter as well... Thanks -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001 -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Re: Using filter to search in SOLR 1.3 with solrj
i see, ..would be nice to build component within the code.. programmatically...rather than as a component to add to the configuration file..but i will read the docs on how to do this. thanks On Thu, Oct 2, 2008 at 2:37 PM, Ryan McKinley [EMAIL PROTECTED] wrote: On Oct 2, 2008, at 2:24 PM, Jeryl Cook wrote: I don't have issues adding a filter query to a SolrQuery... i guess ill look at the source code, i just need to pass the a custom Filter object at runtime before i execute a search using the SolrServer.. currently this is all i can do the below with SOLR... SolrServer.query(customScoreQuery); i need a method that would accept this: searcher.search(customScoreQuery, myfilter ); , like i am able todo using lucene searcher. aaah -- that lands you in custom plugin territory... perhaps look at building a QueryComponent ryan -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Re: What's the bottleneck?
I think you should justs break up your index across boxes and do a federated search across them... since you mentioned you have a single machine.. Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001 On Thu, Sep 11, 2008 at 3:58 PM, Jason Rennie [EMAIL PROTECTED] wrote: On Thu, Sep 11, 2008 at 1:29 PM, [EMAIL PROTECTED] wrote: what is your index configuration??? Not sure what you mean. We're using 1.2, though we've tested with a recent nightly and didn't see a significant change in performance... What is your average size form the returned fields ??? Returned fields are relatively small, ~200 characters total per document. We're requesting the top 10 or so docs. How much memory have your System ?? 8g. We give the jvm a 2g (max) heap. We have another solr running on the same box also w/ 2g heap. The Linux kernel caches ~2.5g of disk. Do you have long fieds who is returned in the queries ? No. The searched and returned fields are relatively short. One searched-over (but not returned) field can get up to a few hundred characters, but it's safe to assume they're all 1k. Do you have actívate the Highlighting in the request ? Nope. Are you using multi-value filed for filter ... No, it does not have the multiValue attribute turned on. The qf field is just an integer. Any thoughts/comments are appreciated. Thanks, Jason -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done. --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Re: Update schema.xml without restarting Solr?
Top often requested feature: 1. Make the option on using the RAMDirectory to hook in Terracotta( billion(s) of items in an index anyone?..it would be possible using this.) 2. Make the schema.xml configurable at runtime, not really sure the best way to address this, because changing the schema would require re-indexing the documents. Terracotta: http://www.terracotta.org/ On Tue, Mar 25, 2008 at 11:27 AM, [EMAIL PROTECTED] wrote: Hi, The wiki for Solr talks about the schema.xml, and it seems that changes in this file requires a restart of Solr before they have effect. In the wiki it says: How can I rebuild my index from scratch if I change my schema? The most efficient/complete way is to... 1. Stop your application server 2. Change your schema.xml file 3. Delete the index directory in your data directory 4. Start your application server (Solr will detect that there is no existing index and make a new one) 5. Re-Index your data If the permission scheme of your server does not allow you to manually delete the index directory an alternate technique is... 1. Stop your application server 2. Change your schema.xml file 3. Start your application server 4. Use the match all docs query in a delete by query command: deletequery*:*/query/delete 5. Send an optimize/ command. 6. Re-Index your data Is this really the case? I find that quite strange that you need to restart solr for a change in the schema.xml. The way we plan to use Solr together with a Content Management System is that the authors/editors can create new article/document types when needed, without any need to restart anything. The CMS itself has full support for this. But we need Solr to also support this. Is that possible? Like a simple realoadSchemaXml/ command, maybe, that would trigger Solr to re-read it's schema.xml file. If this is not possible to do, is it really necessary to restart the entire application server for a change in schema.xml to have effect? Or only the solr webapp? Regards /Jimi -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986)
Re: Update schema.xml without restarting Solr?
i wouldn't call Terracotta approach magic(smile)..., it's being used quite a bit in many scalable high performing projects... i personally used Terracotta and Lucene, and it worked but did not try to cluster it with multiple terracotta(workers) across nodes , and the Terracotta(master)..just a single box with two tomcat instances... However talk is cheap, if I have the time over the next few weeks ill make a bench mark test based on the Terracotta and Lucene, with maybe 3 nodes?and a 1 million documents.. maybe some others can do the same :).. FYI: http://www.terracotta.org/confluence/display/tcforge/Proposal+-+Terracotta+for+Lucene Jeryl Cook On Wed, Mar 26, 2008 at 5:16 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Wed, Mar 26, 2008 at 4:41 PM, Ryan McKinley [EMAIL PROTECTED] wrote: just intuition - haven't tried it, so i'd love to be proved wrong. Instrumenting Objects and magically passing them around seems like it would be slower then a tuned approach used in SOLR-303. Yep, that's my sense too. No magic solutions when it comes to scalability. -Yonik -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986)
Re: RAM Based Index for Solr
there currently is no way to use RAMDirectory instead of FSDirectory yet in SOLR, however there is a feature request to implement this. I personally think this will be great because we could use Terracotta to handle the clustering. Jeryl Cook On Thu, Mar 20, 2008 at 1:07 AM, Norberto Meijome [EMAIL PROTECTED] wrote: On Wed, 19 Mar 2008 17:04:34 -0700 (PDT) swarag [EMAIL PROTECTED] wrote: In Lucene there is a Ram Based Index org.apache.lucene.store.RAMDirectory. Is there a way to setup my index in solr to use a RAMDirectory? create a mountpoint on a ramdrive (tmpfs in linux, i think), and put your index in there... ? or does lucene do anything other than that? B _ {Beto|Norberto|Numard} Meijome Unix is very simple, but it takes a genius to understand the simplicity. Dennis Ritchie I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986)
DynamicField and FacetFields..
Question: I need to dynamically data to SOLR, so I do not have a predefined list of field names... so i use the dynamicField option in the schma and match approioate datatype.. in my schema.xml field name=id type=string indexed=true stored=true required=true / dynamicField name=*_s type=string indexed=true stored=true/ Then programatically my code ... document.addField( dynamicFieldName + _s, dynamicFieldValue, 10 ); facetFieldNames.put( dynamicFieldName + _s,null);//TODO:use copyField.. server.add( document,true ); server.commit(); when i attempt to graph results, i want to display SolrQuery query = new SolrQuery(); query.setQuery( *:* ); query.setFacetLimit(10);//TODO: Iterator facetsIt = facetFieldNames.entrySet().iterator(); while(facetsIt.hasNext()){ EntryString,Stringentry = (Entry)facetsIt.next(); String facetName = (String)entry.getKey(); query.addFacetField(facetName); } QueryResponse rsp; rsp = server.query( query ); ListFacetField facetFieldList = rsp.getFacetFields(); assertNotNull(facetFieldList); my facetFieldList is null, of course if i addFacetField if id it works..because i define it in the schema.xml is this just a something that is not implemented? or am i missing something... Thanks. Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Fri, 30 Nov 2007 21:23:59 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Solr Highlighting, word index It's good you already have the data because if you somehow got it from some sort of calculations I'd have to tell my product manager that the feature he wanted that I told him couldn't be done with our data was possible after all G... About page breaks: Another approach to paging is to index a special page token with an increment of 0 from the last word of the page. Say you have the following: last ctrl-l first. Then index last, $$$ at an increment of 0 then first. You can then quite quickly calculate the pages by using termdocs/termenum on your special token and count. Which approach you use depends upon whether you want span and/or phrase queries to match across page boundaries. If you use an increment as Mike suggests, matching last first~3 won't work. It just depends upon whether how you want to match across the page break. Best Erick On Nov 30, 2007 4:37 PM, Mike Klaas [EMAIL PROTECTED] wrote: On 30-Nov-07, at 1:02 PM, Owens, Martin wrote: Hello everyone, We're working to replace the old Linux version of dtSearch with Lucene/Solr, using the http requests for our perl side and java for the indexing. The functionality that is causing the most problems is the highlighting since we're not storing the text in solr (only indexing) and we need to highlight an image file (ocr) so what we really need is to request from solr the word indexes of the matches, we then tie this up to the ocr image and create html boxes to do the highlighting. This isn't possible with Solr out-of-the-box. Also, the usual methods for highlighting won't work because Solr typically re- analyzes the raw text to find the appropriate highlighting points. However, it shouldn't be too hard to come up with a custom solution. You can tell lucene to store token offsets using TermVectors (configurable via schema.xml). Then you can customize the request handler to return the token offsets (and/or positions) by retrieving the TVs. The text is also multi page, each page is seperated by Ctrl-L page breaks, should we handle the paging out selves or can Solr tell use which page the match happened on too? Again, not automatically. However, if you wrote an analyzer that bumped up the position increment of tokens every time a new page was found (to, say the next multiple of 1000), then you infer the matching page by the token position. cheers, -Mike
RE: DynamicField and FacetFields..
fixed, i had a typo...may want to delete my post( i want to :P .) Jeryl Cook From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: DynamicField and FacetFields.. Date: Sat, 1 Dec 2007 14:21:12 -0500 Question: I need to dynamically data to SOLR, so I do not have a predefined list of field names... so i use the dynamicField option in the schma and match approioate datatype.. in my schema.xml field name=id type=string indexed=true stored=true required=true / dynamicField name=*_s type=string indexed=true stored=true/ Then programatically my code ... document.addField( dynamicFieldName + _s, dynamicFieldValue, 10 ); facetFieldNames.put( dynamicFieldName + _s,null);//TODO:use copyField.. server.add( document,true ); server.commit(); when i attempt to graph results, i want to display SolrQuery query = new SolrQuery(); query.setQuery( *:* ); query.setFacetLimit(10);//TODO: Iterator facetsIt = facetFieldNames.entrySet().iterator(); while(facetsIt.hasNext()){ EntryString,Stringentry = (Entry)facetsIt.next(); String facetName = (String)entry.getKey(); query.addFacetField(facetName); } QueryResponse rsp; rsp = server.query( query ); ListFacetField facetFieldList = rsp.getFacetFields(); assertNotNull(facetFieldList); my facetFieldList is null, of course if i addFacetField if id it works..because i define it in the schema.xml is this just a something that is not implemented? or am i missing something... Thanks. Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Fri, 30 Nov 2007 21:23:59 -0500 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Solr Highlighting, word index It's good you already have the data because if you somehow got it from some sort of calculations I'd have to tell my product manager that the feature he wanted that I told him couldn't be done with our data was possible after all G... About page breaks: Another approach to paging is to index a special page token with an increment of 0 from the last word of the page. Say you have the following: last ctrl-l first. Then index last, $$$ at an increment of 0 then first. You can then quite quickly calculate the pages by using termdocs/termenum on your special token and count. Which approach you use depends upon whether you want span and/or phrase queries to match across page boundaries. If you use an increment as Mike suggests, matching last first~3 won't work. It just depends upon whether how you want to match across the page break. Best Erick On Nov 30, 2007 4:37 PM, Mike Klaas [EMAIL PROTECTED] wrote: On 30-Nov-07, at 1:02 PM, Owens, Martin wrote: Hello everyone, We're working to replace the old Linux version of dtSearch with Lucene/Solr, using the http requests for our perl side and java for the indexing. The functionality that is causing the most problems is the highlighting since we're not storing the text in solr (only indexing) and we need to highlight an image file (ocr) so what we really need is to request from solr the word indexes of the matches, we then tie this up to the ocr image and create html boxes to do the highlighting. This isn't possible with Solr out-of-the-box. Also, the usual methods for highlighting won't work because Solr typically re- analyzes the raw text to find the appropriate highlighting points. However, it shouldn't be too hard to come up with a custom solution. You can tell lucene to store token offsets using TermVectors (configurable via schema.xml). Then you can customize the request handler to return the token offsets (and/or positions) by retrieving the TVs. The text is also multi page, each page is seperated by Ctrl-L page breaks, should we handle the paging out selves or can Solr tell use which page the match happened on too? Again, not automatically. However, if you wrote an analyzer that bumped up the position increment of tokens every time a new page was found (to, say the next multiple of 1000), then you infer the matching page by the token position. cheers, -Mike
unsubscribe
Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) From: [EMAIL PROTECTED] Subject: Re: start.jar -Djetty.port= not working Date: Wed, 7 Nov 2007 10:13:22 -0500 To: solr-user@lucene.apache.org On Nov 7, 2007, at 10:07 AM, Mike Davies wrote: I'm using 1.2, downloaded from http://apache.rediris.es/lucene/solr/ Where can i get the trunk version? svn, or http://people.apache.org/builds/lucene/solr/nightly/
RE: Any tips for indexing large amounts of data?
Usability consideration, Not really answering your question, but i must comment using searching on items up to 100k makes faceted navigation very effective..but becomes least effective past 100k..u may want to consider breaking up the 500k documents in categories(typical breadcrumb) to 100k to faceted browse. Jeryl Cook To: solr-user@lucene.apache.org From: [EMAIL PROTECTED] Subject: Any tips for indexing large amounts of data? Date: Wed, 31 Oct 2007 10:30:50 -0400 Hi, I am creating an index of approx 500K documents. I wrote an indexing program using embeded solr: http://wiki.apache.org/solr/EmbeddedSolr and am seeing probably a 10 fold increase in indexing speeds. My problem is though, that if I try to reindex say 20K docs at a time it slows down considerably. I currently batch my updates in lots of 100 and between batches I close and reopen the connection to solr like so: private void openConnection(String environment) throws ParserConfigurationException, IOException, SAXException { System.setProperty(solr.solr.home, SOLR_HOME); solrConfig = new SolrConfig(solrconfig.xml); solrCore = new SolrCore(SOLR_HOME + data/ + environment, solrConfig, new IndexSchema(solrConfig, schema.xml)); logger.debug(Opened solr connection); } private void closeConnection() { solrCore.close(); solrCore = null; logger.debug(Closed solr connection); } Does anyone have any pointers or see anything obvious I'm doing wrong? Thanks PS Sorry if this is posted twice.
RE: RAMDirectory
not yet implemented ,hopefully soon : http://jira.terracotta.org/jira/browse/CDV-399 Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Sat, 22 Sep 2007 15:33:58 -0400 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RAMDirectory HI, Does any know how to use RAM disk for index? Thanks, Jae Jo,
RE: Solr and terracotta
had no problems with Terracotta, I got a good handle on the product.. Maybe you all at Terracotta could lead the implementation to patch SOLR to allow it to use the RAMDirectory ( a setter) so terracotta can hook into the RAMDirectory... the way Terracotta handles clustering , Those of you who are not familiar with Terracotta, it clusters the JVM, and uses a master server to help all the child servers to stay synced., this approach will allow SOLR to be clustered very easily(indexing 1 node, will index all nodes), not to mentioned the performance boost indexing,and perhaps searching.Also it uses virtual memory , so the amount of documents stored in the RAMDirectory is only limited to space. Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ Date: Wed, 22 Aug 2007 14:46:19 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Solr and terracotta Jeryl, I remember you asking about how to hook in the RAMDirectory a while back. It seemed like there was maybe some support within Solr that you needed. I assume you're suggesting adding an issue in the Solr JIRA, right? Is there something that the Terracotta team can do to help? Cheers, Orion Jeryl Cook wrote: tried it, didn't work that well...so I ended up making my own little faceted Search engine directly using RAMDirectory and clustering it via Terracotta...not as good as SOLR(smile), but it worked. i actually posted some questions awhile back in trying to get it to work. so terracotta can hook the RAMDirectory, maybe be good to submit this in JIRA for terrocotta support! Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Wed, 22 Aug 2007 16:18:24 -0300 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Solr and terracotta Recently I ran into this topic. I googled it a little and didn't find much information. It would be great to have solr working with RAMDirectory and Terracotta. We could stop using crons for rsync, right? Has anyone tried that out? -- View this message in context: http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537 Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr and terracotta
tried it, didn't work that well...so I ended up making my own little faceted Search engine directly using RAMDirectory and clustering it via Terracotta...not as good as SOLR(smile), but it worked. i actually posted some questions awhile back in trying to get it to work. so terracotta can hook the RAMDirectory, maybe be good to submit this in JIRA for terrocotta support! Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Wed, 22 Aug 2007 16:18:24 -0300 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Solr and terracotta Recently I ran into this topic. I googled it a little and didn't find much information. It would be great to have solr working with RAMDirectory and Terracotta. We could stop using crons for rsync, right? Has anyone tried that out?
RE: RAMDirecotory instead of FSDirectory for SOLR
Thats the thing,Terracotta persists everything it has in memory to the disk when it overflows(u can set how much u want to use in memory), or when the server goes offline. When the server comes back the master terracotta simply loads it back into the memory of the once offline worker..identical to the approach SOLR already does to handle scalability, this allows unlimited storage of the items in memory, ... you just need to cluster the RAMDirectory according to the sample giving by TerracottaHowever i read some of the post here...I read some say: i wonder how performance will be.,etci was trying to get it working..andload test the hell out it, and see how it acts with large amounts of data, and how it ompares with SOLR using typical FSDirectory approach.i plan to post findings..Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Thu, 31 May 2007 13:51:53 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: RAMDirecotory instead of FSDirectory for SOLR : board, looks like i can achieve this with the embedded version of SOLR : uses the lucene RAMDirectory to store the index..Jeryl Cook yeah ... adding asolrconfig.xml option for using a RAMDirectory would be possible ... but almost meaningless for most people (the directory would go away when the server shuts down) ... even for use cases like what you describe (hooking in terrecota) it wouldn't be enough in itself, because there would be no hook to give terracota access to it. -Hoss
RE: RAMDirecotory instead of FSDirectory for SOLR
i have Terracotta to work with Lucene , and it works find with the RAMDirectory...i am trying to get it to work with SOLR(Hook the RAMDirectory..)..., when i do, ill post the findings,problems,etc..Thanks for feedback from everyone.Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Thu, 31 May 2007 18:24:26 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: RAMDirecotory instead of FSDirectory for SOLR Jeryl, If you need any help getting Terracotta to work under Lucene or if you have any questions about performance tuning and/or load testing, you can also use the Terracotta community resources (mailing lists, forums, IRC, whatnot): http://www.terracotta.org/confluence/display/orgsite/Community. We'd be more than happy to help you get this stuff working. Cheers, Orion Jeryl Cook wrote:Thats the thing,Terracotta persists everything it has in memory to the disk when it overflows(u can set how much u want to use in memory), or when the server goes offline. When the server comes back the master terracotta simply loads it back into the memory of the once offline worker..identical to the approach SOLR already does to handle scalability, this allows unlimited storage of the items in memory, ... you just need to cluster the RAMDirectory according to the sample giving by TerracottaHowever i read some of the post here...I read some say: i wonder how performance will be.,etci was trying to get it working..andload test the hell out it, and see how it acts with large amounts of data, and how it ompares with SOLR using typical FSDirectory approach.i plan to post findings..Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size..-Prince(1986) Date: Thu, 31 May 2007 13:51:53 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: RAMDirecotory instead of FSDirectory for SOLR : board, looks like i can achieve this with the embedded version of SOLR : uses the lucene RAMDirectory to store the index..Jeryl Cook yeah ... adding asolrconfig.xml option for using a RAMDirectory would be possible ... but almost meaningless for most people (the directory would go away when the server shuts down) ... even for use cases like what you describe (hooking in terrecota) it wouldn't be enough in itself, because there would be no hook to give terracota access to it. -Hoss -- View this message in context: http://www.nabble.com/RAMDirecotory-instead-of-FSDirectory-for-SOLR-tf3843377.html#a10905062 Sent from the Solr - User mailing list archive at Nabble.com.
RAMDirecotory instead of FSDirectory for SOLR
Is it possible to simply change configuration to use RAMDirectory , instead of the FSDirectory..if not it would be great to have this as possible option int he configuration fileThe Master/Worker pattern used for handling scalability works(outlined in SOLR manual/wiki).its a proven pattern..however Terracotta , http://terracottatech.com/ is able to cluster the RAMDirectory(items that cannot fit in memory are managed written to disk.) ..i would love to take advantageof this approach...can you tell me if it is possible to switch out??Thanks.Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986)