Re: Getting page number of result with tika
You can't assume that Fix Version/s 4.3 means anybody is actively working on it, and the age of the patches suggests nobody is. The Fix Version/s gets updated when releases are made, otherwise you'd have open JIRAs for, say, Solr 1.4.1. Near as I can tell, that JIRA is dead, don't look for it unless someone picks it up again. Best Erick On Thu, Apr 11, 2013 at 11:55 AM, Gian Maria Ricci alkamp...@nablasoft.com wrote: As far as I know SOLR-380 https://issues.apache.org/jira/browse/SOLR-380 deal with the problem of kowing page number with tika indexing. The issue contains a patch but it is really old, and I'm curious how is the status of this issue (since I see Fix Version/s 4.3, so it seems that it will be implemented in the next version). Anyone has a good workaround/patch/solution to search into tika indexed documents and having the list of pages where match was found? Thanks in advance. Gian Maria.
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
bq: disk space is three times True, I keep forgetting about compound since I never use it... On Wed, Apr 10, 2013 at 11:05 AM, Walter Underwood wun...@wunderwood.org wrote: Correct, except the worst case maximum for disk space is three times. --wunder On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote: You're mixing up disk and RAM requirements when you talk about having twice the disk size. Solr does _NOT_ require twice the index size of RAM to optimize, it requires twice the size on _DISK_. In terms of RAM requirements, you need to create an index, run realistic queries at the installation and measure. Best Erick On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote: On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn Furkan KAMACI furkankam...@gmail.com writes: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org 10 GB. Because when Solr shuffles the data around, it could use up to twice the size of the index in order to optimize the index on disk. -- Justin -- Walter Underwood wun...@wunderwood.org
Re: Use of SolrJettyTestBase
I don't see anything obvious, can you set a breakpoint in any other test and hit it? It's always worked for me if I set a breakpoint and execute in debug mode... Not much help, Erick On Thu, Apr 11, 2013 at 5:01 PM, Upayavira u...@odoko.co.uk wrote: On Tue, Apr 2, 2013, at 12:21 AM, Chris Hostetter wrote: : I've subclassed SolrJettyTestBase, and added a test method (annotated : with @test). However, my test method is never called. I see the You got an immediate failure from the tests setup, because you don'th ave assertions enabled in your JVM (the Lucene Solr test frameworks both require assertions enabled to run tests because so many important things can't be sanity checked w/o them)... : Test class requires enabled assertions, enable globally (-ea) or for : Solr/Lucene subpackages only: com.odoko.ArgPostActionTest FYI: in addition to that txt being written to System.err, it would have immediately been thrown as an Exception as well. (see TestRuleAssertionsRequired.java) So, I've finally found time to get past the enable assertions thingie. I've got that sorted. But my test still doesn't stop at breakpoints. I've got this: public class ArgPostActionTest extends SolrJettyTestBase { @BeforeClass public static void beforeTest() throws Exception { createJetty(ExternalPaths.EXAMPLE_HOME, null, null); } @Test public void testArgPostAction() throws SolrServerException { blah.blah.blah assertEquals(response.getResults().getNumFound(), 1); } } Neither of these methods get called when I execute the test. Any ideas what's up? Upayavira
Re: Not able to replicate the solr 3.5 indexes to solr 4.2 indexes
Please make a JIRA and attach as a patch if there aren't any JIRAs for this yet. Best Erick On Fri, Apr 12, 2013 at 1:58 AM, Montu v Boda montu.b...@highqsolutions.com wrote: hi thanks for your reply. is anyone is going to fix this issue in new solr version? because there are so many guys facing the same problem while upgrading the solr index 3.5.0 to solr 4.2 Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-replicate-the-solr-3-5-indexes-to-solr-4-2-indexes-tp4055313p4055477.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 3.4: memory leak?
Hi André, Thanks a lot for your response and the relevant information. Indeed, we have noticed the similar behavior when hot reloading a web-app with solr after changing some of the classes. The only bad consequence of this that luckily does not happen too often, is that the web app becomes stale. So we prefer actually (re)deploying via tomcat restart. Thanks, Dmitry On Thu, Apr 11, 2013 at 6:01 PM, Andre Bois-Crettez andre.b...@kelkoo.comwrote: On 04/11/2013 08:49 AM, Dmitry Kan wrote: SEVERE: The web application [/solr] appears to have started a thread named [**MultiThreadedHttpConnectionMan**ager cleanup] but has failed to stop it. This is very likely to create a memory leak. Apr 11, 2013 6:38:14 AM org.apache.catalina.loader.**WebappClassLoader clearThreadLocalMap To my understanding this kind of leak only is a problem if the Java code is *reloaded* while the tomcat JVM is not stopped. For example when reloadable=true in the Context of the web application and you change files in WEB-INF or .war : what would happen is that each existing threadlocals would continue to live (potentially holding references to other stuff and preventing GC) while new threadlocals are created. http://wiki.apache.org/tomcat/**MemoryLeakProtectionhttp://wiki.apache.org/tomcat/MemoryLeakProtection If you stop tomcat entirely each time, you should be safe. -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: Not able to replicate the solr 3.5 indexes to solr 4.2 indexes
Hi Erick, I have already created a Jira and also attached a Path. But no unit tests. My local build is failing (building from solr 4.2.1 source jar). Please see https://issues.apache.org/jira/browse/SOLR-4703 . -- Umesh On Sat, Apr 13, 2013 at 7:24 PM, Erick Erickson erickerick...@gmail.comwrote: Please make a JIRA and attach as a patch if there aren't any JIRAs for this yet. Best Erick On Fri, Apr 12, 2013 at 1:58 AM, Montu v Boda montu.b...@highqsolutions.com wrote: hi thanks for your reply. is anyone is going to fix this issue in new solr version? because there are so many guys facing the same problem while upgrading the solr index 3.5.0 to solr 4.2 Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-replicate-the-solr-3-5-indexes-to-solr-4-2-indexes-tp4055313p4055477.html Sent from the Solr - User mailing list archive at Nabble.com. -- --- Thanks Regards Umesh Prasad
CloudSolrServer vs ConcurrentUpdateSolrServer for indexing
Hi This question has come up many times in the list with lots of variations (which confuses me a lot). Iam using Solr 4.1. one collection , 6 shards, 6 machines. I am using CloudSolrServer inside each mapper to index my documents…. While it is working fine , iam trying to improve the indexing performance. Question is: 1) is CloudSolrServer multiThreaded? 2) Will using ConcurrentUpdateSolr server increase indexing performance? ./Zahoor
Re: CloudSolrServer vs ConcurrentUpdateSolrServer for indexing
On Apr 13, 2013, at 11:07 AM, J Mohamed Zahoor zah...@indix.com wrote: Hi This question has come up many times in the list with lots of variations (which confuses me a lot). Iam using Solr 4.1. one collection , 6 shards, 6 machines. I am using CloudSolrServer inside each mapper to index my documents…. While it is working fine , iam trying to improve the indexing performance. Question is: 1) is CloudSolrServer multiThreaded? No. The proper fast way to use it is to start many threads that all add docs to the same CloudSolrServer instance. In other words, currently, you must do the multi threading yourself. CloudSolrServer is thread safe. 2) Will using ConcurrentUpdateSolr server increase indexing performance? Yes, but at the cost of having to specify a server to talk to - if it goes down, so does your indexing. It's also not very great at reporting errors. Finally, using multiple threads and CloudSolrServer, you can approach the performance of ConcurrentUpdateSolr server. - Mark ./Zahoor
Re: Easier way to do this?
OK, is d in degrees or miles? On Fri, Apr 12, 2013 at 10:20 PM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: Bill, I responded to the issue you created about this: https://issues.apache.org/jira/browse/SOLR-4704 In summary, use {!geofilt}. ~ David Billnbell wrote I would love for the SOLR spatial 4 to support pt so that I can run # of results around a central point easily like in 3.6. How can I pass parameters to a Circle() ? I would love to send PT to this query since the pt is the same across multiple areas For example: http://localhost:8983/solr/core/select?rows=0q=*:*facet=truefacet.query={ ! key=.5}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.0072369))%22facet.query={! key=1}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.01447))%22facet.query={! key=5}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.0723))%22facet.query={! key=10}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.1447))%22{! key=25}facet.query=store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.361846))%22facet.query={! key=50}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=.72369))%22facet.query={! key=100}store_geohash:%22Intersects(Circle(26.012156,-80.311943%20d=1.447))%22 - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Easier-way-to-do-this-tp4055474p4055732.html Sent from the Solr - User mailing list archive at Nabble.com. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Easier way to do this?
Good question. With geofilt it's kilometers. - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Easier-way-to-do-this-tp4055474p4055784.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Basic auth on SolrCloud /admin/* calls
This JIRA covers a lot of what you're asking: https://issues.apache.org/jira/browse/SOLR-4470 I am also trying to get this sort of solution in place, but it seems to be dying off a bit. Hopefully we can get some interest on this again, this question comes up every few weeks, it seems. I can confirm the latest patch from this JIRA works as expected, although my primary concern is the credentials appear in the JVM command, and I'd like to move that to a file. Cheers, Tim On 11/04/13 10:41 AM, Michael Della Bitta wrote: It's fairly easy to lock down Solr behind basic auth using just the servlet container it's running in, but the problem becomes letting services that *should* be able to access Solr in. I've rolled with basic auth in some setups, but certain deployments such as Solr Cloud or sharded setups don't play well with auth because there's no good way to configure them to use it. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Apr 11, 2013 at 1:19 PM, Raymond Wikerrwi...@gmail.com wrote: On Apr 11, 2013, at 17:12 , adfel70adfe...@gmail.com wrote: Hi I need to implement security in solr as follows: 1. prevent unauthorized users from accessing to solr admin pages. 2. prevent unauthorized users from performing solr operations - both /admin and /update. Is the conclusion of this thread is that this is not possible at the moment? The obvious solution (to me, at least) would be to (1) restrict access to solr to localhost, and (2) use a reverse proxy (e.g, apache) on the same node to provide authenticated restricted access to solr. I think I've seen recipes for (1), somewhere, and I've used (2) fairly extensively for similar purposes.
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Hi Jack; Due to I am new to Solr, can you explain this two things that you said: 1) when most people say index size they are referring to all fields, collectively, not individual fields (what do you mean with Segments are on a per-field basis and all fields, individual fields.) 2) more cores might make the worst case scenario worse since it will maximize the amount of data processed at a given moment 2013/4/13 Erick Erickson erickerick...@gmail.com bq: disk space is three times True, I keep forgetting about compound since I never use it... On Wed, Apr 10, 2013 at 11:05 AM, Walter Underwood wun...@wunderwood.org wrote: Correct, except the worst case maximum for disk space is three times. --wunder On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote: You're mixing up disk and RAM requirements when you talk about having twice the disk size. Solr does _NOT_ require twice the index size of RAM to optimize, it requires twice the size on _DISK_. In terms of RAM requirements, you need to create an index, run realistic queries at the installation and measure. Best Erick On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote: On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn Furkan KAMACI furkankam...@gmail.com writes: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org 10 GB. Because when Solr shuffles the data around, it could use up to twice the size of the index in order to optimize the index on disk. -- Justin -- Walter Underwood wun...@wunderwood.org
Re: Which tokenizer or analizer should use and field type
I tried both way. (project AND assistant) OR manager project assistant~5 OR manager it is working properly. but i got problem. if i give query projec assistant, then it is not able to find out. and what is meaning of ~5 ? If i write *projec assistant* then it is able to find out but it give project or assistant. My objective is to search like - Mysql like operator, %search word% . How to write query which is exactly like , Mysql like operator. Thanks Need help As soon as possible -- View this message in context: http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591p4055833.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Which tokenizer or analizer should use and field type
Hi, If you can help me in. It will solve my problem. keyword:(*assistant AND coach*) giving me 1 result. keyword:(*iit AND kanpur*) giving me 2 result. But query:- keyword:(*assistant AND coach* OR (*iit AND kanpur*)) giving me only 1 result. Also i tried. keyword:(*assistant AND coach* OR (*:* *iit AND kanpur*)) giving me only 1 result. Don't know why. How query should look like ?? please help me to find out solution. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591p4055837.html Sent from the Solr - User mailing list archive at Nabble.com.
Is any way to return the number of indexed tokens in a field?
Hello, We seem to have all sorts of functions around tokenized field content, but I am looking for simple count/length that can be returned as a pseudo-field. Does anyone know of one out of the box? The specific situation is that I am indexing a field for specific regular expressions that become tokens (in a copyField). Not every field has the same number of those. I now want to find the documents that have maximum number of tokens in that field (for testing and review). But I can't figure out how. Any help would be appreciated. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)