What request handlers to use for query strings in Chinese or Japanese?
Hi, For my Solr server, some of the query strings will be in Asian languages such as Chinese or Japanese. For such query strings, would the Standard or Dismax request handler work? My understanding is that both the Standard and the Dismax handler tokenize the query string by whitespace. And that wouldn't work for Chinese or Japanese, right? In that case, what request handler should I use? And if I need to set up custom request handlers for those languages, how do I do it? Thanks. Andy
Re: Solrj performance bottleneck
thanks for all your info. I will try increase the RAM and check it. thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-performance-bottleneck-tp2682797p2692503.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What request handlers to use for query strings in Chinese or Japanese?
That's the job your analyzer should concern 2011/3/17 Andy angelf...@yahoo.com: Hi, For my Solr server, some of the query strings will be in Asian languages such as Chinese or Japanese. For such query strings, would the Standard or Dismax request handler work? My understanding is that both the Standard and the Dismax handler tokenize the query string by whitespace. And that wouldn't work for Chinese or Japanese, right? In that case, what request handler should I use? And if I need to set up custom request handlers for those languages, how do I do it? Thanks. Andy
Re: Solr Autosuggest help
Hi, One more query. Currently in the autosuggestion Solr returns words like below: googl googl _ googl search googl chrome googl map The last letter seems to be missing in autosuggestion. I have send the query as ?qt=/termsterms=trueterms.fl=mydataterms.lower=googterms.prefix=goog. The following is my schema.xml for the Text filed. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 filter class=solr.LowerCaseFilterFactory filter class=solr.EnglishPorterFilterFactory protected=protwords.txt filter class=solr.RemoveDuplicatesTokenFilterFactory filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true outputUnigramIfNoNgram=true analyzer fieldType Could anyone update what could be wrong? why the last letter get missing. It occurs for a few word only. Suggestions for other words are good only. One more query, how the word 'sci/tech' will be indexed in solr. If I search on sci/tech it wont send any results. Thanks in Advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2692651.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting 0 values last
Okay. When I use the map function with ..sort=map(price, 0, 0, 0, 1) desc then solr output an error: 17.03.2011 09:42:58 org.apache.solr.common.SolrException log SCHWERWIEGEND: org.apache.solr.common.SolrException: Missing sort order. at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:254) at org.apache.solr.search.QParser.getSort(QParser.java:211) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:90) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) 17.03.2011 09:42:58 org.apache.solr.core.SolrCore execute INFO: [de] webapp=/solr path=/select params={sort=map(calc_curr,0,0,1)+descqt=nonequery} status=400 QTime=1 fyi: I use solr 1.4.1 -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-0-values-last-tp2681612p2692701.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replication slows down massively during high load
Hello Shawn, Primary assumption: You have a 64-bit OS and a 64-bit JVM. Jepp, it's running 64-bit Linux with 64-bit JVM It sounds to me like you're I/O bound, because your machine cannot keep enough of your index in RAM. Relative to your 100GB index, you only have a maximum of 14GB of RAM available to the OS disk cache, since Java's heap size is 10GB. The load test seems to be more CPU bound than I/O bound. All cores are fully busy and iostat says that there isn't much more disk I/O going on than without load test. The index is on a RAID10 array with four disks. How much disk space do all of the index files that end in x take up? I would venture a guess that it's significantly more than 14GB. On Linux, you could do this command to tally it quickly: # du -hc *x 27G total # du -hc `ls | egrep -v tvf|fdt` 51G total If you installed enough RAM so the disk cache can be much larger than the total size of those files ending in x, you'd probably stop having these performance issues. Realizing that this is a Alternatively, you could take steps to reduce the size of your index, or perhaps add more machines to go distributed. Unfortunately, this doesn't seem to be the problem. The queries themselves are running fine. The problem is that the replications is crawling when there are many queries going on and that the replication speed stays low even after the load is gone. Cheers Vadim
Re: hierarchical faceting, SOLR-792 - confused on config
On Wed, 2011-03-16 at 18:36 +0100, Erik Hatcher wrote: Sorry, I missed the original mail on this thread I put together that hierarchical faceting wiki page a couple of years ago when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches. Since then, SOLR-792 morphed and is committed as pivot faceting. SOLR-64 spawned a PathTokenizer which is part of Solr now too. Recently Toke updated that page with some additional info. It's definitely not a how to page, and perhaps should get renamed/moved/revamped? Toke? Unfortunately or luckily, depending on ones point of view, I am hit by a child #3 and buying house combo. A lot of intentions, but no promises for the next month or two. I think we need both an overview and a detailed how-to of the different angles on extended faceting in Solr, seen from a user-perspective. I am not sure I fully understand the different methods myself, so maybe we could start by discussing them here? Below is a quick outline of how I see them; please expand correct. I plan to back up the claims about scale later with a wiki-page with performance tests. http://www.lucidimagination.com/solutions/webcasts/faceting @27-33 min: - Requires the user to transform the paths to multiple special terms - Step-by-step drill down: If a visual tree is needed, it requires one call for each branch. - Supports multiple paths/document - Constraints on output works just as standard faceting - Scales very well when a single branch is requested Example use case: Click-to-expand tree structure of categories for books. PathHierarchyTokenizer (trunk): Changes /A/B/C to /A, /A/B and /A/B/C. I don't know how this can be used directly for hierarchical faceting. The Lucid Imagination webcast uses the tokenization 0/A, 1/A/B and 2/A/B/C so they seem incompatible to me. The discussion on SOLR-1057 indicates that it can be used with SOLR-64, but SOLR-64 does its own tokenization!? Little help here? SOLR-64 (not up to date with trunk?): - Uses a custom tokenizer to handle delimited paths (A/B/C). - Single-path hierarchical faceting - Constraints can be given on the depth of the hierarchy but not on the number of entries at a given level (huge result set when a wide hierarchy is analyzed) - Fine (speed memory) for small taxonomies - Does not scale well (speed) to large taxonomies Example use case: Tree structure of addresses for stores. SOLR-792 aka pivot faceting (Solr 4.0): - Uses multiple independent fields as input: Not suitable for taxonomies - Multi-value but not multi-path - Supports taxonomies by restraining to single-path/document(?) - Constraints can be given on entry count, but sorting cannot be done on recursive counting of entries (and it would be very CPU expensive to do so(?)) - Fine (speed memory) for small taxonomies - Scales well (speed memory)to large taxonomies - Scales poorly (speed)to large taxonomies and large result size Example use case: Tree structure with price, rating and stock SOLR-2412 (trunk, highly experimental): - Multi-path hierarchical faceting - Uses a field with delimited paths as input (A/B/C) - Constraints can be given on depth as well as entry count, but sorting cannot be done on recursive counting of entries (the number is there though, so it would be fairly easy to add such a sorter) - Fine (speed memory) for small taxonomies - Scales well (speed memory)to large taxonomies result size Example use case: Tree structure of categories for books.
SOLR building problems
Hello, The apache wiki gives me this information: Skip this section if you have a binary distribution of Solr. These instructions will building Solr from source, if you have a nightly tarball or have checked out the trunk from subversion at http://svn.apache.org/repos/asf/lucene/dev/trunk. Assumes that you have JDK 1.6 already installed. In the source directory, run ant dist to build the .war file under dist. Build the example for the Solr tutorial by running ant example. Change to the 'example' directory, run java -jar start.jar and visit localhost:8983/solr/admin to test that the example works with the Jetty container. I have run this code: svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk After that i try the ant example commando. This doesn't work, i got the following error message: common.compile-core: [javac] Compiling 508 source files to somedir/trunk/lucene/build/classes/java [javac] -- [javac] 1. ERROR in dir/trunk/lucene/src/java/org/apache/lucene/document/DateTools.java (at line 1) [javac] package org.apache.lucene.document; [javac] ^^ [javac] The type Enum is not generic; it cannot be parameterized with arguments [javac] -- [javac] 1 problem (1 error) BUILD FAILED somedir/trunk/solr/common-build.xml:249: The following error occurred while executing this line: somedir/trunk/lucene/contrib/contrib-build.xml:58: The following error occurred while executing this line: somedir/trunk/lucene/common-build.xml:296: The following error occurred while executing this line: somedir/trunk/lucene/common-build.xml:717: Compile failed; see the compiler error output for details. Ant is installed correctly i think: ant -version = Apache Ant(TM) version 1.8.2 compiled on December 20 2010 What goes wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-building-problems-tp2692916p2692916.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Version Incompatibility(Invalid version (expected 2, but 1) or the data in not in 'javabin' format)
On Thursday 17 March 2011 03:18 AM, Ahmet Arslan wrote: I am using Solr 4.0 api to search from index (made using solr1.4 version). I am getting error Invalid version (expected 2, but 1) or the data in not in 'javabin' format. Can anyone help me to fix problem. You need to use solrj version 1.4 which is compatible to your index format/version. Actually there exists another solution. Using XMLResponseParser instead of BinaryResponseParser which is the default. new CommonsHttpSolrServer(new URL(http://solr1.4.0Instance:8080/solr;), null, new XMLResponseParser(), false); Hi, Thanks !!!
Re: Error: Unbuffered entity enclosing request can not be repeated.
What happens if you submit the 9th batch first? I'm wondering if the 9th batch is just mal-formed and has nothing to do with the previous batches. As to the time, what merge factor are you using? And how are you committing? Via autocommit parameters or explicitly or not at all? Best Erick On Wed, Mar 16, 2011 at 1:13 PM, André Santos manofi...@gmail.com wrote: Hi all! I created a SolrJ project to run test Solr. So, I am inserting batches of 7000 records, each with 200 attributes which adds up approximately to 13.77 Mb per batch. I am measuring the time it takes to add and commit each set of 7000 records to an instantiation of CommonsHttpSolrServer. Each of the first 6 batches takes approximately 17 to 21 seconds. The 7th batch takes 42sec and the 8th takes 1min. And when it adds the 9th batch to the server it generates this error: Mar 16, 2011 4:56:20 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.SocketException) caught when processing request: Connection reset Mar 16, 2011 4:56:21 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request Exception in thread main org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:480) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) I googled this error and one of the suggestions consists of the reduction of the number of records per batch. But I want to achieve a solution with at least 7000 records per batch. Any help would be appreciated. André
Re: i don't get why my index didn't grow more...
This page: http://lucene.apache.org/java/3_0_2/fileformats.html#file-names, when combined with what Yonik said may help you figure it out... And if you're still stumped, please post the fieldType and field definitions you used Best Erick On Wed, Mar 16, 2011 at 5:10 PM, Robert Petersen rober...@buy.com wrote: OK I have a 30 gb index where there are lots of sparsly populated int fields and then one title field and one catchall field with title and everything else we want as keywords, the catchall field. I figure it is the biggest field in our documents which as I mentioned is otherwise composed of a variety if int fields and a title. So my puzzlement is that my biggest field is copied into a double metaphone field and now I added another copyfield to also copy the catchall field into a newly created soundex field for an experiment to compare the effectiveness of the two. I expected the index to grow by at least 25% to 30%, but it barely grew at all. Can someone explain this to me? Thanks! J
Re: Error: Unbuffered entity enclosing request can not be repeated.
Hi, Eric! I suspect that the problem resides in Tomcat. I think that the connection server-client times out. What happens if you submit the 9th batch first? I'm wondering if the 9th batch is just mal-formed and has nothing to do with the previous batches. The 9th batch is ok, like the other batches. It is filled up with random data. I received that error in many executions (normally in 7th, 8th or 9th batch) when batches have more than 10Mb approximately. As to the time, what merge factor are you using? And how are you committing? Via autocommit parameters or explicitly or not at all? The merge factor is 25. I done the commit explicitly: for (int k = 0;k nregisters; k++) { ... docs.add( doc ); } server.add(docs); server.commit(); André
Re: SOLR building problems
What Java version do you have installed? (java -version) Best Erick On Thu, Mar 17, 2011 at 6:30 AM, royr r...@blixem.nl wrote: Hello, The apache wiki gives me this information: Skip this section if you have a binary distribution of Solr. These instructions will building Solr from source, if you have a nightly tarball or have checked out the trunk from subversion at http://svn.apache.org/repos/asf/lucene/dev/trunk. Assumes that you have JDK 1.6 already installed. In the source directory, run ant dist to build the .war file under dist. Build the example for the Solr tutorial by running ant example. Change to the 'example' directory, run java -jar start.jar and visit localhost:8983/solr/admin to test that the example works with the Jetty container. I have run this code: svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk After that i try the ant example commando. This doesn't work, i got the following error message: common.compile-core: [javac] Compiling 508 source files to somedir/trunk/lucene/build/classes/java [javac] -- [javac] 1. ERROR in dir/trunk/lucene/src/java/org/apache/lucene/document/DateTools.java (at line 1) [javac] package org.apache.lucene.document; [javac] ^^ [javac] The type Enum is not generic; it cannot be parameterized with arguments [javac] -- [javac] 1 problem (1 error) BUILD FAILED somedir/trunk/solr/common-build.xml:249: The following error occurred while executing this line: somedir/trunk/lucene/contrib/contrib-build.xml:58: The following error occurred while executing this line: somedir/trunk/lucene/common-build.xml:296: The following error occurred while executing this line: somedir/trunk/lucene/common-build.xml:717: Compile failed; see the compiler error output for details. Ant is installed correctly i think: ant -version = Apache Ant(TM) version 1.8.2 compiled on December 20 2010 What goes wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-building-problems-tp2692916p2692916.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR-2242-distinctFacet.patch
Hi, I want to enquire the patch for namedistinct(SOLR-2242-distinctFacet.patch) available with solr4.0 trunk On Monday 14 March 2011 08:05 PM, Jonathan Rochkind wrote: It's not easy if you have lots of facet values (in my case, can even be up to a million), but there is no way built-in to Solr to get this. I have been told that some of the faceting strategies (there are actually several in use in Solr based on your parameters and the nature of your data) return the page of facet values without actually counting all possible facet values, which is what would make this difficult. But I have not looked at the code myself. Jonathan On 3/11/2011 7:33 AM, Erick Erickson wrote: There's nothing that I know of that gives you this, but it's simple to count the members of the list yourself... Best Erick On Fri, Mar 11, 2011 at 3:34 AM, rajini maskirajinima...@gmail.com wrote: Query on facet field results... When I run a facet query on some field say : facet=on facet.field=StudyID I get list of distinct StudyID list with the count that tells that how many times did this study occur in the search query. But I also needed the count of these distinct StudyID list.. Any solr query to get count of it.. Example: lst name=*facet_fields* lst name= StudyID int name=*105*135164/int int name=*179*79820/int int name=*107*70815/int int name=*120*37076/int int name=*134*35276/int /lst /lst I wanted the count attribute that shall return the count of number of different studyID occurred .. In above example it could be : Count = 5 (105,179,107,120,134) lst name=*facet_fields* lst name= StudyID COUNT=5 int name=*105*135164/int int name=*179*79820/int int name=*107*70815/int int name=*120*37076/int int name=*134*35276/int /lst /lst
Re: Multiple Blocked threads on UnInvertedField.getUnInvertedField() SegmentReader$CoreReaders.getTermsReader
Hi Yonik, I have another question related to fieldValueCache. When we uninvert a facet field, and if the termInstances = 0 for a particular field, then also it gets added to the FieldValueCache. What is the reason for caching facet fields with termInstances=0? In our case, a lot of time is being spent in the 'uninvert' process. From 'time' values , I checked that it goes upto 20secs for certain facet fields. Eg : UnInverted multi-valued field {field=product_brands_61936,memSize=4224,tindexSize=32,time=20202,phase1=20202,nTerms=0,bigTerms=0,termInstances=0,uses=0} Also for the same facet field, the time and phase1 time varies from 3 msec to 20 secs. What is the reason for this variation ? Also what does nTerms represent ? Thanks, Rachita On Mon, Mar 7, 2011 at 8:22 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Mar 7, 2011 at 9:44 AM, Rachita Choudhary rachita.choudh...@burrp.com wrote: As enum method , will create a bitset for all the unique values It's more complex than that. - small sets will use a sorted int set... not a bitset - you can control what gets cached via facet.enum.cache.minDf parameter -Yonik http://lucidimagination.com
Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11
Lewis My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my context file and it does not have the xml preamble your has, specifically: '?xml version=1.0 encoding=utf-8?', Here is my context file: Context docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/home/omim/index/ override=true / /Context --- Hope this helps. Cheers François On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote: Hello list, Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the past I have been using guidance in accordance with http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems E.g. INFO: Deploying configuration descriptor wombra.xml This is my context fragment from /home/lewis/Downloads/apache-tomcat-7.0.11/conf/Catalina/localhost 16-Mar-2011 16:57:36 org.apache.tomcat.util.digester.Digester fatalError SEVERE: Parse Fatal Error at line 4 column 6: The processing instruction target matching [xX][mM][lL] is not allowed. org.xml.sax.SAXParseException: The processing instruction target matching [xX][mM][lL] is not allowed. ... 16-Mar-2011 16:57:36 org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor wombra.xml org.xml.sax.SAXParseException: The processing instruction target matching [xX][mM][lL] is not allowed. ... some more ... My configuration descriptor is as follows ?xml version=1.0 encoding=utf-8? Context docBase=/home/lewis/Downloads/wombra/wombra.war crossContext=true Environment name=solr/home type=java.lang.String value=/home/lewis/Downloads/wombra override=true/ /Context Preferably I would upload a WAR file, but I have been working well with the configuration I have been using up until now therefore I didn't question change. I am unfamiliar with the above errors. Can anyone please point me in the right direction? Thank you Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: i don't get why my index didn't grow more...
Without even looking at the different segment files, things look odd: You say that you optimize every day, yet I see segments up to 4 days old. Also look at all the segments_??? files... each represents a commit point of the index. So it looks like you have 16 snapshots (or commit points) of the index. Do you have a deletion policy configured to do this for some reason? Anyway, this is why when you changed how you index, you didn't see much of a size increase (comparatively). -Yonik http://lucidimagination.com On Wed, Mar 16, 2011 at 7:46 PM, Robert Petersen rober...@buy.com wrote: Thanks for the reply Yonik, Here are the results of Ls -l on the master server index folder, also please note we have hundreds of those small sparsely populated fields and I run optimize once a day at midnight. We index 24/7 off a queue at a clip of about 200K docs per hour so the index has had hundreds of commits since last night at midnight. [...]
RE: Using Solr 1.4.1 on most recent Tomcat 7.0.11
I do have the xml preamble ?xml version=1.0 encoding=UTF-8? in my config file in conf/Catalina/localhost/ and solr starts ok with Tomcat 7.0.8. Haven't try with 7.0.11 yet. I wonder why your exception point to line 4 column 6, however. Shouldn't it point to line 1 column 1 ? Do you have some blank lines at the start of your XML file or some non blank lines ? Pierre -Message d'origine- De : François Schiettecatte [mailto:fschietteca...@gmail.com] Envoyé : jeudi 17 mars 2011 14:48 À : solr-user@lucene.apache.org Objet : Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11 Lewis My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my context file and it does not have the xml preamble your has, specifically: '?xml version=1.0 encoding=utf-8?', Here is my context file: Context docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/home/omim/index/ override=true / /Context --- Hope this helps. Cheers François On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote: Hello list, Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the past I have been using guidance in accordance with http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems E.g. INFO: Deploying configuration descriptor wombra.xml This is my context fragment from /home/lewis/Downloads/apache-tomcat-7.0.11/conf/Catalina/localhost 16-Mar-2011 16:57:36 org.apache.tomcat.util.digester.Digester fatalError SEVERE: Parse Fatal Error at line 4 column 6: The processing instruction target matching [xX][mM][lL] is not allowed. org.xml.sax.SAXParseException: The processing instruction target matching [xX][mM][lL] is not allowed. ... 16-Mar-2011 16:57:36 org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor wombra.xml org.xml.sax.SAXParseException: The processing instruction target matching [xX][mM][lL] is not allowed. ... some more ... My configuration descriptor is as follows ?xml version=1.0 encoding=utf-8? Context docBase=/home/lewis/Downloads/wombra/wombra.war crossContext=true Environment name=solr/home type=java.lang.String value=/home/lewis/Downloads/wombra override=true/ /Context Preferably I would upload a WAR file, but I have been working well with the configuration I have been using up until now therefore I didn't question change. I am unfamiliar with the above errors. Can anyone please point me in the right direction? Thank you Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education's Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: SOLR building problems
java version 1.6.0_21 Java(TM) SE Runtime Environment (build 1.6.0_21-b06) Java HotSpot(TM) Server VM (build 17.0-b16, mixed mode) -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-building-problems-tp2692916p2693574.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Using Solr 1.4.1 on most recent Tomcat 7.0.11
Hi François, Thank you for your reply. I had made a simple mistake of including comments before '?xml version=1.0 encoding=utf-8?', therefore I was getting a SAX error. As you have correctly pointed out, it is not essential to include the snippet as above in the context file (if using one), however it might be useful to know that Tomcat 7 now validates XML files by default. In time I will get round to editing the wiki accordingly to mitigate against this in the future. Thanks for looking in to this. Lewis ___ From: François Schiettecatte [fschietteca...@gmail.com] Sent: 17 March 2011 13:47 To: solr-user@lucene.apache.org Subject: Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11 Lewis My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my context file and it does not have the xml preamble your has, specifically: '?xml version=1.0 encoding=utf-8?', Here is my context file: Context docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/home/omim/index/ override=true / /Context --- Hope this helps. Cheers François Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: hierarchical faceting, SOLR-792 - confused on config
Yes, pivot faceting is committed to trunk. But is not part of upcoming 3.1 release. Erik On Mar 16, 2011, at 15:00 , McGibbney, Lewis John wrote: Hi Erik, I have been reading about the progression of SOLR-792 into pivot faceting, however can you expand to comment on where it is committed. Are you referring to trunk? The reason I am asking is that I have been using 1.4.1 for some time now and have been thinking of upgrading to trunk... or branch Thank you Lewis From: Erik Hatcher [erik.hatc...@gmail.com] Sent: 16 March 2011 17:36 To: solr-user@lucene.apache.org Subject: Re: hierarchical faceting, SOLR-792 - confused on config Sorry, I missed the original mail on this thread I put together that hierarchical faceting wiki page a couple of years ago when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches. Since then, SOLR-792 morphed and is committed as pivot faceting. SOLR-64 spawned a PathTokenizer which is part of Solr now too. Recently Toke updated that page with some additional info. It's definitely not a how to page, and perhaps should get renamed/moved/revamped? Toke? Erik Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: Solr Autosuggest help
hi, We have found that 'EnglishPorterFilterFactory' causes that issue. I believe that is used for stemming words. Once we commented that factory, it works fine. And another thing, currently I am checking about how the word 'sci/tech' will be indexed in solr. As mentioned in my previous email, if I search on sci/tech it wont send any results. But solr has the terms as sci/tech. When I search on other terms which also contain sci/tech, it returns both the words. Please let me know, if you have any idea regarding that.. If I came to know I will update this thread. thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2693601.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: hierarchical faceting, SOLR-792 - confused on config
On Mar 16, 2011, at 14:53 , Jonathan Rochkind wrote: Interesting, any documentation on the PathTokenizer anywhere? Or just have to find and look at the source? That's something I hadn't known about, which may be useful to some stuff I've been working on depending on how it works. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory Sorry, I said PathTokenizer which is what SOLR-1057 called it for a bit before it got renamed. Erik
Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11
Pierre That is a very good point, I have been caught in the past by poor xml (RSS feeds) that included control characters before the '?xml...' . And I have add the preamble to my solr.xml files for good form :) François On Mar 17, 2011, at 10:02 AM, Pierre GOSSE wrote: I do have the xml preamble ?xml version=1.0 encoding=UTF-8? in my config file in conf/Catalina/localhost/ and solr starts ok with Tomcat 7.0.8. Haven't try with 7.0.11 yet. I wonder why your exception point to line 4 column 6, however. Shouldn't it point to line 1 column 1 ? Do you have some blank lines at the start of your XML file or some non blank lines ? Pierre -Message d'origine- De : François Schiettecatte [mailto:fschietteca...@gmail.com] Envoyé : jeudi 17 mars 2011 14:48 À : solr-user@lucene.apache.org Objet : Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11 Lewis My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my context file and it does not have the xml preamble your has, specifically: '?xml version=1.0 encoding=utf-8?', Here is my context file: Context docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/home/omim/index/ override=true / /Context --- Hope this helps. Cheers François On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote: Hello list, Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the past I have been using guidance in accordance with http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems E.g. INFO: Deploying configuration descriptor wombra.xml This is my context fragment from /home/lewis/Downloads/apache-tomcat-7.0.11/conf/Catalina/localhost 16-Mar-2011 16:57:36 org.apache.tomcat.util.digester.Digester fatalError SEVERE: Parse Fatal Error at line 4 column 6: The processing instruction target matching [xX][mM][lL] is not allowed. org.xml.sax.SAXParseException: The processing instruction target matching [xX][mM][lL] is not allowed. ... 16-Mar-2011 16:57:36 org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor wombra.xml org.xml.sax.SAXParseException: The processing instruction target matching [xX][mM][lL] is not allowed. ... some more ... My configuration descriptor is as follows ?xml version=1.0 encoding=utf-8? Context docBase=/home/lewis/Downloads/wombra/wombra.war crossContext=true Environment name=solr/home type=java.lang.String value=/home/lewis/Downloads/wombra override=true/ /Context Preferably I would upload a WAR file, but I have been working well with the configuration I have been using up until now therefore I didn't question change. I am unfamiliar with the above errors. Can anyone please point me in the right direction? Thank you Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education's Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: Parent-child options
On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: The dreaded parent-child without denormalization question. What are one's options for the following example: parent: shoes 3 children. each with 2 attributes/fields: color and size * color: red black orange * size: 10 11 12 The goal is to be able to search for: 1) color:red AND size:10 and get 1 hit for the above 2) color:red AND size:12 and get *no* matches because there are no red shoes of size 12, only size 10. What if you had this instead: color: red red orange size: 10 11 12 Do you need for color:red to return 1 or 2 (i.e. is the final answer in units of child hits or parent hits)? -Yonik http://lucidimagination.com
Re: Replication slows down massively during high load
On 3/17/2011 3:43 AM, Vadim Kisselmann wrote: Unfortunately, this doesn't seem to be the problem. The queries themselves are running fine. The problem is that the replications is crawling when there are many queries going on and that the replication speed stays low even after the load is gone. If you run iostat 5 what are typical values on each iteration for the various CPU states while you're doing load testing and replication at the same time? In particular, %iowait is important.
from multiValued field to non-multiValued field with copyField?
Is there a way to have a kind of casting for copyField? I have author names in multiValued string field and need a sorting on it, but sort on field is only for multiValued=false. I'm trying to get multiValued content from one field to a non-multiValued text or string field for sorting. And this, if possible, during loading with copyField. Or any other solution? I need this solution due to patch SOLR-2339, which is now more strict. May be anyone else also. Regards, Bernd
Re: from multiValued field to non-multiValued field with copyField?
On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Is there a way to have a kind of casting for copyField? I have author names in multiValued string field and need a sorting on it, but sort on field is only for multiValued=false. I'm trying to get multiValued content from one field to a non-multiValued text or string field for sorting. And this, if possible, during loading with copyField. Or any other solution? [...] Not sure about CopyField, but you could use a transformer to extract values from a multiValued field, and stick them into a single-valued field. Regards, Gora
Rename fields in a query
Given a Query object (name:firefox name:opera), is it possible 'rename' the fields names to, for example, (content:firefox content:opera)?
Re: Sorting on multiValued fields via function query
On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Also... if lucene is already capable of sorting on multi-valued field by choosing the largest value largest vs. smallest is presumably just arbitrary there, there is presumably no performance implication to choosing the smallest instead of the largest. It just chooses the largest, according to Yonik. It's a little more complicated than that. It's not so much an explicit feature in lucene, but just what naturally happens when building the field cache via uninverting an indexed field. It's pretty much this: for every term in the field: for every document that matches that term: value[document] = term And since terms are iterated from smallest to largest (and no, you can't reverse this) larger values end up overwriting smaller values. There's no simple patch to pick the smallest rather than the largest. In the past, lucene used to try and detect this multi-valued case by checking the number of values set in the whole array. This was unreliable though and the check was discarded. -Yonik http://lucidimagination.com
Re: from multiValued field to non-multiValued field with copyField?
On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Is there a way to have a kind of casting for copyField? I have author names in multiValued string field and need a sorting on it, but sort on field is only for multiValued=false. I'm trying to get multiValued content from one field to a non-multiValued text or string field for sorting. And this, if possible, during loading with copyField. Or any other solution? I need this solution due to patch SOLR-2339, which is now more strict. May be anyone else also. Hmmm, you're the second person that's relied on that (sorting on a multiValued field working). Was SOLR-2339 a mistake? -Yonik http://lucidimagination.com
Re: from multiValued field to non-multiValued field with copyField?
Good idea. Was also just looking into this area. Assuming my input record looks like this: documents document id=foobar element name=authorvalueauthor_1 ; author_2 ; author_3/value/element /document /documents Do you know if I can use something like this: ... entity name=records processor=XPathEntityProcessor transformer=RegexTransformer ... field column=author xpath=/documents/document/element[@name='author']/value / field column=author_sort xpath=/documents/document/element[@name='author']/value / field column=author splitBy= ; / ... To just double the input and make author multiValued and author_sort a string field? Regards Bernd Am 17.03.2011 15:39, schrieb Gora Mohanty: On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Is there a way to have a kind of casting for copyField? I have author names in multiValued string field and need a sorting on it, but sort on field is only for multiValued=false. I'm trying to get multiValued content from one field to a non-multiValued text or string field for sorting. And this, if possible, during loading with copyField. Or any other solution? [...] Not sure about CopyField, but you could use a transformer to extract values from a multiValued field, and stick them into a single-valued field. Regards, Gora
Re: Sorting on multiValued fields via function query
Here is a work around. Stick the high value and low value into other fields. Use those fields for sorting. Bill Bell Sent from mobile On Mar 17, 2011, at 8:49 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Also... if lucene is already capable of sorting on multi-valued field by choosing the largest value largest vs. smallest is presumably just arbitrary there, there is presumably no performance implication to choosing the smallest instead of the largest. It just chooses the largest, according to Yonik. It's a little more complicated than that. It's not so much an explicit feature in lucene, but just what naturally happens when building the field cache via uninverting an indexed field. It's pretty much this: for every term in the field: for every document that matches that term: value[document] = term And since terms are iterated from smallest to largest (and no, you can't reverse this) larger values end up overwriting smaller values. There's no simple patch to pick the smallest rather than the largest. In the past, lucene used to try and detect this multi-valued case by checking the number of values set in the whole array. This was unreliable though and the check was discarded. -Yonik http://lucidimagination.com
Re: Sorting on multiValued fields via function query
By the way, this could be done automatically by Solr or Lucene behind the scenes. Bill Bell Sent from mobile On Mar 17, 2011, at 9:02 AM, Bill Bell billnb...@gmail.com wrote: Here is a work around. Stick the high value and low value into other fields. Use those fields for sorting. Bill Bell Sent from mobile On Mar 17, 2011, at 8:49 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Also... if lucene is already capable of sorting on multi-valued field by choosing the largest value largest vs. smallest is presumably just arbitrary there, there is presumably no performance implication to choosing the smallest instead of the largest. It just chooses the largest, according to Yonik. It's a little more complicated than that. It's not so much an explicit feature in lucene, but just what naturally happens when building the field cache via uninverting an indexed field. It's pretty much this: for every term in the field: for every document that matches that term: value[document] = term And since terms are iterated from smallest to largest (and no, you can't reverse this) larger values end up overwriting smaller values. There's no simple patch to pick the smallest rather than the largest. In the past, lucene used to try and detect this multi-valued case by checking the number of values set in the whole array. This was unreliable though and the check was discarded. -Yonik http://lucidimagination.com
Re: Sorting on multiValued fields via function query
Aha, oh well, not quite as good/flexible as I hoped. Still, if lucene is now behaving somewhat more predictably/rationally when sorting on multi-valued fields, then I think, in response to your other email on a similar thread, perhaps SOLR-2339 is now a mistake. When lucene was returning completely unpredictable results -- and even sometimes crashing entirely -- when sorting on a multi-valued field --- then I think in that situation it made a lot of sense for Solr to prevent you from doing that, which is I think what SOLR-2339 does? So I don't think it was neccesarily a mistake in that context. But if lucene now can sort a multi-valued field without crashing when there are 'too many' unique values, and with easily described and predictable semantics (use the minimal value in the multi-valued field as sort key) -- then it probably makes more sense for Solr to let you do that if you really want to, give you enough rope to hang yourself. Jonathan On 3/17/2011 10:49 AM, Yonik Seeley wrote: On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkindrochk...@jhu.edu wrote: Also... if lucene is already capable of sorting on multi-valued field by choosing the largest value largest vs. smallest is presumably just arbitrary there, there is presumably no performance implication to choosing the smallest instead of the largest. It just chooses the largest, according to Yonik. It's a little more complicated than that. It's not so much an explicit feature in lucene, but just what naturally happens when building the field cache via uninverting an indexed field. It's pretty much this: for every term in the field: for every document that matches that term: value[document] = term And since terms are iterated from smallest to largest (and no, you can't reverse this) larger values end up overwriting smaller values. There's no simple patch to pick the smallest rather than the largest. In the past, lucene used to try and detect this multi-valued case by checking the number of values set in the whole array. This was unreliable though and the check was discarded. -Yonik http://lucidimagination.com
Re: from multiValued field to non-multiValued field with copyField?
Do you use Dih handler? A script can do this easily. Bill Bell Sent from mobile On Mar 17, 2011, at 9:02 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Good idea. Was also just looking into this area. Assuming my input record looks like this: documents document id=foobar element name=authorvalueauthor_1 ; author_2 ; author_3/value/element /document /documents Do you know if I can use something like this: ... entity name=records processor=XPathEntityProcessor transformer=RegexTransformer ... field column=author xpath=/documents/document/element[@name='author']/value / field column=author_sort xpath=/documents/document/element[@name='author']/value / field column=author splitBy= ; / ... To just double the input and make author multiValued and author_sort a string field? Regards Bernd Am 17.03.2011 15:39, schrieb Gora Mohanty: On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Is there a way to have a kind of casting for copyField? I have author names in multiValued string field and need a sorting on it, but sort on field is only for multiValued=false. I'm trying to get multiValued content from one field to a non-multiValued text or string field for sorting. And this, if possible, during loading with copyField. Or any other solution? [...] Not sure about CopyField, but you could use a transformer to extract values from a multiValued field, and stick them into a single-valued field. Regards, Gora
Re: Replication slows down massively during high load
On Mar 17, 2011, at 3:19 PM, Shawn Heisey wrote: On 3/17/2011 3:43 AM, Vadim Kisselmann wrote: Unfortunately, this doesn't seem to be the problem. The queries themselves are running fine. The problem is that the replications is crawling when there are many queries going on and that the replication speed stays low even after the load is gone. If you run iostat 5 what are typical values on each iteration for the various CPU states while you're doing load testing and replication at the same time? In particular, %iowait is important. CPU stats from top (iostat doesn't seem to show CPU load correctly): 90.1%us, 4.5%sy, 0.0%ni, 5.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Seems like I/O is not the bottleneck here. Other interesting thing: When Solr starts its replication under heavy load, it tries to download the whole index from master. From /solr/admin/replication/index.jsp: Current Replication Status Start Time: Thu Mar 17 15:57:20 CET 2011 Files Downloaded: 9 / 163 Downloaded: 83,04 MB / 97,75 GB [0.0%] Downloading File: _d5x.nrm, Downloaded: 86,82 KB / 86,82 KB [100.0%] Time Elapsed: 419s, Estimated Time Remaining: 504635s, Speed: 202,94 KB/s
Re: from multiValued field to non-multiValued field with copyField?
Hi Yonik, actually some applications misused sorting on a multiValued field, like VuFind. And as a matter oft fact also FAST doesn't support this because it doesn't make sense. FAST distinguishes between multiValue and singleValue by just adding the seperator-FieldAttribute to the field. So I moved this from FAST index-profile to Solr DIH and placed the seperator there. But now I'm looking for a solution for VuFind. Easiest thing would be to have a kind of casting, may be for copyField. Regards, Bernd Am 17.03.2011 15:58, schrieb Yonik Seeley: On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Is there a way to have a kind of casting for copyField? I have author names in multiValued string field and need a sorting on it, but sort on field is only for multiValued=false. I'm trying to get multiValued content from one field to a non-multiValued text or string field for sorting. And this, if possible, during loading with copyField. Or any other solution? I need this solution due to patch SOLR-2339, which is now more strict. May be anyone else also. Hmmm, you're the second person that's relied on that (sorting on a multiValued field working). Was SOLR-2339 a mistake? -Yonik http://lucidimagination.com
Re: from multiValued field to non-multiValued field with copyField?
Hi Bill, yes DIH is in use. Thanks, Bernd Am 17.03.2011 16:09, schrieb Bill Bell: Do you use Dih handler? A script can do this easily. Bill Bell Sent from mobile On Mar 17, 2011, at 9:02 AM, Bernd Fehlingbernd.fehl...@uni-bielefeld.de wrote: Good idea. Was also just looking into this area. Assuming my input record looks like this: documents document id=foobar element name=authorvalueauthor_1 ; author_2 ; author_3/value/element /document /documents Do you know if I can use something like this: ... entity name=records processor=XPathEntityProcessor transformer=RegexTransformer ... field column=author xpath=/documents/document/element[@name='author']/value / field column=author_sort xpath=/documents/document/element[@name='author']/value / field column=author splitBy= ; / ... To just double the input and make author multiValued and author_sort a string field? Regards Bernd Am 17.03.2011 15:39, schrieb Gora Mohanty: On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Is there a way to have a kind of casting for copyField? I have author names in multiValued string field and need a sorting on it, but sort on field is only for multiValued=false. I'm trying to get multiValued content from one field to a non-multiValued text or string field for sorting. And this, if possible, during loading with copyField. Or any other solution? [...] Not sure about CopyField, but you could use a transformer to extract values from a multiValued field, and stick them into a single-valued field. Regards, Gora -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: Replication slows down massively during high load
You could always rsync the index dir and reload (old scripts). But this is still something we should investigate. I had this same issue on high load and never really found a solution. Did you try another Nic card? See if the Nic is configured right? Routing? Speed of transfer? Bill Bell Sent from mobile On Mar 17, 2011, at 9:11 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: On Mar 17, 2011, at 3:19 PM, Shawn Heisey wrote: On 3/17/2011 3:43 AM, Vadim Kisselmann wrote: Unfortunately, this doesn't seem to be the problem. The queries themselves are running fine. The problem is that the replications is crawling when there are many queries going on and that the replication speed stays low even after the load is gone. If you run iostat 5 what are typical values on each iteration for the various CPU states while you're doing load testing and replication at the same time? In particular, %iowait is important. CPU stats from top (iostat doesn't seem to show CPU load correctly): 90.1%us, 4.5%sy, 0.0%ni, 5.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Seems like I/O is not the bottleneck here. Other interesting thing: When Solr starts its replication under heavy load, it tries to download the whole index from master. From /solr/admin/replication/index.jsp: Current Replication Status Start Time: Thu Mar 17 15:57:20 CET 2011 Files Downloaded: 9 / 163 Downloaded: 83,04 MB / 97,75 GB [0.0%] Downloading File: _d5x.nrm, Downloaded: 86,82 KB / 86,82 KB [100.0%] Time Elapsed: 419s, Estimated Time Remaining: 504635s, Speed: 202,94 KB/s
Re: Parent-child options
Hi, - Original Message From: Yonik Seeley yo...@lucidimagination.com Subject: Re: Parent-child options On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: The dreaded parent-child without denormalization question. What are one's options for the following example: parent: shoes 3 children. each with 2 attributes/fields: color and size * color: red black orange * size: 10 11 12 The goal is to be able to search for: 1) color:red AND size:10 and get 1 hit for the above 2) color:red AND size:12 and get *no* matches because there are no red shoes of size 12, only size 10. What if you had this instead: color: red red orange size: 10 11 12 Do you need for color:red to return 1 or 2 (i.e. is the final answer in units of child hits or parent hits)? The final answer is the parent, which is shoes in this example. So: if the query is color:red AND size:10 the answer is: Yes, we got red shoes size 10 if the query is color:red AND size:11 the answer is: Yes, we got red shoes size 11 if the query is color:red AND size:12 the answer is: No, we don't have red shoes size 12 Thanks, Otis
Re: Solr Autosuggest help
Rahul, Go to your Solr Admin Analysis page, enter sci/tech, check appropriate check boxes, and see how sci/tech gets analyzed. This will lead you in the right direction. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: rahul asharud...@gmail.com To: solr-user@lucene.apache.org Sent: Thu, March 17, 2011 10:12:27 AM Subject: Re: Solr Autosuggest help hi, We have found that 'EnglishPorterFilterFactory' causes that issue. I believe that is used for stemming words. Once we commented that factory, it works fine. And another thing, currently I am checking about how the word 'sci/tech' will be indexed in solr. As mentioned in my previous email, if I search on sci/tech it wont send any results. But solr has the terms as sci/tech. When I search on other terms which also contain sci/tech, it returns both the words. Please let me know, if you have any idea regarding that.. If I came to know I will update this thread. thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2693601.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: from multiValued field to non-multiValued field with copyField?
Perhaps easiest thing for you right now, that you can do in any version of Solr, is translate your data at indexing time so you don't have to sort on a multi-valued field. Put the stuff in an additional field for sorting, where at index time you only put the greatest or least value (your choice) from a multi-valued set in, to have a single-valued field. Your sorting on a multi-valued field before, while Solr let you, was almost certainly resulting in unpredictable results in some cases, that you just hadn't noticed. Better to fix it up so it's predictable and reliable instead, no? On 3/17/2011 11:14 AM, Bernd Fehling wrote: Hi Yonik, actually some applications misused sorting on a multiValued field, like VuFind. And as a matter oft fact also FAST doesn't support this because it doesn't make sense. FAST distinguishes between multiValue and singleValue by just adding the seperator-FieldAttribute to the field. So I moved this from FAST index-profile to Solr DIH and placed the seperator there. But now I'm looking for a solution for VuFind. Easiest thing would be to have a kind of casting, may be for copyField. Regards, Bernd Am 17.03.2011 15:58, schrieb Yonik Seeley: On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Is there a way to have a kind of casting for copyField? I have author names in multiValued string field and need a sorting on it, but sort on field is only for multiValued=false. I'm trying to get multiValued content from one field to a non-multiValued text or string field for sorting. And this, if possible, during loading with copyField. Or any other solution? I need this solution due to patch SOLR-2339, which is now more strict. May be anyone else also. Hmmm, you're the second person that's relied on that (sorting on a multiValued field working). Was SOLR-2339 a mistake? -Yonik http://lucidimagination.com
Re: Parent-child options
On Thu, Mar 17, 2011 at 11:21 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, - Original Message From: Yonik Seeley yo...@lucidimagination.com Subject: Re: Parent-child options On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: The dreaded parent-child without denormalization question. What are one's options for the following example: parent: shoes 3 children. each with 2 attributes/fields: color and size * color: red black orange * size: 10 11 12 The goal is to be able to search for: 1) color:red AND size:10 and get 1 hit for the above 2) color:red AND size:12 and get *no* matches because there are no red shoes of size 12, only size 10. What if you had this instead: color: red red orange size: 10 11 12 Do you need for color:red to return 1 or 2 (i.e. is the final answer in units of child hits or parent hits)? The final answer is the parent, which is shoes in this example. So: if the query is color:red AND size:10 the answer is: Yes, we got red shoes size 10 if the query is color:red AND size:11 the answer is: Yes, we got red shoes size 11 if the query is color:red AND size:12 the answer is: No, we don't have red shoes size 12 Then yes, the join patch would work (as long as it's just filtering and you don't need relevancy of child hits to propagate to the parent). parent {category:shoes} child {parent:shoes, color:red, size:10} q={!join from=parent to=category}color:red AND size:10 If you had a query on the parent type docs, the join could also be used as an fq. -Yonik http://lucidimagination.com
FuzzyQuery rewrite
Rewriting fuzzy queries in spellchecker index is a good practice? When I rewrite these queries in the main index, the rewriting time is about 3.5 - 4 secs. Now, this rewrites takes a few milliseconds.
Re: from multiValued field to non-multiValued field with copyField?
Better to fix it up so it's predictable and reliable instead, no? Yes, you are absolutely right. Thats why I'm looking into this. But how would i stuff, say always author_1, from a multi-valued field into a single-valued (string or text) field? Ok, another solution comes up to my mind. Writing a processor for updateRequestProcessorChain, that might work. Regards, Bernd Am 17.03.2011 16:27, schrieb Jonathan Rochkind: Perhaps easiest thing for you right now, that you can do in any version of Solr, is translate your data at indexing time so you don't have to sort on a multi-valued field. Put the stuff in an additional field for sorting, where at index time you only put the greatest or least value (your choice) from a multi-valued set in, to have a single-valued field. Your sorting on a multi-valued field before, while Solr let you, was almost certainly resulting in unpredictable results in some cases, that you just hadn't noticed. Better to fix it up so it's predictable and reliable instead, no? On 3/17/2011 11:14 AM, Bernd Fehling wrote: Hi Yonik, actually some applications misused sorting on a multiValued field, like VuFind. And as a matter oft fact also FAST doesn't support this because it doesn't make sense. FAST distinguishes between multiValue and singleValue by just adding the seperator-FieldAttribute to the field. So I moved this from FAST index-profile to Solr DIH and placed the seperator there. But now I'm looking for a solution for VuFind. Easiest thing would be to have a kind of casting, may be for copyField. Regards, Bernd Am 17.03.2011 15:58, schrieb Yonik Seeley: On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Is there a way to have a kind of casting for copyField? I have author names in multiValued string field and need a sorting on it, but sort on field is only for multiValued=false. I'm trying to get multiValued content from one field to a non-multiValued text or string field for sorting. And this, if possible, during loading with copyField. Or any other solution? I need this solution due to patch SOLR-2339, which is now more strict. May be anyone else also. Hmmm, you're the second person that's relied on that (sorting on a multiValued field working). Was SOLR-2339 a mistake? -Yonik http://lucidimagination.com -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: Replication slows down massively during high load
Hi Bill, You could always rsync the index dir and reload (old scripts). I used them previously but was getting problems with them. The application querying the Solr doesn't cause enough load on it to trigger the issue. Yet. But this is still something we should investigate. Indeed :-) See if the Nic is configured right? Routing? Speed of transfer? Network doesn't seem to be the problem. Testing with iperf from slave to master yields a full gigabit, even while Solrmeter is hammering the server. Bill Bell Vadim
Re: Parent-child options
The standard answer, which is a kind of de-normalizing, is to index tokens like this: red_10 red_11orange_12 in another field, you could do these things with size first: 10_red 11_red 12_orange Now if you want to see what sizes of red you have, you can do a facet query with facet.prefix=red_ . You'll need to do a bit of parsing/interpreting client size to translate from the results you get (red_10, red_11) to telling the users sizes 10 and 11 are available. The second field with size first lets you do the same thing to answer what colors do we have in size X?. That gets unmanageable with more than 2-3 facet combinations, but with just 2 (or, pushing it, 3), can work out okay. You'd probably ALSO want to keep the facets you have with plain values red red orange etc, to support that first level of user-implementing. There is a bit more work to do on client side with this approach, Solr isn't just giving you exactly what you want in it's response, you've got to have logic for when to use the top-level facets and when to go to that second-level combo facet (red_12), but it's do-able. On 3/17/2011 11:21 AM, Otis Gospodnetic wrote: Hi, - Original Message From: Yonik Seeleyyo...@lucidimagination.com Subject: Re: Parent-child options On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: The dreaded parent-child without denormalization question. What are one's options for the following example: parent: shoes 3 children. each with 2 attributes/fields: color and size * color: red black orange * size: 10 11 12 The goal is to be able to search for: 1) color:red AND size:10 and get 1 hit for the above 2) color:red AND size:12 and get *no* matches because there are no red shoes of size 12, only size 10. What if you had this instead: color: red red orange size: 10 11 12 Do you need for color:red to return 1 or 2 (i.e. is the final answer in units of child hits or parent hits)? The final answer is the parent, which is shoes in this example. So: if the query is color:red AND size:10 the answer is: Yes, we got red shoes size 10 if the query is color:red AND size:11 the answer is: Yes, we got red shoes size 11 if the query is color:red AND size:12 the answer is: No, we don't have red shoes size 12 Thanks, Otis
memory not getting released in tomcat after pushing large documents
Hi, I am very new to SOLR and facing a lot of issues when using SOLR to push large documents. I have solr running in tomcat. I have allocated about 4gb memory (-Xmx) but I am pushing about twenty five 100 mb documents and gives heap space and fails. Also I tried pushing just 1 document. It went thru successfully, but the tomcat memory does not come down. It consumes about a gig memory for just one 100 mb document and does not release it. Please let me know if I am making any mistake in configuration/ or set up. Here is the stack trace: SEVERE: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515) at java.lang.StringBuffer.append(StringBuffer.java:306) at java.io.StringWriter.write(StringWriter.java:77) at com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.java:1570) at com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1488) at com.sun.org.apache.xml.internal.serializer.ToHTMLStream.characters(ToHTMLStream.java:1529) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.characters(TransformerHandlerImpl.java:168) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(CVExtractingDocumentLoader.java:349) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122) Thanks for help, Geeta **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you.
Info about Debugging SOLR in Eclipse
Hi, Can some please let me know the steps on how can I debug the solr code in my eclipse? I tried to compile the source, use the jars and place in tomcat where I am running solr. And do remote debugging, but it did not stop at any break point. I also tried to write a sample standalone java class to push the document. But I stopped at solr j classes and not solr server classes. Please let me know if I am making any mistake. Regards, Geeta **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you.
Re: memory not getting released in tomcat after pushing large documents
Hi, 25*100MB=2.5GB will most likely fail with just 4GB of heap space. But consecutive single `pushes` as you call it, of 25MB documents should work fine. Heap memory will only drop after the garbage collector comes along. Cheers, On Thursday 17 March 2011 17:12:46 Geeta Subramanian wrote: Hi, I am very new to SOLR and facing a lot of issues when using SOLR to push large documents. I have solr running in tomcat. I have allocated about 4gb memory (-Xmx) but I am pushing about twenty five 100 mb documents and gives heap space and fails. Also I tried pushing just 1 document. It went thru successfully, but the tomcat memory does not come down. It consumes about a gig memory for just one 100 mb document and does not release it. Please let me know if I am making any mistake in configuration/ or set up. Here is the stack trace: SEVERE: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java: 100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515) at java.lang.StringBuffer.append(StringBuffer.java:306) at java.io.StringWriter.write(StringWriter.java:77) at com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream. java:1570) at com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.ja va:1488) at com.sun.org.apache.xml.internal.serializer.ToHTMLStream.characters(ToHTMLS tream.java:1529) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.charac ters(TransformerHandlerImpl.java:168) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecor ator.java:124) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.j ava:153) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecor ator.java:124) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecor ator.java:124) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java: 39) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java: 151) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.jav a:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(CVEx tractingDocumentLoader.java:349) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Content StreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas e.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleReque st(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java :337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav a:240) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicati onFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilter Chain.java:206) at filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.jav a:122) Thanks for help, Geeta **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Info about Debugging SOLR in Eclipse
http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in- eclipse On Thursday 17 March 2011 17:17:30 Geeta Subramanian wrote: Hi, Can some please let me know the steps on how can I debug the solr code in my eclipse? I tried to compile the source, use the jars and place in tomcat where I am running solr. And do remote debugging, but it did not stop at any break point. I also tried to write a sample standalone java class to push the document. But I stopped at solr j classes and not solr server classes. Please let me know if I am making any mistake. Regards, Geeta **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: memory not getting released in tomcat after pushing large documents
On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian gsubraman...@commvault.com wrote: at com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(CVExtractingDocumentLoader.java:349) Looks like you're using a custom update handler. Perhaps that's accidentally hanging onto memory? -Yonik http://lucidimagination.com
RE: memory not getting released in tomcat after pushing large documents
Hi, Thanks for the reply. I am sorry, the logs from where I posted does have a Custom Update Handler. But I have a local setup, which does not have a custome update handler, its as its downloaded from SOLR site, even that gives me heap space. at java.util.Arrays.copyOf(Unknown Source) at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source) at java.lang.AbstractStringBuilder.append(Unknown Source) at java.lang.StringBuilder.append(Unknown Source) at org.apache.solr.handler.extraction.Solrtik ContentHandler.characters(SolrContentHandler.java:257) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) Also, in general, if I post 25 * 100 mb docs to solr, how much should be the ideal heap space set? Also, I see that when I push a single document of 100 mb, in task manager I see that about 900 mb memory is been used up, and some subsequent push keeps the memory about 900mb, so at what point there can be OOM crash? When I ran the YourKit Profiler, I saw that around 1 gig of memory was just consumed by char[] , String []. How can I find out who is creating these(is it SOLR or TIKA) and free up these objects? Thank you so much for your time and help, Regards, Geeta -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 17 March, 2011 12:21 PM To: solr-user@lucene.apache.org Cc: Geeta Subramanian Subject: Re: memory not getting released in tomcat after pushing large documents On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian gsubraman...@commvault.com wrote: at com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load( CVExtractingDocumentLoader.java:349) Looks like you're using a custom update handler. Perhaps that's accidentally hanging onto memory? -Yonik http://lucidimagination.com **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you.
RE: Info about Debugging SOLR in Eclipse
Hi Markus, Thanks, I had already followed the steps of this site. But I am not able to DEBUG the SOLR classes though I am able to run the solr. I want to see the code flow from the server side, especially the point where solr calls tika and it gets the content from tika. Thanks for the time help, Regards, Geeta -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: 17 March, 2011 12:22 PM To: solr-user@lucene.apache.org Cc: Geeta Subramanian Subject: Re: Info about Debugging SOLR in Eclipse http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in- eclipse On Thursday 17 March 2011 17:17:30 Geeta Subramanian wrote: Hi, Can some please let me know the steps on how can I debug the solr code in my eclipse? I tried to compile the source, use the jars and place in tomcat where I am running solr. And do remote debugging, but it did not stop at any break point. I also tried to write a sample standalone java class to push the document. But I stopped at solr j classes and not solr server classes. Please let me know if I am making any mistake. Regards, Geeta **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you.
Re: Sorting on multiValued fields via function query
: But if lucene now can sort a multi-valued field without crashing when there : are 'too many' unique values, and with easily described and predictable : semantics (use the minimal value in the multi-valued field as sort key) -- : then it probably makes more sense for Solr to let you do that if you really : want to, give you enough rope to hang yourself. (Clarification: it's the the *maximal* value that gets used by lucene in that situation) I disagree. If we do what you describe we'd be relying on users to recognize when the sort logic is silently doing something tricky under the covers and make a concious decision as to if that was what they want, and if not then change their indexing to account for it. That seems like a recipe for confusion and unexpected behavior. with SOLR-2339 in place, we tell users explicitly and up front what you are attempting to do can not work as specified and we force them to decide in advance how they want to deal with it -- by either indexing the lowest value or hte highest value (or both in distinct fields). As the code stands now: we fail fast and let the person building hte index make a decision. If we silently sort on the maximal value, we leave nasty headache for people who don't realize they are missusing a multiValued field and then wonder why some sorts don't do what they expect in some situations. Bottom line: from day 1, we have always documented that sorting on multiValued fields (or fields that produced more then one document per document) didn't work. If people didn't notice that documentation, they aren't likely to notice any documentation that says it will sort on the maximal value either -- SOLR-2339 may introduce a pain point for people upgrading, but it introduces it early and loudly, not quietly at some arbitrary moment in the future when they're beating their heads against a desk wondering why some sort isn't working the way they expect it to becuase they added some more values to a few documents. -Hoss
Segments and Memory Correlate?
Hi folks, I ran into problem today where I am no longer able to execute any queries :( due to Out of Memory issues. I am in the process of investigating the use of different mergeFactors, or even different merge policies all together. My question is if I have many segments (i.e. smaller sized segments), will that also reduce the total memory in RAM required for searching? (my System is currently allocated 8GB ram and has a ~255GB index). (I'm not fully up on the 'default merge policy' but I believe with a mergeFactor of 10, that would mean each segment should be approaching about 25Gb? with ~543 million documents of note: this is all running on 1 server. As seen below. SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.cache.LongValuesCreator.fillLongValues(LongValuesCreator.java:141) at org.apache.lucene.search.cache.LongValuesCreator.validate(LongValuesCreator.java:84) at org.apache.lucene.search.cache.LongValuesCreator.create(LongValuesCreator.java:74) at org.apache.lucene.search.cache.LongValuesCreator.create(LongValuesCreator.java:37) at org.apache.lucene.search.FieldCacheImpl$Cache.createValue(FieldCacheImpl.java:155) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:188) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:337) at org.apache.lucene.search.FieldComparator$LongComparator.setNextReader(FieldComparator.java:504) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:207) at org.apache.lucene.search.Searcher.search(Searcher.java:101) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1389) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1285) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:344) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:273) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1324) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at com.openmarket.servletfilters.LogToCSVFilter.doFilter(LogToCSVFilter.java:89) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at com.openmarket.servletfilters.GZipAutoDeflateFilter.doFilter(GZipAutoDeflateFilter.java:66) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) ...etc -- View this message in context: http://lucene.472066.n3.nabble.com/Segments-and-Memory-Correlate-tp2694747p2694747.html Sent from the Solr - User mailing list archive at Nabble.com.
OOM for large files
Hi, I am getting OOM after posting a 100 Mb document to SOLR with trace: Exception in thread main org.apache.solr.common.SolrException: Java heap space java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Unknown Source) at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source) at java.lang.AbstractStringBuilder.append(Unknown Source) at java.lang.StringBuilder.append(Unknown Source) at org.apache.solr.handler.extraction.Solrtik ContentHandler.characters(SolrContentHandler.java:257) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.se I have given 1024M memory. But still this fails, so, can somebody tell me the minimum heap size required w.r.t. file size so that document get indexed successfully? Also just a weird question: In Tika's code, there is a place where char[] is initialized to 4096. Then when this used in StringWriter, if the array is full it does an expandCapacity (as highlighted in logs), there is an array copy operation. So with just 4kb, if I want to process a 100mb document, a lot of char arrays will be generated and we need to depend on GC for getting them cleaned. Is there any idea, if I change the Tika code to initialize the char array with more than ~4k , will there be any performance improvement? Thanks for your time, Regards, Geeta **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you.
Helpful new JVM parameters
We're on the final stretch in getting our product database in Production with Solr. We have 13m wide-ish records with quite a few stored fields in a single index (no shards). We sort on at least a dozen fields and facet on 20-30. One thing that came up in QA testing is we were getting full gc's due to promotion failed conditions. This led us to believe we were dealing with large objects being created and a fragmented old generation. After improving, but not solving, the problem by tweaking conventional jvm parameters, our JVM expert learned about some newer tuning params included in Sun/Oracle's JDK 1.6.0_24 (we're running RHEL x64, but I think these are available on other platforms too): These 3 options dramatically reduced the # objects getting promoted into the Old Gen, reducing fragmentation and CMS frequency time: -XX:+UseStringCache -XX:+OptimizeStringConcat -XX:+UseCompressedStrings This uses compressed pointers on a 64-bit JVM, significantly reducing the memory performance penalty in using a 64-bit jvm over 32-bit. This reduced our new GC (ParNew) time significantly: -XX:+UseCompressedOops The default for this was causing CMS to begin too late sometimes. (the documentated 68% proved false in our case. We figured it was defaulting close to 90%) Much lower than 75%, though, and CMS ran far too often: -XX:CMSInitiatingOccupancyFraction=75 This made the stop-the-world pauses during CMS much shorter: -XX:+CMSParallelRemarkEnabled We use these in conjunction with CMS/ParNew and a 22gb heap (64gb total on the box), with a 1.2G newSize/maxNewSize. In case anyone else is having similar issues, we thought we would share our experience with these newer options. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311
Re: Helpful new JVM parameters
Awesome, very helpful. Do you maybe want to add this to the Solr wiki somewhere? Finding some advice for JVM tuning for Solr can be challenging, and you've explained what you did and why very well. On 3/17/2011 2:59 PM, Dyer, James wrote: We're on the final stretch in getting our product database in Production with Solr. We have 13m wide-ish records with quite a few stored fields in a single index (no shards). We sort on at least a dozen fields and facet on 20-30. One thing that came up in QA testing is we were getting full gc's due to promotion failed conditions. This led us to believe we were dealing with large objects being created and a fragmented old generation. After improving, but not solving, the problem by tweaking conventional jvm parameters, our JVM expert learned about some newer tuning params included in Sun/Oracle's JDK 1.6.0_24 (we're running RHEL x64, but I think these are available on other platforms too): These 3 options dramatically reduced the # objects getting promoted into the Old Gen, reducing fragmentation and CMS frequency time: -XX:+UseStringCache -XX:+OptimizeStringConcat -XX:+UseCompressedStrings This uses compressed pointers on a 64-bit JVM, significantly reducing the memory performance penalty in using a 64-bit jvm over 32-bit. This reduced our new GC (ParNew) time significantly: -XX:+UseCompressedOops The default for this was causing CMS to begin too late sometimes. (the documentated 68% proved false in our case. We figured it was defaulting close to 90%) Much lower than 75%, though, and CMS ran far too often: -XX:CMSInitiatingOccupancyFraction=75 This made the stop-the-world pauses during CMS much shorter: -XX:+CMSParallelRemarkEnabled We use these in conjunction with CMS/ParNew and a 22gb heap (64gb total on the box), with a 1.2G newSize/maxNewSize. In case anyone else is having similar issues, we thought we would share our experience with these newer options. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311
Re: Sorting on multiValued fields via function query
On Thu, Mar 17, 2011 at 2:12 PM, Chris Hostetter hossman_luc...@fucit.org wrote: As the code stands now: we fail fast and let the person building hte index make a decision. Indexing two fields when one could work is unfortunate though. I think what we should support (eventually) is a max() function will also work on a multi-valued field and select the maximum value (i.e. it will simply bypass the check for multi-valued fields). Then one can utilize sort-by-function to do sort=max(author) asc -Yonik http://lucidimagination.com
dismax 1.4.1 and pure negative queries
Should 1.4.1 dismax query parser be able to handle pure negative queries like: q=-foo q=-foo -bar It kind of seems to me trying it out that it can NOT. Can anyone else verify? The documentation I can find doesn't say one way or another. Which is odd because the documentation for straight solr-lucene query parser athttp://wiki.apache.org/solr/SolrQuerySyntax suggests that straight solr-lucene query parser_can_ handle pure negative. That seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm misinterpreting or misunderstanding my experimental results.
Re: Info about Debugging SOLR in Eclipse
Can you use jetty? http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse On Thu, Mar 17, 2011 at 12:17 PM, Geeta Subramanian gsubraman...@commvault.com wrote: Hi, Can some please let me know the steps on how can I debug the solr code in my eclipse? I tried to compile the source, use the jars and place in tomcat where I am running solr. And do remote debugging, but it did not stop at any break point. I also tried to write a sample standalone java class to push the document. But I stopped at solr j classes and not solr server classes. Please let me know if I am making any mistake. Regards, Geeta **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you.
Re: Info about Debugging SOLR in Eclipse
The instructions refer to the 'Run configuration' menu. Did you try 'Debug configurations'? On Thu, Mar 17, 2011 at 3:27 PM, Peter Keegan peterlkee...@gmail.comwrote: Can you use jetty? http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse On Thu, Mar 17, 2011 at 12:17 PM, Geeta Subramanian gsubraman...@commvault.com wrote: Hi, Can some please let me know the steps on how can I debug the solr code in my eclipse? I tried to compile the source, use the jars and place in tomcat where I am running solr. And do remote debugging, but it did not stop at any break point. I also tried to write a sample standalone java class to push the document. But I stopped at solr j classes and not solr server classes. Please let me know if I am making any mistake. Regards, Geeta **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you.
Adding the suggest component
Hi all, When I installed Solr, I downloaded the most recent version (1.4.1) I believe. I wanted to implement the Suggester ( http://wiki.apache.org/solr/Suggester). I copied and pasted the information there into my solrconfig.xml file but I'm getting the following error: Error loading class 'org.apache.solr.spelling.suggest.Suggester' I read up on this error and found that I needed to checkout a newer version from SVN. I checked out a full version and copied the contents of src/java/org/apache/spelling/suggest to the same location on my set up. However, I am still receiving this error. Did I not put the files in the right place? What am I doing incorrectly? Thanks, Brian Lamb
Re: memory not getting released in tomcat after pushing large documents
In your solrconfig.xml, Are you specifying ramBufferSizeMB or maxBufferedDocs? -Yonik http://lucidimagination.com On Thu, Mar 17, 2011 at 12:27 PM, Geeta Subramanian gsubraman...@commvault.com wrote: Hi, Thanks for the reply. I am sorry, the logs from where I posted does have a Custom Update Handler. But I have a local setup, which does not have a custome update handler, its as its downloaded from SOLR site, even that gives me heap space. at java.util.Arrays.copyOf(Unknown Source) at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source) at java.lang.AbstractStringBuilder.append(Unknown Source) at java.lang.StringBuilder.append(Unknown Source) at org.apache.solr.handler.extraction.Solrtik ContentHandler.characters(SolrContentHandler.java:257) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) Also, in general, if I post 25 * 100 mb docs to solr, how much should be the ideal heap space set? Also, I see that when I push a single document of 100 mb, in task manager I see that about 900 mb memory is been used up, and some subsequent push keeps the memory about 900mb, so at what point there can be OOM crash? When I ran the YourKit Profiler, I saw that around 1 gig of memory was just consumed by char[] , String []. How can I find out who is creating these(is it SOLR or TIKA) and free up these objects? Thank you so much for your time and help, Regards, Geeta -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 17 March, 2011 12:21 PM To: solr-user@lucene.apache.org Cc: Geeta Subramanian Subject: Re: memory not getting released in tomcat after pushing large documents On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian gsubraman...@commvault.com wrote: at com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load( CVExtractingDocumentLoader.java:349) Looks like you're using a custom update handler. Perhaps that's accidentally hanging onto memory? -Yonik http://lucidimagination.com **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you.
RE: memory not getting released in tomcat after pushing large documents
Hi Yonik, I am not setting the ramBufferSizeMB or maxBufferedDocs params... DO I need to for Indexing? Regards, Geeta -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 17 March, 2011 3:45 PM To: Geeta Subramanian Cc: solr-user@lucene.apache.org Subject: Re: memory not getting released in tomcat after pushing large documents In your solrconfig.xml, Are you specifying ramBufferSizeMB or maxBufferedDocs? -Yonik http://lucidimagination.com On Thu, Mar 17, 2011 at 12:27 PM, Geeta Subramanian gsubraman...@commvault.com wrote: Hi, Thanks for the reply. I am sorry, the logs from where I posted does have a Custom Update Handler. But I have a local setup, which does not have a custome update handler, its as its downloaded from SOLR site, even that gives me heap space. at java.util.Arrays.copyOf(Unknown Source) at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source) at java.lang.AbstractStringBuilder.append(Unknown Source) at java.lang.StringBuilder.append(Unknown Source) at org.apache.solr.handler.extraction.Solrtik ContentHandler.characters(SolrContentHandler.java:257) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD ecorator.java:124) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandl er.java:153) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD ecorator.java:124) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD ecorator.java:124) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.j ava:39) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java :61) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java: 113) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.j ava:151) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler .java:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99 ) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:11 2) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extra ctingDocumentLoader.java:193) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con tentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle rBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleR equest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter. java:337) Also, in general, if I post 25 * 100 mb docs to solr, how much should be the ideal heap space set? Also, I see that when I push a single document of 100 mb, in task manager I see that about 900 mb memory is been used up, and some subsequent push keeps the memory about 900mb, so at what point there can be OOM crash? When I ran the YourKit Profiler, I saw that around 1 gig of memory was just consumed by char[] , String []. How can I find out who is creating these(is it SOLR or TIKA) and free up these objects? Thank you so much for your time and help, Regards, Geeta -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 17 March, 2011 12:21 PM To: solr-user@lucene.apache.org Cc: Geeta Subramanian Subject: Re: memory not getting released in tomcat after pushing large documents On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian gsubraman...@commvault.com wrote: at com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load ( CVExtractingDocumentLoader.java:349) Looks like you're using a custom update handler. Perhaps that's accidentally hanging onto memory? -Yonik http://lucidimagination.com **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have
Re: dismax 1.4.1 and pure negative queries
Hi, It works just as expected, but not in a phrase query. Get rid of your quotes and you'll be fine. Cheers, Should 1.4.1 dismax query parser be able to handle pure negative queries like: q=-foo q=-foo -bar It kind of seems to me trying it out that it can NOT. Can anyone else verify? The documentation I can find doesn't say one way or another. Which is odd because the documentation for straight solr-lucene query parser athttp://wiki.apache.org/solr/SolrQuerySyntax suggests that straight solr-lucene query parser_can_ handle pure negative. That seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm misinterpreting or misunderstanding my experimental results.
Re: dismax 1.4.1 and pure negative queries
My fault for putting in the quotes in the email, I actually don't have tests in my quotes, just tried again to make sure. And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I think it does not actually work? On 3/17/2011 3:52 PM, Markus Jelsma wrote: Hi, It works just as expected, but not in a phrase query. Get rid of your quotes and you'll be fine. Cheers, Should 1.4.1 dismax query parser be able to handle pure negative queries like: q=-foo q=-foo -bar It kind of seems to me trying it out that it can NOT. Can anyone else verify? The documentation I can find doesn't say one way or another. Which is odd because the documentation for straight solr-lucene query parser athttp://wiki.apache.org/solr/SolrQuerySyntax suggests that straight solr-lucene query parser_can_ handle pure negative. That seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm misinterpreting or misunderstanding my experimental results.
Re: dismax 1.4.1 and pure negative queries
Oh i see, i overlooked your first query. A query with one term that is negated will yield zero results, it doesn't return all documents because nothing matches. It's, if i remember correctly, the same as when you're looking for a field that doesn't have a value: q=-field:[* TO *]. My fault for putting in the quotes in the email, I actually don't have tests in my quotes, just tried again to make sure. And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I think it does not actually work? On 3/17/2011 3:52 PM, Markus Jelsma wrote: Hi, It works just as expected, but not in a phrase query. Get rid of your quotes and you'll be fine. Cheers, Should 1.4.1 dismax query parser be able to handle pure negative queries like: q=-foo q=-foo -bar It kind of seems to me trying it out that it can NOT. Can anyone else verify? The documentation I can find doesn't say one way or another. Which is odd because the documentation for straight solr-lucene query parser athttp://wiki.apache.org/solr/SolrQuerySyntax suggests that straight solr-lucene query parser_can_ handle pure negative. That seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm misinterpreting or misunderstanding my experimental results.
Re: dismax 1.4.1 and pure negative queries
purely negative queries work with Solr's default (lucene) query parser. But don't with dismax. Or so that seems with my experience testing this out just now, on trunk. In chatting with Jonathan further off-list we discussed having the best of both worlds q={!lucene}*:* AND NOT _query_:{!dismax ...}inverse of original negative query But this of course requires detecting that a query is all negative. edismax can handle purely negative, FWIW, -ipod = +(-DisjunctionMaxQuery((text:ipod)) +MatchAllDocsQuery(*:*)) Erik On Mar 17, 2011, at 16:45 , Markus Jelsma wrote: Oh i see, i overlooked your first query. A query with one term that is negated will yield zero results, it doesn't return all documents because nothing matches. It's, if i remember correctly, the same as when you're looking for a field that doesn't have a value: q=-field:[* TO *]. My fault for putting in the quotes in the email, I actually don't have tests in my quotes, just tried again to make sure. And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I think it does not actually work? On 3/17/2011 3:52 PM, Markus Jelsma wrote: Hi, It works just as expected, but not in a phrase query. Get rid of your quotes and you'll be fine. Cheers, Should 1.4.1 dismax query parser be able to handle pure negative queries like: q=-foo q=-foo -bar It kind of seems to me trying it out that it can NOT. Can anyone else verify? The documentation I can find doesn't say one way or another. Which is odd because the documentation for straight solr-lucene query parser athttp://wiki.apache.org/solr/SolrQuerySyntax suggests that straight solr-lucene query parser_can_ handle pure negative. That seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm misinterpreting or misunderstanding my experimental results.
Re: dismax 1.4.1 and pure negative queries
Yeah, looks to me like two or more negated terms does the same thing, not just one. q=-foo -bar -baz Also always returns zero hits. For the same reason. I understand why (sort of), although at the same time there is a logical answer to this question -foo -bar -baz, and oddly, 1.4.1 _lucene_ query parser _can_ handle it. Erik Hatcher in IRC gave me one transformation of this query that still uses dismax as a unit, but can get you a solution. (I want to use dismax in this case for it's convenient aggregation of multiple fields in qf, not so much for actual disjunction-maximum behavior). defType=lucene q=*:* AND NOT _query_:{!dismax} foo bar baz I might be able to work with that in my situation. But it also seems like something that dismax could take care of for you in such a situation. It looks from the documentation like the newer (not in 1.4.1) edismax does in at least some cases, where the pure negative query is inside grouping/subquery parens, it's not clear to me if it does it in general or not. On 3/17/2011 4:45 PM, Markus Jelsma wrote: Oh i see, i overlooked your first query. A query with one term that is negated will yield zero results, it doesn't return all documents because nothing matches. It's, if i remember correctly, the same as when you're looking for a field that doesn't have a value: q=-field:[* TO *]. My fault for putting in the quotes in the email, I actually don't have tests in my quotes, just tried again to make sure. And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I think it does not actually work? On 3/17/2011 3:52 PM, Markus Jelsma wrote: Hi, It works just as expected, but not in a phrase query. Get rid of your quotes and you'll be fine. Cheers, Should 1.4.1 dismax query parser be able to handle pure negative queries like: q=-foo q=-foo -bar It kind of seems to me trying it out that it can NOT. Can anyone else verify? The documentation I can find doesn't say one way or another. Which is odd because the documentation for straight solr-lucene query parser athttp://wiki.apache.org/solr/SolrQuerySyntax suggests that straight solr-lucene query parser_can_ handle pure negative. That seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm misinterpreting or misunderstanding my experimental results.
RE: Info about Debugging SOLR in Eclipse
Hi All, Thanks for the help... I am now able to debug my solr. :-) -Original Message- From: pkeegan01...@gmail.com [mailto:pkeegan01...@gmail.com] On Behalf Of Peter Keegan Sent: 17 March, 2011 3:33 PM To: solr-user@lucene.apache.org Subject: Re: Info about Debugging SOLR in Eclipse The instructions refer to the 'Run configuration' menu. Did you try 'Debug configurations'? On Thu, Mar 17, 2011 at 3:27 PM, Peter Keegan peterlkee...@gmail.comwrote: Can you use jetty? http://www.lucidimagination.com/developers/articles/setting-up-apache- solr-in-eclipse On Thu, Mar 17, 2011 at 12:17 PM, Geeta Subramanian gsubraman...@commvault.com wrote: Hi, Can some please let me know the steps on how can I debug the solr code in my eclipse? I tried to compile the source, use the jars and place in tomcat where I am running solr. And do remote debugging, but it did not stop at any break point. I also tried to write a sample standalone java class to push the document. But I stopped at solr j classes and not solr server classes. Please let me know if I am making any mistake. Regards, Geeta **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you.
Re: dismax 1.4.1 and pure negative queries
On 3/17/2011 5:02 PM, Jonathan Rochkind wrote: defType=lucene q=*:* AND NOT _query_:{!dismax} foo bar baz Oops, forgot a part, for anyone reading this and wanting to use it as a solution. You can transform: $defType=dismax q=-foo -bar -baz To: defType=lucene q=*:* AND NOT _query_:{!dismax mm=1}foo bar baz And have basically equivalent semantics to what you meant but which dismax won't do. The mm=1 is important, left that out before. Jonathan I might be able to work with that in my situation. But it also seems like something that dismax could take care of for you in such a situation. It looks from the documentation like the newer (not in 1.4.1) edismax does in at least some cases, where the pure negative query is inside grouping/subquery parens, it's not clear to me if it does it in general or not. On 3/17/2011 4:45 PM, Markus Jelsma wrote: Oh i see, i overlooked your first query. A query with one term that is negated will yield zero results, it doesn't return all documents because nothing matches. It's, if i remember correctly, the same as when you're looking for a field that doesn't have a value: q=-field:[* TO *]. My fault for putting in the quotes in the email, I actually don't have tests in my quotes, just tried again to make sure. And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I think it does not actually work? On 3/17/2011 3:52 PM, Markus Jelsma wrote: Hi, It works just as expected, but not in a phrase query. Get rid of your quotes and you'll be fine. Cheers, Should 1.4.1 dismax query parser be able to handle pure negative queries like: q=-foo q=-foo -bar It kind of seems to me trying it out that it can NOT. Can anyone else verify? The documentation I can find doesn't say one way or another. Which is odd because the documentation for straight solr-lucene query parser athttp://wiki.apache.org/solr/SolrQuerySyntax suggests that straight solr-lucene query parser_can_ handle pure negative. That seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm misinterpreting or misunderstanding my experimental results.
DIH Issue(newbie to solr)
I am a newbie to solr I have an issue with DIH but unable to pinpoint what is causing the issue. I am using the demo jetty installation of Solr and tried to create a project with new schema.xml, solrconfig.xml and data-config.xml files. when I run http://131.187.88.221:8983/solr/dataimport?command=full-import; this is what I get: I am unable to index documents(it doesn't throw any error though). ## − 0 0 − − test-data-config.xml full-import idle − 0 1 0 2011-03-17 17:07:18 − Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. 2011-03-17 17:07:18 2011-03-17 17:07:18 0 0:0:0.119 − This response format is experimental. It is likely to change in the future. # I do not find any log files(except on the console). And here are the messages from the console: ### INFO: Starting Full Import Mar 17, 2011 5:08:20 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Mar 17, 2011 5:08:20 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/local/home/neha/ruby/Solr/apache-solr-1.4.1/example/solr/data/index,segFN=segments_k,version=1300286691490,generation=20,filenames=[segments_k] Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1300286691490 Mar 17, 2011 5:08:20 PM org.apache.solr.handler.dataimport.DocBuilder finish INFO: Import completed successfully Mar 17, 2011 5:08:20 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=false) Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy onCommit INFO: SolrDeletionPolicy.onCommit: commits:num=2 commit{dir=/local/home/neha/ruby/Solr/apache-solr-1.4.1/example/solr/data/index,segFN=segments_k,version=1300286691490,generation=20,filenames=[segments_k] commit{dir=/local/home/neha/ruby/Solr/apache-solr-1.4.1/example/solr/data/index,segFN=segments_l,version=1300286691491,generation=21,filenames=[segments_l] Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1300286691491 Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher INFO: Opening Searcher@d1329 main Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming Searcher@d1329 main from Searcher@1dcc2a3 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=8,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0,item_subject_topic_facet={field=subject_topic_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_subject_geo_facet={field=subject_geo_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_subject_era_facet={field=subject_era_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_pub_date={field=pub_date,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_language_facet={field=language_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_lc_b4cutter_facet={field=lc_b4cutter_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_lc_alpha_facet={field=lc_alpha_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_lc_1letter_facet={field=lc_1letter_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2}} Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for Searcher@d1329 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming Searcher@d1329 main from Searcher@1dcc2a3 main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=2,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for Searcher@d1329 main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=2,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming Searcher@d1329 main from Searcher@1dcc2a3 main
Retrieving Ranking (Position)
Hi, I am looking for the way to retrieve a ranking (or position) of the document matched in the result set. I can get the data, then parse it to find the position of the document matched, but am looking for the way if there is a feature. Thanks, Jae
Re: memory not getting released in tomcat after pushing large documents
On Thu, Mar 17, 2011 at 3:55 PM, Geeta Subramanian gsubraman...@commvault.com wrote: Hi Yonik, I am not setting the ramBufferSizeMB or maxBufferedDocs params... DO I need to for Indexing? No, the default settings that come with Solr should be fine. You should verify that they have not been changed however. An older solrconfig that used maxBufferedDocs could cause an OOM with large documents since it buffered a certain amount of documents instead a certain amount of RAM. Perhaps post your solrconfig (or at least the sections related to index configuration). -Yonik http://lucidimagination.com Regards, Geeta -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 17 March, 2011 3:45 PM To: Geeta Subramanian Cc: solr-user@lucene.apache.org Subject: Re: memory not getting released in tomcat after pushing large documents In your solrconfig.xml, Are you specifying ramBufferSizeMB or maxBufferedDocs? -Yonik http://lucidimagination.com On Thu, Mar 17, 2011 at 12:27 PM, Geeta Subramanian gsubraman...@commvault.com wrote: Hi, Thanks for the reply. I am sorry, the logs from where I posted does have a Custom Update Handler. But I have a local setup, which does not have a custome update handler, its as its downloaded from SOLR site, even that gives me heap space. at java.util.Arrays.copyOf(Unknown Source) at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source) at java.lang.AbstractStringBuilder.append(Unknown Source) at java.lang.StringBuilder.append(Unknown Source) at org.apache.solr.handler.extraction.Solrtik ContentHandler.characters(SolrContentHandler.java:257) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD ecorator.java:124) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandl er.java:153) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD ecorator.java:124) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD ecorator.java:124) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.j ava:39) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java :61) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java: 113) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.j ava:151) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler .java:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99 ) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:11 2) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extra ctingDocumentLoader.java:193) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con tentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle rBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleR equest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter. java:337) Also, in general, if I post 25 * 100 mb docs to solr, how much should be the ideal heap space set? Also, I see that when I push a single document of 100 mb, in task manager I see that about 900 mb memory is been used up, and some subsequent push keeps the memory about 900mb, so at what point there can be OOM crash? When I ran the YourKit Profiler, I saw that around 1 gig of memory was just consumed by char[] , String []. How can I find out who is creating these(is it SOLR or TIKA) and free up these objects? Thank you so much for your time and help, Regards, Geeta -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 17 March, 2011 12:21 PM To: solr-user@lucene.apache.org Cc: Geeta Subramanian Subject: Re: memory not getting released in tomcat after pushing large documents On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian gsubraman...@commvault.com wrote: at com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load ( CVExtractingDocumentLoader.java:349) Looks like you're using a custom update handler. Perhaps that's accidentally hanging onto memory? -Yonik http://lucidimagination.com **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by
Re: Rename fields in a query
Given a Query object (name:firefox name:opera), is it possible 'rename' the fields names to, for example, (content:firefox content:opera)? By saying object you mean solrJ? Anyway, it that helps, with df parameter you can change fields. q=firefox operadf=name will be parsed into name:firefox name:opera q=firefox operadf=content will be parsed into content:firefox content:opera
Re: memory not getting released in tomcat after pushing large documents
On Thu, Mar 17, 2011 at 5:50 PM, Geeta Subramanian gsubraman...@commvault.com wrote: Here is the attached xml. In our xml, maxBufferedDocs is commented. I hope that's not causing any issue. The ramBufferSizeMB is 32Mb, will changing this be of any use to me? Nope... your index settings are fine. Perhaps something in extracting request handler or tika is holding onto memory. Has anyone else experienced/reproduced this? Geeta, can you open a JIRA issue? If you're actually giving the JVM 4G of heap (is this a 64 bit JVM?), this looks like a bug somewhere. -Yonik http://lucidimagination.com
Spacial Search Field Type
I am using Solr 1.4.1 (Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42) to be exact. I'm trying to implement that GeoSpacial field type by adding to the schema: fieldType name=location class=solr.LatLonType subFieldSuffix=_latLon/ dynamicField name=*_latlon type=location index=true stored=true / field name=geo type=location index=true stored=true multiValued=false / but I get the following errors: org.apache.solr.common.SolrException: Unknown fieldtype 'location' specified on field geo and org.apache.solr.common.SolrException: Error loading class 'solr.LatLonType' I thought I read that you had to have Solr 4.0 for the LatLon field type, but isn't 1.4 = 4.0? Do I need some type of patch or different version of Solr to use that field type?
Re: Spacial Search Field Type
I thought I read that you had to have Solr 4.0 for the LatLon field type, but isn't 1.4 = 4.0? Do I need some type of patch or different version of Solr to use that field type? No, 1.4 and 4.0 are different. You can checkout trunk http://wiki.apache.org/solr/HowToContribute#Getting_the_source_code
Re: Rename fields in a query
hi, Arslan! By object, I was saying an instance of [org.apache.lucene.search.Query]. For performance purposes, I'm wanting rewrite a fuzzy query in a field and, then, query in another. Thank you! On Thu, Mar 17, 2011 at 18:43, Ahmet Arslan iori...@yahoo.com wrote: Given a Query object (name:firefox name:opera), is it possible 'rename' the fields names to, for example, (content:firefox content:opera)? By saying object you mean solrJ? Anyway, it that helps, with df parameter you can change fields. q=firefox operadf=name will be parsed into name:firefox name:opera q=firefox operadf=content will be parsed into content:firefox content:opera
Re: Smart Pagination queries
: In order to paint Next links app would have to know total number of : records that user is eligible for read. getNumFound() will tell me that : there are total 4K records that Solr returned. If there wasn't any : entitlement rules then it could have been easier to determine how many : Next links to paint and when user clicks on Next pass in start : position appropriately in solr query. Since I have to apply post filter as : and when results are fetched from Solr is there a better way to achieve In an ideal world, you would do this using a custom plugin -- either a SearchComponent or a QParser used i na filter query. if you really have to do this client side, then a few basic rules come to mind... 1) allways over request. if you estimate that your user can only fiew 1/X docs in your total collection, and you want ot show Y results per page, then your rows param should be at least 2*X*Y (i picked 2 just for good measure, just because you know the average doesn't mean you know the real distrobution) 2) however many rows you get back, you need to keep track of the real start param you used, and at what in the current page you had enough docs to show the user -- that will determine your next start param. 3) wether you have a next link or not depends on: 3a) wether you had any left over the first time you over requested (see #2 above) 3b) wether numFound was greater then the index of the last item you got. ...if 3a and 3b are both false, you definitley don't need a next link. if either of them is true then you probably *should* give them a next link, but you still need to be prepared for the possibility that you won't have any more docs (they might only be half way through the result set, but every remaining doc might be something they arne't allowed to see) there's really no clean way to avoid the possibility completley, unless you really crank up how agressively you over request -- ultimatley if you over request *all* matches, then you can know definitively wether to give them a next link at any point. -Hoss
Re: Custom search filters
: Hi all, I am trying to use a custom search filter : (org.apache.lucene.search.Filter) but I am unsure of where I should configure : this. : : Would I have to create my own SearchHandler that would wrap this logic in? Any : example/suggestions out there? the easiest way to plugin a custom Filter is to wrap it in a ConstantScoreQuery and use it as part of the filters that SolrIndexSearcher applies (that way it will be cached independently and can be reused) you could do this in a SearchComponent where you decide when to generate the Filter based on query params and then add it explicitly (see ResponseBuilder.getFilters()). or you could do it in a QParserPlugin, in which case clients could optionally enable it by refering ot your QParser by name in the local params of an fq param. -Hoss
Re: Helpful new JVM parameters
will UseCompressedOops be useful? for application using less than 4GB memory, it will be better that 64bit reference. But for larger memory using application, it will not be cache friendly. JRocket the definite guide says: Naturally, 64 GB isn't a theoretical limit but just an example. It was mentioned because compressed references on 64-GB heaps have proven beneficial compared to full 64-bit pointers in some benchmarks and applications. What really matters, is how many bits can be spared and the performance benefit of this approach. In some cases, it might just be easier to use full length 64-bit pointers. 2011/3/18 Dyer, James james.d...@ingrambook.com: We're on the final stretch in getting our product database in Production with Solr. We have 13m wide-ish records with quite a few stored fields in a single index (no shards). We sort on at least a dozen fields and facet on 20-30. One thing that came up in QA testing is we were getting full gc's due to promotion failed conditions. This led us to believe we were dealing with large objects being created and a fragmented old generation. After improving, but not solving, the problem by tweaking conventional jvm parameters, our JVM expert learned about some newer tuning params included in Sun/Oracle's JDK 1.6.0_24 (we're running RHEL x64, but I think these are available on other platforms too): These 3 options dramatically reduced the # objects getting promoted into the Old Gen, reducing fragmentation and CMS frequency time: -XX:+UseStringCache -XX:+OptimizeStringConcat -XX:+UseCompressedStrings This uses compressed pointers on a 64-bit JVM, significantly reducing the memory performance penalty in using a 64-bit jvm over 32-bit. This reduced our new GC (ParNew) time significantly: -XX:+UseCompressedOops The default for this was causing CMS to begin too late sometimes. (the documentated 68% proved false in our case. We figured it was defaulting close to 90%) Much lower than 75%, though, and CMS ran far too often: -XX:CMSInitiatingOccupancyFraction=75 This made the stop-the-world pauses during CMS much shorter: -XX:+CMSParallelRemarkEnabled We use these in conjunction with CMS/ParNew and a 22gb heap (64gb total on the box), with a 1.2G newSize/maxNewSize. In case anyone else is having similar issues, we thought we would share our experience with these newer options. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311
RE: Helpful new JVM parameters
Our tests showed, in our situation, the compressed oops flag caused our minor (ParNew) generation time to decrease significantly. We're using a larger heap (22gb) and our index size is somewhere in the 40's gb total. I guess with any of these jvm parameters, it all depends on your situation and you need to test. In our case, this flag solved a real problem we were having. Whoever wrote the JRocket book you refer to no doubt had other scenarios in mind... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: Thursday, March 17, 2011 10:38 PM To: solr-user@lucene.apache.org Subject: Re: Helpful new JVM parameters will UseCompressedOops be useful? for application using less than 4GB memory, it will be better that 64bit reference. But for larger memory using application, it will not be cache friendly. JRocket the definite guide says: Naturally, 64 GB isn't a theoretical limit but just an example. It was mentioned because compressed references on 64-GB heaps have proven beneficial compared to full 64-bit pointers in some benchmarks and applications. What really matters, is how many bits can be spared and the performance benefit of this approach. In some cases, it might just be easier to use full length 64-bit pointers. 2011/3/18 Dyer, James james.d...@ingrambook.com: We're on the final stretch in getting our product database in Production with Solr. We have 13m wide-ish records with quite a few stored fields in a single index (no shards). We sort on at least a dozen fields and facet on 20-30. One thing that came up in QA testing is we were getting full gc's due to promotion failed conditions. This led us to believe we were dealing with large objects being created and a fragmented old generation. After improving, but not solving, the problem by tweaking conventional jvm parameters, our JVM expert learned about some newer tuning params included in Sun/Oracle's JDK 1.6.0_24 (we're running RHEL x64, but I think these are available on other platforms too): These 3 options dramatically reduced the # objects getting promoted into the Old Gen, reducing fragmentation and CMS frequency time: -XX:+UseStringCache -XX:+OptimizeStringConcat -XX:+UseCompressedStrings This uses compressed pointers on a 64-bit JVM, significantly reducing the memory performance penalty in using a 64-bit jvm over 32-bit. This reduced our new GC (ParNew) time significantly: -XX:+UseCompressedOops The default for this was causing CMS to begin too late sometimes. (the documentated 68% proved false in our case. We figured it was defaulting close to 90%) Much lower than 75%, though, and CMS ran far too often: -XX:CMSInitiatingOccupancyFraction=75 This made the stop-the-world pauses during CMS much shorter: -XX:+CMSParallelRemarkEnabled We use these in conjunction with CMS/ParNew and a 22gb heap (64gb total on the box), with a 1.2G newSize/maxNewSize. In case anyone else is having similar issues, we thought we would share our experience with these newer options. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311