Re: How to make UnInvertedField faster?
On Fri, Oct 21, 2011 at 4:37 PM, Michael McCandless luc...@mikemccandless.com wrote: Well... the limitation of DocValues is that it cannot handle more than one value per document (which UnInvertedField can). you can pack this into one byte[] or use more than one field? I don't see a real limitation here. simon Hopefully we can fix that at some point :) Mike McCandless http://blog.mikemccandless.com On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer simon.willna...@googlemail.com wrote: In trunk we have a feature called IndexDocValues which basically creates the uninverted structure at index time. You can then simply suck that into memory or even access it on disk directly (RandomAccess). Even if I can't help you right now this is certainly going to help you here. There is no need to uninvert at all anymore in lucene 4.0 simon On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan mr...@moreover.com wrote: I was wondering if anyone has any ideas for making UnInvertedField.uninvert() faster, or other alternatives for generating facets quickly. The vast majority of the CPU time for our Solr instances is spent generating UnInvertedFields after each commit. Here's an example of one of our slower fields: [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) - UnInverted multi-valued field {field=authorCS,memSize=38063628,tindexSize=422652, time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0} That is from an index with approximately 8 million documents. After each commit, it takes on average about 90 seconds to uninvert all the fields that we facet on. Any ideas at all would be greatly appreciated. -Michael
Re: Solr Open File Descriptors
Thanks for sharing your insights shawn On Mon, Oct 17, 2011 at 1:27 AM, Shawn Heisey s...@elyograg.org wrote: On 10/16/2011 12:01 PM, samarth s wrote: Hi, Is it safe to assume that with a megeFactor of 10 the open file descriptors required by solr would be around (1+ 10) * 10 = 110 ref: *http://onjava.com/pub/a/**onjava/2003/03/05/lucene.html#** indexing_speed*http://onjava.com/pub/a/onjava/2003/03/05/lucene.html#indexing_speed* Solr wiki: http://wiki.apache.org/solr/**SolrPerformanceFactors#**Optimization_** Considerationsstateshttp://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerationsstates that FD's required per segment is around 7. Are these estimates appropriate. Does it in anyway depend on the size of the index number of docs (assuming same number of segments in any case) as well? My index has 10 files per normal segment (the usual 7 plus three more for termvectors). Some of the segments also have a .del file, and there is a segments_* file and a segments.gen file. Your servlet container and other parts of the OS will also have to open files. I have personally seen three levels of segment merging taking place at the same time on a slow filesystem during a full-import, along with new content coming in at the same time. With a mergefactor of 10, each merge is 11 segments - the ten that are being merged and the merged segment. If you have three going on at the same time, that's 33 segments, and you can have up to 10 more that are actively being built by ongoing index activity, so that's 43 potential segments. If your filesystem is REALLY slow, you might end up with even more segments as existing merges are paused for new ones to start, but if you run into that, you'll want to udpate your hardware, so I won't consider it. Multiplying 43 segments by 11 files per segment yields a working theoretical maximum of 473 files. Add in the segments files, you're up to 475. Most operating systems have a default FD limit that's at least 1024. If you only have one index (core) on your Solr server, Solr is the only thing running on that server, and it's using the default mergeFactor of 10, you should be fine with the default. If you are going to have more than one index on your Solr server (such as a build core and a live core), you plan to run other things on the server, or you want to increase your mergeFactor significantly, you might need to adjust the OS configuration to allow more file descriptors. Thanks, Shawn -- Regards, Samarth
Bet you didn't know Lucene can...
Hi All, I'm giving a talk at ApacheCon titled Bet you didn't know Lucene can... (http://na11.apachecon.com/talks/18396). It's based on my observation, that over the years, a number of us in the community have done some pretty cool things using Lucene/Solr that don't fit under the core premise of full text search. I've got a fair number of ideas for the talk (easily enough for 1 hour), but I wanted to reach out to hear your stories of ways you've (ab)used Lucene and Solr to see if we couldn't extend the conversation to a bit more than the conference and also see if I can't inject more ideas beyond the ones I have. I don't need deep technical details, but just high level use case and the basic insight that led you to believe Lucene/Solr could solve the problem. Thanks in advance, Grant Grant Ingersoll http://www.lucidimagination.com
Re: How to make UnInvertedField faster?
On Sat, Oct 22, 2011 at 4:10 AM, Simon Willnauer simon.willna...@googlemail.com wrote: On Fri, Oct 21, 2011 at 4:37 PM, Michael McCandless luc...@mikemccandless.com wrote: Well... the limitation of DocValues is that it cannot handle more than one value per document (which UnInvertedField can). you can pack this into one byte[] or use more than one field? I don't see a real limitation here. Well... not very easily? UnInvertedField (DocTermOrds in Lucene) is the same as DocValues' BYTES_VAR_SORTED. So for an app to do this on top it'd have to handle the term - ord resolving itself, save that somewhere, then encode the multiple ords into a byte[]. I agree for other simple types (no deref/sorting involved) an app could pack them into its own byte[] that's otherwise opaque to Lucene. Mike McCandless http://blog.mikemccandless.com
Re: data import in 4.0
yup that was it .. my data import files version was not the same as solr war .. now I am having another problem though I tried doing a simple data import document entity name=p query=SELECT ID, Status, Title FROM project field column=ID name=id / field column=Status name=status_s / field column=Title name=title_t / /entity /document simple in terms of just pulling up three fields from a table and adding to index and this worked fine but when I add a nested or joined table .. document entity name=project query=SELECT ID, Status, Title FROM project field column=ID name=id / field column=Status name=status_s / field column=Title name=title_t / entity name=related query=select last_name FROM person per inner join project proj on proj.pi_pid = per.pid where proj.ID = ${project.ID} field column=last_name name=pi_s / /entity /entity /document this data import doesnt seems to end .. it just keeps going .. i only have about 15000 records in the main table and about 22000 in the joined table .. but the Fetch count in dataimport handler status indicator thing shows that it has fetched close to half a million records or something .. i m not sure what those records are .. is there a way to see exactly what queries are being run by dataimport handler .. is there something wrong with my nested query .. THanks Adeel On Fri, Oct 21, 2011 at 3:05 PM, Alireza Salimi alireza.sal...@gmail.comwrote: So to me it heightens the probability of classloader conflicts, I haven't worked with Solr 4.0, so I don't know if set of JAR files are the same with Solr 3.4. Anyway, make sure that there is only ONE instance of apache-solr-dataimporthandler-***.jar in your whole tomcat+webapp. Maybe you have this jar file in CATALINA_HOME\lib folder. On Fri, Oct 21, 2011 at 3:06 PM, Adeel Qureshi adeelmahm...@gmail.com wrote: its deployed on a tomcat server .. On Fri, Oct 21, 2011 at 12:49 PM, Alireza Salimi alireza.sal...@gmail.comwrote: Hi, How do you start Solr, through start.jar or you deploy it to a web container? Sometimes problems like this are because of different class loaders. I hope my answer would help you. Regards On Fri, Oct 21, 2011 at 12:47 PM, Adeel Qureshi adeelmahm...@gmail.com wrote: Hi I am trying to setup the data import handler with solr 4.0 and having some unexpected problems. I have a multi-core setup and only one core needed the dataimport handler so I have added the request handler to it and added the lib imports in config file lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / lib dir=../../dist/ regex=apache-solr-dataimporthandler-extras-\d.*\.jar / for some reason this doesnt works .. it still keeps giving me ClassNoFound error message so I moved the jars files to the shared lib folder and then atleast I was able to see the admin screen with the dataimport plugin loaded. But when I try to do the import its throwing this error message INFO: Starting Full Import Oct 21, 2011 11:35:41 AM org.apache.solr.core.SolrCore execute INFO: [DW] webapp=/solr path=/select params={command=statusqt=/dataimport} status=0 QTime=0 Oct 21, 2011 11:35:41 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties WARNING: Unable to read: dataimport.properties Oct 21, 2011 11:35:41 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.lang.NoSuchMethodError: org.apache.solr.update.DeleteUpdateCommand: method init()V not found at org.apache.solr.handler.dataimport.SolrWriter.doDeleteAll(SolrWriter.java:193) at org.apache.solr.handler.dataimport.DocBuilder.cleanByQuery(DocBuilder.java:1012) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:183) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:374) Oct 21, 2011 11:35:41 AM org.apache.solr.handler.dataimport.SolrWriter rollback SEVERE: Exception while solr rollback. java.lang.NoSuchMethodError: org.apache.solr.update.RollbackUpdateCommand: method init()V not found at org.apache.solr.handler.dataimport.SolrWriter.rollback(SolrWriter.java:184) at org.apache.solr.handler.dataimport.DocBuilder.rollback(DocBuilder.java:249) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:340) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at
Re: questions about autocommit committing documents
Old entry but I try to configure auto commit. I am still not sure to understand how Solr handles the commit process. Does Solr really wait for 1 documents before send a commit ? I was thinking, it will use maxTime and then commit a number of documents less than 1. Could you please correct this following scenario: - 20 documents are added. - After value of maxTime is reached, the 20 documents are committed because less than 1 ? - 2 documents are added. - After value of maxTime is reached, only the first 1 documents are committed. The next 1 will on next iteration of commit phase. Is it the right way to understand both maxTime and maxDocs parameters ? Thanks, - If I enable autoCommit and set maxDocs at 1, does it mean that my new documents won't be avalable for searching until 10,000 new documents have been added? Yes, that's correct. However, you can do a commit explicitly, if you want to do so. -- View this message in context: http://lucene.472066.n3.nabble.com/questions-about-autocommit-committing-documents-tp1582487p3443838.html Sent from the Solr - User mailing list archive at Nabble.com.