no result when searching sentences in solr
I index some pdf and docx with solrj and when i want to create query some sentences like We'd be glad to have you accompany or anything else, the result is empty. is it any configuration? i mention that i create query in /solr/browse -- View this message in context: http://lucene.472066.n3.nabble.com/no-result-when-searching-sentences-in-solr-tp3354659p3354659.html Sent from the Solr - User mailing list archive at Nabble.com.
boost a document which has a field not empty
Hi, I have one entity called organisation. I am indexing their name to be able to search afterwards on their name. I store also the website of the organisation. Some organisations have a website some don't. Can I achieve that when searching for organisations even if I have a match on their name I will show first those which have a website. Thank you. Regards, Zoltan
Solr Indexing - Null Values in date field
Hi, I have a field in my source with data type as string and that field has NULL values. I am trying to index this field in solr as a date data type with multivalued = true. Following is the entry for that field in my schema.xml field name=startdate type=date indexed=true stored=true multiValued=true required=false / When I try to index, I get the following exception org.apache.solr.common.SolrException: Invalid Date String:'' at org.apache.solr.schema.DateField.parseMath(DateField.java:163) at org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:171) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:95) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:618) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:261) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) I even tried giving the IFNULL condition in my query for that field(Eg: IFNULL(startdate,'') and also IFNULL(startdate,NULL)) but still I am getting the same exception. Is there any way to index the null values as such in date field? Please help. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-Null-Values-in-date-field-tp3355068p3355068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Indexing - Null Values in date field
On Wed, Sep 21, 2011 at 4:08 PM, mechravi25 mechrav...@yahoo.co.in wrote: Hi, I have a field in my source with data type as string and that field has NULL values. I am trying to index this field in solr as a date data type with multivalued = true. Following is the entry for that field in my schema.xml [...] One cannot have NULL values as input for Solr date fields. The multivalued part is irrelevant here. As it seems like you are getting the input data from a database, you will need to supply some invalid date for NULL date values. E.g., with mysql, we have: COALESCE( CreationDate, STR_TO_DATE( '1970,1,1', '%Y,%m,%d' ) ) The required syntax will be different for other databases. Regards, Gora
Fuzzy Suggester
From http://wiki.apache.org/solr/Suggester: JaspellLookup can provide fuzzy suggestions, though this functionality is not currently exposed (it's a one line change in JaspellLookup). Anybody know what change this would have to be? -- View this message in context: http://lucene.472066.n3.nabble.com/Fuzzy-Suggester-tp3355111p3355111.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem using EdgeNGram
Hi, I am using solr 3.3 with SolrJ. I am trying to use EdgeNgram to power auto suggest feature in my application. My understanding is that using EdgeNgram would mean that results will only be returned for records starting with the search criteria but this is not happening for me. For example if i search for tr, i get results as following: Greenham Trading 6 IT Training Publications AA Training Below are details of my configuration: fieldType name=edgytext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=businessName type=edgytext indexed=true stored=true required=true omitNorms=true omitTermFreqAndPositions=true / Any ideas why this is happening will be much appreciated. Thanks.
JSON response with SolrJ
Hi, I am using solr 3.3 with SolrJ. Does anybody have any idea how i can retrieve JSON response with SolrJ? Is it possible? It seems to be more focused on XML and Beans. Thanks.
Re: JSON response with SolrJ
Hi, Similar question asked before.Maybe it can help. http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-td1002024.html On Wed, Sep 21, 2011 at 3:01 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3 with SolrJ. Does anybody have any idea how i can retrieve JSON response with SolrJ? Is it possible? It seems to be more focused on XML and Beans. Thanks.
Re: Problem using EdgeNGram
Try using KeywordTokenizerFactory instead of StandardTokenizerFactory to get the results you want. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-using-EdgeNGram-tp3355132p3355211.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: boost a document which has a field not empty
Can u assign a doc boost at index time? 2011/9/21 Zoltan Altfatter altfatt...@gmail.com Hi, I have one entity called organisation. I am indexing their name to be able to search afterwards on their name. I store also the website of the organisation. Some organisations have a website some don't. Can I achieve that when searching for organisations even if I have a match on their name I will show first those which have a website. Thank you. Regards, Zoltan -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Sort five random Top Offers to the top
Hey Community. I got a Lucene/Solr Index with many offers. Some of them are marked by a flag field topoffer that they are top offers. Now I want so sort randomly 5 of this offers on the top. For Example HTC Sensation - topoffer = true HTC Desire - topoffer = false Samsung Galaxy S2 - topoffer = ture IPhone 4 - topoffer = true ... When i search for a Handy then i want that first 3 offers are HTC Sensation, Samsung Galaxy S2 and the iPhone 4. Does anyone have an idea? PS.: I hope my english is not to bad -- View this message in context: http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3355469.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: boost a document which has a field not empty
I have one entity called organisation. I am indexing their name to be able to search afterwards on their name. I store also the website of the organisation. Some organisations have a website some don't. Can I achieve that when searching for organisations even if I have a match on their name I will show first those which have a website. Which query parser are you using? lucene? (e)dismax? If lucene (default one), you can add an optional clause to your query: q=+(some query) website:[* TO *]^10 (assuming you have OR as default operator) If dismax, there is a bq parameter which accepts lucene query syntax bq=website:[* TO *]^10 http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29
Re: MMapDirectory failed to map a 23G compound index segment
On Tue, Sep 20, 2011 at 12:32 PM, Michael McCandless luc...@mikemccandless.com wrote: Or: is it possible you reopened the reader several times against the index (ie, after committing from Solr)? If so, I think 2.9.x never unmaps the mapped areas, and so this would accumulate against the system limit. In order to unmap in Lucene 2.9.x you must specifically turn this unmapping on with setUseUnmapHack(true) -- lucidimagination.com
Re: boost a document which has a field not empty
Yes, I am using edismax and the bq parameter did the trick. Thanks a lot. On Wed, Sep 21, 2011 at 3:59 PM, Ahmet Arslan iori...@yahoo.com wrote: I have one entity called organisation. I am indexing their name to be able to search afterwards on their name. I store also the website of the organisation. Some organisations have a website some don't. Can I achieve that when searching for organisations even if I have a match on their name I will show first those which have a website. Which query parser are you using? lucene? (e)dismax? If lucene (default one), you can add an optional clause to your query: q=+(some query) website:[* TO *]^10 (assuming you have OR as default operator) If dismax, there is a bq parameter which accepts lucene query syntax bq=website:[* TO *]^10 http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29
LocalParams, bq, and highlighting
I've run into another strange behavior related to LocalParams syntax in Solr 1.4.1. If I apply Dismax boosts using bq in LocalParams syntax, the contents of the boost queries get used by the highlighter. Obviously, when I use bq as a separate parameter, this is not an issue. To clarify, here are two searches that yield identical results but different highlighting behaviors: http://localhost:8080/solr/biblio/select/?q=johnrows=20start=0indent=yesqf=author^100qt=dismaxbq=author%3Asmith^1000fl=scorehl=truehl.fl=* http://localhost:8080/solr/biblio/select/?q=%28%28_query_%3A%22{!dismax+qf%3D\%22author^100\%22+bq%3D\%27author%3Asmith^1000\%27}john%22%29%29rows=20start=0indent=yesfl=scorehl=truehl.fl=* Query #1 highlights only john (the desired behavior), but query #2 highlights both john and smith. Is this a known limitation of the highlighter, or is it a bug? Is this issue resolved in newer versions of Solr? thanks, Demian
Selective values for facets
Hi, The dataset I have got is for special offers. We got lot of offer codes. But I need to create few facets for specific conditions only. For example, I got the following codes: ABCD, AGTR, KUYH, NEWY, NEWA, NEWB, EAS1, EAS2 And I need to create a facet like 'New Year Offers' mapped with NEWA, NEWB, NEWY and 'Easter Offers' mapped with EAS1, EAS2 I dont want other codes returned in the facet when I query it. How to prevent other values to be ignored while creating the facet during indexing time? Thanks, Srikanth NT -- View this message in context: http://lucene.472066.n3.nabble.com/Selective-values-for-facets-tp3355676p3355676.html Sent from the Solr - User mailing list archive at Nabble.com.
Best Practices for indexing nested XML in Solr via DIH
Hello Everyone, I was wondering what are the various best practices that everyone follows for indexing nested XML into Solr. Please don't feel limited by examples, feel free to share your own experiences. Given an xml structure such as the following: categoryPath category idcat001/id nameEverything/name /category category idcat002/id nameMusic/name /category category idcat003/id namePop/name /category /categoryPath How do you make the best use of the data when indexing? 1) Do you use Scenario A? categoryPath_category_id = cat001 cat002 cat003 (flattened) categoryPath_category_name = Everything Music Pop (flattened) If so then how do you manage to find the corresponding categoryPath_category_id if someone's search matches a value in the categoryPath_category_name field? I understand that Solr is not about lookups but this may be important information for you to display right away as part of the search results page rendering. 2) Do you use Scenario B? categoryPath_category_id = [cat001 cat002 cat003] (the [] signifies a multi-value field) categoryPath_category_name = [Everything Music Pop] (the [] signifies a multi-value field) And once again how do you find associated data sets once something matches. Side Question: How can one configure DIH to store the data this way for Scenario B? Thanks! - Pulkit
Re: How to write core's name in log
Not sure if this is a good lead for you but when I run out-of-the-box multi-core example-DIH instance of Solr, I often see core name thrown about in the logs. Perhaps you can look there? On Thu, Sep 15, 2011 at 6:50 AM, Joan joan.monp...@gmail.com wrote: Hi, I have multiple core in Solr and I want to write core name in log through to lo4j. I've found in SolrException a method called log(Logger log, Throwable e) but when It try to build a Exception it haven't core's name. The Exception is built in toStr() method in SolrException class, so I want to write core's name in the message of Exception. I'm thinking to add MDC variable, this will be name of core. Finally I'll use it in log4j configuration like this in ConversionPattern %X{core} The idea is that when Solr received a request I'll add this new variable name of core. But I don't know if it's a good idea or not. or Do you already exists any solution for add name of core in log? Thanks Joan
Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?
On 9/20/2011 4:09 PM, Robert Muir wrote: yes, mergeFactory=10 is interpreted as both segmentsPerTier and maxMergeAtOnce. yes, specifying explicit TieredMP parameters will override whatever you set in mergeFactor (which is basically only interpreted to be backwards compatible) this is why i created this confusing test configuration: to test this exact case. I've got a checked out lucene_solr_3_4 and this isn't what I'm seeing. Solr Implementation Version: 3.4-SNAPSHOT 1173320M - root - 2011-09-21 09:58:58 With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem to be ignored. I've got both set to 35, but Solr is merging every 10 segments. I haven't tried explicitly setting mergeFactor yet to see if that will make the other settings override it, I'm letting the current import finish first. Here's the relevant config pieces. These two sections are in separate files incorporated into solrconfig.xml using xinclude: indexDefaults useCompoundFilefalse/useCompoundFile mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxMergeCount4/int int name=maxThreadCount4/int /mergeScheduler ramBufferSizeMB96/ramBufferSizeMB maxFieldLength32768/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypenative/lockType /indexDefaults mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce35/int int name=segmentsPerTier35/int int name=maxMergeAtOnceExplicit105/int /mergePolicy Thanks, Shawn
strange copied field problem
i have 3 fields that I am working with: genre, genre_search and text. genre is a string field which comes from the data source. genre_search is a text field that is copied from genre, and text is a text field that is copied from genre_search and a few other fields. Text field is the default search field for queries. When I search for q=genre_search:indie+rock, solr returns several records that have both Indie as a genre and Rock as a genre, which is great, but when I search for q=indie+rock or q=text:indie+rock, i get no results. Why would the source field return the value and the destination wouldn't. Both genre_search and text are the same data type, so there shouldn't be any strange translations happening.
Re: FW: MMapDirectory failed to map a 23G compound index segment
I hit similar issue recently. Not sure if MMapDirectory is right way to go. When index file be map to ram, JVM will call OS file mapping function. The memory usage is in share memory, it may not be calculate to JVM process space. I saw one problem is if the index file bigger then physical ram, and there are lot of query which cause wide index file access. Then, the machine has no available memory. The system change to very slow. What i did is change lucene code to disable MMapDirectory. On Wed, Sep 21, 2011 at 1:26 PM, Yongtao Liu y...@commvault.com wrote: -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Tuesday, September 20, 2011 3:33 PM To: solr-user@lucene.apache.org Subject: Re: MMapDirectory failed to map a 23G compound index segment Since you hit OOME during mmap, I think this is an OS issue not a JVM issue. Ie, the JVM isn't running out of memory. How many segments were in the unoptimized index? It's possible the OS rejected the mmap because of process limits. Run cat /proc/sys/vm/max_map_count to see how many mmaps are allowed. Or: is it possible you reopened the reader several times against the index (ie, after committing from Solr)? If so, I think 2.9.x never unmaps the mapped areas, and so this would accumulate against the system limit. My memory of this is a little rusty but isn't mmap also limited by mem + swap on the box? What does 'free -g' report? I don't think this should be the case; you are using a 64 bit OS/JVM so in theory (except for OS system wide / per-process limits imposed) you should be able to mmap up to the full 64 bit address space. Your virtual memory is unlimited (from ulimit output), so that's good. Mike McCandless http://blog.mikemccandless.com On Wed, Sep 7, 2011 at 12:25 PM, Rich Cariens richcari...@gmail.com wrote: Ahoy ahoy! I've run into the dreaded OOM error with MMapDirectory on a 23G cfs compound index segment file. The stack trace looks pretty much like every other trace I've found when searching for OOM map failed[1]. My configuration follows: Solr 1.4.1/Lucene 2.9.3 (plus SOLR-1969https://issues.apache.org/jira/browse/SOLR-1969 ) CentOS 4.9 (Final) Linux 2.6.9-100.ELsmp x86_64 yada yada yada Java SE (build 1.6.0_21-b06) Hotspot 64-bit Server VM (build 17.0-b16, mixed mode) ulimits: core file size (blocks, -c) 0 data seg size(kbytes, -d) unlimited file size (blocks, -f) unlimited pending signals(-i) 1024 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files(-n) 256000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 stack size(kbytes, -s) 10240 cpu time(seconds, -t) unlimited max user processes (-u) 1064959 virtual memory(kbytes, -v) unlimited file locks(-x) unlimited Any suggestions? Thanks in advance, Rich [1] ... java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(Unknown Source) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(Unknown Source) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(Unknown Source) at org.apache.lucene.store.MMapDirectory.openInput(Unknown Source) at org.apache.lucene.index.SegmentReader$CoreReaders.init(Unknown Source) at org.apache.lucene.index.SegmentReader.get(Unknown Source) at org.apache.lucene.index.SegmentReader.get(Unknown Source) at org.apache.lucene.index.DirectoryReader.init(Unknown Source) at org.apache.lucene.index.ReadOnlyDirectoryReader.init(Unknown Source) at org.apache.lucene.index.DirectoryReader$1.doBody(Unknown Source) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown Source) at org.apache.lucene.index.DirectoryReader.open(Unknown Source) at org.apache.lucene.index.IndexReader.open(Unknown Source) ... Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) ... **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. *
SolrCloud state
Hi there. I'm starting a new project using solr and i would like to know if solr is able to setup a cluster with fault tolerance. I'm setting up an environment with two shards. Each shard should have a replica. What i would like to know is if a shard master fails will the replica be promoted to a master. Or will it remain search only and only recover when a new master is setup. Also how is the document indexing distributed by the shards? Can i add a new shard dynamically? All the best, Miguel Coxo.
Re: strange copied field problem
I am NOT claiming that making a copy of a copy field is wrong or leads to a race condition. I don't know that. BUT did you try to copy into the text field directly from the genre field? Instead of the genre_search field? Did that yield working queries? On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert tanner.post...@gmail.com wrote: i have 3 fields that I am working with: genre, genre_search and text. genre is a string field which comes from the data source. genre_search is a text field that is copied from genre, and text is a text field that is copied from genre_search and a few other fields. Text field is the default search field for queries. When I search for q=genre_search:indie+rock, solr returns several records that have both Indie as a genre and Rock as a genre, which is great, but when I search for q=indie+rock or q=text:indie+rock, i get no results. Why would the source field return the value and the destination wouldn't. Both genre_search and text are the same data type, so there shouldn't be any strange translations happening.
Re: strange copied field problem
i believe that was the original configuration, but I can switch it back and see if that yields any results. On Wed, Sep 21, 2011 at 10:54 AM, Pulkit Singhal pulkitsing...@gmail.comwrote: I am NOT claiming that making a copy of a copy field is wrong or leads to a race condition. I don't know that. BUT did you try to copy into the text field directly from the genre field? Instead of the genre_search field? Did that yield working queries? On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert tanner.post...@gmail.com wrote: i have 3 fields that I am working with: genre, genre_search and text. genre is a string field which comes from the data source. genre_search is a text field that is copied from genre, and text is a text field that is copied from genre_search and a few other fields. Text field is the default search field for queries. When I search for q=genre_search:indie+rock, solr returns several records that have both Indie as a genre and Rock as a genre, which is great, but when I search for q=indie+rock or q=text:indie+rock, i get no results. Why would the source field return the value and the destination wouldn't. Both genre_search and text are the same data type, so there shouldn't be any strange translations happening.
Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?
Usually any good piece of java code refrains from capturing Throwable so that Errors will bubble up unlike exceptions. Having said that, perhaps someone in the list can help, if you share which particular Solr version you are using where you suspect that the Error is being eaten up. On Fri, Sep 16, 2011 at 2:47 PM, Jason Toy jason...@gmail.com wrote: I have solr issues where I keep running out of memory. I am working on solving the memory issues (this will take a long time), but in the meantime, I'm trying to be notified when the error occurs. I saw with the jvm I can pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every time the out of memory issue occurs though my script never runs. Does solr let the error bubble up so that the jvm can call this script? If not how can I have a script run when solr gets an out of memory issue?
Re: Solr Indexing - Null Values in date field
Also you may use the script transformer to explicitly remove the field from the document if the field is null. I do this for all my sdouble and sdate fields ... its a bit manual and I would like to see Solr enhanced to simply skip stuff like this by having a flag for its DIH code but until then it suffices: ... transformer=DateFormatTransformer,script:skipEmptyFields script ![CDATA[ function skipEmptyFields(row) { var regularPrice = row.get( 'regularPrice' ); if ( regularPrice == null || regularPrice == '' ) { row.remove( 'regularPrice' ); } var salePrice = row.get( 'salePrice' ); if ( salePrice == null || salePrice == '' ) { row.remove( 'salePrice' ); } return row; } ]] /script On Wed, Sep 21, 2011 at 6:06 AM, Gora Mohanty g...@mimirtech.com wrote: On Wed, Sep 21, 2011 at 4:08 PM, mechravi25 mechrav...@yahoo.co.in wrote: Hi, I have a field in my source with data type as string and that field has NULL values. I am trying to index this field in solr as a date data type with multivalued = true. Following is the entry for that field in my schema.xml [...] One cannot have NULL values as input for Solr date fields. The multivalued part is irrelevant here. As it seems like you are getting the input data from a database, you will need to supply some invalid date for NULL date values. E.g., with mysql, we have: COALESCE( CreationDate, STR_TO_DATE( '1970,1,1', '%Y,%m,%d' ) ) The required syntax will be different for other databases. Regards, Gora
Debugging DIH by placing breakpoints
Hello, I was wondering where can I find the source code for DIH? I want to checkout the source and step-trhought it breakpoint by breakpoint to understand it better :) Thanks! - Pulkit
Re: strange copied field problem
sure enough that worked. could have sworn we had it this way before, but either way, that fixed it. Thanks. On Wed, Sep 21, 2011 at 11:01 AM, Tanner Postert tanner.post...@gmail.comwrote: i believe that was the original configuration, but I can switch it back and see if that yields any results. On Wed, Sep 21, 2011 at 10:54 AM, Pulkit Singhal pulkitsing...@gmail.comwrote: I am NOT claiming that making a copy of a copy field is wrong or leads to a race condition. I don't know that. BUT did you try to copy into the text field directly from the genre field? Instead of the genre_search field? Did that yield working queries? On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert tanner.post...@gmail.com wrote: i have 3 fields that I am working with: genre, genre_search and text. genre is a string field which comes from the data source. genre_search is a text field that is copied from genre, and text is a text field that is copied from genre_search and a few other fields. Text field is the default search field for queries. When I search for q=genre_search:indie+rock, solr returns several records that have both Indie as a genre and Rock as a genre, which is great, but when I search for q=indie+rock or q=text:indie+rock, i get no results. Why would the source field return the value and the destination wouldn't. Both genre_search and text are the same data type, so there shouldn't be any strange translations happening.
Re: Debugging DIH by placing breakpoints
On Thu, Sep 22, 2011 at 12:08 AM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello, I was wondering where can I find the source code for DIH? I want to checkout the source and step-trhought it breakpoint by breakpoint to understand it better :) Should be under contrib/dataimporthandler in your Solr source tree. Regards, Gora
Re: Debugging DIH by placing breakpoints
Correct! With that additional info, plus http://wiki.apache.org/solr/HowToContribute (ant eclipse), plus a refreshed (close/open) eclipse project ... I'm all set. Thanks Again. On Wed, Sep 21, 2011 at 1:43 PM, Gora Mohanty g...@mimirtech.com wrote: On Thu, Sep 22, 2011 at 12:08 AM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello, I was wondering where can I find the source code for DIH? I want to checkout the source and step-trhought it breakpoint by breakpoint to understand it better :) Should be under contrib/dataimporthandler in your Solr source tree. Regards, Gora
Production Issue: SolrJ client throwing this error even though field type is not defined in schema
Hi All We are getting this error in our Production Solr Setup. Message: Element type t_sort must be followed by either attribute specifications, or /. Solr version is 1.4.1 Stack trace indicates that solr is returning malformed document. Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232) ... 15 more Caused by: org.apache.solr.common.SolrException: parsing error at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) ... 17 more Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3,136974] Message: Element type t_sort must be followed by either attribute specifications, or /. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594) at org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282) at org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410) at org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360) at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125) ... 21 more
Re: strange copied field problem
No probs. I would still hope someone would comment on you thread with some expert opinions about making a copy of a copy :) On Wed, Sep 21, 2011 at 1:38 PM, Tanner Postert tanner.post...@gmail.com wrote: sure enough that worked. could have sworn we had it this way before, but either way, that fixed it. Thanks. On Wed, Sep 21, 2011 at 11:01 AM, Tanner Postert tanner.post...@gmail.comwrote: i believe that was the original configuration, but I can switch it back and see if that yields any results. On Wed, Sep 21, 2011 at 10:54 AM, Pulkit Singhal pulkitsing...@gmail.comwrote: I am NOT claiming that making a copy of a copy field is wrong or leads to a race condition. I don't know that. BUT did you try to copy into the text field directly from the genre field? Instead of the genre_search field? Did that yield working queries? On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert tanner.post...@gmail.com wrote: i have 3 fields that I am working with: genre, genre_search and text. genre is a string field which comes from the data source. genre_search is a text field that is copied from genre, and text is a text field that is copied from genre_search and a few other fields. Text field is the default search field for queries. When I search for q=genre_search:indie+rock, solr returns several records that have both Indie as a genre and Rock as a genre, which is great, but when I search for q=indie+rock or q=text:indie+rock, i get no results. Why would the source field return the value and the destination wouldn't. Both genre_search and text are the same data type, so there shouldn't be any strange translations happening.
Re: Sort five random Top Offers to the top
Hi MOuli, AFAIK (and I don't know that much about Solr), this feature does not exist out of the box in Solr. One way to achieve this could be to construct a DocSet with topoffer:true and intersect it with your result DocSet, then select the first 5 off the intersection, randomly shuffle them, sublist [0:5], and move the sublist to the top of the results like QueryElevationComponent does. Actually you may want to take a look at QueryElevationComponent code for inspiration (this is where I would have looked if I had to implement something similar). -sujit On Wed, 2011-09-21 at 06:54 -0700, MOuli wrote: Hey Community. I got a Lucene/Solr Index with many offers. Some of them are marked by a flag field topoffer that they are top offers. Now I want so sort randomly 5 of this offers on the top. For Example HTC Sensation - topoffer = true HTC Desire - topoffer = false Samsung Galaxy S2 - topoffer = ture IPhone 4 - topoffer = true ... When i search for a Handy then i want that first 3 offers are HTC Sensation, Samsung Galaxy S2 and the iPhone 4. Does anyone have an idea? PS.: I hope my english is not to bad -- View this message in context: http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3355469.html Sent from the Solr - User mailing list archive at Nabble.com.
Implementing a custom ResourceLoader
Hi, As part of writing a solr plugin I need to override the ResourceLoader. My plugin is intended stop word analyzer filter factory and I need to change the way stop words are being fetched. My assumption is overriding ResourceLoader-getLines() will help me to meet my target of fetching stop word data from an external webservice. Is thisi feasible? Or should I go about overriding Factory-inform(ResourceLoader) method. Kindly let me know how to achieve this. -- Thanks Jithin
Re: Two unrelated questions
for 1 I don't quite get what you're driving at. Your DIH query assigns the uniqueKey, it's not like it's something auto-generated. Perhaps a concrete example would help. 2 There's a limit you can adjust that defaults to 1024 (maxBooleanClauses in solrconfig.xml). You can bump this very high, but you're right, if anyone actually does something absurd it'll slow *that* query down. But just bumping this query higher won't change performance absent someone actually putting a ton of items in it... Best Erick On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron rol...@lbpc.com wrote: Hi all- I'm not sure if I should break this out into two separate questions to the list for searching purposes, or if one is more acceptable (don't want to flood). I have two (hopefully) straightforward questions: 1. Is it possible to expose the unique ID of a document to a DIH query? The reason I want to do this is because I use the unique ID of the row in the table as the unique ID of the Lucene document, but I've noticed that the counts of documents doesn't match the count in the table; I'd like to add these rows and was hoping to avoid writing a custom SolrJ app to do it. 2. Is there any limit to the number of conditions in a Boolean search? We're working on a new project where the user can choose either, for example, Ford Vehicles, in which case I can simply search for Ford, but if the user chooses specific makes and models, then I have to say something like Crown Vic OR Focus OR Taurus OR F-150, etc., where they could theoretically choose every model of Ford ever made except one. This could lead to a *very* large query, and was worried both that it was even possible, but also the impact on performance. Thanks, and I apologize if this really should be two separate messages. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Slow autocomplete(terms)
Think about ngrams if you really need infix searches, you're right that the regex is very probably the root of your problem. The index has to examine *every* term in the field to determine if the regex will match. Best Erick On Tue, Sep 20, 2011 at 12:57 AM, roySolr royrutten1...@gmail.com wrote: Hello, I used the terms request for autocomplete. It works fine with 200.000 records but with 2 million docs it's very slow.. I use some regex to fix autocomplete in the middle of words, example: chest - manchester. My call(pecl PHP solr): $query = new SolrQuery(); $query-setTermsLimit(10); $query-setTerms(true); $query-setTermsField($field); $term = SolrUtils::escapeQueryChars ($term); $query-set(terms.regex,(.*)$term(.*)); $query-set(terms.regex.flag,case_insensitive); URL: /solr/terms?terms.fl=autocompletewhatterms.regex=(.*)chest(.*)terms.regex.flag=case_insensitiveterms=true I think the regex is the reason for the very high query time: Solr search between 2 million docs with a regex. The query takes 2 seconds, this is to much for the autocomplete. A user typed manchester united and solr needs to do 16 query's from 2 seconds. Are there some other options? Faster solutions? I use solr 3.1 -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-autocomplete-terms-tp3351352p3351352.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?
: With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem to be : ignored. I've got both set to 35, but Solr is merging every 10 segments. I ... : Here's the relevant config pieces. These two sections are in separate files : incorporated into solrconfig.xml using xinclude: : : indexDefaults ... do you have a mainIndex section with mergeFactor defined there? -Hoss
RE: Two unrelated questions
Thanks for the reply. As far as #1, my table that I'm indexing via DIH has a PK field, generated by a sequence, so there are records with ID of 1, 2, 3, etc. That same id is the one I use in my unique id field in the document (uniqueKeyID/uniqueID). I've noticed that the table has, say, 10 rows. My index only has 8. I don't know why that is, but I'd like to figure out which records are missing and add them (and hopefully understand why they weren't added in the first place). I was just wondering if there was some way to compare the two as part of a sql query, but on reflection, it does seem like an absurd request, so I apologize; I think what I'll have to do is write a solrj program that gets every ID in the table, then does a search on that ID in the index, and add the ones that are missing. Regarding the second item, yes, it's crazy but I'm not sure what to do; there really are that many options and some searches will be extremely specific, yet broad enough in terms for this to be a problem. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, September 21, 2011 3:55 PM To: solr-user@lucene.apache.org Subject: Re: Two unrelated questions for 1 I don't quite get what you're driving at. Your DIH query assigns the uniqueKey, it's not like it's something auto-generated. Perhaps a concrete example would help. 2 There's a limit you can adjust that defaults to 1024 (maxBooleanClauses in solrconfig.xml). You can bump this very high, but you're right, if anyone actually does something absurd it'll slow *that* query down. But just bumping this query higher won't change performance absent someone actually putting a ton of items in it... Best Erick On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron rol...@lbpc.com wrote: Hi all- I'm not sure if I should break this out into two separate questions to the list for searching purposes, or if one is more acceptable (don't want to flood). I have two (hopefully) straightforward questions: 1. Is it possible to expose the unique ID of a document to a DIH query? The reason I want to do this is because I use the unique ID of the row in the table as the unique ID of the Lucene document, but I've noticed that the counts of documents doesn't match the count in the table; I'd like to add these rows and was hoping to avoid writing a custom SolrJ app to do it. 2. Is there any limit to the number of conditions in a Boolean search? We're working on a new project where the user can choose either, for example, Ford Vehicles, in which case I can simply search for Ford, but if the user chooses specific makes and models, then I have to say something like Crown Vic OR Focus OR Taurus OR F-150, etc., where they could theoretically choose every model of Ford ever made except one. This could lead to a *very* large query, and was worried both that it was even possible, but also the impact on performance. Thanks, and I apologize if this really should be two separate messages. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Two unrelated questions
for #1, i don't use DIH, but is there any possibility of that column having duplicate keys, with subsequent docs replacing existing ones? and for #2, for some cases you could use a negative filterquery: http://wiki.apache.org/solr/SimpleFacetParameters#Retrieve_docs_with_facets_missing so instead of that fq=-facetField:[* TO *], something like fq=-car_make:Taurus. picking negatives might even make the UI a bit easier. anyway, just some thoughts. cheers, rob On Wed, Sep 21, 2011 at 5:17 PM, Olson, Ron rol...@lbpc.com wrote: Thanks for the reply. As far as #1, my table that I'm indexing via DIH has a PK field, generated by a sequence, so there are records with ID of 1, 2, 3, etc. That same id is the one I use in my unique id field in the document (uniqueKeyID/uniqueID). I've noticed that the table has, say, 10 rows. My index only has 8. I don't know why that is, but I'd like to figure out which records are missing and add them (and hopefully understand why they weren't added in the first place). I was just wondering if there was some way to compare the two as part of a sql query, but on reflection, it does seem like an absurd request, so I apologize; I think what I'll have to do is write a solrj program that gets every ID in the table, then does a search on that ID in the index, and add the ones that are missing. Regarding the second item, yes, it's crazy but I'm not sure what to do; there really are that many options and some searches will be extremely specific, yet broad enough in terms for this to be a problem. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, September 21, 2011 3:55 PM To: solr-user@lucene.apache.org Subject: Re: Two unrelated questions for 1 I don't quite get what you're driving at. Your DIH query assigns the uniqueKey, it's not like it's something auto-generated. Perhaps a concrete example would help. 2 There's a limit you can adjust that defaults to 1024 (maxBooleanClauses in solrconfig.xml). You can bump this very high, but you're right, if anyone actually does something absurd it'll slow *that* query down. But just bumping this query higher won't change performance absent someone actually putting a ton of items in it... Best Erick On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron rol...@lbpc.com wrote: Hi all- I'm not sure if I should break this out into two separate questions to the list for searching purposes, or if one is more acceptable (don't want to flood). I have two (hopefully) straightforward questions: 1. Is it possible to expose the unique ID of a document to a DIH query? The reason I want to do this is because I use the unique ID of the row in the table as the unique ID of the Lucene document, but I've noticed that the counts of documents doesn't match the count in the table; I'd like to add these rows and was hoping to avoid writing a custom SolrJ app to do it. 2. Is there any limit to the number of conditions in a Boolean search? We're working on a new project where the user can choose either, for example, Ford Vehicles, in which case I can simply search for Ford, but if the user chooses specific makes and models, then I have to say something like Crown Vic OR Focus OR Taurus OR F-150, etc., where they could theoretically choose every model of Ford ever made except one. This could lead to a *very* large query, and was worried both that it was even possible, but also the impact on performance. Thanks, and I apologize if this really should be two separate messages. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any
Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?
: Usually any good piece of java code refrains from capturing Throwable : so that Errors will bubble up unlike exceptions. Having said that, Even if some piece of code catches an OutOfMemoryError, the JVM should have already called the -XX:OnOutOfMemoryError hook - Although from what i can tell, the JVM will only call the hook on hte *first* OOM thrown (you can try the code below to test this behavior in your own JVM) : I'm trying to be notified when the error occurs. I saw with the jvm I can : pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every time : the out of memory issue occurs though my script never runs. Does solr let ...exactly what JVM are you running? this option is specific to the Sun/Oracle JVM. For example, in the IBM JVM, there is a completley different mechanism... http://stackoverflow.com/questions/3467219/is-there-something-like-xxonerror-or-xxonoutofmemoryerror-in-ibm-jvm -- Simple OnOutOfMemoryError hook test - import static java.lang.System.out; import java.util.ArrayList; public final class Test { public static void main(String... args) throws Exception { ArrayList data = new ArrayListObject(1000); for (int i=0; i5; i++) { try { while (i 5) { data.add(new ArrayListInteger(10)); } } catch (OutOfMemoryError oom) { data.clear(); out.println(caught); } } } } -- example of running it --- hossman@bester:~/tmp$ java -version java version 1.6.0_24 Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) hossman@bester:~/tmp$ java -XX:OnOutOfMemoryError=echo HOOK -Xmx64M Test # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError=echo HOOK # Executing /bin/sh -c echo HOOK... HOOK caught caught caught caught caught hossman@bester:~/tmp$ -- -Hoss
Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?
On 9/21/2011 3:10 PM, Chris Hostetter wrote: : With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem to be : ignored. I've got both set to 35, but Solr is merging every 10 segments. I ... : Here's the relevant config pieces. These two sections are in separate files : incorporated into solrconfig.xml using xinclude: : :indexDefaults ... do you have a mainIndex section with mergeFactor defined there? The mergeFactor section is in my config, but it's commented out. I left out the commented sections when I included it before. It doesn't appear anywhere else. Here's the full config snippet with comments: indexDefaults useCompoundFilefalse/useCompoundFile !-- mergeFactor35/mergeFactor -- mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxMergeCount4/int int name=maxThreadCount4/int /mergeScheduler !-- termIndexInterval64/termIndexInterval -- ramBufferSizeMB96/ramBufferSizeMB maxFieldLength32768/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypenative/lockType /indexDefaults Here's the mainIndex section: mainIndex unlockOnStartuptrue/unlockOnStartup reopenReaderstrue/reopenReaders deletionPolicy class=solr.SolrDeletionPolicy str name=maxCommitsToKeep1/str str name=maxOptimizedCommitsToKeep0/str /deletionPolicy infoStream file=INFOSTREAM.txtfalse/infoStream /mainIndex Thanks, Shawn
SOLR error with custom FacetComponent
Hi All, I'm trying to write a custom SOLR facet component and I'm getting some errors when I deploy my code into the SOLR server. Can you please let me know what Im doing wrong? I appreciate your help on this issue. Thanks. *Issue* I'm getting an error saying Error instantiating SearchComponent My Custom Class is not a org.apache.solr.handler.component.SearchComponent. My custom class inherits from *FacetComponent* which extends from * SearchComponent*. My custom class is defined as follows… I implemented the process method to meet our functionality. We have some default facets that have to be sent every time, irrespective of the Query request. /** * * @author ravibulusu */ public class MyFacetComponent extends FacetComponent { …. }
Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?
On 9/21/2011 11:18 AM, Shawn Heisey wrote: With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem to be ignored. I've got both set to 35, but Solr is merging every 10 segments. I haven't tried explicitly setting mergeFactor yet to see if that will make the other settings override it, I'm letting the current import finish first. I have tried again with mergeFactor set to 8 and the other settings in mergePolicy remaining at 35. It merged after every 8th segment. This is on lucene_solr_3_4 checked out from SVN, with SOLR-1972 manually applied. Settings used this time: indexDefaults useCompoundFilefalse/useCompoundFile mergeFactor8/mergeFactor mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxMergeCount4/int int name=maxThreadCount4/int /mergeScheduler !-- termIndexInterval64/termIndexInterval -- ramBufferSizeMB96/ramBufferSizeMB maxFieldLength32768/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypenative/lockType /indexDefaults mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce35/int int name=segmentsPerTier35/int int name=maxMergeAtOnceExplicit105/int /mergePolicy If there's anything else you'd like me to do, please let me know and I'll get to it as soon as I can. Thanks, Shawn
Re: SOLR error with custom FacetComponent
Why create a custom facet component for this? Simply add lines like this to your request handler(s): str name=facet.fieldmanu_exact/str either in defaults or appends sections. Erik On Sep 21, 2011, at 14:00 , Ravi Bulusu wrote: Hi All, I'm trying to write a custom SOLR facet component and I'm getting some errors when I deploy my code into the SOLR server. Can you please let me know what Im doing wrong? I appreciate your help on this issue. Thanks. *Issue* I'm getting an error saying Error instantiating SearchComponent My Custom Class is not a org.apache.solr.handler.component.SearchComponent. My custom class inherits from *FacetComponent* which extends from * SearchComponent*. My custom class is defined as follows… I implemented the process method to meet our functionality. We have some default facets that have to be sent every time, irrespective of the Query request. /** * * @author ravibulusu */ public class MyFacetComponent extends FacetComponent { …. }
RE: Example setting TieredMergePolicy for Solr 3.3 or 3.4?
I think the problem is that the mergePolicy config needs to be inside of the indexDefaults config, rather than after it as your have. -Michael
Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?
I am running the sun version: java version 1.6.0_26 Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) I get multiple Out of memory exceptions looking at my application and the solr logs, but my script doesn't get called the first time or other times, hence why I was thinking that maybe solr is doing something different. My script notifies me of the memory exception and then restarts the jvm. Running the script manually works fine. I'll try to do some more testing to see what exactly is going on. Jason On Wed, Sep 21, 2011 at 2:31 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Usually any good piece of java code refrains from capturing Throwable : so that Errors will bubble up unlike exceptions. Having said that, Even if some piece of code catches an OutOfMemoryError, the JVM should have already called the -XX:OnOutOfMemoryError hook - Although from what i can tell, the JVM will only call the hook on hte *first* OOM thrown (you can try the code below to test this behavior in your own JVM) : I'm trying to be notified when the error occurs. I saw with the jvm I can : pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every time : the out of memory issue occurs though my script never runs. Does solr let ...exactly what JVM are you running? this option is specific to the Sun/Oracle JVM. For example, in the IBM JVM, there is a completley different mechanism... http://stackoverflow.com/questions/3467219/is-there-something-like-xxonerror-or-xxonoutofmemoryerror-in-ibm-jvm -- Simple OnOutOfMemoryError hook test - import static java.lang.System.out; import java.util.ArrayList; public final class Test { public static void main(String... args) throws Exception { ArrayList data = new ArrayListObject(1000); for (int i=0; i5; i++) { try { while (i 5) { data.add(new ArrayListInteger(10)); } } catch (OutOfMemoryError oom) { data.clear(); out.println(caught); } } } } -- example of running it --- hossman@bester:~/tmp$ java -version java version 1.6.0_24 Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) hossman@bester:~/tmp$ java -XX:OnOutOfMemoryError=echo HOOK -Xmx64M Test # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError=echo HOOK # Executing /bin/sh -c echo HOOK... HOOK caught caught caught caught caught hossman@bester:~/tmp$ -- -Hoss -- - sent from my mobile 6176064373
Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?
I wonder if config-file validation would be helpful here :) I posted a patch in SOLR-1758 once. -Mike On 9/21/2011 6:22 PM, Michael Ryan wrote: I think the problem is that themergePolicy config needs to be inside of the indexDefaults config, rather than after it as your have. -Michael
RE: NRT and commit behavior
Okay, but is there any number that if we reach on the index size or total docs in the index or the size of physical memory that sharding should be considered. I am trying to find the winning combination. Tirthankar -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, September 16, 2011 7:46 AM To: solr-user@lucene.apache.org Subject: Re: NRT and commit behavior Uhm, you're putting a lot of index into not very much memory. I really think you're going to have to shard your index across several machines to get past this problem. Simply increasing the size of your caches is still limited by the physical memory you're working with. You really have to put a profiler on the system to see what's going on. At that size there are too many things that it *could* be to definitively answer it with e-mails Best Erick On Wed, Sep 14, 2011 at 7:35 AM, Tirthankar Chatterjee tchatter...@commvault.com wrote: Erick, Also, we had our solrconfig where we have tried increasing the cache making the below value for autowarm count as 0 helps returning the commit call within the second, but that will slow us down on searches filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ !-- Cache used to hold field values that are quickly accessible by document id. The fieldValueCache is created by default even if not configured here. fieldValueCache class=solr.FastLRUCache size=512 autowarmCount=128 showItems=32 / -- !-- queryResultCache caches results of searches - ordered lists of document ids (DocList) based on a query, a sort, and the range of documents requested. -- queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ !-- documentCache caches Lucene Document objects (the stored fields for each document). Since Lucene internal document ids are transient, this cache will not be autowarmed. -- documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=512/ -Original Message- From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com] Sent: Wednesday, September 14, 2011 7:31 AM To: solr-user@lucene.apache.org Subject: RE: NRT and commit behavior Erick, Here is the answer to your questions: Our index is 267 GB We are not optimizing... No we have not profiled yet to check the bottleneck, but logs indicate opening the searchers is taking time... Nothing except SOLR Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and JVM and Tomcat -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, September 11, 2011 11:37 AM To: solr-user@lucene.apache.org Subject: Re: NRT and commit behavior Hmm, OK. You might want to look at the non-cached filter query stuff, it's quite recent. The point here is that it is a filter that is applied only after all of the less expensive filter queries are run, One of its uses is exactly ACL calculations. Rather than calculate the ACL for the entire doc set, it only calculates access for docs that have made it past all the other elements of the query See SOLR-2429 and note that it is a 3.4 (currently being released) only. As to why your commits are taking so long, I have no idea given that you really haven't given us much to work with. How big is your index? Are you optimizing? Have you profiled the application to see what the bottleneck is (I/O, CPU, etc?). What else is running on your machine? It's quite surprising that it takes that long. How much memory are you giving the JVM? etc... You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee tchatter...@commvault.com wrote: Erick, What you said is correct for us the searches are based on some Active Directory permissions which are populated in Filter query parameter. So we don't have any warming query concept as we cannot fire for every user ahead of time. What we do here is that when user logs in we do an invalid query(which return no results instead of '*') with the correct filter query (which is his permissions based on the login). This way the cache gets warmed up with valid docs. It works then. Also, can you please let me know why commit is taking 45 mins to 1 hours on a good resourced hardware with multiple processors and 16gb RAM 64 bit VM, etc. We tried passing waitSearcher as false and found that inside the code it hard coded to be true. Is there any specific reason. Can we change that value to honor what is being passed. Thanks, Tirthankar -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, September 01, 2011 8:38 AM To:
Re: Two unrelated questions
For *1* I have faced similar issues, and have realized that it has got more to do with the data I am trying to index. In some cases when I run even a full-import with DIH, unless its a flat table that I am tryin to index, there are often issues at data end when I try to get joins and then index data. Am not too sure if you are joining two tables. If not I would suggest that you re-check your data and then re-index using full-import. -- View this message in context: http://lucene.472066.n3.nabble.com/Two-unrelated-questions-tp3348991p3357720.html Sent from the Solr - User mailing list archive at Nabble.com.