how to save a snapshot of an index?
When I add some docs by post.jar(org.apache.solr.util.SimplePostTool), It commits after all docs are added. It will call IndexWriter.commit(). And a new segment will be added and sometimes it triggers segment merging. New index files will be generated(frm, tii,tis, ). Old segments will be deleted after all references are closed(All the reader which open it). That's ok. But I want to backup a version of my index so that when something wrong happen I can use it. I can write a script to backup all the files in the index directory everyday. But it may happen that when it's indexing, the script may backup wrong files. So it must obtain the ***.lock file to make things right. Is there any built in tools in solr for my need ? I just want to back up the index periodly(such as 0 clock every day).
RE: Database connections during data import
Hi Gora Also indexing 4mil + records from a MS-SQL database - index size is about 25Gb. I managed to solve both the performance and recovery issue by segmenting the indexing process along with the CachedSqlEntityProcessor. Basically I populate a temp table with a subset of primary keys (I use a modulus of the productId to achieve this) and inner join from that table on both the primary query and all the child queries. As a result when a segment fails (usually also due to connectivity being interrupted) only one segment has to be re-done. Imports are managed by a custom built service running on the SOLR box. Its smart enough to pick up stalled imports when polling dataimport and restart that segment. With indexing segmented data sets become small enough for CachedSqlEntityProcessor to load it all into RAM (the box has 8GB). Doing this reduced indexing time from 27hours to 2.5hours! (Due to currency changes we need a full re-index every day). I suspect that latency kills import speed whenever there's child queries involved. Databases are also generally much better at 1 query with 300,000 rows than 100,000 queries with 2-4. The 4GB (actually 3.2GB) limit only applies to the 32bit version of Windows/SQL Server. That being said SQL server is not much of a RAM hog. After its basic querying needs memory is only used to cache indexes and query plans. SQL is pretty happy with 4GB but if you can upgrade the OS another 2GB for the disk cache will help a lot. Regards, Willem PS: You are using the JTDS driver? (http://jtds.sourceforge.net/) I find it faster and more stable than the MS one. -Original Message- From: Gora Mohanty [mailto:g...@srijan.in] Sent: 10 July 2010 03:31 PM To: solr-user@lucene.apache.org Subject: Database connections during data import Hi, We are indexing a large amount of data into Solr from a MS-SQL database (don't ask!). There are approximately 4 million records, and a total database size of the order of 20GB. There is also a need for incremental updates, but these are only a few % of the total. After some trials-and-error, things are working great. Indexing is a little slow as per our original expectations, but this is probably to be expected, given that: * There are a fair number of queries per record indexed into Solr * Only one database server is in use at the moment, and this could well be a bottle-neck (please see below). * The index has many fields, and we are also storing everything in this phase, so that we can recover data directly from the Solr index. * Transformers are used pretty liberally * Finally, we are no longer so concerned about the indexing speed of a single Solr instance, as thanks to the possibility of merging indexes, we can simply throw more hardware at the problem. (Incidentally, a big thank-you to everyone who has contributed to Solr. The above work was way easier than we had feared.) As a complete indexing takes about 20h, sometimes the process gets interrupted due to a loss of the database connection. I can tell that that a loss of connection is the problem from the Solr Tomcat logs, but it is difficult to tell whether it is the database dropping connections (the database server is at 60-70% CPU utilisation, but close to being maxed out at 4GB, and I am told that MS-SQL/the OS cannot handle more RAM), or a network glitch. What happens is that the logs report a reconnection, but the number of processed records reported by the DataImportHandler at /solr/dataimport?command=full-import stops incrementing, even several hours after the reconnection. Is there any way to recover from a reconnection, and continue DataImportHandler indexing at the point where the process left off? Regards, Gora P.S. Incidentally, would there be any interest in a GDataRequestHandler for Solr queries, and a GDataResponseWriter? We wrote one in the interests of trying to adhere to a de-facto standard, and can consider contributing these, after further testing, and cleanup.
Re: Modifications to AbstractSubTypeFieldType
On 7 Jul 2010, at 6:24 pm, Yonik Seeley wrote: On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll gsing...@apache.org wrote: Originally, I had intended that it was just for one Field Sub Type, thinking that if we ever wanted multiple sub types, that a new, separate class would be needed Right - this was my original thinking too. AbstractSubTypeFieldType is only a convenience class to create compound types... people can do it other ways. Just for clarification, does that mean my modifications won't be included? If so, can you let me know so that I can extract the changes and maintain them in a different package structure from the main Solr code please. Cheers Mark -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: how to save a snapshot of an index?
That's ok. But I want to backup a version of my index so that when something wrong happen I can use it. I can write a script to backup all the files in the index directory everyday. But it may happen that when it's indexing, the script may backup wrong files. So it must obtain the ***.lock file to make things right. Is there any built in tools in solr for my need ? I just want to back up the index periodly(such as 0 clock every day). Under the bin directory there are some scripts for that purpose. http://wiki.apache.org/solr/CollectionDistribution
Re: how to save a snapshot of an index?
Hi Li Li, If the changes are not that frequently just copy the data folder: http://wiki.apache.org/solr/SolrOperationsTools Or see this question + answer: http://stackoverflow.com/questions/3083314/solr-incremental-backup-on-real-time-system-with-heavy-index where those direct links could help: http://wiki.apache.org/solr/CollectionDistribution (solr 1.4) http://wiki.apache.org/solr/SolrReplication (solr = 1.4) Regards, Peter. When I add some docs by post.jar(org.apache.solr.util.SimplePostTool), It commits after all docs are added. It will call IndexWriter.commit(). And a new segment will be added and sometimes it triggers segment merging. New index files will be generated(frm, tii,tis, ). Old segments will be deleted after all references are closed(All the reader which open it). That's ok. But I want to backup a version of my index so that when something wrong happen I can use it. I can write a script to backup all the files in the index directory everyday. But it may happen that when it's indexing, the script may backup wrong files. So it must obtain the ***.lock file to make things right. Is there any built in tools in solr for my need ? I just want to back up the index periodly(such as 0 clock every day).
Re: Field Collapsing SOLR-236
Hi Mozzam, I finally got it working Thanks a ton guys :) Regards Raakhi On Sat, Jul 10, 2010 at 10:45 AM, Moazzam Khan moazz...@gmail.com wrote: Hi Rakhi, Sorry, I didn't see this email until just now. Did you get it working? If not here's some things that might help. - Download the patch first. - Check the date on which the patch was released. - Download the version of the trunk that existed at that date. - Apply the patch using the patch program in linux. There is a Windows program for patching but I can't remember right now. - After applying the patch just compile the whole thing It might be better if you used the example folder first and modify the config to work for multicore (at least that's what I did) . You can compile example by doing ant example (if I remember correctly) For config stuff refer to this link : http://wiki.apache.org/solr/FieldCollapsing HTH :) - Moazzam I'd give you the On Wed, Jun 23, 2010 at 7:23 AM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, But these is almost no settings in my config heres a snapshot of what i have in my solrconfig.xml config updateHandler class=solr.DirectUpdateHandler2 / requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=2048 / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.XmlUpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / !-- config for the admin interface -- admin defaultQuery*:*/defaultQuery /admin !-- config for field collapsing -- searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent / /config Am i goin wrong anywhere? Regards, Raakhi On Wed, Jun 23, 2010 at 3:28 PM, Govind Kanshi govind.kan...@gmail.com wrote: fieldType:analyzer without class or tokenizer filter list seems to point to the config - you may want to correct. On Wed, Jun 23, 2010 at 3:09 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, I checked out modules lucene from the trunk. Performed a build using the following commands ant clean ant compile ant example Which compiled successfully. I then put my existing index(using schema.xml from solr1.4.0/conf/solr/) in the multicore folder, configured solr.xml and started the server When i type in http://localhost:8983/solr i get the following error: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType:analyzer without class or tokenizer filter list at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:122) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:286) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:198) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at
Re: Sort by Day - Use of DateMathParser in Function Query?
Hi Hoss, ...somewhere you got confused, or missunderstood something. There is no default date field in Solr, there are only recomendations and examples provided in the example schema.xml -- in Solr 1.4.1 *and* in Solr 1.4 the recommended field for dealing with dates is solr.TrieDateField The idea of a default date type came while reading this on http://wiki.apache.org/solr/FunctionQuery: Arguments may be numerically indexed date fields such as TrieDate (the default in 1.4), or date math (examples in SolrQuerySyntax) based on a constant date or NOW. And now that I revisited that sentence, I see that it answers my question on whether I can use date math in those queries. Sorry for not reading more thoroughly... As noted in the FunctionQuery wiki page you mentioned, the ms() function does not work with solr.DateField. (most likely your schema.xml originally started from the example in SOlr 1.3 or earlier ... *OR* ... you needed the sortMissingLast/sortMissingFirst functionality that DateField supports but TrieDateField does not. the 1.4 example schema.xml explains the differences) Actually, right now, I don't need sortMissingLast because the date is required for all documents. It is good that you mention it, though. I will keep it in mind when considering changing a field to TrieDate. Thanks! Chantal
Re: Filter multivalue fields from search result
Hi, So if those are separate documents how should I handle paging? Two separate queries? First to return all matching courses-events pairs, and second one to get courses for given page? Is this common design described in details somewhere? Thanks, Alex On 2010-07-09 01:50, Lance Norskog wrote: Yes, denormalizing the index into separate (name,town) pairs is the common design for this problem. 2010/7/8 Alex J. G. Burzyńskimailing-s...@ajgb.net: Hi, Is it possible to remove from search results the multivalued fields that don't pass the search criteria? My schema is defined as: !-- course_id -- field name=id type=string indexed=true stored=true required=true / !-- course_name -- field name=name type=string indexed=true stored=true/ !-- events.event_town -- field name=town type=string indexed=true stored=true multiValued=true/ !-- events.event_date -- field name=date type=tdate indexed=true stored=true multiValued=true/ And example docs are: ++--+++ | id | name | town | date | ++--+++ | 1 | Microsoft Excel | London | 2010-08-20 | || | Glasgow| 2010-08-24 | || | Leeds | 2010-08-28 | | 2 | Microsoft Word | Aberdeen | 2010-08-21 | || | Reading| 2010-08-25 | || | London | 2010-08-29 | | 2 | Microsoft Powerpoint | Birmingham | 2010-08-22 | || | Leeds | 2010-08-26 | ++--+++ so the query for q=name:Microsoft town:Leeds returns docs 1 3. How would I remove London/Glasgow from doc 1 and Birmingham from doc 3? Or is it that I should create separate doc for each name-event? Thanks, Alex
Two analyzer per field
Is it possible to specify two analyzers per fields for example , consider a field *F1 *( keyword analyzer) = cheers mate *F2 *(keyword analyzer ) = hello world There is also a copy field *TEXT *( standard analyzer ) which will store the terms { cheers mate hello world } now when user perform any search we will be looking at copy field TEXT only which uses standard analyzer . Suppose user search hello word phrase it will not return any result as hello and world terms are tokenized . is it possible that I index hello world as it is as well in to *TEXT*field ? i.e can I use keyword analyzer as well and standard analyzer for field TEXT what should be better approach to handle this situation ? -- Nipen Mark
fq= more then one ?
Hallo, i tryes to ceate a new Search for mails, and become a Problem.. If i search: http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.de it works, i only get E-Mails from t...@mail.de But i need something like That: http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.defq=EMAIL_HEADER_TO:t...@mail.de But that, dosent work, it looks like, i can Only one parameter in FQ.. Maby you can help me. King
Re: fq= more then one ?
Hi Jörg, the filter queries are exclusive. You can specify as many as you want but everything that does not fit one of them will be excluded from your result. You can specify an OR clause in a single filter query to achieve what you want: fq=(EMAIL_HEADER_FROM:t...@mail.de OR EMAIL_HEADER_TO:t...@mail.de) Cheers, Chantal On Mon, 2010-07-12 at 11:05 +0200, Jörg Agatz wrote: Hallo, i tryes to ceate a new Search for mails, and become a Problem.. If i search: http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.de it works, i only get E-Mails from t...@mail.de But i need something like That: http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.defq=EMAIL_HEADER_TO:t...@mail.de But that, dosent work, it looks like, i can Only one parameter in FQ.. Maby you can help me. King
Re: fq= more then one ?
hi, you shouldn't have two fq parameters -- some solr params work like that, but fq doesn't http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.defq=EMAIL_HEADER_TO:t...@mail.de you need to combine it into a single param i.e. try putting it as an OR or AND if you're using the standard request handler: fq=EMAIL_HEADER_FROM:t...@mail.de%20or%20email_header_to:t...@mail.de or put something like + if you're using dismax (i think but i don't use it :) ) hope that helps, bec :)
Re: fq= more then one ?
oops - i thought you couldn't put more than one - ignore my answer then :) On 12 July 2010 17:20, Rebecca Watson bec.wat...@gmail.com wrote: hi, you shouldn't have two fq parameters -- some solr params work like that, but fq doesn't http://172.20.1.33:8983/solr/select/?q=*:*start=0fq=EMAIL_HEADER_FROM:t...@mail.defq=EMAIL_HEADER_TO:t...@mail.de you need to combine it into a single param i.e. try putting it as an OR or AND if you're using the standard request handler: fq=EMAIL_HEADER_FROM:t...@mail.de%20or%20email_header_to:t...@mail.de or put something like + if you're using dismax (i think but i don't use it :) ) hope that helps, bec :)
Re: Filter multivalue fields from search result
Hi Alex, I think you have to explain the complete use case. Paging is done by specifying the parameter start (and rows if you want to have more or less than 10 hits per page). For each page you need of course a new query, but the queries differ only in the parameter value start (first page start=0, second page start=10 etc. if rows=10). The other parameters remain the same. You should also have a look at facets. They might help you to get a list of the values of your multi valued fields that you can display in the UI, allowing the user to drill down the results further. Chantal On Mon, 2010-07-12 at 10:26 +0200, Alex J. G. Burzyński wrote: Hi, So if those are separate documents how should I handle paging? Two separate queries? First to return all matching courses-events pairs, and second one to get courses for given page? Is this common design described in details somewhere? Thanks, Alex On 2010-07-09 01:50, Lance Norskog wrote: Yes, denormalizing the index into separate (name,town) pairs is the common design for this problem. 2010/7/8 Alex J. G. Burzyńskimailing-s...@ajgb.net: Hi, Is it possible to remove from search results the multivalued fields that don't pass the search criteria? My schema is defined as: !-- course_id -- field name=id type=string indexed=true stored=true required=true / !-- course_name -- field name=name type=string indexed=true stored=true/ !-- events.event_town -- field name=town type=string indexed=true stored=true multiValued=true/ !-- events.event_date -- field name=date type=tdate indexed=true stored=true multiValued=true/ And example docs are: ++--+++ | id | name | town | date | ++--+++ | 1 | Microsoft Excel | London | 2010-08-20 | || | Glasgow| 2010-08-24 | || | Leeds | 2010-08-28 | | 2 | Microsoft Word | Aberdeen | 2010-08-21 | || | Reading| 2010-08-25 | || | London | 2010-08-29 | | 2 | Microsoft Powerpoint | Birmingham | 2010-08-22 | || | Leeds | 2010-08-26 | ++--+++ so the query for q=name:Microsoft town:Leeds returns docs 1 3. How would I remove London/Glasgow from doc 1 and Birmingham from doc 3? Or is it that I should create separate doc for each name-event? Thanks, Alex
Re: fq= more then one ?
OK... Thanks.. It works if i try it direktly.. but in PHP it dosent: *Warning*: file_get_contents(http://@mail.de OR EMAIL_HEADER_TO:t...@mail.de email_header_to%3at...@mail.de)) [ function.file-get-contentshttp://172.20.1.33/new/function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request in * /var/www/new/msearchres.php* on line *32* Code: $url=' http://172.20.1.33:8983/solr/select?wt=phpsq='.urlencode($q).'sort='.urlencode($sort).'%20'.$direction.'fq=(EMAIL_HEADER_FROM:'.$wo.' OR EMAIL_HEADER_TO:'.$wo.')'; if(isset($_GET['s'])) $url.=start=.$_GET['s']; $serializedResult = file_get_contents($url); $results = unserialize($serializedResult);
Ranking position in solr
I wonder there is a proper way to fulfill this requirement. A book has several keyphrases. Each keyphrase consists from one word to 3 words. The author could either buy keyphrase position or don't buy position. Note: each author could buy more than 1 keyphrase. The keyphrase search must be exact and case sensitive. For example: Book A, keyphrases: agile, web, development Book B, keyphrases: css, html, web Let's say Author of Book A buys search result position 1 with keyphrase web, so his book should be in the first position. His book should be listed before the Book B. Anyone has any suggestions on how to implement this in solr? -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Re: Filter multivalue fields from search result
Hi Chantal, The paging problem I've asked about is that having course-event pairs and specifying rows limits the number of pairs returned not the courses +---+--+++ | id-id | name | town | date | +---+--+++ | 1-1 | Microsoft Excel | London | 2010-08-20 | | 1-2 | Microsoft Excel | Glasgow| 2010-08-24 | | 1-3 | Microsoft Excel | Leeds | 2010-08-28 | | 2-1 | Microsoft Word | Aberdeen | 2010-08-21 | | 2-2 | Microsoft Word | Reading| 2010-08-25 | | 2-3 | Microsoft Word | London | 2010-08-29 | | 3-1 | Microsoft Powerpoint | Birmingham | 2010-08-22 | | 3-2 | Microsoft Powerpoint | Leeds | 2010-08-26 | | 3-3 | Microsoft Powerpoint | Leeds | 2010-08-30 | +---+--+++ And from UI point of view I'm returning less courses then events - that's why I've asked about paging. The search for q=name:Microsoft town:Leeds with rows=2 should return: 1-3 3-2 3-3 But 3-3 will be obviously on page 2. I hope that it makes my questions more clear. Thanks, Alex On 2010-07-12 10:26, Chantal Ackermann wrote: Hi Alex, I think you have to explain the complete use case. Paging is done by specifying the parameter start (and rows if you want to have more or less than 10 hits per page). For each page you need of course a new query, but the queries differ only in the parameter value start (first page start=0, second page start=10 etc. if rows=10). The other parameters remain the same. You should also have a look at facets. They might help you to get a list of the values of your multi valued fields that you can display in the UI, allowing the user to drill down the results further. Chantal On Mon, 2010-07-12 at 10:26 +0200, Alex J. G. Burzyński wrote: Hi, So if those are separate documents how should I handle paging? Two separate queries? First to return all matching courses-events pairs, and second one to get courses for given page? Is this common design described in details somewhere? Thanks, Alex On 2010-07-09 01:50, Lance Norskog wrote: Yes, denormalizing the index into separate (name,town) pairs is the common design for this problem. 2010/7/8 Alex J. G. Burzyńskimailing-s...@ajgb.net: Hi, Is it possible to remove from search results the multivalued fields that don't pass the search criteria? My schema is defined as: !-- course_id -- field name=id type=string indexed=true stored=true required=true / !-- course_name -- field name=name type=string indexed=true stored=true/ !-- events.event_town -- field name=town type=string indexed=true stored=true multiValued=true/ !-- events.event_date -- field name=date type=tdate indexed=true stored=true multiValued=true/ And example docs are: ++--+++ | id | name | town | date | ++--+++ | 1 | Microsoft Excel | London | 2010-08-20 | || | Glasgow| 2010-08-24 | || | Leeds | 2010-08-28 | | 2 | Microsoft Word | Aberdeen | 2010-08-21 | || | Reading| 2010-08-25 | || | London | 2010-08-29 | | 2 | Microsoft Powerpoint | Birmingham | 2010-08-22 | || | Leeds | 2010-08-26 | ++--+++ so the query for q=name:Microsoft town:Leeds returns docs 1 3. How would I remove London/Glasgow from doc 1 and Birmingham from doc 3? Or is it that I should create separate doc for each name-event? Thanks, Alex
Query: URl too long
Hi, I need to perform a search using a list of values (about 2000). I'm using SolrNET QueryInList function that creates the searchstring like: fieldName: value1 OR fieldName: value2 OR fieldName: value3... (2000 values) This method created a string with about 100 000 chars and the Web Request fails with URI too long (C#). I'm trying to update an old Lucene app that performs this kind of searches. How can I achieve this with Solr? What are my options here? Thank you, Frederico
Re: Query: URl too long
Hi Frederico, not sure about solrNET, but changing the http method from GET to POST worked for me (using SolrJ). Chantal On Mon, 2010-07-12 at 12:18 +0200, Frederico Azeiteiro wrote: Hi, I need to perform a search using a list of values (about 2000). I'm using SolrNET QueryInList function that creates the searchstring like: fieldName: value1 OR fieldName: value2 OR fieldName: value3... (2000 values) This method created a string with about 100 000 chars and the Web Request fails with URI too long (C#). I'm trying to update an old Lucene app that performs this kind of searches. How can I achieve this with Solr? What are my options here? Thank you, Frederico
Re: Query: URl too long
I'm using SolrNET QueryInList function that creates the searchstring like: fieldName: value1 OR fieldName: value2 OR fieldName: value3... (2000 values) This method created a string with about 100 000 chars and the Web Request fails with URI too long (C#). Not sure about SolrNet but you can use POST method instead of GET or configure maxHttpHeaderSize setting of your servlet container. For example for tomcat http://wiki.apache.org/solr/SolrTomcat#Enabling_Longer_Query_Requests
Re: Ranking position in solr
I wonder there is a proper way to fulfill this requirement. A book has several keyphrases. Each keyphrase consists from one word to 3 words. The author could either buy keyphrase position or don't buy position. Note: each author could buy more than 1 keyphrase. The keyphrase search must be exact and case sensitive. For example: Book A, keyphrases: agile, web, development Book B, keyphrases: css, html, web Let's say Author of Book A buys search result position 1 with keyphrase web, so his book should be in the first position. His book should be listed before the Book B. Anyone has any suggestions on how to implement this in solr? http://wiki.apache.org/solr/QueryElevationComponent - which is used to elevate results based on editorial decisions - may help.
Re: Query: URl too long
Hi there, We had a similar issue. It's an easy fix, simply change the request type from GET to POST. Jon On 12 Jul 2010, at 11:18, Frederico Azeiteiro wrote: Hi, I need to perform a search using a list of values (about 2000). I'm using SolrNET QueryInList function that creates the searchstring like: fieldName: value1 OR fieldName: value2 OR fieldName: value3... (2000 values) This method created a string with about 100 000 chars and the Web Request fails with URI too long (C#). I'm trying to update an old Lucene app that performs this kind of searches. How can I achieve this with Solr? What are my options here? Thank you, Frederico
Re: Filter multivalue fields from search result
Hi Alex, feedback inline: On Mon, 2010-07-12 at 12:03 +0200, Alex J. G. Burzyński wrote: Hi Chantal, The paging problem I've asked about is that having course-event pairs and specifying rows limits the number of pairs returned not the courses +---+--+++ | id-id | name | town | date | +---+--+++ | 1-1 | Microsoft Excel | London | 2010-08-20 | | 1-2 | Microsoft Excel | Glasgow| 2010-08-24 | | 1-3 | Microsoft Excel | Leeds | 2010-08-28 | | 2-1 | Microsoft Word | Aberdeen | 2010-08-21 | | 2-2 | Microsoft Word | Reading| 2010-08-25 | | 2-3 | Microsoft Word | London | 2010-08-29 | | 3-1 | Microsoft Powerpoint | Birmingham | 2010-08-22 | | 3-2 | Microsoft Powerpoint | Leeds | 2010-08-26 | | 3-3 | Microsoft Powerpoint | Leeds | 2010-08-30 | +---+--+++ And from UI point of view I'm returning less courses then events - that's why I've asked about paging. The search for q=name:Microsoft town:Leeds with rows=2 should return: 1-3 3-2 3-3 If you want to list all available courses in a query and also display how often and where they take place, then query for name (in your table) and facet on town per name. This might require the use of the facet.query parameter. Otherwise use your query from above and group afterwards in the client or your server backend. Of course, you should increase the rows value. But I see your point with paging, so facetting might be a better option. Or maybe field collapsing is what you need (there is a patch - search for solr field collapsing and you should find a lot about it). (I haven't tried that, however, and it's just a guess.) Chantal But 3-3 will be obviously on page 2. I hope that it makes my questions more clear. Thanks, Alex
indexing with pdf files problem
hi all, i am working with solr on tomcat. the indexing is good for xml files but when i send the docs or html files or pdf's through curl i get the error as lazy error. can u telll me the way. the output is as follows when i send a pdf file i am working in ubuntu. solr home is /opt/example tomcat is /opt/tomcat6 htmlheadtitleApache Tomcat/6.0.26 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.solr.common.SolrException: java.lang.NullPointerException at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:76) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) ... 16 more Caused by: java.lang.NullPointerException at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:73) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:99) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:84) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:61) at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:74) ... 17 more /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b ulazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at
RE: Query: URl too long
Hi, A closer look shows that the problem is not on the request but on the creation of the URI object. The exception is sent when trying to access the URI object inside the URIbuilder. Trying to google it but without luck... -Original Message- From: Jon Poulton [mailto:jon.poul...@vyre.com] Sent: segunda-feira, 12 de Julho de 2010 11:56 To: solr-user@lucene.apache.org Subject: Re: Query: URl too long Hi there, We had a similar issue. It's an easy fix, simply change the request type from GET to POST. Jon On 12 Jul 2010, at 11:18, Frederico Azeiteiro wrote: Hi, I need to perform a search using a list of values (about 2000). I'm using SolrNET QueryInList function that creates the searchstring like: fieldName: value1 OR fieldName: value2 OR fieldName: value3... (2000 values) This method created a string with about 100 000 chars and the Web Request fails with URI too long (C#). I'm trying to update an old Lucene app that performs this kind of searches. How can I achieve this with Solr? What are my options here? Thank you, Frederico
RE: Query: URl too long
Yes, i guess i can't create an URI object that long. Can someone remember other options? I'm thinking about options avoiding the http request... My best try is using lucene again but keep the solr for indexing. Do you thing this is a good aproach? -Original Message- From: Frederico Azeiteiro [mailto:frederico.azeite...@cision.com] Sent: segunda-feira, 12 de Julho de 2010 12:10 To: solr-user@lucene.apache.org Subject: RE: Query: URl too long Hi, A closer look shows that the problem is not on the request but on the creation of the URI object. The exception is sent when trying to access the URI object inside the URIbuilder. Trying to google it but without luck... -Original Message- From: Jon Poulton [mailto:jon.poul...@vyre.com] Sent: segunda-feira, 12 de Julho de 2010 11:56 To: solr-user@lucene.apache.org Subject: Re: Query: URl too long Hi there, We had a similar issue. It's an easy fix, simply change the request type from GET to POST. Jon On 12 Jul 2010, at 11:18, Frederico Azeiteiro wrote: Hi, I need to perform a search using a list of values (about 2000). I'm using SolrNET QueryInList function that creates the searchstring like: fieldName: value1 OR fieldName: value2 OR fieldName: value3... (2000 values) This method created a string with about 100 000 chars and the Web Request fails with URI too long (C#). I'm trying to update an old Lucene app that performs this kind of searches. How can I achieve this with Solr? What are my options here? Thank you, Frederico
Using stored terms for faceting
Hi, is it possible to use the stored terms of a field for a faceted search? I mean, I don't want to get the term frequency per document as it is shown here: http://wiki.apache.org/solr/TermVectorComponentExampleOptions I want to get the frequency of the term of my special search and show only the 10 most frequent terms and all the nice things that I can do for faceting. At the moment I am calculating the terms for every document and index them into a separate multivalued field where I can then easily apply faceting. But is there a better way? Regards, Peter.
RE: Query: URl too long
Yes, i guess i can't create an URI object that long. Can someone remember other options? You can shorten your String by not repeating OR and fieldName. e.g. fieldName: value1 OR fieldName: value2 OR fieldName: value3... q=value1 value2 value3q.op=ORdf=fieldName By the way how are you generating these value1 value2 etc? If the above does not solve your problem you can embed this logic into a custom SearchHandler.
Re: Query: URl too long
Frederico, You should also pose your question on the SolrNet forum, http://groups.google.com/group/solrnet?hl=en Switching from GET to POST isn't a Solr issue, but a SolrNet issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-URl-too-long-tp959990p960208.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem during indexing
i am trying to add 20 million documents to my index from another index that contains these documents(cant help this architecture..its something that i will have to follow) Now the problems i am facing are the following : 1) Too many files open error.. its at the code which is adding documents to my index. IndexWriter w = new IndexWriter(); IndexWriter w = new IndexWriter(index , analyzer, IndexWriter.MaxFieldLength.UNLIMITED ); w.setMergeFactor(1000); w.setMaxBufferedDocs(1000); w.setMaxMergeDocs(60); for(i =0;ireader1.numDocs();i++) { System.out.println(i); addDoc(w, reader1.document(i).getField( url ).stringValue( ),reader1.document(i).getField( content ).stringValue().replace('.',' ').replace('-',' ')); } w.optimize(); w.close(); reader1.close(); Due to the mergefactor parameters that i have set, around 1300 .cfs files are now opened in index, but there is only one fdt file. These files seem to be the reason for the this error. Are these files not closed?? do i have to do IndexWriter.commit() in the loop to close these open files??
RE: Query: URl too long
Not an option because the query has other fields to query also. They are generated throw a list choices (that could go to 5000's string with 7 char each..). I don't know is this could be considered off-topic (please advise...) but: i'm doing some test with lucene (Lucene.Net 2.9.2) but the results with date range queries are not similar (0 hits on Lucene, 900 with Solr). Does lucene supports date range queries? Thank you for your help. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: segunda-feira, 12 de Julho de 2010 13:16 To: solr-user@lucene.apache.org Subject: RE: Query: URl too long Yes, i guess i can't create an URI object that long. Can someone remember other options? You can shorten your String by not repeating OR and fieldName. e.g. fieldName: value1 OR fieldName: value2 OR fieldName: value3... q=value1 value2 value3q.op=ORdf=fieldName By the way how are you generating these value1 value2 etc? If the above does not solve your problem you can embed this logic into a custom SearchHandler.
ShingleFilter failing with more words than indexed phrase
I am using Solr 1.4.1 (lucene 2.9.3) on windows and am trying to understand ShingleFilter. I wrote the following code and find that if I provide more words than the actual phrase index in the field, then the search on that field fails (no score found with debugQuery=true). Here is an example to reproduce, with field names: Id: 1 title_1: Nina Simone title_2: I put a spell on you Query (dismax) with: “Nina Simone I put” - Fails i.e. no score shown from title_1 search (using debugQuery) “Nina Simone” - Success I checked the index with luke and it showed correct indexes. I used Solr’s Field Analysis with the ‘shingle’ field and tried “Nina Simone I put” and it succeeds, as I would expect as correct behavior. It’s only during the query that no score is provided. I also checked ‘parsedquery’ and it shows disjunctionMaxQuery issuing the string “Nina_Simone Simone_I I_put” to the title_1 field. title_1 and title_2 fields are of type ‘shingle’, defined as: fieldType name=shingle class=solr.TextField positionIncrementGap=100 indexed=true stored=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=false/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=false/ /analyzer /fieldType Note that I also have a catchall field which is text. I have qf set to: 'id catchall' and pf set to: 'title_1 title_2' Am I missing something here in my expectation or is there a bug somewhere? -Ethan
Re: Query: URl too long
Frederico, This is indeed a SolrNet issue. You can switch to POST in queries by implementing a ISolrConnection decorator. In the Get() method you'd build a POST request instead of the standard GET. Please use the SolrNet forum for further questions about SolrNet. Cheers, Mauricio On Mon, Jul 12, 2010 at 9:33 AM, kenf_nc ken.fos...@realestate.com wrote: Frederico, You should also pose your question on the SolrNet forum, http://groups.google.com/group/solrnet?hl=en Switching from GET to POST isn't a Solr issue, but a SolrNet issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-URl-too-long-tp959990p960208.html Sent from the Solr - User mailing list archive at Nabble.com.
Copy Date Field and/or Using DateMathParser in DataImportHandler
Hi and back again, to create a copy of my date field that holds only the date with no time (=0:00h time). The question is: Do I have to create the new date (without time) in my own transformer (using a Calendar object) or is there some convenient way to use the DateMathParser during indexing time when using DataImportHandler? I checked out: https://issues.apache.org/jira/browse/SOLR-469 which looks like the original Jira issue tracking the DataImportHandler development. And http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer and the thread: [solr-user] Sorting dates with reduced precision http://search.lucidimagination.com/search/document/f2313ffae081bf79/sorting_dates_with_reduced_precision#46566037750d7b5 In the latter, it's said: Append /DAY to the date value you index, for example 1995-12-31T23:59:59Z/DAY will yield 1995-12-31 [...] Thanks, this happens at indexing time? Yes Well, I tried the most simple idea that came to my mind: copyField source=start_date/DAY dest=start_day / but this does not work. Certainly - the slash is not a reserved character and SOLR expects a field called start_date/DAY, in this case. Is it possible to use the DateMathParser syntax to create that new field from the existing date field or the sourcing date string? In the Jira issue listed above I found this: A new interface called Evaluator has been added which makes it possible to plugin new expression evaluators (for resolving variable names) Using the same Evaluator interface, a few new evaluators have been added formatDate - use as ${dataimporter.functions.formatDate('NOW',-MM-dd HH:mm)}, this will format NOW as per the given format and return a string which can be used in queries or urls. It supports the full DateMathParser syntax. You can also format fields e.g. ${dataimporter.functions.formatDate(A.purchase_date,dd-MM-)} This is from 2008, is this still true for the current DataImportHandler? Just looking for the best method to solve this. Any insights very much appresciated! Thanks, Chantal
RE: Query: URl too long
Ok, I posted on SOLRNet forum asking how can I reduce the URL string using POST method. But I'm giving a try to SOLRJ. Think should be the right way to do it maybe. -Original Message- From: Mauricio Scheffer [mailto:mauricioschef...@gmail.com] Sent: segunda-feira, 12 de Julho de 2010 14:31 To: solr-user@lucene.apache.org Subject: Re: Query: URl too long Frederico, This is indeed a SolrNet issue. You can switch to POST in queries by implementing a ISolrConnection decorator. In the Get() method you'd build a POST request instead of the standard GET. Please use the SolrNet forum for further questions about SolrNet. Cheers, Mauricio On Mon, Jul 12, 2010 at 9:33 AM, kenf_nc ken.fos...@realestate.com wrote: Frederico, You should also pose your question on the SolrNet forum, http://groups.google.com/group/solrnet?hl=en Switching from GET to POST isn't a Solr issue, but a SolrNet issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-URl-too-long-tp959990p960208.ht ml Sent from the Solr - User mailing list archive at Nabble.com.
AW: Copy Date Field and/or Using DateMathParser in DataImportHandler
Hi Chantal, where is your Solr integrated? Where is this Date comin from? I personaly wouldnt use SOLR for such conversions, ist nice if there are such built-in features, but sticking to java/.net or whatever generates your documents seems much more comfortable. In Java f.e. this is just 2 lines of code and you are done. cheers. -Ursprüngliche Nachricht- Von: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] Gesendet: Montag, 12. Juli 2010 16:05 An: solr-user@lucene.apache.org Betreff: Copy Date Field and/or Using DateMathParser in DataImportHandler Hi and back again, to create a copy of my date field that holds only the date with no time (=0:00h time). The question is: Do I have to create the new date (without time) in my own transformer (using a Calendar object) or is there some convenient way to use the DateMathParser during indexing time when using DataImportHandler? I checked out: https://issues.apache.org/jira/browse/SOLR-469 which looks like the original Jira issue tracking the DataImportHandler development. And http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer and the thread: [solr-user] Sorting dates with reduced precision http://search.lucidimagination.com/search/document/f2313ffae081bf79/sorting_dates_with_reduced_precision#46566037750d7b5 In the latter, it's said: Append /DAY to the date value you index, for example 1995-12-31T23:59:59Z/DAY will yield 1995-12-31 [...] Thanks, this happens at indexing time? Yes Well, I tried the most simple idea that came to my mind: copyField source=start_date/DAY dest=start_day / but this does not work. Certainly - the slash is not a reserved character and SOLR expects a field called start_date/DAY, in this case. Is it possible to use the DateMathParser syntax to create that new field from the existing date field or the sourcing date string? In the Jira issue listed above I found this: A new interface called Evaluator has been added which makes it possible to plugin new expression evaluators (for resolving variable names) Using the same Evaluator interface, a few new evaluators have been added formatDate - use as ${dataimporter.functions.formatDate('NOW',-MM-dd HH:mm)}, this will format NOW as per the given format and return a string which can be used in queries or urls. It supports the full DateMathParser syntax. You can also format fields e.g. ${dataimporter.functions.formatDate(A.purchase_date,dd-MM-)} This is from 2008, is this still true for the current DataImportHandler? Just looking for the best method to solve this. Any insights very much appresciated! Thanks, Chantal
AW: Copy Date Field and/or Using DateMathParser in DataImportHandler
Hm seems i didnt read the 1st part of your Question:/ Forget what i just wrote. :) -Ursprüngliche Nachricht- Von: Bastian Spitzer [mailto:bspit...@magix.net] Gesendet: Montag, 12. Juli 2010 16:41 An: solr-user@lucene.apache.org Betreff: AW: Copy Date Field and/or Using DateMathParser in DataImportHandler Hi Chantal, where is your Solr integrated? Where is this Date comin from? I personaly wouldnt use SOLR for such conversions, ist nice if there are such built-in features, but sticking to java/.net or whatever generates your documents seems much more comfortable. In Java f.e. this is just 2 lines of code and you are done. cheers. -Ursprüngliche Nachricht- Von: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] Gesendet: Montag, 12. Juli 2010 16:05 An: solr-user@lucene.apache.org Betreff: Copy Date Field and/or Using DateMathParser in DataImportHandler Hi and back again, to create a copy of my date field that holds only the date with no time (=0:00h time). The question is: Do I have to create the new date (without time) in my own transformer (using a Calendar object) or is there some convenient way to use the DateMathParser during indexing time when using DataImportHandler? I checked out: https://issues.apache.org/jira/browse/SOLR-469 which looks like the original Jira issue tracking the DataImportHandler development. And http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer and the thread: [solr-user] Sorting dates with reduced precision http://search.lucidimagination.com/search/document/f2313ffae081bf79/sorting_dates_with_reduced_precision#46566037750d7b5 In the latter, it's said: Append /DAY to the date value you index, for example 1995-12-31T23:59:59Z/DAY will yield 1995-12-31 [...] Thanks, this happens at indexing time? Yes Well, I tried the most simple idea that came to my mind: copyField source=start_date/DAY dest=start_day / but this does not work. Certainly - the slash is not a reserved character and SOLR expects a field called start_date/DAY, in this case. Is it possible to use the DateMathParser syntax to create that new field from the existing date field or the sourcing date string? In the Jira issue listed above I found this: A new interface called Evaluator has been added which makes it possible to plugin new expression evaluators (for resolving variable names) Using the same Evaluator interface, a few new evaluators have been added formatDate - use as ${dataimporter.functions.formatDate('NOW',-MM-dd HH:mm)}, this will format NOW as per the given format and return a string which can be used in queries or urls. It supports the full DateMathParser syntax. You can also format fields e.g. ${dataimporter.functions.formatDate(A.purchase_date,dd-MM-)} This is from 2008, is this still true for the current DataImportHandler? Just looking for the best method to solve this. Any insights very much appresciated! Thanks, Chantal
CommonsHttpSolrServer add document hangs
Hey guys, I'm using Solr 1.4.1 and I've been having some problems lately with code that adds documents through a CommonsHttpSolrServer. It seems that randomly the call to theserver.add() will hang. I am currently running my code in a single thread, but I noticed this would happen in multi threaded code as well. The jar version of commons-httpclient is 3.1. I got a thread dump of the process, and one thread seems to be waiting on the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as shown below. All other threads are in a RUNNABLE state (besides the Finalizer daemon). [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode): [java] [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10 tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000] [java]java.lang.Thread.State: WAITING (on object monitor) [java] at java.lang.Object.wait(Native Method) [java] - waiting on 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) [java] - locked 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) [java] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) Any ideas? Thanks.
RE: CommonsHttpSolrServer add document hangs
Maybe solr is busy doing a commit or optimize? -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Monday, July 12, 2010 9:59 AM To: solr-user@lucene.apache.org Subject: CommonsHttpSolrServer add document hangs Hey guys, I'm using Solr 1.4.1 and I've been having some problems lately with code that adds documents through a CommonsHttpSolrServer. It seems that randomly the call to theserver.add() will hang. I am currently running my code in a single thread, but I noticed this would happen in multi threaded code as well. The jar version of commons-httpclient is 3.1. I got a thread dump of the process, and one thread seems to be waiting on the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as shown below. All other threads are in a RUNNABLE state (besides the Finalizer daemon). [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode): [java] [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10 tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000] [java]java.lang.Thread.State: WAITING (on object monitor) [java] at java.lang.Object.wait(Native Method) [java] - waiting on 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) [java] - locked 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) [java] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) Any ideas? Thanks.
Re: Database connections during data import
On Sun, 11 Jul 2010 07:22:51 -0700 (PDT) osocurious2 ken.fos...@realestate.com wrote: Gora, Our environment, currently under development, is very nearly the exact same thing as yours. My DB is currently only about 10GB, but likely to grow. [...] Thanks for your response. It is good to hear from people dealing with similar issues. I'm still trying out different architectures to deal with this. I've tried doing a Bulk Copy from the DB to some flat files and importing from there. File handles seem to be more stable than database connections. But it brings it's own issues to the party. Yes, we tried that too, but creating the XMLs turned out to be as time-consuming. We ended up using multiple cores on several Solr instances. Please see some further details in a separate response to Willem. I'm also currently looking at using queuing (either MSMQ or Amazons Simple Queue service) so the database piece isn't used for 20 hours, but gets it's part over fairly quickly. I haven't done this using DataImportHandler however, not sure yet how, so I'm writing my own Import manager. [...] We are considering using Amazon, but at this point I believe that we will have the indexing time down to our requirements through multiple cores on multiple Solr instances. The DataImportHandler docs are pretty good, but I will try to get the time to write up an example on using transformers, etc., which turned out to be a little tricky. Or, at least it took me some trial-and-error beyond the available documentation. As to the GData handler and response writer. I would be very interested in OData versions, which wouldn't be too much of a stretch from GData to deal with. Would you be moving in that direction later? Or if you put your contrib out there could someone else (maybe me if time allows) be able to take it there? That would be a great edition for our work in a few months. Yes, we would be happy to do that, though I do need to look at how closely our solution meets the GData specifications. Also, at the moment, we have only implemented the GET part, i.e., search results can only be retrieved through the GData interface. Good luck, and I'd love to keep in touch about your solutions, I'm sure I could get some great ideas from them for our own work. [...] Likewise, I am sure that we can learn much from you guys. Willem and you have already given me some ideas. We should maybe start getting use cases up on the Solr Wiki, or at least on a blog somewhere. Regards, Gora
Re: Database connections during data import
On Mon, 12 Jul 2010 09:20:05 +0200 Willem Van Riet willem.vanr...@sa.24.com wrote: Hi Gora Also indexing 4mil + records from a MS-SQL database - index size is about 25Gb. Thanks for some great pointers. More detailed responses below. I managed to solve both the performance and recovery issue by segmenting the indexing process along with the CachedSqlEntityProcessor. Basically I populate a temp table with a subset of primary keys (I use a modulus of the productId to achieve this) and inner join from that table on both the primary query and all the child queries. [...] Thanks for that pointer. I had read about the CachedSqlEntityProcessor, but my eyes must have been glazing over at that point. That sounds like a great possibility, especially your point on breaking up the data into chunks small enough to fit into physical RAM. We came up with something of a brute-force solution. We discovered that indexing on each of several cores on a single multi-core Solr instance was comparably fast to indexing on separate Solr instances. So, we have broken up our hardware into 15 cores on five Solr instances (three/instance seems to peg the CPU on each Solr server at ~80%), and two MS-SQL database servers, and seem to be down to about 6 hours for indexing (scaling almost exactly by the number of cores). Tomorrow, we plan to bring online another five Solr instances, and a third database server, in order to halve that time. Beyond that, we are probably going to something like Amazon. The 4GB (actually 3.2GB) limit only applies to the 32bit version of Windows/SQL Server. That being said SQL server is not much of a RAM hog. After its basic querying needs memory is only used to cache indexes and query plans. SQL is pretty happy with 4GB but if you can upgrade the OS another 2GB for the disk cache will help a lot. [...] Yes, it turns out that I was (somewhat) unwarrantedly bad-mouthing Microsoft. The database server stands up quite well in terms of CPU usage, though 3-4 Solr DIH instances hitting the DB seem to get up to the RAM limit almost at once. Unfortunately, upgrading the OS is not an option at the moment, but the database server is hardly the bottle-neck now. PS: You are using the JTDS driver? (http://jtds.sourceforge.net/) I find it faster and more stable than the MS one. Oh, saw that driver, but did not know that it was better than the MS one. Thanks for the tip. Regards, Gora Gora
/select handler statistics
Hi All, I am looking at the stats.jsp page in the SOLR admin panel. I do not see statistics for the /select request handler. I want to know total # of search requests + avg time of request ... etc Am I overlooking something? Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan Chase, Inc. This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
Indexing large amount of data
i have a large amount of data (120 GB) to be indexed in the index. Hence i want to improve the performance of indexing this data. I went through the documentation given on the lucene website which mentioned various ways by which the performance can be improved. i am working on debian linux with amd64. hence the file size supported is very large. java version is 1.6 i tried many points mentioned in that documentations but got unusual results. 1) Reuse field document objects to reduce the GC overhead using the field.setValue() method.. By doing this, instead of speeding up, the indexing speed reduced drastically. i know this is unusual but thats what happened. 2) Tuning parameters by setMergeFactor(), setMaxBufferedDocs(). now the default value for both is 10.. i increased the value to 1000.. by doing so the no of .CSF file in the index folder increased many folds.. and i got java.io.IOException : Too Many Files Open. IF i choose the default value 10 for both the parameters then this error is avoided but then size of .fdt file in index becomes really high. so where am i going wrong ?? how to overcome these problems..how to speed up my indexing process..
RE: /select handler statistics
Hi, I think you're looking for the statistics for the standard request handler. Cheers, -Original message- From: Vladimir Sutskever vladimir.sutske...@jpmorgan.com Sent: Mon 12-07-2010 19:44 To: solr-user@lucene.apache.org; Subject: /select handler statistics Hi All, I am looking at the stats.jsp page in the SOLR admin panel. I do not see statistics for the /select request handler. I want to know total # of search requests + avg time of request ... etc Am I overlooking something? Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan Chase, Inc. This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
RE: /select handler statistics
Yup that did it. Thank you Markus -Original Message- From: Markus Jelsma [mailto:markus.jel...@buyways.nl] Sent: Monday, July 12, 2010 2:30 PM To: solr-user@lucene.apache.org Subject: RE: /select handler statistics Hi, I think you're looking for the statistics for the standard request handler. Cheers, -Original message- From: Vladimir Sutskever vladimir.sutske...@jpmorgan.com Sent: Mon 12-07-2010 19:44 To: solr-user@lucene.apache.org; Subject: /select handler statistics Hi All, I am looking at the stats.jsp page in the SOLR admin panel. I do not see statistics for the /select request handler. I want to know total # of search requests + avg time of request ... etc Am I overlooking something? Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan Chase, Inc. This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
Re: CommonsHttpSolrServer add document hangs
Thanks Robert, My script did start going again, but it was waiting for about half an hour which seems a bit excessive to me. Is there some tuning I can do on the solr end to optimize for my use case, which is very heavy on commits and very light on searches (I do most of my searches on the raw Lucene index in the background)? Thanks. On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen rober...@buy.com wrote: Maybe solr is busy doing a commit or optimize? -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Monday, July 12, 2010 9:59 AM To: solr-user@lucene.apache.org Subject: CommonsHttpSolrServer add document hangs Hey guys, I'm using Solr 1.4.1 and I've been having some problems lately with code that adds documents through a CommonsHttpSolrServer. It seems that randomly the call to theserver.add() will hang. I am currently running my code in a single thread, but I noticed this would happen in multi threaded code as well. The jar version of commons-httpclient is 3.1. I got a thread dump of the process, and one thread seems to be waiting on the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as shown below. All other threads are in a RUNNABLE state (besides the Finalizer daemon). [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode): [java] [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10 tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000] [java]java.lang.Thread.State: WAITING (on object monitor) [java] at java.lang.Object.wait(Native Method) [java] - waiting on 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) [java] - locked 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) [java] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) Any ideas? Thanks.
RE: CommonsHttpSolrServer add document hangs
You could try a master slave setup using replication perhaps, then the slave serves searches and indexing commits on the master won't hang up searches at least... Here is the description: http://wiki.apache.org/solr/SolrReplication -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Monday, July 12, 2010 11:57 AM To: solr-user@lucene.apache.org Subject: Re: CommonsHttpSolrServer add document hangs Thanks Robert, My script did start going again, but it was waiting for about half an hour which seems a bit excessive to me. Is there some tuning I can do on the solr end to optimize for my use case, which is very heavy on commits and very light on searches (I do most of my searches on the raw Lucene index in the background)? Thanks. On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen rober...@buy.com wrote: Maybe solr is busy doing a commit or optimize? -Original Message- From: Max Lynch [mailto:ihas...@gmail.com] Sent: Monday, July 12, 2010 9:59 AM To: solr-user@lucene.apache.org Subject: CommonsHttpSolrServer add document hangs Hey guys, I'm using Solr 1.4.1 and I've been having some problems lately with code that adds documents through a CommonsHttpSolrServer. It seems that randomly the call to theserver.add() will hang. I am currently running my code in a single thread, but I noticed this would happen in multi threaded code as well. The jar version of commons-httpclient is 3.1. I got a thread dump of the process, and one thread seems to be waiting on the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as shown below. All other threads are in a RUNNABLE state (besides the Finalizer daemon). [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode): [java] [java] MultiThreadedHttpConnectionManager cleanup daemon prio=10 tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000] [java]java.lang.Thread.State: WAITING (on object monitor) [java] at java.lang.Object.wait(Native Method) [java] - waiting on 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) [java] - locked 0x7f443ae5b290 (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) [java] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) Any ideas? Thanks.
Problem with Wildcard searches in Solr
Hi, I am having a problem doing wildcard searches in lucene syntax using the edismax handler. I have Solr 4.0 nightly build from the trunk. A general search like 'computer' returns results but 'com*er' doesn't return any results. Similary, a search like 'co?mput?r' returns no results. The only type of wildcard searches working currrently is ones with trailing wildcards(like compute? or comput*). I want to be able to do searches with wildcards at the beginning (*puter) and in between (com*er). Could someone please tell me what I am doing wrong and how to fix it. Thanks. Regards, Imran. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961448.html Sent from the Solr - User mailing list archive at Nabble.com.
How to find first document for the ALL search
I have found that this search crashes: /solr/select?q=*%3A*fq=start=0rows=1fl=id SEVERE: java.lang.IndexOutOfBoundsException: Index: 114, Size: 90 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:288) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:217) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506) at org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:259) but this one works: /solr/select?q=*%3A*fq=start=1rows=1fl=id It looks like just that first document is bad. I am happy to delete it - but not sure how to get to it. Does anyone know how to find it? - Ian
range faceting with integers
So I want to provide some range facets with an integer (probably tint, that is trie field with non-0 precision) solr field. It's clear enough how to do this, along the lines of facet.query=[1 TO 100]facet.query=[101 TO 200]facet.query=[201 TO 300] etc. The issue is that I'd like to calculate N equal ranges based on the min and max value found in the field. I can't think of any way to do this that doesn't require two querries -- one to get the min and max (within the current search set), then calculate the ranges client-side (possibly making the boundaries 'nice' numbers instead of strictly equal ranges), then do another query with the calculated facet.queries set. Is there any other trick I'm missing here? If there were date values, you could possibly use facet.date.gap, although I'm not even sure if that works without explicitly setting the facet.date.start, not sure if you can leave facet.date.start unset meaning the minimum value in the field or not. But I'm not dealing with dates here anyway, but with integers. So anything I'm missing, or just have the client do two queries? For that matter, is there an easy way to ask for minimum and maximum values in a field, within a result set? Thanks for any advice, Jonathan
RE: Problem with Wildcard searches in Solr
Hi, The DisMaxQParser does not support wildcards in its q parameter [1]. You must use the LuceneQParser instead. AFAIK, in DisMax, wildcards are part of the search query and may get filtered out in your query analyzer. [1]: http://wiki.apache.org/solr/DisMaxRequestHandler#q Cheers, -Original message- From: imranak imranak...@gmail.com Sent: Mon 12-07-2010 22:40 To: solr-user@lucene.apache.org; Subject: Problem with Wildcard searches in Solr Hi, I am having a problem doing wildcard searches in lucene syntax using the edismax handler. I have Solr 4.0 nightly build from the trunk. A general search like 'computer' returns results but 'com*er' doesn't return any results. Similary, a search like 'co?mput?r' returns no results. The only type of wildcard searches working currrently is ones with trailing wildcards(like compute? or comput*). I want to be able to do searches with wildcards at the beginning (*puter) and in between (com*er). Could someone please tell me what I am doing wrong and how to fix it. Thanks. Regards, Imran. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961448.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problem with Wildcard searches in Solr
Hi, Thanks for you response. The dismax query parser doesn't support it but I heard the edismax parser supports all kinds of wildcards. Been trying it out but without any luck. Could someone please help me with that. I'm unable to make leading and in-the-middle wildcard searches work. Thanks. Imran. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961617.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problem with Wildcard searches in Solr
Hi, Check edismax' JIRA page and its unresolved related issues [1]. AFAIK, it hasn't been committed yet. [1]: https://issues.apache.org/jira/browse/SOLR-1553 Cheers, -Original message- From: imranak imranak...@gmail.com Sent: Mon 12-07-2010 23:55 To: solr-user@lucene.apache.org; Subject: RE: Problem with Wildcard searches in Solr Hi, Thanks for you response. The dismax query parser doesn't support it but I heard the edismax parser supports all kinds of wildcards. Been trying it out but without any luck. Could someone please help me with that. I'm unable to make leading and in-the-middle wildcard searches work. Thanks. Imran. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961617.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with Wildcard searches in Solr
On Mon, Jul 12, 2010 at 4:39 PM, imranak imranak...@gmail.com wrote: A general search like 'computer' returns results but 'com*er' doesn't return any results. This is due to issues with wildcards and stemming. computer is indexed and searched as comput... but it's not generally possible to stem wildcarded terms. So comp*er won't match (the terms in the index are comput) but comp*r should. If wildcarding is important, use a field type without a stemmer. -Yonik http://www.lucidimagination.com
Re: Two analyzer per field
Could you handle this with the Dismax query handler? You could specify that the search boost the keyword-analyzed field quite high if you wanted those documents to come up at the top of the list If this doesn't help, could you elaborate on the use-case? In particular I'm wondering why you want to use the keyword analyser in the first place. Best Erick On Mon, Jul 12, 2010 at 4:36 AM, Mark N nipen.m...@gmail.com wrote: Is it possible to specify two analyzers per fields for example , consider a field *F1 *( keyword analyzer) = cheers mate *F2 *(keyword analyzer ) = hello world There is also a copy field *TEXT *( standard analyzer ) which will store the terms { cheers mate hello world } now when user perform any search we will be looking at copy field TEXT only which uses standard analyzer . Suppose user search hello word phrase it will not return any result as hello and world terms are tokenized . is it possible that I index hello world as it is as well in to *TEXT*field ? i.e can I use keyword analyzer as well and standard analyzer for field TEXT what should be better approach to handle this situation ? -- Nipen Mark
Re: indexing with pdf files problem
You need to use the ExtractingRequestHandler to parse these kinds of files. solr/update only takes a fixed XML format and a custom binary format. http://wiki.apache.org/solr/ExtractingRequestHandler On Mon, Jul 12, 2010 at 3:57 AM, satya swaroop sswaro...@gmail.com wrote: hi all, i am working with solr on tomcat. the indexing is good for xml files but when i send the docs or html files or pdf's through curl i get the error as lazy error. can u telll me the way. the output is as follows when i send a pdf file i am working in ubuntu. solr home is /opt/example tomcat is /opt/tomcat6 htmlheadtitleApache Tomcat/6.0.26 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.solr.common.SolrException: java.lang.NullPointerException at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:76) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) ... 16 more Caused by: java.lang.NullPointerException at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:73) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:99) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:84) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:61) at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:74) ... 17 more /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b ulazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at
Re: Supplementing already indexed data
There are two ways to interpret your mail: 1) you want to add the content to documents already in the index It is not possible to update some fields in a document. You have to delete and re-index the entire document. 2) you want to read a database record, use a file name, fetch that file and index both database fields and the file content in one document. The DataImportHandler would let you read fields from the database, use fields as file names and load in those files into other fields. This is an advanced use but it might be covered on the DIH page: http://wiki.apache.org/solr/DataImportHandler Look for FileDataSource and FieldReaderDataSource. On Sun, Jul 11, 2010 at 6:37 PM, Tod listac...@gmail.com wrote: I'm getting metadata from a RDB but the actual content is stored somewhere else. I'd like to index the content too but I don't want to overlay the already indexed metadata. I know this can be done but I just can't seem to dig up the correct docs, can anyone point me in the right direction? Thanks. -- Lance Norskog goks...@gmail.com