Re: Dynamically loading xml files from webapplication to index
You need to write some script using solrj or some other connector to parse your data file and post to solr for indexing - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Dynamically-loading-xml-files-from-webapplication-to-index-tp2865890p2873608.html Sent from the Solr - User mailing list archive at Nabble.com.
XS DateTime format
Hi, I just have a small question regarding the output format of fields of type TrieDateField. If a document containing the date 0001-01-01T01.01.01Z is passed to Solr and I then try to search for that document the output of the date field is of format Y-MM-DDThh:mm:ssZ. The first three zeros are missing. According to XML specification found on w3.org in XS DateTime is a four-or-more digit optionally negative-signed numeral that represents the year. Is it intentional that Solr strips leading zeros for the first four digits? Thanks Jens Jørgen Flaaris
fq parameter with partial value
Hello, I would like to know if there is a way to use the fq parameter with a partial value. For instance, if I have a request with fq=NAME:Joe, and I would like to retrieve all answers where NAME contains Joe, including those with NAME = Joe Smith. Thanks, Elisabeth
Re: fq parameter with partial value
Hi Elisabeth, that's not what FilterQueries are made for :) What against using that Criteria in the Query? Perhaps you want to describe your UseCase and we'll see if there's another way to solve it? Regards Stefan On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I would like to know if there is a way to use the fq parameter with a partial value. For instance, if I have a request with fq=NAME:Joe, and I would like to retrieve all answers where NAME contains Joe, including those with NAME = Joe Smith. Thanks, Elisabeth
Spatial Search
Dear list :) I am new to solr and try to use the spatial search feature which was added in 3.1. In my schema.xml I have 2 double fields for latitude and longitude. How can I get them into the location field type? I use solrj to fill the index with data. If I would use a location field instead of two double fields, how could I fill this with solrj? I use annotations to link the data from my dto´s to the index fields... Hope you got my problem... best regards, Jonas
Re: fq parameter with partial value
Hi Stefan, Thanks for answering. In more details, my problem is the following. I'm working on searching points of interest (POIs), which can be hotels, restaurants, plumbers, psychologists, etc. Those POIs can be identified among other things by categories or by brand. And a single POIs might have different categories (no maximum number). User might enter a query like McDonald’s Paris or Restaurant Paris or many other possible queries First I want to do a facet search on brand and categories, to find out which case is the current case. http://localhost:8080/solr /select?q=restaurant paris facet=truefacet.field=BRAND facet.field=CATEGORY and get an answer like lst name=facet_fields lst name=CATEGORY int name=Restaurant598/int int name=Restaurant Hotel451/int Then I want to send a request with fq= CATEGORY: Restaurant and still get answers with CATEGORY= Restaurant Hotel. One solution would be to modify the data to add a new document every time we have a new category, so a POI with three different categories would be index three times, each time with a different category. But I was wondering if there was another way around. Thanks again, Elisabeth 2011/4/28 Stefan Matheis matheis.ste...@googlemail.com Hi Elisabeth, that's not what FilterQueries are made for :) What against using that Criteria in the Query? Perhaps you want to describe your UseCase and we'll see if there's another way to solve it? Regards Stefan On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I would like to know if there is a way to use the fq parameter with a partial value. For instance, if I have a request with fq=NAME:Joe, and I would like to retrieve all answers where NAME contains Joe, including those with NAME = Joe Smith. Thanks, Elisabeth
how to update database record after indexing
Hello, i am using dataimporthandler to import data from sql server database. my requirement is when solr completed indexing on particular database record i want to update that record in database or after indexing all records if i can get all ids and update all records how to achieve same ? Thanks Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-update-database-record-after-indexing-tp2874171p2874171.html Sent from the Solr - User mailing list archive at Nabble.com.
manual background re-indexing
Hello list, I am planning to implement a setup, to be run on unix scripts, that should perform a full pull-and-reindex in a background server and index then deploy that index. All should happen on the same machine. I thought the replication methods would help me but they seem to rather solve the issues of distribution while, what I need, is only the ability to: - suspend the queries - swap the directories with the new index - close all searchers - reload and warm-up the searcher on the new index Is there a part of the replication utilities (http or unix) that I could use to perform the above tasks? I intend to do this on occasion... maybe once a month or even less. Is reload the right term to be used? paul
Re: Formatted date/time in long field and javabinRW exception
Any thoughts on this one? Why does Solr output a string in a long field with XMLResponseWriter but fails doing so (as it should) with the javabin format? On Tuesday 19 April 2011 10:52:33 Markus Jelsma wrote: Hi, Nutch 1.3-dev seems to have changed its tstamp field from a long to a properly formatted Solr readable date/time but the example Solr schema for Nutch still configures the tstamp field as a long. This results in a formatted date/time in a long field, which i think should not be allowed in the first place by Solr. long name=tstamp2011-04-19T08:16:31.675Z/long While the above is strange enough, i only found out it's all wrong when using the javabin format. The following query will throw an exception while using XML response writer works find and returns the tstamp as long but formatted as a proper date/time. javabin: curl http://localhost:8983/solr/select?fl=id,boost,tstamp,digeststart=0q=id: \[*+TO+*\]wt=javabinrows=2version=1 Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver getDoc WARNING: Error reading a field from document : SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}] java.lang.NumberFormatException: For input string: 2011-04-19T08:16:31.675Z at java.lang.NumberFormatException.forInputString(NumberFormatException.java:4 8) at java.lang.Long.parseLong(Long.java:419) at java.lang.Long.valueOf(Long.java:525) at org.apache.solr.schema.LongField.toObject(LongField.java:82) at org.apache.solr.schema.LongField.toObject(LongField.java:33) at org.apache.solr.request.BinaryResponseWriter$Resolver.getDoc(BinaryResponse Writer.java:148) at org.apache.solr.request.BinaryResponseWriter$Resolver.writeDocList(BinaryRe sponseWriter.java:124) at org.apache.solr.request.BinaryResponseWriter$Resolver.resolve(BinaryRespons eWriter.java:88) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:143) at org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:1 33) at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:2 21) at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:138) at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:87) at org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWriter.jav a:48) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter .java:322) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java :254) more trace from Jetty Here's the wt=xml working fine and showing output for the tstamp field: markus@midas:~$ curl http://localhost:8983/solr/select?fl=id,boost,tstamp,digeststart=0q=id: \[*+TO+*\]wt=xmlrows=2version=1 ?xml version=1.0 encoding=UTF-8? response responseHeaderstatus0/statusQTime17/QTime lst name=params str name=flid,boost,tstamp,digest/str str name=start0/str str name=qid:[* TO *]/str str name=wtxml/str str name=rows2/str str name=version1/str /lst/responseHeader result name=response numFound=2 start=0 doc str name=digest478e77f99f7005ae71aa92a879be2fd4/str str name=ididfield/str long name=tstamp2011-04-19T08:16:31.689Z/long /doc doc str name=digest7ff92a31c58e43a34fd45bc6d87cda03/str str name=ididfield/str long name=tstamp2011-04-19T08:16:31.675Z/long /doc /result Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: manual background re-indexing
Hi Paul Would a multi-core set up and the swap command do what you want it to do? http://wiki.apache.org/solr/CoreAdmin Shaun On 28 April 2011 12:49, Paul Libbrecht p...@hoplahup.net wrote: Hello list, I am planning to implement a setup, to be run on unix scripts, that should perform a full pull-and-reindex in a background server and index then deploy that index. All should happen on the same machine. I thought the replication methods would help me but they seem to rather solve the issues of distribution while, what I need, is only the ability to: - suspend the queries - swap the directories with the new index - close all searchers - reload and warm-up the searcher on the new index Is there a part of the replication utilities (http or unix) that I could use to perform the above tasks? I intend to do this on occasion... maybe once a month or even less. Is reload the right term to be used? paul
Re: fq parameter with partial value
So, I assume your CATEGORY field is multiValued but each value is not broken up into tokens, right? If that's the case, would it work to have a second field CATEGORY_TOKENIZED and run your fq against that field instead? You could have this be a multiValued field with an increment gap if you wanted to prevent matches across separate entries and have your fq do a proximity search where the proximity was less than the increment gap Best Erick On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hi Stefan, Thanks for answering. In more details, my problem is the following. I'm working on searching points of interest (POIs), which can be hotels, restaurants, plumbers, psychologists, etc. Those POIs can be identified among other things by categories or by brand. And a single POIs might have different categories (no maximum number). User might enter a query like McDonald’s Paris or Restaurant Paris or many other possible queries First I want to do a facet search on brand and categories, to find out which case is the current case. http://localhost:8080/solr /select?q=restaurant paris facet=truefacet.field=BRAND facet.field=CATEGORY and get an answer like lst name=facet_fields lst name=CATEGORY int name=Restaurant598/int int name=Restaurant Hotel451/int Then I want to send a request with fq= CATEGORY: Restaurant and still get answers with CATEGORY= Restaurant Hotel. One solution would be to modify the data to add a new document every time we have a new category, so a POI with three different categories would be index three times, each time with a different category. But I was wondering if there was another way around. Thanks again, Elisabeth 2011/4/28 Stefan Matheis matheis.ste...@googlemail.com Hi Elisabeth, that's not what FilterQueries are made for :) What against using that Criteria in the Query? Perhaps you want to describe your UseCase and we'll see if there's another way to solve it? Regards Stefan On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I would like to know if there is a way to use the fq parameter with a partial value. For instance, if I have a request with fq=NAME:Joe, and I would like to retrieve all answers where NAME contains Joe, including those with NAME = Joe Smith. Thanks, Elisabeth
Re: how to update database record after indexing
I don't think you can do this through DIH, you'll probably have to write a separate process that queries the Solr index and updates your table. You'll have to be a bit cautious that you coordinate the commits, that is wait for the DIH to complete and commit before running your separate db update process. Best Erick On Thu, Apr 28, 2011 at 6:59 AM, vrpar...@gmail.com vrpar...@gmail.com wrote: Hello, i am using dataimporthandler to import data from sql server database. my requirement is when solr completed indexing on particular database record i want to update that record in database or after indexing all records if i can get all ids and update all records how to achieve same ? Thanks Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-update-database-record-after-indexing-tp2874171p2874171.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spatial Search
On Thu, Apr 28, 2011 at 5:15 AM, Jonas Lanzendörfer jonas.lanzendoer...@affinitas.de wrote: I am new to solr and try to use the spatial search feature which was added in 3.1. In my schema.xml I have 2 double fields for latitude and longitude. How can I get them into the location field type? I use solrj to fill the index with data. If I would use a location field instead of two double fields, how could I fill this with solrj? I use annotations to link the data from my dto´s to the index fields... I've not used the annotation stuff in SolrJ, but since the value sent in must be of the for 10.3,20.4 then I guess one would have to have a String field with this value on your object. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: manual background re-indexing
Just where to do I put the new index data with such a command? Simply replacing the segment files appears dangerous to me. Also, what is the best practice to move from single-core to multi-core? My current set-up is single-core, do I simply need to add a solr.xml in my solr-home and one core1 directory with the data that was there previously? paul Le 28 avr. 2011 à 14:04, Shaun Campbell a écrit : Hi Paul Would a multi-core set up and the swap command do what you want it to do? http://wiki.apache.org/solr/CoreAdmin Shaun On 28 April 2011 12:49, Paul Libbrecht p...@hoplahup.net wrote: Hello list, I am planning to implement a setup, to be run on unix scripts, that should perform a full pull-and-reindex in a background server and index then deploy that index. All should happen on the same machine. I thought the replication methods would help me but they seem to rather solve the issues of distribution while, what I need, is only the ability to: - suspend the queries - swap the directories with the new index - close all searchers - reload and warm-up the searcher on the new index Is there a part of the replication utilities (http or unix) that I could use to perform the above tasks? I intend to do this on occasion... maybe once a month or even less. Is reload the right term to be used? paul
Re: manual background re-indexing
It would probable be safest just to set up a separate system as multi-core from the start, get the process working and then either use the new machine or copy the whole setup to the production machine. Best Erick On Thu, Apr 28, 2011 at 8:49 AM, Paul Libbrecht p...@hoplahup.net wrote: Just where to do I put the new index data with such a command? Simply replacing the segment files appears dangerous to me. Also, what is the best practice to move from single-core to multi-core? My current set-up is single-core, do I simply need to add a solr.xml in my solr-home and one core1 directory with the data that was there previously? paul Le 28 avr. 2011 à 14:04, Shaun Campbell a écrit : Hi Paul Would a multi-core set up and the swap command do what you want it to do? http://wiki.apache.org/solr/CoreAdmin Shaun On 28 April 2011 12:49, Paul Libbrecht p...@hoplahup.net wrote: Hello list, I am planning to implement a setup, to be run on unix scripts, that should perform a full pull-and-reindex in a background server and index then deploy that index. All should happen on the same machine. I thought the replication methods would help me but they seem to rather solve the issues of distribution while, what I need, is only the ability to: - suspend the queries - swap the directories with the new index - close all searchers - reload and warm-up the searcher on the new index Is there a part of the replication utilities (http or unix) that I could use to perform the above tasks? I intend to do this on occasion... maybe once a month or even less. Is reload the right term to be used? paul
Re: fq parameter with partial value
yes, the multivalued field is not broken up into tokens. so, if I understand well what you mean, I could have a field CATEGORY with multiValued=true a field CATEGORY_TOKENIZED with multiValued= true and then some POI field name=NAMEPOI_Name/field ... field name=*CATEGORY*Restaurant Hotel/field field name=CATEGORY_TOKENIZEDRestaurant/field field name=CATEGORY_TOKENIZEDHotel/field do faceting on CATEGORY and fq on CATEGORY_TOKENIZED. But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED? Best regards Elisabeth 2011/4/28 Erick Erickson erickerick...@gmail.com So, I assume your CATEGORY field is multiValued but each value is not broken up into tokens, right? If that's the case, would it work to have a second field CATEGORY_TOKENIZED and run your fq against that field instead? You could have this be a multiValued field with an increment gap if you wanted to prevent matches across separate entries and have your fq do a proximity search where the proximity was less than the increment gap Best Erick On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hi Stefan, Thanks for answering. In more details, my problem is the following. I'm working on searching points of interest (POIs), which can be hotels, restaurants, plumbers, psychologists, etc. Those POIs can be identified among other things by categories or by brand. And a single POIs might have different categories (no maximum number). User might enter a query like McDonald’s Paris or Restaurant Paris or many other possible queries First I want to do a facet search on brand and categories, to find out which case is the current case. http://localhost:8080/solr /select?q=restaurant paris facet=truefacet.field=BRAND facet.field=CATEGORY and get an answer like lst name=facet_fields lst name=CATEGORY int name=Restaurant598/int int name=Restaurant Hotel451/int Then I want to send a request with fq= CATEGORY: Restaurant and still get answers with CATEGORY= Restaurant Hotel. One solution would be to modify the data to add a new document every time we have a new category, so a POI with three different categories would be index three times, each time with a different category. But I was wondering if there was another way around. Thanks again, Elisabeth 2011/4/28 Stefan Matheis matheis.ste...@googlemail.com Hi Elisabeth, that's not what FilterQueries are made for :) What against using that Criteria in the Query? Perhaps you want to describe your UseCase and we'll see if there's another way to solve it? Regards Stefan On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I would like to know if there is a way to use the fq parameter with a partial value. For instance, if I have a request with fq=NAME:Joe, and I would like to retrieve all answers where NAME contains Joe, including those with NAME = Joe Smith. Thanks, Elisabeth
RE: fq parameter with partial value
Yep, what you describe is what I do in similar situations, it works fine. It is certainly possible to facet on a tokenized field... but your individual facet values will be the _tokens_, not the complete values. And they'll be the post-analyzed tokens at that. Which is rarely what you want. Thus the use of two fields, one tokenized and analyzed, one not tokenized and minimimally analzyed (for instance, not stemmed). From: elisabeth benoit [elisaelisael...@gmail.com] Sent: Thursday, April 28, 2011 9:03 AM To: solr-user@lucene.apache.org Subject: Re: fq parameter with partial value yes, the multivalued field is not broken up into tokens. so, if I understand well what you mean, I could have a field CATEGORY with multiValued=true a field CATEGORY_TOKENIZED with multiValued= true and then some POI field name=NAMEPOI_Name/field ... field name=*CATEGORY*Restaurant Hotel/field field name=CATEGORY_TOKENIZEDRestaurant/field field name=CATEGORY_TOKENIZEDHotel/field do faceting on CATEGORY and fq on CATEGORY_TOKENIZED. But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED? Best regards Elisabeth 2011/4/28 Erick Erickson erickerick...@gmail.com So, I assume your CATEGORY field is multiValued but each value is not broken up into tokens, right? If that's the case, would it work to have a second field CATEGORY_TOKENIZED and run your fq against that field instead? You could have this be a multiValued field with an increment gap if you wanted to prevent matches across separate entries and have your fq do a proximity search where the proximity was less than the increment gap Best Erick On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hi Stefan, Thanks for answering. In more details, my problem is the following. I'm working on searching points of interest (POIs), which can be hotels, restaurants, plumbers, psychologists, etc. Those POIs can be identified among other things by categories or by brand. And a single POIs might have different categories (no maximum number). User might enter a query like McDonald’s Paris or Restaurant Paris or many other possible queries First I want to do a facet search on brand and categories, to find out which case is the current case. http://localhost:8080/solr /select?q=restaurant paris facet=truefacet.field=BRAND facet.field=CATEGORY and get an answer like lst name=facet_fields lst name=CATEGORY int name=Restaurant598/int int name=Restaurant Hotel451/int Then I want to send a request with fq= CATEGORY: Restaurant and still get answers with CATEGORY= Restaurant Hotel. One solution would be to modify the data to add a new document every time we have a new category, so a POI with three different categories would be index three times, each time with a different category. But I was wondering if there was another way around. Thanks again, Elisabeth 2011/4/28 Stefan Matheis matheis.ste...@googlemail.com Hi Elisabeth, that's not what FilterQueries are made for :) What against using that Criteria in the Query? Perhaps you want to describe your UseCase and we'll see if there's another way to solve it? Regards Stefan On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I would like to know if there is a way to use the fq parameter with a partial value. For instance, if I have a request with fq=NAME:Joe, and I would like to retrieve all answers where NAME contains Joe, including those with NAME = Joe Smith. Thanks, Elisabeth
boost fields which have value
Hi, How can I achieve that documents which don't have field1 and field2 filled in, are returned in the end of the search result. I have tried with *bf* parameter, which seems to work but just with one field. Is there any function query which I can use in bf value to boost two fields? Thank you. Regards, Zoltan
Boost newer documents only if date is different from timestamp
I am trying to boost newer documents in Solr queries. The ms function http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents seems to be the right way to go, but I need to add an additional condition: I am using the last-Modified-Date from crawled web pages as the date to consider, and that does not always provide a meaningful date. Therefore I would like the function to only boost documents where the date (not time) found in the last-Modified-Date is different from the timestamp, eliminating results that just return the current date as the last-Modified-Date. Suggestions are appreciated!
Searching for escaped characters
I'm trying to create a test to make sure that character sequences like egrave; are successfully converted to their equivalent utf character (that is, in this case, è). So, I'd like to search my solr index using the equivalent of the following regular expression: \w{1,6}; To find any escaped sequences that might have slipped through. Is this possible? I have indexed these fields with text_lu, which looks like this: fieldtype name=text_lu class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype Thanks, Paul
Re: Concatenate multivalued DIH fields
I solved this problem using the flatten=true attribute. Given this schema people person names name firstNameJoe/firstName lastNameSmith/firstName /name /names /person /people field column=attr_names xpath=/people/person/names/name flatten=true / attr_names is a multiValued field in my schema.xml. The flatten attribute tells solr to take all the text from the specified node and below. -- View this message in context: http://lucene.472066.n3.nabble.com/Concatenate-multivalued-DIH-fields-tp2749988p2875435.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: boost fields which have value
I believe the sortMissingLast fieldtype attribute is what you want: fieldType ... sortMissingLast=true ... / http://wiki.apache.org/solr/SchemaXml -Original Message- From: Zoltán Altfatter [mailto:altfatt...@gmail.com] Sent: Thursday, April 28, 2011 6:11 AM To: solr-user@lucene.apache.org Subject: boost fields which have value Hi, How can I achieve that documents which don't have field1 and field2 filled in, are returned in the end of the search result. I have tried with *bf* parameter, which seems to work but just with one field. Is there any function query which I can use in bf value to boost two fields? Thank you. Regards, Zoltan
Re: Searching for escaped characters
StandardTokenizer will have stripped punctuation I think. You might try searching for all the entity names though: (agrave | egrave | omacron | etc... ) The names are pretty distinctive. Although you might have problems with greek letters. -Mike On 04/28/2011 12:10 PM, Paul wrote: I'm trying to create a test to make sure that character sequences like egrave; are successfully converted to their equivalent utf character (that is, in this case, è). So, I'd like to search my solr index using the equivalent of the following regular expression: \w{1,6}; To find any escaped sequences that might have slipped through. Is this possible? I have indexed these fields with text_lu, which looks like this: fieldtype name=text_lu class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype Thanks, Paul
Re: SolrQuery#setStart(Integer) ???
Hi Erick, Correct, i cut some zeros while reading the javadocs, thanks for the heads up! [ ]'s Leonardo da S. Souza °v° Linux user #375225 /(_)\ http://counter.li.org/ ^ ^ On Wed, Apr 27, 2011 at 8:13 PM, Erick Erickson erickerick...@gmail.comwrote: Well, the java native int fomat is 32 bits, so unless you're returning over 2 billion documents, you should be OK. But you'll run into other issues long before you get to that range. Best Erick On Wed, Apr 27, 2011 at 5:25 PM, Leonardo Souza leonardo...@gmail.com wrote: Hi Guys, We have an index with more than 3 millions documents, we use the pagination feature through SolrQuery#setStart and SolrQuery#setRows methods. Some queries can return a huge amount of documents and i'm worry about the integer parameter of the setStart method, this parameter should be a long don't you think? For now i'm considering to use the ModifiableSolrParams class. Any suggestion is welcome! thanks! [ ]'s Leonardo Souza °v° Linux user #375225 /(_)\ http://counter.li.org/ ^ ^
Re: Replicaiton Fails with Unreachable error when master host is responding.
Anybody? On 04/27/2011 01:51 PM, Jed Glazner wrote: Hello All, I'm having a very strange problem that I just can't figure out. The slave is not able to replicate from the master, even though the master is reachable from the slave machine. I can telnet to the port it's running on, I can use text based browsers to navigate the master from the slave. I just don't understand why it won't replicate. The admin screen gives me an Unreachable in the status, and in the log there is an exception thrown. Details below: BACKGROUND: OS: Arch Linux Solr Version: svn revision 1096983 from https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/ No custom plugins, just whatever came with the version above. Java Setup: java version "1.6.0_22" OpenJDK Runtime Environment (IcedTea6 1.10) (ArchLinux-6.b22_1.10-1-x86_64) OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode) We have 3 cores running, all 3 cores are not able to replicate. The admin on the slave shows the Master as http://solr-master-01_dev.la.bo:8983/solr/music/replication - *Unreachable* Replicaiton def on the slave 529 requestHandler name="/replication" class="solr.ReplicationHandler" 530 lst name="${slave:slave}" 531 str name="masterUrl"http://solr-master-01_dev.la.bo:8983/solr/music/replication/str 532 str name="pollInterval"00:15:00/str 533 /lst 534 /requestHandler Replication def on the master: 529 requestHandler name="/replication" class="solr.ReplicationHandler" 530 lst name="${master:master}" 531 str name="replicateAfter"commit/str 532 str name="replicateAfter"startup/str 533 str name="confFiles"schema.xml,stopwords.txt/str 534 /lst 535 /requestHandler Below is the log start to finish for replication attempts, note that it says connection refused, however, I can telnet to 8983 from the slave to the master, so I know it's up and reachable from the slave: telnet solr-master-01_dev.la.bo 8983 Trying 172.12.65.58... Connected to solr-master-01_dev.la.bo. Escape character is '^]'. I double checked the master to make sure that it didn't have replication turned off, and it's not. So I should be able to replicate but it can't. I just dont' know what else to check. The log from the slave is below. Apr 27, 2011 7:39:45 PM org.apache.solr.request.SolrQueryResponse init WARNING: org.apache.solr.request.SolrQueryResponse is deprecated. Please use the corresponding class in org.apache.solr.response Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.ConnectException) caught when processing request: Connection refused Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.ConnectException) caught when processing request: Connection refused Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.ConnectException) caught when processing request: Connection refused Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request Apr 27, 2011 7:39:45 PM org.apache.solr.handler.ReplicationHandler getReplicationDetails WARNING: Exception while invoking 'details' method for replication on master java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at
Re: Replicaiton Fails with Unreachable error when master host is responding.
No clue. Try wireshark to gather more data? On 04/28/2011 02:53 PM, Jed Glazner wrote: Anybody? On 04/27/2011 01:51 PM, Jed Glazner wrote: Hello All, I'm having a very strange problem that I just can't figure out. The slave is not able to replicate from the master, even though the master is reachable from the slave machine. I can telnet to the port it's running on, I can use text based browsers to navigate the master from the slave. I just don't understand why it won't replicate. The admin screen gives me an Unreachable in the status, and in the log there is an exception thrown. Details below: BACKGROUND: OS: Arch Linux Solr Version: svn revision 1096983 from https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/ No custom plugins, just whatever came with the version above. Java Setup: java version 1.6.0_22 OpenJDK Runtime Environment (IcedTea6 1.10) (ArchLinux-6.b22_1.10-1-x86_64) OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode) We have 3 cores running, all 3 cores are not able to replicate. The admin on the slave shows the Master as http://solr-master-01_dev.la.bo:8983/solr/music/replication - *Unreachable* Replicaiton def on the slave 529requestHandler name=/replication class=solr.ReplicationHandler 530lst name=${slave:slave} 531str name=masterUrlhttp://solr-master-01_dev.la.bo:8983/solr/music/replication/str 532str name=pollInterval00:15:00/str 533/lst 534/requestHandler Replication def on the master: 529requestHandler name=/replication class=solr.ReplicationHandler 530lst name=${master:master} 531str name=replicateAftercommit/str 532str name=replicateAfterstartup/str 533str name=confFilesschema.xml,stopwords.txt/str 534/lst 535/requestHandler Below is the log start to finish for replication attempts, note that it says connection refused, however, I can telnet to 8983 from the slave to the master, so I know it's up and reachable from the slave: telnet solr-master-01_dev.la.bo 8983 Trying 172.12.65.58... Connected to solr-master-01_dev.la.bo. Escape character is '^]'. I double checked the master to make sure that it didn't have replication turned off, and it's not. So I should be able to replicate but it can't. I just dont' know what else to check. The log from the slave is below. Apr 27, 2011 7:39:45 PM org.apache.solr.request.SolrQueryResponseinit WARNING: org.apache.solr.request.SolrQueryResponse is deprecated. Please use the corresponding class in org.apache.solr.response Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.ConnectException) caught when processing request: Connection refused Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.ConnectException) caught when processing request: Connection refused Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (java.net.ConnectException) caught when processing request: Connection refused Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request Apr 27, 2011 7:39:45 PM org.apache.solr.handler.ReplicationHandler getReplicationDetails WARNING: Exception while invoking 'details' method for replication on master java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at
Re: fq parameter with partial value
See below: On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit elisaelisael...@gmail.com wrote: yes, the multivalued field is not broken up into tokens. so, if I understand well what you mean, I could have a field CATEGORY with multiValued=true a field CATEGORY_TOKENIZED with multiValued= true and then some POI field name=NAMEPOI_Name/field ... field name=*CATEGORY*Restaurant Hotel/field field name=CATEGORY_TOKENIZEDRestaurant/field field name=CATEGORY_TOKENIZEDHotel/field [EOE] If the above is the document you're sending, then no. The document would be indexed with field name=*CATEGORY*Restaurant Hotel/field field name=CATEGORY_TOKENIZEDRestaurant Hotel/field Or even just: field name=*CATEGORY*Restaurant Hotel/field and set up a copyField to copy the value from CATEGORY to CATEGORY_TOKENIZED. The multiValued part comes from: And a single POIs might have different categories so your document could have which would look like: field name=CATEGORYRestaruant Hotel/field field name=CATEGORYHealth Spa/field field name=CATEGORYDance Hall/field and your document would be counted for each of those entries while searches against CATEGORY_TOKENIZED would match things like dance spa etc. But do notice that if you did NOT want searching for restaurant hall (no quotes), to match then you could do proximity searches for less than your increment gap. e.g. (this time with the quotes) would be restaurant hall~50, which would then NOT match if your increment gap were 100. Best Erick do faceting on CATEGORY and fq on CATEGORY_TOKENIZED. But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED? Best regards Elisabeth 2011/4/28 Erick Erickson erickerick...@gmail.com So, I assume your CATEGORY field is multiValued but each value is not broken up into tokens, right? If that's the case, would it work to have a second field CATEGORY_TOKENIZED and run your fq against that field instead? You could have this be a multiValued field with an increment gap if you wanted to prevent matches across separate entries and have your fq do a proximity search where the proximity was less than the increment gap Best Erick On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hi Stefan, Thanks for answering. In more details, my problem is the following. I'm working on searching points of interest (POIs), which can be hotels, restaurants, plumbers, psychologists, etc. Those POIs can be identified among other things by categories or by brand. And a single POIs might have different categories (no maximum number). User might enter a query like McDonald’s Paris or Restaurant Paris or many other possible queries First I want to do a facet search on brand and categories, to find out which case is the current case. http://localhost:8080/solr /select?q=restaurant paris facet=truefacet.field=BRAND facet.field=CATEGORY and get an answer like lst name=facet_fields lst name=CATEGORY int name=Restaurant598/int int name=Restaurant Hotel451/int Then I want to send a request with fq= CATEGORY: Restaurant and still get answers with CATEGORY= Restaurant Hotel. One solution would be to modify the data to add a new document every time we have a new category, so a POI with three different categories would be index three times, each time with a different category. But I was wondering if there was another way around. Thanks again, Elisabeth 2011/4/28 Stefan Matheis matheis.ste...@googlemail.com Hi Elisabeth, that's not what FilterQueries are made for :) What against using that Criteria in the Query? Perhaps you want to describe your UseCase and we'll see if there's another way to solve it? Regards Stefan On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I would like to know if there is a way to use the fq parameter with a partial value. For instance, if I have a request with fq=NAME:Joe, and I would like to retrieve all answers where NAME contains Joe, including those with NAME = Joe Smith. Thanks, Elisabeth
Re: Extra facet query from within a custom search component
Have you looked at: http://wiki.apache.org/solr/TermsComponent? Best Erick On Thu, Apr 28, 2011 at 2:44 PM, Frederik Kraus frederik.kr...@gmail.com wrote: Hi Guys, I'm currently working on a custom search component and need to fetch a list of all possible values within a certain field. An internal facet (wildcard) query first came to mind, but I'm not quite sure how to best create and then execute such a query ... What would be the best way to do this? Can anyone please point me in the right direction? Thanks, Fred.
Problem with autogeneratePhraseQueries=false
Hi, I'm new to solr. My solr instance version is: Solr Specification Version: 3.1.0 Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 18:00:07 Lucene Specification Version: 3.1.0 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 Current Time: Tue Apr 26 08:01:09 CEST 2011 Server Start Time:Tue Apr 26 07:59:05 CEST 2011 I have following definition for textgen type: fieldType name=textgen class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 side=front preserveOriginal=1/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I'm using this type for name field in my index. As you can see I'm using autoGeneratePhraseQueries=false but for query sony vaio 4gb I'm getting following query in debug: lst name=debug str name=rawquerystringsony vaio 4gb/str str name=querystringsony vaio 4gb/str str name=parsedquery+name:sony +name:vaio +MultiPhraseQuery(name:(4gb 4) gb)/str str name=parsedquery_toString+name:sony +name:vaio +name:(4gb 4) gb/str Do you have any idea how can I avoid this MultiPhraseQuery? Best Regards, solr_beginner
Re: Problem with autogeneratePhraseQueries
Thank you very much for answer. You were right. There was no luceneMatchVersion in solrconfig.xml of our dev core. We thought that values not present in core configuration are copied from main solrconfig.xml. I will investigate if our administrators did something wrong during upgrade to 3.1. On Tue, Apr 26, 2011 at 1:35 PM, Robert Muir rcm...@gmail.com wrote: What do you have in solrconfig.xml for luceneMatchVersion? If you don't set this, then its going to default to Lucene 2.9 emulation so that old solr 1.4 configs work the same way. I tried your example and it worked fine here, and I'm guessing this is probably whats happening. the default in the example/solrconfig.xml looks like this: !-- Controls what version of Lucene various components of Solr adhere to. Generally, you want to use the latest version to get all bug fixes and improvements. It is highly recommended that you fully re-index after changing this setting as it can affect both how text is indexed and queried. -- luceneMatchVersionLUCENE_31/luceneMatchVersion On Tue, Apr 26, 2011 at 6:51 AM, Solr Beginner solr_begin...@onet.pl wrote: Hi, I'm new to solr. My solr instance version is: Solr Specification Version: 3.1.0 Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 18:00:07 Lucene Specification Version: 3.1.0 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 Current Time: Tue Apr 26 08:01:09 CEST 2011 Server Start Time:Tue Apr 26 07:59:05 CEST 2011 I have following definition for textgen type: fieldType name=textgen class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 side=front preserveOriginal=1/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I'm using this type for name field in my index. As you can see I'm using autoGeneratePhraseQueries=false but for query sony vaio 4gb I'm getting following query in debug: lst name=debug str name=rawquerystringsony vaio 4gb/str str name=querystringsony vaio 4gb/str str name=parsedquery+name:sony +name:vaio +MultiPhraseQuery(name:(4gb 4) gb)/str str name=parsedquery_toString+name:sony +name:vaio +name:(4gb 4) gb/str Do you have any idea how can I avoid this MultiPhraseQuery? Best Regards, solr_beginner
Dynamically loading xml files from webapplication to index
In our webapp, we need to upload a xml data file from the UI(dialogue box) for indexing we are not able to find the solution in documentation. plz suggest what is the way to implement it -- View this message in context: http://lucene.472066.n3.nabble.com/Dynamically-loading-xml-files-from-webapplication-to-index-tp2865890p2865890.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fieldCache only on stats page
Solr version: Solr Specification Version: 3.1.0 Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 18:00:07 Lucene Specification Version: 3.1.0 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58 Current Time: Wed Apr 27 14:28:34 CEST 2011 Server Start Time:Wed Apr 27 11:07:00 CEST 2011 According to cache I can see only following informations: CACHE name:fieldCache class: org.apache.solr.search.SolrFieldCacheMBean version: 1.0 description: Provides introspection of the Lucene FieldCache, this is **NOT** a cache that is managed by Solr. sourceid:$Id: SolrFieldCacheMBean.java 984594 2010-08-11 21:42:04Z yonik $ source: $URL: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/solr/src/java/org/apache/solr/search/SolrFieldCacheMBean.java $ name:fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) sourceid:$Id: FastLRUCache.java 1065312 2011-01-30 16:08:25Z rmuir $ source: $URL: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/solr/src/java/org/apache/solr/search/FastLRUCache.java $ Nothing about filterCache or documentCache ;/ Best Regards, Solr Beginner On Wed, Apr 27, 2011 at 2:00 PM, Erick Erickson erickerick...@gmail.com wrote: There's nothing special you need to do to be able to view the various stats from admin/stats.jsp. If another look doesn't show them, could you post a screenshot? And please include the version of Solr you're using, I checked with 1.4.1. Best Erick On Wed, Apr 27, 2011 at 1:44 AM, Solr Beginner solr_begin...@onet.pl wrote: Hi, I can see only fieldCache (nothing about filter, query or document cache) on stats page. What I'm doing wrong? We have two servers with replication. There are two cores(prod, dev) on each server. Maybe I have to add something to solrconfig.xml of cores? Best Regards, Solr Beginner
Re: Extra facet query from within a custom search component
Haaa fantastic! Thanks a lot! Fred. On Donnerstag, 28. April 2011 at 22:21, Erick Erickson wrote: Have you looked at: http://wiki.apache.org/solr/TermsComponent? Best Erick On Thu, Apr 28, 2011 at 2:44 PM, Frederik Kraus frederik.kr...@gmail.com wrote: Hi Guys, I'm currently working on a custom search component and need to fetch a list of all possible values within a certain field. An internal facet (wildcard) query first came to mind, but I'm not quite sure how to best create and then execute such a query ... What would be the best way to do this? Can anyone please point me in the right direction? Thanks, Fred.
Re: AlternateDistributedMLT.patch not working (SOLR-788)
On 2/23/2011 11:53 AM, Otis Gospodnetic wrote: Hi Isha, The patch is out of date. You need to look at the patch and rejection and update your local copy of the code to match the logic from the patch, if it's still applicable to the version of Solr source code you have. We have a need for distributed More Like This. We're gearing up for a deployment of 3.1, so a patch against 1.4.1 is not very useful for us. I've spent the last couple of days trying to rework both the original and the alternate patches on SOLR-788 to work against 3.1. I don't understand enough about the code to know how to fix it. I knew I had to change the value of PURPOSE_GET_MLT_RESULTS to 0x800 because of the conflict with PURPOSE_GET_TERMS, but the changes in MoreLikeThisComponent.java are beyond me. Thanks, Shawn
Re: Spatial Search
1) Create an extra String field on your bean as Yonik suggests or 2) Write an UpdateRequestHandler which reads the doubles and creates the LatLon from that -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 28. apr. 2011, at 14.44, Yonik Seeley wrote: On Thu, Apr 28, 2011 at 5:15 AM, Jonas Lanzendörfer jonas.lanzendoer...@affinitas.de wrote: I am new to solr and try to use the spatial search feature which was added in 3.1. In my schema.xml I have 2 double fields for latitude and longitude. How can I get them into the location field type? I use solrj to fill the index with data. If I would use a location field instead of two double fields, how could I fill this with solrj? I use annotations to link the data from my dto´s to the index fields... I've not used the annotation stuff in SolrJ, but since the value sent in must be of the for 10.3,20.4 then I guess one would have to have a String field with this value on your object. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Rép : Re: manual background re-indexing
It would probable be safest just to set up a separate system as multi-core from the start, get the process working and then either use the new machine or copy the whole setup to the production machine. On Thu, Apr 28, 2011 at 8:49 AM, Paul Libbrecht p...@hoplahup.net wrote: Just where to do I put the new index data with such a command? Simply replacing the segment files appears dangerous to me.Any idea where I should put the data directory before calling the reload command?paul
Re: Rép : Re: manual background re-indexing
You simply create two cores. One in solr/cores/core1 and another in solr/cores/core2 They each have a separate conf and data directory,and the index in in core#/data/index. Really, its' just introducing one more level. You can experiment just by configuring a core and copying your index to solr/cores/yourcore/data/index. After, of course, configuring Solr.xml to understand cores. Best Erick On Thu, Apr 28, 2011 at 7:27 PM, Paul Libbrecht p...@hoplahup.net wrote: It would probable be safest just to set up a separate system as multi-core from the start, get the process working and then either use the new machine or copy the whole setup to the production machine. On Thu, Apr 28, 2011 at 8:49 AM, Paul Libbrecht p...@hoplahup.net wrote: Just where to do I put the new index data with such a command? Simply replacing the segment files appears dangerous to me. Any idea where I should put the data directory before calling the reload command? paul
Location of Solr Logs
Hi, I am newbee to SOLR. Can you please help me to know where can see the logs written by SOLR? Is there any configuration required to see the logs of SOLR? Thanks for your time and help, Geeta **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. *
Can the Suggester be updated incrementally?
I'm interested in using Suggester (http://wiki.apache.org/solr/Suggester) for auto-complete on the field Document Title. Does Suggester (either FST, TST or Jaspell) support incremental updates? Say I want to add a new document title to the Suggester, or to change the weight of an existing document title, would I need to rebuild the entire tree for every update? Also, can the Suggester be sharded? If the size of the tree gets bigger than the RAM size, is it possible to shard the Suggester across multiple machines? Thanks Andy
Re: Can the Suggester be updated incrementally?
It's answered on the wiki site: TSTLookup - ternary tree based representation, capable of immediate data structure updates Although the EdgeNGram technique is probably more widely adopted, eg, it's closer to what Google has implemented. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ On Thu, Apr 28, 2011 at 9:37 PM, Andy angelf...@yahoo.com wrote: I'm interested in using Suggester (http://wiki.apache.org/solr/Suggester) for auto-complete on the field Document Title. Does Suggester (either FST, TST or Jaspell) support incremental updates? Say I want to add a new document title to the Suggester, or to change the weight of an existing document title, would I need to rebuild the entire tree for every update? Also, can the Suggester be sharded? If the size of the tree gets bigger than the RAM size, is it possible to shard the Suggester across multiple machines? Thanks Andy
Re: Question on Batch process
Charles, Maybe the question to ask is why you are committing at all? Do you need somebody to see index changes while you are indexing? If not, commit just at the end. And optimize if you won't touch the index for a while. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Charles Wardell charles.ward...@bcsolution.com To: solr-user@lucene.apache.org Sent: Wed, April 27, 2011 7:51:20 PM Subject: Re: Question on Batch process Thank you for your response. I did not make the StreamingUpdate application yet, but I did change the other settings that you mentioned. It gave me a huge boost in indexing speed. (I am still using post.sh but hope to change that soon). One thing I noticed is the indexing speed was incredibly fast last night, but today the commits are taking so long. Is this to be expected? -- Best Regards, Charles Wardell Blue Chips Technology, Inc. www.bcsolution.com On Wednesday, April 27, 2011 at 6:15 PM, Otis Gospodnetic wrote: Hi Charles, Yes, the threads I was referring to are in the context of the client/indexer, so one of the params for StreamingUpdateSolrServer. post.sh/jar are just there because they are handy. Don't use them for production. It's impossible to tell how long indexing of 100M documents may take. They could be very big or very small. You could perform very light or no analysis or heavy analysis. They could contain 1 or 100 fields. :) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Charles Wardell charles.ward...@bcsolution.com To: solr-user@lucene.apache.org Sent: Tue, April 26, 2011 8:01:28 PM Subject: Re: Question on Batch process Thank you Otis. Without trying to appear to stupid, when you refer to having the params matching your # of CPU cores, you are talking about the # of threads I can spawn with the StreamingUpdateSolrServer object? Up until now, I have been just utilizing post.sh or post.jar. Are these capable of that or do I need to write some code to collect a bunch of files into the buffer and send it off? Also, Do you have a sense for how long it should take to index 100,000 files or in my case 100,000,000 documents? StreamingUpdateSolrServer public StreamingUpdateSolrServer(String solrServerUrl, int queueSize, int threadCount) throws MalformedURLException Thanks again, Charlie -- Best Regards, Charles Wardell Blue Chips Technology, Inc. www.bcsolution.com On Tuesday, April 26, 2011 at 5:12 PM, Otis Gospodnetic wrote: Charlie, How's this: * -Xmx2g * ramBufferSizeMB 512 * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n allows) * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB * use SolrStreamingUpdateServer (with params matching your number of CPU cores) or send batches of say 1000 docs with the other SolrServer impl using N threads (N=# of your CPU cores) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Charles Wardell charles.ward...@bcsolution.com To: solr-user@lucene.apache.org Sent: Tue, April 26, 2011 2:32:29 PM Subject: Question on Batch process I am sure that this question has been asked a few times, but I can't seem to find the sweetspot for indexing. I have about 100,000 files each containing 1,000 xml documents ready to be posted to Solr. My desire is to have it index as quickly as possible and then once completed the daily stream of ADDs will be small in comparison. The individual documents are small. Essentially web postings from the net. Title, postPostContent, date. What would be the ideal configuration? For RamBufferSize, mergeFactor, MaxbufferedDocs, etc.. My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in TOP I have 16GB of available ram. Thanks in advance. Charlie