Re: defaultOperator=AND and queries with (
On Thu, Aug 13, 2009 at 5:31 AM, Subbacharya, Madhu madhu.subbacha...@corp.aol.com wrote: Hello, We have Solr running with the defaultOperator set to AND. Am not able to get any results for queries like q=( Ferrari AND ( 599 GTB Fiorano OR 612 Scaglietti OR F430 )) , which contain ( for grouping. Anyone have any ideas for a workaround ? Can you try adding debugQuery=on to the request and post the details here? -- Regards, Shalin Shekhar Mangar.
Re: Questions about XPath in data import handler
yes. look at the 'flatten' attribute in the field. It should give you all the text (not attributes) under a given node. On Thu, Aug 13, 2009 at 8:02 PM, Andrew Cleggandrew.cl...@gmail.com wrote: Noble Paul നോബിള് नोब्ळ्-2 wrote: On Thu, Aug 13, 2009 at 6:35 PM, Andrew Cleggandrew.cl...@gmail.com wrote: Does the second one mean select the value of the attribute called qualifier in the /a/b/subject element? yes you are right. Isn't that the semantics of standard xpath syntax? Yes, just checking since the DIH XPath engine is a little different. Do you know what I would get in this case? Also... Can I select a non-leaf node and get *ALL* the text underneath it? e.g. /a/b in this example? Cheers, Andrew. -- View this message in context: http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954869.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Boolean logic in distributed searches
Hello, Firstly, my apologies for what I suspect is a very straightforward question. I have two Solr 1.3 indexes and am searching both through Solr's distributed searching. Searching works correctly, however boolean searches are being interepreted differently depending on whether or not I search both indexes. For the example search criteria 'fish OR game', the following query returns 9300 results: http://testapp1.test.archives.govt.nz:8080/solr-archway/select?qt=standa rdq=fish+OR+game Whereas the following query only returns 170 results: http://testapp1.test.archives.govt.nz:8080/solr-archway/select?qt=standa rdq=fish+OR+gameshards=testapp1.test.archives.govt.nz:8080/solr-archwa y,testapp1.test.archives.govt.nz:8080/solr-portal http://testapp1.test.archives.govt.nz:8080/solr-archway/select?qt=stand ardq=fish+OR+gameshards=testapp1.test.archives.govt.nz:8080/solr-archw ay,testapp1.test.archives.govt.nz:8080/solr-portal This is the same even if a single shard is present. Are boolean searches not supported across multiple shards? Or do I need to tweak something? Thanks in anticipation, Matt This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) and may also be LEGALLY PRIVILEGED. If you are not the intended addressee, please do not use, disclose, copy or distribute the message or the information it contains. Instead, please notify me as soon as possible and delete the e-mail, including any attachments. Thank you.
Re: Solr 1.4 Replication scheme
On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati jimmoe...@gmail.comwrote: In the old replication, I could snappull with multiple slaves asynchronously but perform the snapinstall on each at the same time (+- epsilon seconds), so that way production load balanced query serving will always be consistent. With the new system it seems that i have no control over syncing them, but rather it polls every few minutes and then decides the next cycle based on last time it *finished* updating, so in any case I lose control over the synchronization of snap installation across multiple slaves. That is true. How did you synchronize them with the script based solution? Assuming network bandwidth is equally distributed and all slaves are equal in hardware/configuration, the time difference between new searcher registration on any slave should not be more then pollInterval, no? Also, I noticed the default poll interval is 60 seconds. It would seem that for such a rapid interval, what i mentioned above is a non issue, however i am not clear how this works vis-a-vis the new searcher warmup? for a considerable index size (20Million docs+) the warmup itself is an expensive and somewhat lengthy process and if a new searcher opens and warms up every minute, I am not at all sure i'll be able to serve queries with reasonable QTimes. If the pollInterval is 60 seconds, it does not mean that a new index is fetched every 60 seconds. A new index is downloaded and installed on the slave only if a commit happened on the master (i.e. the index was actually changed on the master). -- Regards, Shalin Shekhar Mangar.
Re: Writing own request handler tutorial
On Thu, Aug 13, 2009 at 6:17 AM, darniz rnizamud...@edmunds.com wrote: could anybody provide me with a complete data import handler example with oracle if there is any. Only the DataSource section in the configuration and the jdbc driver will be specific to Oracle. What is the problem you're facing? -- Regards, Shalin Shekhar Mangar.
Re: Solr 1.4 Replication scheme
Hey Shalin, thanks for your prompt reply. To clarity: With the old script-based replication, I would snappull every x minutes (say, on the order of 5 minutes). Assuming no index optimize occured ( I optimize 1-2 times a day so we can disregard it for the sake of argument), the snappull would take a few seconds to run on each iteration. I then have a crontab on all slaves that runs snapinstall on a fixed time, lets say every 15 minutes from start of a round hour, inclusive. (slave machine times are synced e.g via ntp) so that essentially all slaves will begin a snapinstall exactly at the same time - assuming uniform load and the fact they all have at this point in time the same snapshot since I snappull frequently - this leads to a fairly synchronized replication across the board. With the new replication however, it seems that by binding the pulling and installing as well specifying the timing in delta's only (as opposed to absolute-time based like in crontab) we've essentially made it impossible to effectively keep multiple slaves up to date and synchronized; e.g if we set poll interval to 15 minutes, a slight offset in the startup times of the slaves (that can very much be the case for arbitrary resets/maintenance operations) can lead to deviations in snappull(+install) times. this in turn is further made worse by the fact that the pollInterval is then computed based on the offset of when the last commit *finished* - and this number seems to have a higher variance, e.g due to warmup which might be different across machines based on the queries they've handled previously. To summarize, It seems to me like it might be beneficial to introduce a second parameter that acts more like a crontab time-based tableau, in so far that it can enable a user to specify when an actual commit should occur - so then we can have the pollInterval set to a low value (e.g 60 seconds) but then specify to only perform a commit on the 0,15,30,45-minutes of every hour. this makes the commit times on the slaves fairly deterministic. Does this make sense or am i missing something with current in-process replication? Thanks, -Chak Shalin Shekhar Mangar wrote: On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati jimmoe...@gmail.comwrote: In the old replication, I could snappull with multiple slaves asynchronously but perform the snapinstall on each at the same time (+- epsilon seconds), so that way production load balanced query serving will always be consistent. With the new system it seems that i have no control over syncing them, but rather it polls every few minutes and then decides the next cycle based on last time it *finished* updating, so in any case I lose control over the synchronization of snap installation across multiple slaves. That is true. How did you synchronize them with the script based solution? Assuming network bandwidth is equally distributed and all slaves are equal in hardware/configuration, the time difference between new searcher registration on any slave should not be more then pollInterval, no? Also, I noticed the default poll interval is 60 seconds. It would seem that for such a rapid interval, what i mentioned above is a non issue, however i am not clear how this works vis-a-vis the new searcher warmup? for a considerable index size (20Million docs+) the warmup itself is an expensive and somewhat lengthy process and if a new searcher opens and warms up every minute, I am not at all sure i'll be able to serve queries with reasonable QTimes. If the pollInterval is 60 seconds, it does not mean that a new index is fetched every 60 seconds. A new index is downloaded and installed on the slave only if a commit happened on the master (i.e. the index was actually changed on the master). -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 Replication scheme
usually the pollInterval is kept to a small value like 10secs. there is no harm in polling more frequently. This can ensure that the replication happens at almost same time On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Shalin, thanks for your prompt reply. To clarity: With the old script-based replication, I would snappull every x minutes (say, on the order of 5 minutes). Assuming no index optimize occured ( I optimize 1-2 times a day so we can disregard it for the sake of argument), the snappull would take a few seconds to run on each iteration. I then have a crontab on all slaves that runs snapinstall on a fixed time, lets say every 15 minutes from start of a round hour, inclusive. (slave machine times are synced e.g via ntp) so that essentially all slaves will begin a snapinstall exactly at the same time - assuming uniform load and the fact they all have at this point in time the same snapshot since I snappull frequently - this leads to a fairly synchronized replication across the board. With the new replication however, it seems that by binding the pulling and installing as well specifying the timing in delta's only (as opposed to absolute-time based like in crontab) we've essentially made it impossible to effectively keep multiple slaves up to date and synchronized; e.g if we set poll interval to 15 minutes, a slight offset in the startup times of the slaves (that can very much be the case for arbitrary resets/maintenance operations) can lead to deviations in snappull(+install) times. this in turn is further made worse by the fact that the pollInterval is then computed based on the offset of when the last commit *finished* - and this number seems to have a higher variance, e.g due to warmup which might be different across machines based on the queries they've handled previously. To summarize, It seems to me like it might be beneficial to introduce a second parameter that acts more like a crontab time-based tableau, in so far that it can enable a user to specify when an actual commit should occur - so then we can have the pollInterval set to a low value (e.g 60 seconds) but then specify to only perform a commit on the 0,15,30,45-minutes of every hour. this makes the commit times on the slaves fairly deterministic. Does this make sense or am i missing something with current in-process replication? Thanks, -Chak Shalin Shekhar Mangar wrote: On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati jimmoe...@gmail.comwrote: In the old replication, I could snappull with multiple slaves asynchronously but perform the snapinstall on each at the same time (+- epsilon seconds), so that way production load balanced query serving will always be consistent. With the new system it seems that i have no control over syncing them, but rather it polls every few minutes and then decides the next cycle based on last time it *finished* updating, so in any case I lose control over the synchronization of snap installation across multiple slaves. That is true. How did you synchronize them with the script based solution? Assuming network bandwidth is equally distributed and all slaves are equal in hardware/configuration, the time difference between new searcher registration on any slave should not be more then pollInterval, no? Also, I noticed the default poll interval is 60 seconds. It would seem that for such a rapid interval, what i mentioned above is a non issue, however i am not clear how this works vis-a-vis the new searcher warmup? for a considerable index size (20Million docs+) the warmup itself is an expensive and somewhat lengthy process and if a new searcher opens and warms up every minute, I am not at all sure i'll be able to serve queries with reasonable QTimes. If the pollInterval is 60 seconds, it does not mean that a new index is fetched every 60 seconds. A new index is downloaded and installed on the slave only if a commit happened on the master (i.e. the index was actually changed on the master). -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Questions about XPath in data import handler
Noble Paul നോബിള് नोब्ळ्-2 wrote: yes. look at the 'flatten' attribute in the field. It should give you all the text (not attributes) under a given node. I missed that one -- many thanks. Andrew. -- View this message in context: http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24968349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 Replication scheme
Hey Noble, you are right in that this will solve the problem, however it implicitly assumes that commits to the master are infrequent enough ( so that most polling operations yield no update and only every few polls lead to an actual commit. ) This is a relatively safe assumption in most cases, but one that couples the master update policy with the performance of the slaves - if the master gets updated (and committed to) frequently, slaves might face a commit on every 1-2 poll's, much more than is feasible given new searcher warmup times.. In effect what this comes down to it seems is that i must make the master commit frequency the same as i'd want the slaves to use - and this is markedly different than previous behaviour with which i could have the master get updated(+committed to) at one rate and slaves committing those updates at a different rate. Noble Paul നോബിള് नोब्ळ्-2 wrote: usually the pollInterval is kept to a small value like 10secs. there is no harm in polling more frequently. This can ensure that the replication happens at almost same time On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Shalin, thanks for your prompt reply. To clarity: With the old script-based replication, I would snappull every x minutes (say, on the order of 5 minutes). Assuming no index optimize occured ( I optimize 1-2 times a day so we can disregard it for the sake of argument), the snappull would take a few seconds to run on each iteration. I then have a crontab on all slaves that runs snapinstall on a fixed time, lets say every 15 minutes from start of a round hour, inclusive. (slave machine times are synced e.g via ntp) so that essentially all slaves will begin a snapinstall exactly at the same time - assuming uniform load and the fact they all have at this point in time the same snapshot since I snappull frequently - this leads to a fairly synchronized replication across the board. With the new replication however, it seems that by binding the pulling and installing as well specifying the timing in delta's only (as opposed to absolute-time based like in crontab) we've essentially made it impossible to effectively keep multiple slaves up to date and synchronized; e.g if we set poll interval to 15 minutes, a slight offset in the startup times of the slaves (that can very much be the case for arbitrary resets/maintenance operations) can lead to deviations in snappull(+install) times. this in turn is further made worse by the fact that the pollInterval is then computed based on the offset of when the last commit *finished* - and this number seems to have a higher variance, e.g due to warmup which might be different across machines based on the queries they've handled previously. To summarize, It seems to me like it might be beneficial to introduce a second parameter that acts more like a crontab time-based tableau, in so far that it can enable a user to specify when an actual commit should occur - so then we can have the pollInterval set to a low value (e.g 60 seconds) but then specify to only perform a commit on the 0,15,30,45-minutes of every hour. this makes the commit times on the slaves fairly deterministic. Does this make sense or am i missing something with current in-process replication? Thanks, -Chak Shalin Shekhar Mangar wrote: On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati jimmoe...@gmail.comwrote: In the old replication, I could snappull with multiple slaves asynchronously but perform the snapinstall on each at the same time (+- epsilon seconds), so that way production load balanced query serving will always be consistent. With the new system it seems that i have no control over syncing them, but rather it polls every few minutes and then decides the next cycle based on last time it *finished* updating, so in any case I lose control over the synchronization of snap installation across multiple slaves. That is true. How did you synchronize them with the script based solution? Assuming network bandwidth is equally distributed and all slaves are equal in hardware/configuration, the time difference between new searcher registration on any slave should not be more then pollInterval, no? Also, I noticed the default poll interval is 60 seconds. It would seem that for such a rapid interval, what i mentioned above is a non issue, however i am not clear how this works vis-a-vis the new searcher warmup? for a considerable index size (20Million docs+) the warmup itself is an expensive and somewhat lengthy process and if a new searcher opens and warms up every minute, I am not at all sure i'll be able to serve queries with reasonable QTimes. If the pollInterval is 60 seconds, it does not mean that a new index is fetched every 60 seconds. A new index is downloaded and installed on the slave only if a commit happened on the master (i.e. the index was actually
Re: Solr 1.4 Replication scheme
On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Noble, you are right in that this will solve the problem, however it implicitly assumes that commits to the master are infrequent enough ( so that most polling operations yield no update and only every few polls lead to an actual commit. ) This is a relatively safe assumption in most cases, but one that couples the master update policy with the performance of the slaves - if the master gets updated (and committed to) frequently, slaves might face a commit on every 1-2 poll's, much more than is feasible given new searcher warmup times.. In effect what this comes down to it seems is that i must make the master commit frequency the same as i'd want the slaves to use - and this is markedly different than previous behaviour with which i could have the master get updated(+committed to) at one rate and slaves committing those updates at a different rate. I see , the argument. But , isn't it better to keep both the mster and slave as consistent as possible? There is no use in committing in master, if you do not plan to search on those docs. So the best thing to do is do a commit only as frequently as you wish to commit in a slave. On a different track, if we can have an option of disabling commit after replication, is it worth it? So the user can trigger a commit explicitly Noble Paul നോബിള് नोब्ळ्-2 wrote: usually the pollInterval is kept to a small value like 10secs. there is no harm in polling more frequently. This can ensure that the replication happens at almost same time On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Shalin, thanks for your prompt reply. To clarity: With the old script-based replication, I would snappull every x minutes (say, on the order of 5 minutes). Assuming no index optimize occured ( I optimize 1-2 times a day so we can disregard it for the sake of argument), the snappull would take a few seconds to run on each iteration. I then have a crontab on all slaves that runs snapinstall on a fixed time, lets say every 15 minutes from start of a round hour, inclusive. (slave machine times are synced e.g via ntp) so that essentially all slaves will begin a snapinstall exactly at the same time - assuming uniform load and the fact they all have at this point in time the same snapshot since I snappull frequently - this leads to a fairly synchronized replication across the board. With the new replication however, it seems that by binding the pulling and installing as well specifying the timing in delta's only (as opposed to absolute-time based like in crontab) we've essentially made it impossible to effectively keep multiple slaves up to date and synchronized; e.g if we set poll interval to 15 minutes, a slight offset in the startup times of the slaves (that can very much be the case for arbitrary resets/maintenance operations) can lead to deviations in snappull(+install) times. this in turn is further made worse by the fact that the pollInterval is then computed based on the offset of when the last commit *finished* - and this number seems to have a higher variance, e.g due to warmup which might be different across machines based on the queries they've handled previously. To summarize, It seems to me like it might be beneficial to introduce a second parameter that acts more like a crontab time-based tableau, in so far that it can enable a user to specify when an actual commit should occur - so then we can have the pollInterval set to a low value (e.g 60 seconds) but then specify to only perform a commit on the 0,15,30,45-minutes of every hour. this makes the commit times on the slaves fairly deterministic. Does this make sense or am i missing something with current in-process replication? Thanks, -Chak Shalin Shekhar Mangar wrote: On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati jimmoe...@gmail.comwrote: In the old replication, I could snappull with multiple slaves asynchronously but perform the snapinstall on each at the same time (+- epsilon seconds), so that way production load balanced query serving will always be consistent. With the new system it seems that i have no control over syncing them, but rather it polls every few minutes and then decides the next cycle based on last time it *finished* updating, so in any case I lose control over the synchronization of snap installation across multiple slaves. That is true. How did you synchronize them with the script based solution? Assuming network bandwidth is equally distributed and all slaves are equal in hardware/configuration, the time difference between new searcher registration on any slave should not be more then pollInterval, no? Also, I noticed the default poll interval is 60 seconds. It would seem that for such a rapid interval, what i mentioned above is a non issue, however i am not clear how this works
Choosing between t and s field types
Hi, I want certain fields of type int,float and date to be sortable and I should be able to run my range queries as well as facet queries on those fields. Now as far as I know sint,sfloat fieldtypes make the fields sortable and tint,tfloat,tdate allow range queries on the fields. I want both these features in my fields. How can I make this happen?
A Buzzword Problem!!!
Hi, I have a scenario in which I need to store Buzz words and their frequency in a particular document. Also along with the buzzwords I have possible basewords, portar words associated with the buzzwords. Buzzword,Baseword,Portar word all need to be searchable. How can I use dynamic fields and my Solr schema? Regards, Ninad
Re: Choosing between t and s field types
Accoridng to the documentation in schema.xml.original sint etc can be used for both sorting and range queries? !-- Numeric field types that manipulate the value into a string value that isn't human-readable in its internal form, but with a lexicographic ordering the same as the numeric ordering, so that range queries work correctly. -- fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut hbase.user.ni...@gmail.comwrote: Hi, I want certain fields of type int,float and date to be sortable and I should be able to run my range queries as well as facet queries on those fields. Now as far as I know sint,sfloat fieldtypes make the fields sortable and tint,tfloat,tdate allow range queries on the fields. I want both these features in my fields. How can I make this happen?
Re: Choosing between t and s field types
Hi Constantijn, What are the t types viz;tint,tfloat etc. for? Is there a special use of these? On Fri, Aug 14, 2009 at 4:37 PM, Constantijn Visinescu baeli...@gmail.comwrote: Accoridng to the documentation in schema.xml.original sint etc can be used for both sorting and range queries? !-- Numeric field types that manipulate the value into a string value that isn't human-readable in its internal form, but with a lexicographic ordering the same as the numeric ordering, so that range queries work correctly. -- fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, I want certain fields of type int,float and date to be sortable and I should be able to run my range queries as well as facet queries on those fields. Now as far as I know sint,sfloat fieldtypes make the fields sortable and tint,tfloat,tdate allow range queries on the fields. I want both these features in my fields. How can I make this happen?
Re: [OT] Solr Webinar
webinar: https://admin.na4.acrobat.com/_a837485961/p23226399/ slides: http://www.lucidimagination.com/files/file/improving_findability.pdf []s, Lucas Frare Teixeira .·. - lucas...@gmail.com - blog.lucastex.com - twitter.com/lucastex On Fri, Aug 14, 2009 at 2:55 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hello, they [Lucid Imagination guys] said it should be published on their blog. I hope I understood it correctly. Regards, Lukas http://blog.lukas-vlcek.com/ On Fri, Aug 14, 2009 at 7:52 AM, Mani Kumar manikumarchau...@gmail.com wrote: if anyone has any pointer to this webinar, please share it. thanks! mani On Thu, Aug 13, 2009 at 9:26 PM, Chenini, Mohamed mchen...@geico.com wrote: I also registered to attend but I am not going to because here at work a last minute meeting has been scheduled at the same time. Is it possible in the future to schedule such webinars starting 5-6 PM ET? Thanks, Mohamed -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Wednesday, August 12, 2009 6:22 PM To: solr-user@lucene.apache.org Subject: Re: [OT] Solr Webinar I believe it will be, but am not sure of the procedure for distributing. I think if you register, but don't show, you will get a notification. -Grant On Aug 10, 2009, at 12:26 PM, Lucas F. A. Teixeira wrote: Hello Grant, Will the webinar be recorded and available to download later someplace? Unfortunately, I can't watch this time. Thanks, []s, Lucas Frare Teixeira .*. - lucas...@gmail.com - blog.lucastex.com - twitter.com/lucastex On Mon, Aug 10, 2009 at 12:33 PM, Grant Ingersoll gsing...@apache.orgwrote: I will be giving a free one hour webinar on getting started with Apache Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT You can sign up @ http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP I will present and demo: * Getting started with LucidWorks for Solr * Getting better, faster results using Solr's findability and relevance improvement tools * Deploying Solr in production, including monitoring performance and trends with the LucidGaze for Solr performance profiler -Grant This email/fax message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of this email/fax is prohibited. If you are not the intended recipient, please destroy all paper and electronic copies of the original message.
Re: Choosing between t and s field types
I just checked and the default schema.xml for SOLR 1.3 (solr/conf/schema.xml.original) and i don't see tint, etc listed. So either they''re new in 1.4 and I don't know about them or they were manually defined. Can you post your schema.xml entries for tint? (along with any comments it might have?) Constantijn On Fri, Aug 14, 2009 at 1:39 PM, Ninad Raut hbase.user.ni...@gmail.comwrote: Hi Constantijn, What are the t types viz;tint,tfloat etc. for? Is there a special use of these? On Fri, Aug 14, 2009 at 4:37 PM, Constantijn Visinescu baeli...@gmail.comwrote: Accoridng to the documentation in schema.xml.original sint etc can be used for both sorting and range queries? !-- Numeric field types that manipulate the value into a string value that isn't human-readable in its internal form, but with a lexicographic ordering the same as the numeric ordering, so that range queries work correctly. -- fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, I want certain fields of type int,float and date to be sortable and I should be able to run my range queries as well as facet queries on those fields. Now as far as I know sint,sfloat fieldtypes make the fields sortable and tint,tfloat,tdate allow range queries on the fields. I want both these features in my fields. How can I make this happen?
Re: Solr 1.4 Replication scheme
Longer term, it might be nice to enable clients to specify what version of the index they were searching against. This could be used to prevent consistency issues across different slaves, even if they commit at different times. It could also be used in distributed search to make sure the index didn't change between phases. -Yonik http://www.lucidimagination.com 2009/8/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Noble, you are right in that this will solve the problem, however it implicitly assumes that commits to the master are infrequent enough ( so that most polling operations yield no update and only every few polls lead to an actual commit. ) This is a relatively safe assumption in most cases, but one that couples the master update policy with the performance of the slaves - if the master gets updated (and committed to) frequently, slaves might face a commit on every 1-2 poll's, much more than is feasible given new searcher warmup times.. In effect what this comes down to it seems is that i must make the master commit frequency the same as i'd want the slaves to use - and this is markedly different than previous behaviour with which i could have the master get updated(+committed to) at one rate and slaves committing those updates at a different rate. I see , the argument. But , isn't it better to keep both the mster and slave as consistent as possible? There is no use in committing in master, if you do not plan to search on those docs. So the best thing to do is do a commit only as frequently as you wish to commit in a slave. On a different track, if we can have an option of disabling commit after replication, is it worth it? So the user can trigger a commit explicitly Noble Paul നോബിള് नोब्ळ्-2 wrote: usually the pollInterval is kept to a small value like 10secs. there is no harm in polling more frequently. This can ensure that the replication happens at almost same time On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Shalin, thanks for your prompt reply. To clarity: With the old script-based replication, I would snappull every x minutes (say, on the order of 5 minutes). Assuming no index optimize occured ( I optimize 1-2 times a day so we can disregard it for the sake of argument), the snappull would take a few seconds to run on each iteration. I then have a crontab on all slaves that runs snapinstall on a fixed time, lets say every 15 minutes from start of a round hour, inclusive. (slave machine times are synced e.g via ntp) so that essentially all slaves will begin a snapinstall exactly at the same time - assuming uniform load and the fact they all have at this point in time the same snapshot since I snappull frequently - this leads to a fairly synchronized replication across the board. With the new replication however, it seems that by binding the pulling and installing as well specifying the timing in delta's only (as opposed to absolute-time based like in crontab) we've essentially made it impossible to effectively keep multiple slaves up to date and synchronized; e.g if we set poll interval to 15 minutes, a slight offset in the startup times of the slaves (that can very much be the case for arbitrary resets/maintenance operations) can lead to deviations in snappull(+install) times. this in turn is further made worse by the fact that the pollInterval is then computed based on the offset of when the last commit *finished* - and this number seems to have a higher variance, e.g due to warmup which might be different across machines based on the queries they've handled previously. To summarize, It seems to me like it might be beneficial to introduce a second parameter that acts more like a crontab time-based tableau, in so far that it can enable a user to specify when an actual commit should occur - so then we can have the pollInterval set to a low value (e.g 60 seconds) but then specify to only perform a commit on the 0,15,30,45-minutes of every hour. this makes the commit times on the slaves fairly deterministic. Does this make sense or am i missing something with current in-process replication? Thanks, -Chak Shalin Shekhar Mangar wrote: On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati jimmoe...@gmail.comwrote: In the old replication, I could snappull with multiple slaves asynchronously but perform the snapinstall on each at the same time (+- epsilon seconds), so that way production load balanced query serving will always be consistent. With the new system it seems that i have no control over syncing them, but rather it polls every few minutes and then decides the next cycle based on last time it *finished* updating, so in any case I lose control over the synchronization of snap installation across multiple slaves. That is true. How did you synchronize
Re: Choosing between t and s field types
I just checked and the default schema.xml for SOLR 1.3 (solr/conf/schema.xml.original) and i don't see tint, etc listed.So either they''re new in 1.4 and I don't know about them or they were manually defined. TrieRange fields are new and will make appearance in Solr 1.4. With 1.3 you can use sint and sfloat for your use cases. @Ninad - http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ Cheers Avlesh On Fri, Aug 14, 2009 at 6:03 PM, Constantijn Visinescu baeli...@gmail.comwrote: I just checked and the default schema.xml for SOLR 1.3 (solr/conf/schema.xml.original) and i don't see tint, etc listed. So either they''re new in 1.4 and I don't know about them or they were manually defined. Can you post your schema.xml entries for tint? (along with any comments it might have?) Constantijn On Fri, Aug 14, 2009 at 1:39 PM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi Constantijn, What are the t types viz;tint,tfloat etc. for? Is there a special use of these? On Fri, Aug 14, 2009 at 4:37 PM, Constantijn Visinescu baeli...@gmail.comwrote: Accoridng to the documentation in schema.xml.original sint etc can be used for both sorting and range queries? !-- Numeric field types that manipulate the value into a string value that isn't human-readable in its internal form, but with a lexicographic ordering the same as the numeric ordering, so that range queries work correctly. -- fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, I want certain fields of type int,float and date to be sortable and I should be able to run my range queries as well as facet queries on those fields. Now as far as I know sint,sfloat fieldtypes make the fields sortable and tint,tfloat,tdate allow range queries on the fields. I want both these features in my fields. How can I make this happen?
Re: Solr 1.4 Replication scheme
Slightly off topic one question on the index file transfer mechanism used in the new 1.4 Replication scheme. Is my understanding correct that the transfer is over http? (vs. rsync in the script-based snappuller) Thanks, -Jibo On Aug 14, 2009, at 6:34 AM, Yonik Seeley wrote: Longer term, it might be nice to enable clients to specify what version of the index they were searching against. This could be used to prevent consistency issues across different slaves, even if they commit at different times. It could also be used in distributed search to make sure the index didn't change between phases. -Yonik http://www.lucidimagination.com 2009/8/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Noble, you are right in that this will solve the problem, however it implicitly assumes that commits to the master are infrequent enough ( so that most polling operations yield no update and only every few polls lead to an actual commit. ) This is a relatively safe assumption in most cases, but one that couples the master update policy with the performance of the slaves - if the master gets updated (and committed to) frequently, slaves might face a commit on every 1-2 poll's, much more than is feasible given new searcher warmup times.. In effect what this comes down to it seems is that i must make the master commit frequency the same as i'd want the slaves to use - and this is markedly different than previous behaviour with which i could have the master get updated(+committed to) at one rate and slaves committing those updates at a different rate. I see , the argument. But , isn't it better to keep both the mster and slave as consistent as possible? There is no use in committing in master, if you do not plan to search on those docs. So the best thing to do is do a commit only as frequently as you wish to commit in a slave. On a different track, if we can have an option of disabling commit after replication, is it worth it? So the user can trigger a commit explicitly Noble Paul നോബിള് नोब्ळ्-2 wrote: usually the pollInterval is kept to a small value like 10secs. there is no harm in polling more frequently. This can ensure that the replication happens at almost same time On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Shalin, thanks for your prompt reply. To clarity: With the old script-based replication, I would snappull every x minutes (say, on the order of 5 minutes). Assuming no index optimize occured ( I optimize 1-2 times a day so we can disregard it for the sake of argument), the snappull would take a few seconds to run on each iteration. I then have a crontab on all slaves that runs snapinstall on a fixed time, lets say every 15 minutes from start of a round hour, inclusive. (slave machine times are synced e.g via ntp) so that essentially all slaves will begin a snapinstall exactly at the same time - assuming uniform load and the fact they all have at this point in time the same snapshot since I snappull frequently - this leads to a fairly synchronized replication across the board. With the new replication however, it seems that by binding the pulling and installing as well specifying the timing in delta's only (as opposed to absolute-time based like in crontab) we've essentially made it impossible to effectively keep multiple slaves up to date and synchronized; e.g if we set poll interval to 15 minutes, a slight offset in the startup times of the slaves (that can very much be the case for arbitrary resets/ maintenance operations) can lead to deviations in snappull(+install) times. this in turn is further made worse by the fact that the pollInterval is then computed based on the offset of when the last commit *finished* - and this number seems to have a higher variance, e.g due to warmup which might be different across machines based on the queries they've handled previously. To summarize, It seems to me like it might be beneficial to introduce a second parameter that acts more like a crontab time-based tableau, in so far that it can enable a user to specify when an actual commit should occur - so then we can have the pollInterval set to a low value (e.g 60 seconds) but then specify to only perform a commit on the 0,15,30,45-minutes of every hour. this makes the commit times on the slaves fairly deterministic. Does this make sense or am i missing something with current in- process replication? Thanks, -Chak Shalin Shekhar Mangar wrote: On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati jimmoe...@gmail.comwrote: In the old replication, I could snappull with multiple slaves asynchronously but perform the snapinstall on each at the same time (+- epsilon seconds), so that way production load balanced query serving will always be consistent. With the new system
Re: Solr 1.4 Replication scheme
On Fri, Aug 14, 2009 at 11:53 AM, Jibo Johnjiboj...@mac.com wrote: Slightly off topic one question on the index file transfer mechanism used in the new 1.4 Replication scheme. Is my understanding correct that the transfer is over http? (vs. rsync in the script-based snappuller) Yes, that's correct. -Yonik http://www.lucidimagination.com
Re: Choosing between t and s field types
TrieRange ... what are its features? What additional functionality they provide? On Fri, Aug 14, 2009 at 8:35 PM, Avlesh Singh avl...@gmail.com wrote: I just checked and the default schema.xml for SOLR 1.3 (solr/conf/schema.xml.original) and i don't see tint, etc listed.So either they''re new in 1.4 and I don't know about them or they were manually defined. TrieRange fields are new and will make appearance in Solr 1.4. With 1.3 you can use sint and sfloat for your use cases. @Ninad - http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ Cheers Avlesh On Fri, Aug 14, 2009 at 6:03 PM, Constantijn Visinescu baeli...@gmail.comwrote: I just checked and the default schema.xml for SOLR 1.3 (solr/conf/schema.xml.original) and i don't see tint, etc listed. So either they''re new in 1.4 and I don't know about them or they were manually defined. Can you post your schema.xml entries for tint? (along with any comments it might have?) Constantijn On Fri, Aug 14, 2009 at 1:39 PM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi Constantijn, What are the t types viz;tint,tfloat etc. for? Is there a special use of these? On Fri, Aug 14, 2009 at 4:37 PM, Constantijn Visinescu baeli...@gmail.comwrote: Accoridng to the documentation in schema.xml.original sint etc can be used for both sorting and range queries? !-- Numeric field types that manipulate the value into a string value that isn't human-readable in its internal form, but with a lexicographic ordering the same as the numeric ordering, so that range queries work correctly. -- fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, I want certain fields of type int,float and date to be sortable and I should be able to run my range queries as well as facet queries on those fields. Now as far as I know sint,sfloat fieldtypes make the fields sortable and tint,tfloat,tdate allow range queries on the fields. I want both these features in my fields. How can I make this happen?
Re: Choosing between t and s field types
On Fri, Aug 14, 2009 at 1:15 PM, Ninad Rauthbase.user.ni...@gmail.com wrote: TrieRange ... what are its features? What additional functionality they provide? - a generally more efficient FieldCache structure (less memory) - faster range queries when precisionStep is utilized to index multiple tokens per value -Yonik http://www.lucidimagination.com On Fri, Aug 14, 2009 at 8:35 PM, Avlesh Singh avl...@gmail.com wrote: I just checked and the default schema.xml for SOLR 1.3 (solr/conf/schema.xml.original) and i don't see tint, etc listed.So either they''re new in 1.4 and I don't know about them or they were manually defined. TrieRange fields are new and will make appearance in Solr 1.4. With 1.3 you can use sint and sfloat for your use cases. @Ninad - http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ Cheers Avlesh On Fri, Aug 14, 2009 at 6:03 PM, Constantijn Visinescu baeli...@gmail.comwrote: I just checked and the default schema.xml for SOLR 1.3 (solr/conf/schema.xml.original) and i don't see tint, etc listed. So either they''re new in 1.4 and I don't know about them or they were manually defined. Can you post your schema.xml entries for tint? (along with any comments it might have?) Constantijn On Fri, Aug 14, 2009 at 1:39 PM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi Constantijn, What are the t types viz;tint,tfloat etc. for? Is there a special use of these? On Fri, Aug 14, 2009 at 4:37 PM, Constantijn Visinescu baeli...@gmail.comwrote: Accoridng to the documentation in schema.xml.original sint etc can be used for both sorting and range queries? !-- Numeric field types that manipulate the value into a string value that isn't human-readable in its internal form, but with a lexicographic ordering the same as the numeric ordering, so that range queries work correctly. -- fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, I want certain fields of type int,float and date to be sortable and I should be able to run my range queries as well as facet queries on those fields. Now as far as I know sint,sfloat fieldtypes make the fields sortable and tint,tfloat,tdate allow range queries on the fields. I want both these features in my fields. How can I make this happen?
Re: Solr 1.4 Replication scheme
This would be good! Especially for NRT where this problem is somewhat harder. I think we may need to look at caching readers per corresponding http session. The pitfall is expiring them before running out of RAM. On Fri, Aug 14, 2009 at 6:34 AM, Yonik Seeleyyo...@lucidimagination.com wrote: Longer term, it might be nice to enable clients to specify what version of the index they were searching against. This could be used to prevent consistency issues across different slaves, even if they commit at different times. It could also be used in distributed search to make sure the index didn't change between phases. -Yonik http://www.lucidimagination.com 2009/8/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Noble, you are right in that this will solve the problem, however it implicitly assumes that commits to the master are infrequent enough ( so that most polling operations yield no update and only every few polls lead to an actual commit. ) This is a relatively safe assumption in most cases, but one that couples the master update policy with the performance of the slaves - if the master gets updated (and committed to) frequently, slaves might face a commit on every 1-2 poll's, much more than is feasible given new searcher warmup times.. In effect what this comes down to it seems is that i must make the master commit frequency the same as i'd want the slaves to use - and this is markedly different than previous behaviour with which i could have the master get updated(+committed to) at one rate and slaves committing those updates at a different rate. I see , the argument. But , isn't it better to keep both the mster and slave as consistent as possible? There is no use in committing in master, if you do not plan to search on those docs. So the best thing to do is do a commit only as frequently as you wish to commit in a slave. On a different track, if we can have an option of disabling commit after replication, is it worth it? So the user can trigger a commit explicitly Noble Paul നോബിള് नोब्ळ्-2 wrote: usually the pollInterval is kept to a small value like 10secs. there is no harm in polling more frequently. This can ensure that the replication happens at almost same time On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Shalin, thanks for your prompt reply. To clarity: With the old script-based replication, I would snappull every x minutes (say, on the order of 5 minutes). Assuming no index optimize occured ( I optimize 1-2 times a day so we can disregard it for the sake of argument), the snappull would take a few seconds to run on each iteration. I then have a crontab on all slaves that runs snapinstall on a fixed time, lets say every 15 minutes from start of a round hour, inclusive. (slave machine times are synced e.g via ntp) so that essentially all slaves will begin a snapinstall exactly at the same time - assuming uniform load and the fact they all have at this point in time the same snapshot since I snappull frequently - this leads to a fairly synchronized replication across the board. With the new replication however, it seems that by binding the pulling and installing as well specifying the timing in delta's only (as opposed to absolute-time based like in crontab) we've essentially made it impossible to effectively keep multiple slaves up to date and synchronized; e.g if we set poll interval to 15 minutes, a slight offset in the startup times of the slaves (that can very much be the case for arbitrary resets/maintenance operations) can lead to deviations in snappull(+install) times. this in turn is further made worse by the fact that the pollInterval is then computed based on the offset of when the last commit *finished* - and this number seems to have a higher variance, e.g due to warmup which might be different across machines based on the queries they've handled previously. To summarize, It seems to me like it might be beneficial to introduce a second parameter that acts more like a crontab time-based tableau, in so far that it can enable a user to specify when an actual commit should occur - so then we can have the pollInterval set to a low value (e.g 60 seconds) but then specify to only perform a commit on the 0,15,30,45-minutes of every hour. this makes the commit times on the slaves fairly deterministic. Does this make sense or am i missing something with current in-process replication? Thanks, -Chak Shalin Shekhar Mangar wrote: On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati jimmoe...@gmail.comwrote: In the old replication, I could snappull with multiple slaves asynchronously but perform the snapinstall on each at the same time (+- epsilon seconds), so that way production load balanced query serving will always be consistent. With the new system it seems
Re: Solr 1.4 Replication scheme
On Fri, Aug 14, 2009 at 1:48 PM, Jason Rutherglenjason.rutherg...@gmail.com wrote: This would be good! Especially for NRT where this problem is somewhat harder. I think we may need to look at caching readers per corresponding http session. For something like distributed search I was thinking of a simple reservation mechanism... let the client specify how long to hold open that version of the index (perhaps still have a max number of open versions to prevent an errant client from blowing things up). -Yonik http://www.lucidimagination.com The pitfall is expiring them before running out of RAM. On Fri, Aug 14, 2009 at 6:34 AM, Yonik Seeleyyo...@lucidimagination.com wrote: Longer term, it might be nice to enable clients to specify what version of the index they were searching against. This could be used to prevent consistency issues across different slaves, even if they commit at different times. It could also be used in distributed search to make sure the index didn't change between phases. -Yonik http://www.lucidimagination.com 2009/8/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Noble, you are right in that this will solve the problem, however it implicitly assumes that commits to the master are infrequent enough ( so that most polling operations yield no update and only every few polls lead to an actual commit. ) This is a relatively safe assumption in most cases, but one that couples the master update policy with the performance of the slaves - if the master gets updated (and committed to) frequently, slaves might face a commit on every 1-2 poll's, much more than is feasible given new searcher warmup times.. In effect what this comes down to it seems is that i must make the master commit frequency the same as i'd want the slaves to use - and this is markedly different than previous behaviour with which i could have the master get updated(+committed to) at one rate and slaves committing those updates at a different rate. I see , the argument. But , isn't it better to keep both the mster and slave as consistent as possible? There is no use in committing in master, if you do not plan to search on those docs. So the best thing to do is do a commit only as frequently as you wish to commit in a slave. On a different track, if we can have an option of disabling commit after replication, is it worth it? So the user can trigger a commit explicitly Noble Paul നോബിള് नोब्ळ्-2 wrote: usually the pollInterval is kept to a small value like 10secs. there is no harm in polling more frequently. This can ensure that the replication happens at almost same time On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com wrote: Hey Shalin, thanks for your prompt reply. To clarity: With the old script-based replication, I would snappull every x minutes (say, on the order of 5 minutes). Assuming no index optimize occured ( I optimize 1-2 times a day so we can disregard it for the sake of argument), the snappull would take a few seconds to run on each iteration. I then have a crontab on all slaves that runs snapinstall on a fixed time, lets say every 15 minutes from start of a round hour, inclusive. (slave machine times are synced e.g via ntp) so that essentially all slaves will begin a snapinstall exactly at the same time - assuming uniform load and the fact they all have at this point in time the same snapshot since I snappull frequently - this leads to a fairly synchronized replication across the board. With the new replication however, it seems that by binding the pulling and installing as well specifying the timing in delta's only (as opposed to absolute-time based like in crontab) we've essentially made it impossible to effectively keep multiple slaves up to date and synchronized; e.g if we set poll interval to 15 minutes, a slight offset in the startup times of the slaves (that can very much be the case for arbitrary resets/maintenance operations) can lead to deviations in snappull(+install) times. this in turn is further made worse by the fact that the pollInterval is then computed based on the offset of when the last commit *finished* - and this number seems to have a higher variance, e.g due to warmup which might be different across machines based on the queries they've handled previously. To summarize, It seems to me like it might be beneficial to introduce a second parameter that acts more like a crontab time-based tableau, in so far that it can enable a user to specify when an actual commit should occur - so then we can have the pollInterval set to a low value (e.g 60 seconds) but then specify to only perform a commit on the 0,15,30,45-minutes of every hour. this makes the commit times on the slaves fairly deterministic. Does this make sense or am i missing something with current in-process
Re: A Buzzword Problem!!!
Do you need to know, when you match which type of word it was, or do you just need to know if there was a match? On Aug 14, 2009, at 5:17 AM, Ninad Raut wrote: Hi, I have a scenario in which I need to store Buzz words and their frequency in a particular document. Also along with the buzzwords I have possible basewords, portar words associated with the buzzwords. Buzzword,Baseword,Portar word all need to be searchable. How can I use dynamic fields and my Solr schema? Regards, Ninad -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
ruby client and building spell check dictionary
I set up the spell check component with this code in the config file: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=nametitleCheck/str str name=classnamesolr.IndexBasedSpellChecker/str str name=fielddictionary/str str name=accuracy0.7/str /lst /searchComponent which works great. I can build the dictionary from my browser q=foospellcheck.build=truespellcheck.name=titleCheck and I can also receive the spellcheck response when I make a query via the ruby client. What I'm trying to do now though is build the dictionary via the ruby client. I added this code to class Solr::Request::Standard Solr::Request::Select if @params[:spellcheck] hash[:spellcheck] = true hash[spellcheck.q] = @params[:spellcheck][:query] hash[spellcheck.build] = @params[:spellcheck][:build] end and attempt to make a query with spellcheck.build=true (the spellcheck.nameis set in the defaults of select). Unfortunately I am receiving this exception Net::HTTPFatalError: 500 javaioFileNotFoundException__cfdx__javalangRuntimeException_javaioFileNotFoundException__cfdx__at_orgapachesolrspellingIndexBasedSpellCheckerbuildIndexBasedSpellCheckerjava92__at_orgapachesolrhandlercomponentSpellCheckComponentprepareSpellCheckComponentjava107__at_orgapachesolrhandlercomponentSearchHandlerhandleRequestBodySearchHandlerjava150__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1333__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava303__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava232__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayjettyHttpConnectionhandleHttpConnectionjava378__at_orgmortbayjettybioSocketConnector$ConnectionrunSocketConnectorjava226__at_orgmortbaythreadBoundedThreadPool$PoolThreadrunBoundedThreadPooljava442_Caused_by_javaioFileNo from /usr/lib/ruby/1.8/net/http.rb:2097:in `error!' from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:165:in `post' from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:151:in `send' from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:174:in `create_and_send_query' from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:92:in `query' from /home/mike/code/pubget.rails/app/models/article.rb:695:in `solr_search' from /home/mike/code/pubget.rails/app/models/article.rb:635:in `solr_build_dictionary Any help in understanding the exception would be greatly appreciated. -Mike
Re: Using Lucene's payload in Solr
Thanks for sharing your code, Ken. It is pretty much the same code that I have written except that my custom QueryParser extends Solr's SolrQueryParser instead of Lucene's QueryParser. I am also using BFTQ instead of BTQ. I have tested it and do see the payload being used in the explain output. Functionally I have got everything work now. I still have to decide how I want to index the payload (using DelimitedPayloadTokenFilter or my own custom format/code). Bill On Thu, Aug 13, 2009 at 11:31 AM, Ensdorf Ken ensd...@zoominfo.com wrote: It looks like things have changed a bit since this subject was last brought up here. I see that there are support in Solr/Lucene for indexing payload data (DelimitedPayloadTokenFilterFactory and DelimitedPayloadTokenFilter). Overriding the Similarity class is straight forward. So the last piece of the puzzle is to use a BoostingTermQuery when searching. I think all I need to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser under the cover. I think all I need to do is to write my own query parser plugin that uses a custom query parser, with the only difference being in the getFieldQuery() method where a BoostingTermQuery is used instead of a TermQuery. The BTQ is now deprecated in favor of the BoostingFunctionTermQuery, which gives some more flexibility in terms of how the spans in a single document are scored. Am I on the right track? Yes. Has anyone done something like this already? I wrote a QParserPlugin that seems to do the trick. This is minimally tested - we're not actually using it at the moment, but should get you going. Also, as Grant suggested, you may want to sub BFTQ for BTQ below: package com.zoominfo.solr.analysis; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.*; import org.apache.lucene.search.*; import org.apache.lucene.search.payloads.BoostingTermQuery; import org.apache.solr.common.params.*; import org.apache.solr.common.util.NamedList; import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.search.*; public class BoostingTermQParserPlugin extends QParserPlugin { public static String NAME = zoom; public void init(NamedList args) { } public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { System.out.print(BoostingTermQParserPlugin::createParser\n); return new BoostingTermQParser(qstr, localParams, params, req); } } class BoostingTermQueryParser extends QueryParser { public BoostingTermQueryParser(String f, Analyzer a) { super(f, a); System.out.print(BoostingTermQueryParser::BoostingTermQueryParser\n); } @Override protected Query newTermQuery(Term term){ System.out.print(BoostingTermQueryParser::newTermQuery\n); return new BoostingTermQuery(term); } } class BoostingTermQParser extends QParser { String sortStr; QueryParser lparser; public BoostingTermQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { super(qstr, localParams, params, req); System.out.print(BoostingTermQParser::BoostingTermQParser\n); } public Query parse() throws ParseException { System.out.print(BoostingTermQParser::parse\n); String qstr = getString(); String defaultField = getParam(CommonParams.DF); if (defaultField==null) { defaultField = getReq().getSchema().getSolrQueryParser(null).getField(); } lparser = new BoostingTermQueryParser(defaultField, getReq().getSchema().getQueryAnalyzer()); // these could either be checked set here, or in the SolrQueryParser constructor String opParam = getParam(QueryParsing.OP); if (opParam != null) { lparser.setDefaultOperator(AND.equals(opParam) ? QueryParser.Operator.AND : QueryParser.Operator.OR); } else { // try to get default operator from schema lparser.setDefaultOperator(getReq().getSchema().getSolrQueryParser(null).getDefaultOperator()); } return lparser.parse(qstr); } public String[] getDefaultHighlightFields() { return new String[]{lparser.getField()}; } }
UTF8 Problem with http request?
Hi, First of all I know that there is a utf8 problem with tomcat. So I updated the server.xml tomcat file with Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 URIEncoding=UTF-8 / - So now the solr admin console returns an successful result for example: q=für result: response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=indenton/str str name=start0/str str name=qfür /str str name=rows10/str str name=version2.2/str /lst /lst − result name=response numFound=2 start=0 - However if I use a http request through php5 I´ll get this result: {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r text_two:f�r* ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}} If I look into the tomcat console I see this: 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10} hits=0 status=0 QTime=0 I am quite sure it has something to do with the http request. Is it possible to set the charakterset for an http request? I cant find anything regarding the subject. kind regards, Sebastian -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: I think this is a bug
On Thu, Aug 13, 2009 at 8:21 PM, Chris Male gento...@gmail.com wrote: Hi Paul, Yes the comment does look very wrong. I'll open a JIRA issue and include a fix. On Thu, Aug 13, 2009 at 4:43 PM, Paul Tomblin ptomb...@xcski.com wrote: I don't want to join yet another mailing list or register for JIRA, but I just noticed that the Javadocs for SolrInputDocument.addField(String name, Object value, float boost) is incredibly wrong - it looks like it was copied from a deleteAll method. Thanks Paul and Chris. This is fixed in trunk. -- Regards, Shalin Shekhar Mangar.
RE: UTF8 Problem with http request?
Hey Sebastian, Did u try - 1; URLEncoder.encode(url, UTF-8); 2:if you application is Spring based-try this filter filter-nameCharacterEncoding/filter-name filter-class org.springframework.web.filter.CharacterEncodingFilter /filter-class init-param param-nameencoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameforceEncoding/param-name param-valuetrue/param-value /init-param /filter filter-mapping filter-nameCharacterEncoding/filter-name url-pattern/*/url-pattern /filter-mapping Ankit From: gateway0 [reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 3:32 PM To: solr-user@lucene.apache.org Subject: UTF8 Problem with http request? Hi, First of all I know that there is a utf8 problem with tomcat. So I updated the server.xml tomcat file with Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 URIEncoding=UTF-8 / - So now the solr admin console returns an successful result for example: q=für result: response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=indenton/str str name=start0/str str name=qfür /str str name=rows10/str str name=version2.2/str /lst /lst − result name=response numFound=2 start=0 - However if I use a http request through php5 I´ll get this result: {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r text_two:f�r* ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}} If I look into the tomcat console I see this: 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10} hits=0 status=0 QTime=0 I am quite sure it has something to do with the http request. Is it possible to set the charakterset for an http request? I cant find anything regarding the subject. kind regards, Sebastian -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: UTF8 Problem with http request?
Hi, 1. I use PHP5 what you suggested is a java function I would guess. In PHP there is something like this: urlencode(utf8_encode($url)); But sadly that doesnt help. 2. I don´t use Spring Strange thing. ANKITBHATNAGAR wrote: Hey Sebastian, Did u try - 1; URLEncoder.encode(url, UTF-8); 2:if you application is Spring based-try this filter filter-nameCharacterEncoding/filter-name filter-class org.springframework.web.filter.CharacterEncodingFilter /filter-class init-param param-nameencoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameforceEncoding/param-name param-valuetrue/param-value /init-param /filter filter-mapping filter-nameCharacterEncoding/filter-name url-pattern/*/url-pattern /filter-mapping Ankit From: gateway0 [reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 3:32 PM To: solr-user@lucene.apache.org Subject: UTF8 Problem with http request? Hi, First of all I know that there is a utf8 problem with tomcat. So I updated the server.xml tomcat file with Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 URIEncoding=UTF-8 / - So now the solr admin console returns an successful result for example: q=für result: response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=indenton/str str name=start0/str str name=qfür /str str name=rows10/str str name=version2.2/str /lst /lst − result name=response numFound=2 start=0 - However if I use a http request through php5 I´ll get this result: {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r text_two:f�r* ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}} If I look into the tomcat console I see this: 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10} hits=0 status=0 QTime=0 I am quite sure it has something to do with the http request. Is it possible to set the charakterset for an http request? I cant find anything regarding the subject. kind regards, Sebastian -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977744.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: UTF8 Problem with http request?
I guess in the header you could try setting the Charest something like this header('Content-Type: text/plain; charset=ISO-8859-1'); -Original Message- From: gateway0 [mailto:reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 4:08 PM To: solr-user@lucene.apache.org Subject: RE: UTF8 Problem with http request? Hi, 1. I use PHP5 what you suggested is a java function I would guess. In PHP there is something like this: urlencode(utf8_encode($url)); But sadly that doesnt help. 2. I don´t use Spring Strange thing. ANKITBHATNAGAR wrote: Hey Sebastian, Did u try - 1; URLEncoder.encode(url, UTF-8); 2:if you application is Spring based-try this filter filter-nameCharacterEncoding/filter-name filter-class org.springframework.web.filter.CharacterEncodingFilter /filter-class init-param param-nameencoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameforceEncoding/param-name param-valuetrue/param-value /init-param /filter filter-mapping filter-nameCharacterEncoding/filter-name url-pattern/*/url-pattern /filter-mapping Ankit From: gateway0 [reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 3:32 PM To: solr-user@lucene.apache.org Subject: UTF8 Problem with http request? Hi, First of all I know that there is a utf8 problem with tomcat. So I updated the server.xml tomcat file with Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 URIEncoding=UTF-8 / - So now the solr admin console returns an successful result for example: q=für result: response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=indenton/str str name=start0/str str name=qfür /str str name=rows10/str str name=version2.2/str /lst /lst − result name=response numFound=2 start=0 - However if I use a http request through php5 I´ll get this result: {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r text_two:f�r* ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}} If I look into the tomcat console I see this: 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10} hits=0 status=0 QTime=0 I am quite sure it has something to do with the http request. Is it possible to set the charakterset for an http request? I cant find anything regarding the subject. kind regards, Sebastian -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977744.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: UTF8 Problem with http request?
Or this // Setting the Content-Type header with charset header('Content-Type: text/html; charset=utf-8'); -Original Message- From: gateway0 [mailto:reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 4:08 PM To: solr-user@lucene.apache.org Subject: RE: UTF8 Problem with http request? Hi, 1. I use PHP5 what you suggested is a java function I would guess. In PHP there is something like this: urlencode(utf8_encode($url)); But sadly that doesnt help. 2. I don´t use Spring Strange thing. ANKITBHATNAGAR wrote: Hey Sebastian, Did u try - 1; URLEncoder.encode(url, UTF-8); 2:if you application is Spring based-try this filter filter-nameCharacterEncoding/filter-name filter-class org.springframework.web.filter.CharacterEncodingFilter /filter-class init-param param-nameencoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameforceEncoding/param-name param-valuetrue/param-value /init-param /filter filter-mapping filter-nameCharacterEncoding/filter-name url-pattern/*/url-pattern /filter-mapping Ankit From: gateway0 [reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 3:32 PM To: solr-user@lucene.apache.org Subject: UTF8 Problem with http request? Hi, First of all I know that there is a utf8 problem with tomcat. So I updated the server.xml tomcat file with Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 URIEncoding=UTF-8 / - So now the solr admin console returns an successful result for example: q=für result: response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=indenton/str str name=start0/str str name=qfür /str str name=rows10/str str name=version2.2/str /lst /lst − result name=response numFound=2 start=0 - However if I use a http request through php5 I´ll get this result: {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r text_two:f�r* ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}} If I look into the tomcat console I see this: 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10} hits=0 status=0 QTime=0 I am quite sure it has something to do with the http request. Is it possible to set the charakterset for an http request? I cant find anything regarding the subject. kind regards, Sebastian -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977744.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: UTF8 Problem with http request?
My final two-cents- although not sure - use - encodeURIComponent(data) while creating the query refer - http://www.w3schools.com/jsref/jsref_encodeURIComponent.asp Ankit From: Ankit Bhatnagar [abhatna...@vantage.com] Sent: Friday, August 14, 2009 4:35 PM To: 'solr-user@lucene.apache.org' Subject: RE: UTF8 Problem with http request? Or this // Setting the Content-Type header with charset header('Content-Type: text/html; charset=utf-8'); -Original Message- From: gateway0 [mailto:reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 4:08 PM To: solr-user@lucene.apache.org Subject: RE: UTF8 Problem with http request? Hi, 1. I use PHP5 what you suggested is a java function I would guess. In PHP there is something like this: urlencode(utf8_encode($url)); But sadly that doesnt help. 2. I don´t use Spring Strange thing. ANKITBHATNAGAR wrote: Hey Sebastian, Did u try - 1; URLEncoder.encode(url, UTF-8); 2:if you application is Spring based-try this filter filter-nameCharacterEncoding/filter-name filter-class org.springframework.web.filter.CharacterEncodingFilter /filter-class init-param param-nameencoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameforceEncoding/param-name param-valuetrue/param-value /init-param /filter filter-mapping filter-nameCharacterEncoding/filter-name url-pattern/*/url-pattern /filter-mapping Ankit From: gateway0 [reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 3:32 PM To: solr-user@lucene.apache.org Subject: UTF8 Problem with http request? Hi, First of all I know that there is a utf8 problem with tomcat. So I updated the server.xml tomcat file with Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 URIEncoding=UTF-8 / - So now the solr admin console returns an successful result for example: q=für result: response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=indenton/str str name=start0/str str name=qfür /str str name=rows10/str str name=version2.2/str /lst /lst − result name=response numFound=2 start=0 - However if I use a http request through php5 I´ll get this result: {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r text_two:f�r* ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}} If I look into the tomcat console I see this: 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10} hits=0 status=0 QTime=0 I am quite sure it has something to do with the http request. Is it possible to set the charakterset for an http request? I cant find anything regarding the subject. kind regards, Sebastian -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977744.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: UTF8 Problem with http request?
Hi, thank you for your suggestions. I solved the problem now. It was the PHP function strtolower(). As it turns out it can´t handle utf-8 strings. The solution is doing something like this in PHP: $VALUE = für; $text = utf8_encode(strtolower(utf8_decode($VALUE))); Thank you again ANKITBHATNAGAR kind regards, Sebastian ANKITBHATNAGAR wrote: I guess in the header you could try setting the Charest something like this header('Content-Type: text/plain; charset=ISO-8859-1'); -Original Message- From: gateway0 [mailto:reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 4:08 PM To: solr-user@lucene.apache.org Subject: RE: UTF8 Problem with http request? Hi, 1. I use PHP5 what you suggested is a java function I would guess. In PHP there is something like this: urlencode(utf8_encode($url)); But sadly that doesnt help. 2. I don´t use Spring Strange thing. ANKITBHATNAGAR wrote: Hey Sebastian, Did u try - 1; URLEncoder.encode(url, UTF-8); 2:if you application is Spring based-try this filter filter-nameCharacterEncoding/filter-name filter-class org.springframework.web.filter.CharacterEncodingFilter /filter-class init-param param-nameencoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameforceEncoding/param-name param-valuetrue/param-value /init-param /filter filter-mapping filter-nameCharacterEncoding/filter-name url-pattern/*/url-pattern /filter-mapping Ankit From: gateway0 [reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 3:32 PM To: solr-user@lucene.apache.org Subject: UTF8 Problem with http request? Hi, First of all I know that there is a utf8 problem with tomcat. So I updated the server.xml tomcat file with Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 URIEncoding=UTF-8 / - So now the solr admin console returns an successful result for example: q=für result: response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=indenton/str str name=start0/str str name=qfür /str str name=rows10/str str name=version2.2/str /lst /lst − result name=response numFound=2 start=0 - However if I use a http request through php5 I´ll get this result: {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r text_two:f�r* ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}} If I look into the tomcat console I see this: 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10} hits=0 status=0 QTime=0 I am quite sure it has something to do with the http request. Is it possible to set the charakterset for an http request? I cant find anything regarding the subject. kind regards, Sebastian -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977744.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24978362.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: UTF8 Problem with http request?
Cool :) -Original Message- From: gateway0 [mailto:reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 4:52 PM To: solr-user@lucene.apache.org Subject: RE: UTF8 Problem with http request? Hi, thank you for your suggestions. I solved the problem now. It was the PHP function strtolower(). As it turns out it can´t handle utf-8 strings. The solution is doing something like this in PHP: $VALUE = für; $text = utf8_encode(strtolower(utf8_decode($VALUE))); Thank you again ANKITBHATNAGAR kind regards, Sebastian ANKITBHATNAGAR wrote: I guess in the header you could try setting the Charest something like this header('Content-Type: text/plain; charset=ISO-8859-1'); -Original Message- From: gateway0 [mailto:reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 4:08 PM To: solr-user@lucene.apache.org Subject: RE: UTF8 Problem with http request? Hi, 1. I use PHP5 what you suggested is a java function I would guess. In PHP there is something like this: urlencode(utf8_encode($url)); But sadly that doesnt help. 2. I don´t use Spring Strange thing. ANKITBHATNAGAR wrote: Hey Sebastian, Did u try - 1; URLEncoder.encode(url, UTF-8); 2:if you application is Spring based-try this filter filter-nameCharacterEncoding/filter-name filter-class org.springframework.web.filter.CharacterEncodingFilter /filter-class init-param param-nameencoding/param-name param-valueUTF-8/param-value /init-param init-param param-nameforceEncoding/param-name param-valuetrue/param-value /init-param /filter filter-mapping filter-nameCharacterEncoding/filter-name url-pattern/*/url-pattern /filter-mapping Ankit From: gateway0 [reiterwo...@yahoo.de] Sent: Friday, August 14, 2009 3:32 PM To: solr-user@lucene.apache.org Subject: UTF8 Problem with http request? Hi, First of all I know that there is a utf8 problem with tomcat. So I updated the server.xml tomcat file with Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 URIEncoding=UTF-8 / - So now the solr admin console returns an successful result for example: q=für result: response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=indenton/str str name=start0/str str name=qfür /str str name=rows10/str str name=version2.2/str /lst /lst − result name=response numFound=2 start=0 - However if I use a http request through php5 I´ll get this result: {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r text_two:f�r* ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}} If I look into the tomcat console I see this: 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10} hits=0 status=0 QTime=0 I am quite sure it has something to do with the http request. Is it possible to set the charakterset for an http request? I cant find anything regarding the subject. kind regards, Sebastian -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977744.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24978362.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr conditional adds/updates?
I have a fairly simple need to do a conditional update in Solr, which is easily accomplished in MySQL. For example, * I have 100 documents with a unique field called id * I am POSTing 10 documents, some of which may be duplicate ids, in which case Solr would update the existing records with the same ids * I have a field called dateCreated and I would like to only update a doc if the new dateCreated is greated than the old dateCreated (this applies to duplicate ids only, of course) How would I be able to accomplish such a thing? The context is trying to combat race conditions resulting in multiple adds for the same ID but executing in the wrong order. Thanks. -- View this message in context: http://www.nabble.com/Solr-conditional-adds-updates--tp24979499p24979499.html Sent from the Solr - User mailing list archive at Nabble.com.
Which server parameters to tweak in Solr if I expect heavy writes and light reads?
I am facing scalability issues designing a new Solr cluster and I need to master to be able to handle a relatively high rate of updates with almost no reads - they can be done via slaves. My existing Solr instance is occupying a huge amount of RAM, in fact it started swapping at only 4.5mil docs. I am interested in making the footprint as little as possible in RAM, even if it affects search performance. So, which Solr config values can I tweak in order to accomplish this? Thank you. P.S. Cross posted to http://stackoverflow.com/questions/1280447/which-server-parameters-to-tweak-in-solr-if-i-expect-heavy-writes-and-light-reads for additional help. -- View this message in context: http://www.nabble.com/Which-server-parameters-to-tweak-in-Solr-if-I-expect-heavy-writes-and-light-reads--tp24979526p24979526.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr conditional adds/updates?
You could implement optimistic concurrency? Where a version is stored in the document? Or using the date system you described. Override DirectUpdateHandler2.addDoc with the custom logic. It seems like we should have a way of performing this without custom code and/or an easier way to plug logic into UpdateHandler. Basic SQL like functions should be possible or simply a Lucene query (which can with QP2.0 support SQL like syntax). On Fri, Aug 14, 2009 at 3:18 PM, Archon810archon...@gmail.com wrote: I have a fairly simple need to do a conditional update in Solr, which is easily accomplished in MySQL. For example, * I have 100 documents with a unique field called id * I am POSTing 10 documents, some of which may be duplicate ids, in which case Solr would update the existing records with the same ids * I have a field called dateCreated and I would like to only update a doc if the new dateCreated is greated than the old dateCreated (this applies to duplicate ids only, of course) How would I be able to accomplish such a thing? The context is trying to combat race conditions resulting in multiple adds for the same ID but executing in the wrong order. Thanks. -- View this message in context: http://www.nabble.com/Solr-conditional-adds-updates--tp24979499p24979499.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr conditional adds/updates?
It seems like this would require some sort of Java implementation? Unfortunately I'm not a Java developer but I am in charge of implementing Solr. Any more detailed/straightforward instructions are very much appreciated. Thank you. Jason Rutherglen-2 wrote: You could implement optimistic concurrency? Where a version is stored in the document? Or using the date system you described. Override DirectUpdateHandler2.addDoc with the custom logic. It seems like we should have a way of performing this without custom code and/or an easier way to plug logic into UpdateHandler. Basic SQL like functions should be possible or simply a Lucene query (which can with QP2.0 support SQL like syntax). On Fri, Aug 14, 2009 at 3:18 PM, Archon810archon...@gmail.com wrote: I have a fairly simple need to do a conditional update in Solr, which is easily accomplished in MySQL. For example, * I have 100 documents with a unique field called id * I am POSTing 10 documents, some of which may be duplicate ids, in which case Solr would update the existing records with the same ids * I have a field called dateCreated and I would like to only update a doc if the new dateCreated is greated than the old dateCreated (this applies to duplicate ids only, of course) How would I be able to accomplish such a thing? The context is trying to combat race conditions resulting in multiple adds for the same ID but executing in the wrong order. Thanks. -- View this message in context: http://www.nabble.com/Solr-conditional-adds-updates--tp24979499p24979499.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-conditional-adds-updates--tp24979499p24979676.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr conditional adds/updates?
It seems like this would require some sort of Java implementation? Yes, custom Java code would need to be written and tested etc. On Fri, Aug 14, 2009 at 3:35 PM, Archon810archon...@gmail.com wrote: It seems like this would require some sort of Java implementation? Unfortunately I'm not a Java developer but I am in charge of implementing Solr. Any more detailed/straightforward instructions are very much appreciated. Thank you. Jason Rutherglen-2 wrote: You could implement optimistic concurrency? Where a version is stored in the document? Or using the date system you described. Override DirectUpdateHandler2.addDoc with the custom logic. It seems like we should have a way of performing this without custom code and/or an easier way to plug logic into UpdateHandler. Basic SQL like functions should be possible or simply a Lucene query (which can with QP2.0 support SQL like syntax). On Fri, Aug 14, 2009 at 3:18 PM, Archon810archon...@gmail.com wrote: I have a fairly simple need to do a conditional update in Solr, which is easily accomplished in MySQL. For example, * I have 100 documents with a unique field called id * I am POSTing 10 documents, some of which may be duplicate ids, in which case Solr would update the existing records with the same ids * I have a field called dateCreated and I would like to only update a doc if the new dateCreated is greated than the old dateCreated (this applies to duplicate ids only, of course) How would I be able to accomplish such a thing? The context is trying to combat race conditions resulting in multiple adds for the same ID but executing in the wrong order. Thanks. -- View this message in context: http://www.nabble.com/Solr-conditional-adds-updates--tp24979499p24979499.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-conditional-adds-updates--tp24979499p24979676.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 Replication scheme
: This is a relatively safe assumption in most cases, but one that couples the : master update policy with the performance of the slaves - if the master gets : updated (and committed to) frequently, slaves might face a commit on every : 1-2 poll's, much more than is feasible given new searcher warmup times.. : In effect what this comes down to it seems is that i must make the master : commit frequency the same as i'd want the slaves to use - and this is : markedly different than previous behaviour with which i could have the : master get updated(+committed to) at one rate and slaves committing those : updates at a different rate. : I see , the argument. But , isn't it better to keep both the mster and : slave as consistent as possible? There is no use in committing in : master, if you do not plan to search on those docs. So the best thing : to do is do a commit only as frequently as you wish to commit in a : slave. i would advise against thinking that when designing anything rlated to replication -- people should call commit based on when they want the documents they've added to be available for consumers -- for a single box setup, your consumers are people executing searchers, but for a multi-tier setup your consumers are the slaves replicating from you (the master) -- but your consumers may not all have equal concerns about frehness. some of the slaves may want to poll to get new updates from you as fast as possible and have the freshest data atthe expense of lower cache hit rates and increases network IO, others may be happier with stale data in return for better cache hit rates or lower network IO (even in a realtime search situation, you may also be replicating to a slave in a remote data center with a small network pipe who only wants one snappull a day for backend analytics and an extremely consistent view of hte index for a long durration of analysis. the point being: we shouldn't assume/expect that slaves will always want updates as fast as possible, or that all slaves of a single master will want all updates with equal urgency ... individual slaves need to be able to choose. -Hoss
Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS
My client is using a dedicated server with plesk for the control panel. The hosting provider says that anything done using the control panel is supported by their tech support, so if i try anything using SSH, it voids that warranty. Its easy to install a servlet through plesk anyway, I upload the war file using the the java servlet installer. A sample servlet has been installed so I know this part works. However, when I install solr, i get what looks like a warning icon and if I hover over it the tool tip text says this: Actual status of the application does not correspond to the status retrieved from the database. The host providers support team says that there is something wrong with the war file (big help). Since I kind of stuck using tomcat 5.5, is there an older version of solr that I should be using? How can I fix this so that I can use solr? The only thing that I can find regarding this issue is this link: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200904.mbox/69de18140904150913n66a8c78cgf064b53cd2440...@mail.gmail.com And the replier to the problem mentioned removing solr.xml. I thought that if i tried that and re-wared the app that might fix it but not such file exists in the war file I have. Does anyone have any ideas? --Aaron