date:20090814

2009-08-14 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Thu, Aug 13, 2009 at 5:31 AM, Subbacharya, Madhu 
madhu.subbacha...@corp.aol.com wrote:


 Hello,

   We have Solr running with the defaultOperator set to AND.  Am not able
 to get any results for queries like   q=( Ferrari AND ( 599 GTB Fiorano OR
 612 Scaglietti OR F430 )) , which contain ( for grouping. Anyone have
 any ideas for a workaround ?


Can you try adding debugQuery=on to the request and post the details here?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Questions about XPath in data import handler

yes. look at the 'flatten' attribute in the field. It should give you
all the text (not attributes) under a given node.



On Thu, Aug 13, 2009 at 8:02 PM, Andrew Cleggandrew.cl...@gmail.com wrote:



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 On Thu, Aug 13, 2009 at 6:35 PM, Andrew Cleggandrew.cl...@gmail.com
 wrote:

 Does the second one mean select the value of the attribute called
 qualifier
 in the /a/b/subject element?

 yes you are right. Isn't that the semantics of standard xpath syntax?


 Yes, just checking since the DIH XPath engine is a little different.

 Do you know what I would get in this case?

  Also... Can I select a non-leaf node and get *ALL* the text underneath
 it?
  e.g. /a/b in this example?

 Cheers,

 Andrew.

 --
 View this message in context: 
 http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24954869.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Boolean logic in distributed searches

2009-08-14 Thread Matthew Painter

Hello,
 
Firstly, my apologies for what I suspect is a very straightforward
question.
 
I have two Solr 1.3 indexes and am searching both through Solr's
distributed searching. Searching works correctly, however boolean
searches are being interepreted differently depending on whether or not
I search both indexes.
 
For the example search criteria 'fish OR game', the following query
returns 9300 results:
http://testapp1.test.archives.govt.nz:8080/solr-archway/select?qt=standa
rdq=fish+OR+game
 

Whereas the following query only returns 170 results:
 
http://testapp1.test.archives.govt.nz:8080/solr-archway/select?qt=standa
rdq=fish+OR+gameshards=testapp1.test.archives.govt.nz:8080/solr-archwa
y,testapp1.test.archives.govt.nz:8080/solr-portal
http://testapp1.test.archives.govt.nz:8080/solr-archway/select?qt=stand
ardq=fish+OR+gameshards=testapp1.test.archives.govt.nz:8080/solr-archw
ay,testapp1.test.archives.govt.nz:8080/solr-portal 
 
This is the same even if a single shard is present. Are boolean searches
not supported across multiple shards? Or do I need to tweak something?
 
Thanks in anticipation,
Matt

This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) 
and may also be LEGALLY PRIVILEGED.  If you are not the intended addressee, 
please do not use, disclose, copy or distribute the message or the information 
it contains.  Instead, please notify me as soon as possible and delete the 
e-mail, including any attachments.  Thank you.

Re: Solr 1.4 Replication scheme

On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati jimmoe...@gmail.comwrote:


 In the old replication, I could snappull with multiple slaves
 asynchronously
 but perform the snapinstall on each at the same time (+- epsilon seconds),
 so that way production load balanced query serving will always be
 consistent.

 With the new system it seems that i have no control over syncing them, but
 rather it polls every few minutes and then decides the next cycle based on
 last time it *finished* updating, so in any case I lose control over the
 synchronization of snap installation across multiple slaves.


That is true. How did you synchronize them with the script based solution?
Assuming network bandwidth is equally distributed and all slaves are equal
in hardware/configuration, the time difference between new searcher
registration on any slave should not be more then pollInterval, no?



 Also, I noticed the default poll interval is 60 seconds. It would seem that
 for such a rapid interval, what i mentioned above is a non issue, however i
 am not clear how this works vis-a-vis the new searcher warmup? for a
 considerable index size (20Million docs+) the warmup itself is an expensive
 and somewhat lengthy process and if a new searcher opens and warms up every
 minute, I am not at all sure i'll be able to serve queries with reasonable
 QTimes.


If the pollInterval is 60 seconds, it does not mean that a new index is
fetched every 60 seconds. A new index is downloaded and installed on the
slave only if a commit happened on the master (i.e. the index was actually
changed on the master).

-- 
Regards,
Shalin Shekhar Mangar.

Re: Writing own request handler tutorial

2009-08-14 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Thu, Aug 13, 2009 at 6:17 AM, darniz rnizamud...@edmunds.com wrote:


 could anybody provide me with a complete data import handler example with
 oracle if there is any.


Only the DataSource section in the configuration and the jdbc driver will be
specific to Oracle. What is the problem you're facing?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr 1.4 Replication scheme

2009-08-14 Thread KaktuChakarabati


Hey Shalin,
thanks for your prompt reply.
To clarity:
With the old script-based replication, I would snappull every x minutes
(say, on the order of 5 minutes).
Assuming no index optimize occured ( I optimize 1-2 times a day so we can
disregard it for the sake of argument), the snappull would take a few
seconds to run on each iteration. 
I then have a crontab on all slaves that runs snapinstall on a fixed time,
lets say every 15 minutes from start of a round hour, inclusive. (slave
machine times are synced e.g via ntp) so that essentially all slaves will
begin a snapinstall exactly at the same time - assuming uniform load and the
fact they all have at this point in time the same snapshot since I snappull
frequently - this leads to a fairly synchronized replication across the
board.

With the new replication however, it seems that by binding the pulling and
installing as well specifying the timing in delta's only (as opposed to
absolute-time based like in crontab) we've essentially made it impossible
to effectively keep multiple slaves up to date and synchronized; e.g if we
set poll interval to 15 minutes, a slight offset in the startup times of the
slaves (that can very much be the case for arbitrary resets/maintenance
operations) can lead to deviations in snappull(+install) times. this in turn
is further made worse by the fact that the pollInterval is then computed
based on the offset of when the last commit *finished* - and this number
seems to have a higher variance, e.g due to warmup which might be different
across machines based on the queries they've handled previously.

To summarize, It seems to me like it might be beneficial to introduce a
second parameter that acts more like a crontab time-based tableau, in so far
that it can enable a user to specify when an actual commit should occur - so
then we can have the pollInterval set to a low value (e.g 60 seconds) but
then specify to only perform a commit on the 0,15,30,45-minutes of every
hour. this makes the commit times on the slaves fairly deterministic.

Does this make sense or am i missing something with current in-process
replication?

Thanks,
-Chak


Shalin Shekhar Mangar wrote:
 
 On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
 jimmoe...@gmail.comwrote:
 

 In the old replication, I could snappull with multiple slaves
 asynchronously
 but perform the snapinstall on each at the same time (+- epsilon
 seconds),
 so that way production load balanced query serving will always be
 consistent.

 With the new system it seems that i have no control over syncing them,
 but
 rather it polls every few minutes and then decides the next cycle based
 on
 last time it *finished* updating, so in any case I lose control over the
 synchronization of snap installation across multiple slaves.

 
 That is true. How did you synchronize them with the script based solution?
 Assuming network bandwidth is equally distributed and all slaves are equal
 in hardware/configuration, the time difference between new searcher
 registration on any slave should not be more then pollInterval, no?
 
 

 Also, I noticed the default poll interval is 60 seconds. It would seem
 that
 for such a rapid interval, what i mentioned above is a non issue, however
 i
 am not clear how this works vis-a-vis the new searcher warmup? for a
 considerable index size (20Million docs+) the warmup itself is an
 expensive
 and somewhat lengthy process and if a new searcher opens and warms up
 every
 minute, I am not at all sure i'll be able to serve queries with
 reasonable
 QTimes.

 
 If the pollInterval is 60 seconds, it does not mean that a new index is
 fetched every 60 seconds. A new index is downloaded and installed on the
 slave only if a commit happened on the master (i.e. the index was actually
 changed on the master).
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 Replication scheme

usually the pollInterval is kept to a small value like 10secs. there
is no harm in polling more frequently. This can ensure that the
replication happens at almost same time




On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com wrote:

 Hey Shalin,
 thanks for your prompt reply.
 To clarity:
 With the old script-based replication, I would snappull every x minutes
 (say, on the order of 5 minutes).
 Assuming no index optimize occured ( I optimize 1-2 times a day so we can
 disregard it for the sake of argument), the snappull would take a few
 seconds to run on each iteration.
 I then have a crontab on all slaves that runs snapinstall on a fixed time,
 lets say every 15 minutes from start of a round hour, inclusive. (slave
 machine times are synced e.g via ntp) so that essentially all slaves will
 begin a snapinstall exactly at the same time - assuming uniform load and the
 fact they all have at this point in time the same snapshot since I snappull
 frequently - this leads to a fairly synchronized replication across the
 board.

 With the new replication however, it seems that by binding the pulling and
 installing as well specifying the timing in delta's only (as opposed to
 absolute-time based like in crontab) we've essentially made it impossible
 to effectively keep multiple slaves up to date and synchronized; e.g if we
 set poll interval to 15 minutes, a slight offset in the startup times of the
 slaves (that can very much be the case for arbitrary resets/maintenance
 operations) can lead to deviations in snappull(+install) times. this in turn
 is further made worse by the fact that the pollInterval is then computed
 based on the offset of when the last commit *finished* - and this number
 seems to have a higher variance, e.g due to warmup which might be different
 across machines based on the queries they've handled previously.

 To summarize, It seems to me like it might be beneficial to introduce a
 second parameter that acts more like a crontab time-based tableau, in so far
 that it can enable a user to specify when an actual commit should occur - so
 then we can have the pollInterval set to a low value (e.g 60 seconds) but
 then specify to only perform a commit on the 0,15,30,45-minutes of every
 hour. this makes the commit times on the slaves fairly deterministic.

 Does this make sense or am i missing something with current in-process
 replication?

 Thanks,
 -Chak


 Shalin Shekhar Mangar wrote:

 On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
 jimmoe...@gmail.comwrote:


 In the old replication, I could snappull with multiple slaves
 asynchronously
 but perform the snapinstall on each at the same time (+- epsilon
 seconds),
 so that way production load balanced query serving will always be
 consistent.

 With the new system it seems that i have no control over syncing them,
 but
 rather it polls every few minutes and then decides the next cycle based
 on
 last time it *finished* updating, so in any case I lose control over the
 synchronization of snap installation across multiple slaves.


 That is true. How did you synchronize them with the script based solution?
 Assuming network bandwidth is equally distributed and all slaves are equal
 in hardware/configuration, the time difference between new searcher
 registration on any slave should not be more then pollInterval, no?



 Also, I noticed the default poll interval is 60 seconds. It would seem
 that
 for such a rapid interval, what i mentioned above is a non issue, however
 i
 am not clear how this works vis-a-vis the new searcher warmup? for a
 considerable index size (20Million docs+) the warmup itself is an
 expensive
 and somewhat lengthy process and if a new searcher opens and warms up
 every
 minute, I am not at all sure i'll be able to serve queries with
 reasonable
 QTimes.


 If the pollInterval is 60 seconds, it does not mean that a new index is
 fetched every 60 seconds. A new index is downloaded and installed on the
 slave only if a commit happened on the master (i.e. the index was actually
 changed on the master).

 --
 Regards,
 Shalin Shekhar Mangar.



 --
 View this message in context: 
 http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Questions about XPath in data import handler

2009-08-14 Thread Andrew Clegg




Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 yes. look at the 'flatten' attribute in the field. It should give you
 all the text (not attributes) under a given node.
 
 

I missed that one -- many thanks.

Andrew.
-- 
View this message in context: 
http://www.nabble.com/Questions-about-XPath-in-data-import-handler-tp24954223p24968349.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 Replication scheme

2009-08-14 Thread KaktuChakarabati


Hey Noble,
you are right in that this will solve the problem, however it implicitly
assumes that commits to the master are infrequent enough ( so that most
polling operations yield no update and only every few polls lead to an
actual commit. )
This is a relatively safe assumption in most cases, but one that couples the
master update policy with the performance of the slaves - if the master gets
updated (and committed to) frequently, slaves might face a commit on every
1-2 poll's, much more than is feasible given new searcher warmup times..
In effect what this comes down to it seems is that i must make the master
commit frequency the same as i'd want the slaves to use - and this is
markedly different than previous behaviour with which i could have the
master get updated(+committed to) at one rate and slaves committing those
updates at a different rate.


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 usually the pollInterval is kept to a small value like 10secs. there
 is no harm in polling more frequently. This can ensure that the
 replication happens at almost same time
 
 
 
 
 On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com
 wrote:

 Hey Shalin,
 thanks for your prompt reply.
 To clarity:
 With the old script-based replication, I would snappull every x minutes
 (say, on the order of 5 minutes).
 Assuming no index optimize occured ( I optimize 1-2 times a day so we can
 disregard it for the sake of argument), the snappull would take a few
 seconds to run on each iteration.
 I then have a crontab on all slaves that runs snapinstall on a fixed
 time,
 lets say every 15 minutes from start of a round hour, inclusive. (slave
 machine times are synced e.g via ntp) so that essentially all slaves will
 begin a snapinstall exactly at the same time - assuming uniform load and
 the
 fact they all have at this point in time the same snapshot since I
 snappull
 frequently - this leads to a fairly synchronized replication across the
 board.

 With the new replication however, it seems that by binding the pulling
 and
 installing as well specifying the timing in delta's only (as opposed to
 absolute-time based like in crontab) we've essentially made it
 impossible
 to effectively keep multiple slaves up to date and synchronized; e.g if
 we
 set poll interval to 15 minutes, a slight offset in the startup times of
 the
 slaves (that can very much be the case for arbitrary resets/maintenance
 operations) can lead to deviations in snappull(+install) times. this in
 turn
 is further made worse by the fact that the pollInterval is then computed
 based on the offset of when the last commit *finished* - and this number
 seems to have a higher variance, e.g due to warmup which might be
 different
 across machines based on the queries they've handled previously.

 To summarize, It seems to me like it might be beneficial to introduce a
 second parameter that acts more like a crontab time-based tableau, in so
 far
 that it can enable a user to specify when an actual commit should occur -
 so
 then we can have the pollInterval set to a low value (e.g 60 seconds) but
 then specify to only perform a commit on the 0,15,30,45-minutes of every
 hour. this makes the commit times on the slaves fairly deterministic.

 Does this make sense or am i missing something with current in-process
 replication?

 Thanks,
 -Chak


 Shalin Shekhar Mangar wrote:

 On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
 jimmoe...@gmail.comwrote:


 In the old replication, I could snappull with multiple slaves
 asynchronously
 but perform the snapinstall on each at the same time (+- epsilon
 seconds),
 so that way production load balanced query serving will always be
 consistent.

 With the new system it seems that i have no control over syncing them,
 but
 rather it polls every few minutes and then decides the next cycle based
 on
 last time it *finished* updating, so in any case I lose control over
 the
 synchronization of snap installation across multiple slaves.


 That is true. How did you synchronize them with the script based
 solution?
 Assuming network bandwidth is equally distributed and all slaves are
 equal
 in hardware/configuration, the time difference between new searcher
 registration on any slave should not be more then pollInterval, no?



 Also, I noticed the default poll interval is 60 seconds. It would seem
 that
 for such a rapid interval, what i mentioned above is a non issue,
 however
 i
 am not clear how this works vis-a-vis the new searcher warmup? for a
 considerable index size (20Million docs+) the warmup itself is an
 expensive
 and somewhat lengthy process and if a new searcher opens and warms up
 every
 minute, I am not at all sure i'll be able to serve queries with
 reasonable
 QTimes.


 If the pollInterval is 60 seconds, it does not mean that a new index is
 fetched every 60 seconds. A new index is downloaded and installed on the
 slave only if a commit happened on the master (i.e. the index was
 actually

Re: Solr 1.4 Replication scheme

2009-08-14 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabatijimmoe...@gmail.com wrote:

 Hey Noble,
 you are right in that this will solve the problem, however it implicitly
 assumes that commits to the master are infrequent enough ( so that most
 polling operations yield no update and only every few polls lead to an
 actual commit. )
 This is a relatively safe assumption in most cases, but one that couples the
 master update policy with the performance of the slaves - if the master gets
 updated (and committed to) frequently, slaves might face a commit on every
 1-2 poll's, much more than is feasible given new searcher warmup times..
 In effect what this comes down to it seems is that i must make the master
 commit frequency the same as i'd want the slaves to use - and this is
 markedly different than previous behaviour with which i could have the
 master get updated(+committed to) at one rate and slaves committing those
 updates at a different rate.
I see , the argument. But , isn't it better to keep both the mster and
slave as consistent as possible? There is no use in committing in
master, if you do not plan to search on those docs. So the best thing
to do is do a commit only as frequently as you wish to commit in a
slave.

On a different track, if we can have an option of disabling commit
after replication, is it worth it? So the user can trigger a commit
explicitly



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 usually the pollInterval is kept to a small value like 10secs. there
 is no harm in polling more frequently. This can ensure that the
 replication happens at almost same time




 On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com
 wrote:

 Hey Shalin,
 thanks for your prompt reply.
 To clarity:
 With the old script-based replication, I would snappull every x minutes
 (say, on the order of 5 minutes).
 Assuming no index optimize occured ( I optimize 1-2 times a day so we can
 disregard it for the sake of argument), the snappull would take a few
 seconds to run on each iteration.
 I then have a crontab on all slaves that runs snapinstall on a fixed
 time,
 lets say every 15 minutes from start of a round hour, inclusive. (slave
 machine times are synced e.g via ntp) so that essentially all slaves will
 begin a snapinstall exactly at the same time - assuming uniform load and
 the
 fact they all have at this point in time the same snapshot since I
 snappull
 frequently - this leads to a fairly synchronized replication across the
 board.

 With the new replication however, it seems that by binding the pulling
 and
 installing as well specifying the timing in delta's only (as opposed to
 absolute-time based like in crontab) we've essentially made it
 impossible
 to effectively keep multiple slaves up to date and synchronized; e.g if
 we
 set poll interval to 15 minutes, a slight offset in the startup times of
 the
 slaves (that can very much be the case for arbitrary resets/maintenance
 operations) can lead to deviations in snappull(+install) times. this in
 turn
 is further made worse by the fact that the pollInterval is then computed
 based on the offset of when the last commit *finished* - and this number
 seems to have a higher variance, e.g due to warmup which might be
 different
 across machines based on the queries they've handled previously.

 To summarize, It seems to me like it might be beneficial to introduce a
 second parameter that acts more like a crontab time-based tableau, in so
 far
 that it can enable a user to specify when an actual commit should occur -
 so
 then we can have the pollInterval set to a low value (e.g 60 seconds) but
 then specify to only perform a commit on the 0,15,30,45-minutes of every
 hour. this makes the commit times on the slaves fairly deterministic.

 Does this make sense or am i missing something with current in-process
 replication?

 Thanks,
 -Chak


 Shalin Shekhar Mangar wrote:

 On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
 jimmoe...@gmail.comwrote:


 In the old replication, I could snappull with multiple slaves
 asynchronously
 but perform the snapinstall on each at the same time (+- epsilon
 seconds),
 so that way production load balanced query serving will always be
 consistent.

 With the new system it seems that i have no control over syncing them,
 but
 rather it polls every few minutes and then decides the next cycle based
 on
 last time it *finished* updating, so in any case I lose control over
 the
 synchronization of snap installation across multiple slaves.


 That is true. How did you synchronize them with the script based
 solution?
 Assuming network bandwidth is equally distributed and all slaves are
 equal
 in hardware/configuration, the time difference between new searcher
 registration on any slave should not be more then pollInterval, no?



 Also, I noticed the default poll interval is 60 seconds. It would seem
 that
 for such a rapid interval, what i mentioned above is a non issue,
 however
 i
 am not clear how this works

Choosing between t and s field types

Hi,
I want certain fields of type int,float and date to be sortable and I should
be able to run my range queries as well as facet queries on those fields.
Now as far as I know sint,sfloat fieldtypes  make the fields sortable and
tint,tfloat,tdate allow range queries on the fields.
I want both these features in my fields. How can I make this happen?

A Buzzword Problem!!!

Hi,
I have a scenario in which I need to store Buzz words and their frequency in
a particular document.
Also along with the buzzwords I have possible basewords, portar words
associated with the buzzwords.
Buzzword,Baseword,Portar word all need to be searchable.
How can I use dynamic fields and my Solr schema?
Regards,
Ninad

Re: Choosing between t and s field types

2009-08-14 Thread Constantijn Visinescu

Accoridng to the documentation in schema.xml.original sint etc can be used
for both sorting and range queries?

!-- Numeric field types that manipulate the value into
 a string value that isn't human-readable in its internal form,
 but with a lexicographic ordering the same as the numeric ordering,
 so that range queries work correctly. --
fieldType name=sint class=solr.SortableIntField
sortMissingLast=true omitNorms=true/


On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut hbase.user.ni...@gmail.comwrote:

 Hi,
 I want certain fields of type int,float and date to be sortable and I
 should
 be able to run my range queries as well as facet queries on those fields.
 Now as far as I know sint,sfloat fieldtypes  make the fields sortable and
 tint,tfloat,tdate allow range queries on the fields.
 I want both these features in my fields. How can I make this happen?

Re: Choosing between t and s field types

Hi Constantijn,
What are the t types viz;tint,tfloat etc. for? Is there a special use of
these?

On Fri, Aug 14, 2009 at 4:37 PM, Constantijn Visinescu
baeli...@gmail.comwrote:

 Accoridng to the documentation in schema.xml.original sint etc can be used
 for both sorting and range queries?

!-- Numeric field types that manipulate the value into
 a string value that isn't human-readable in its internal form,
 but with a lexicographic ordering the same as the numeric ordering,
 so that range queries work correctly. --
fieldType name=sint class=solr.SortableIntField
 sortMissingLast=true omitNorms=true/


 On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut hbase.user.ni...@gmail.com
 wrote:

  Hi,
  I want certain fields of type int,float and date to be sortable and I
  should
  be able to run my range queries as well as facet queries on those fields.
  Now as far as I know sint,sfloat fieldtypes  make the fields sortable and
  tint,tfloat,tdate allow range queries on the fields.
  I want both these features in my fields. How can I make this happen?

Re: [OT] Solr Webinar

2009-08-14 Thread Lucas F. A. Teixeira

webinar: https://admin.na4.acrobat.com/_a837485961/p23226399/
slides: http://www.lucidimagination.com/files/file/improving_findability.pdf

[]s,


Lucas Frare Teixeira .·.
- lucas...@gmail.com
- blog.lucastex.com
- twitter.com/lucastex


On Fri, Aug 14, 2009 at 2:55 AM, Lukáš Vlček lukas.vl...@gmail.com wrote:

 Hello,
 they [Lucid Imagination guys] said it should be published on their blog.
 I hope I understood it correctly.

 Regards,
 Lukas

 http://blog.lukas-vlcek.com/


 On Fri, Aug 14, 2009 at 7:52 AM, Mani Kumar manikumarchau...@gmail.com
 wrote:

  if anyone has any pointer to this webinar, please share it.
  thanks!
  mani
 
  On Thu, Aug 13, 2009 at 9:26 PM, Chenini, Mohamed mchen...@geico.com
  wrote:
 
   I also registered to attend but I am not going to because here at work
 a
   last minute meeting has been scheduled at the same time.
  
   Is it possible in the future to schedule such webinars starting 5-6 PM
   ET?
  
   Thanks,
   Mohamed
  
   -Original Message-
   From: Grant Ingersoll [mailto:gsing...@apache.org]
   Sent: Wednesday, August 12, 2009 6:22 PM
   To: solr-user@lucene.apache.org
   Subject: Re: [OT] Solr Webinar
  
   I believe it will be, but am not sure of the procedure for
   distributing.  I think if you register, but don't show, you will get a
   notification.
  
   -Grant
  
   On Aug 10, 2009, at 12:26 PM, Lucas F. A. Teixeira wrote:
  
Hello Grant,
Will the webinar be recorded and available to download later
someplace?
Unfortunately, I can't watch this time.
   
Thanks,
   
[]s,
   
Lucas Frare Teixeira .*.
- lucas...@gmail.com
- blog.lucastex.com
- twitter.com/lucastex
   
   
On Mon, Aug 10, 2009 at 12:33 PM, Grant Ingersoll
gsing...@apache.orgwrote:
   
I will be giving a free one hour webinar on getting started with
Apache
Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT
   
You can sign up @
http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP
   
I will present and demo:
* Getting started with LucidWorks for Solr
* Getting better, faster results using Solr's findability and
relevance
improvement tools
* Deploying Solr in production, including monitoring performance
and trends
with the LucidGaze for Solr performance profiler
   
-Grant
  
   
   This email/fax message is for the sole use of the intended
   recipient(s) and may contain confidential and privileged information.
   Any unauthorized review, use, disclosure or distribution of this
   email/fax is prohibited. If you are not the intended recipient, please
   destroy all paper and electronic copies of the original message.

Re: Choosing between t and s field types

2009-08-14 Thread Constantijn Visinescu

I just checked and the default schema.xml for SOLR 1.3
(solr/conf/schema.xml.original) and i don't see tint, etc listed.
So either they''re new in 1.4 and I don't know about them or they were
manually defined.

Can you post your schema.xml entries for tint? (along with any comments it
might have?)

Constantijn

On Fri, Aug 14, 2009 at 1:39 PM, Ninad Raut hbase.user.ni...@gmail.comwrote:

 Hi Constantijn,
 What are the t types viz;tint,tfloat etc. for? Is there a special use of
 these?

 On Fri, Aug 14, 2009 at 4:37 PM, Constantijn Visinescu
 baeli...@gmail.comwrote:

  Accoridng to the documentation in schema.xml.original sint etc can be
 used
  for both sorting and range queries?
 
 !-- Numeric field types that manipulate the value into
  a string value that isn't human-readable in its internal form,
  but with a lexicographic ordering the same as the numeric
 ordering,
  so that range queries work correctly. --
 fieldType name=sint class=solr.SortableIntField
  sortMissingLast=true omitNorms=true/
 
 
  On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut hbase.user.ni...@gmail.com
  wrote:
 
   Hi,
   I want certain fields of type int,float and date to be sortable and I
   should
   be able to run my range queries as well as facet queries on those
 fields.
   Now as far as I know sint,sfloat fieldtypes  make the fields sortable
 and
   tint,tfloat,tdate allow range queries on the fields.
   I want both these features in my fields. How can I make this happen?

Re: Solr 1.4 Replication scheme

Longer term, it might be nice to enable clients to specify what
version of the index they were searching against.  This could be used
to prevent consistency issues across different slaves, even if they
commit at different times.  It could also be used in distributed
search to make sure the index didn't change between phases.

-Yonik
http://www.lucidimagination.com



2009/8/14 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabatijimmoe...@gmail.com wrote:

 Hey Noble,
 you are right in that this will solve the problem, however it implicitly
 assumes that commits to the master are infrequent enough ( so that most
 polling operations yield no update and only every few polls lead to an
 actual commit. )
 This is a relatively safe assumption in most cases, but one that couples the
 master update policy with the performance of the slaves - if the master gets
 updated (and committed to) frequently, slaves might face a commit on every
 1-2 poll's, much more than is feasible given new searcher warmup times..
 In effect what this comes down to it seems is that i must make the master
 commit frequency the same as i'd want the slaves to use - and this is
 markedly different than previous behaviour with which i could have the
 master get updated(+committed to) at one rate and slaves committing those
 updates at a different rate.
 I see , the argument. But , isn't it better to keep both the mster and
 slave as consistent as possible? There is no use in committing in
 master, if you do not plan to search on those docs. So the best thing
 to do is do a commit only as frequently as you wish to commit in a
 slave.

 On a different track, if we can have an option of disabling commit
 after replication, is it worth it? So the user can trigger a commit
 explicitly



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 usually the pollInterval is kept to a small value like 10secs. there
 is no harm in polling more frequently. This can ensure that the
 replication happens at almost same time




 On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com
 wrote:

 Hey Shalin,
 thanks for your prompt reply.
 To clarity:
 With the old script-based replication, I would snappull every x minutes
 (say, on the order of 5 minutes).
 Assuming no index optimize occured ( I optimize 1-2 times a day so we can
 disregard it for the sake of argument), the snappull would take a few
 seconds to run on each iteration.
 I then have a crontab on all slaves that runs snapinstall on a fixed
 time,
 lets say every 15 minutes from start of a round hour, inclusive. (slave
 machine times are synced e.g via ntp) so that essentially all slaves will
 begin a snapinstall exactly at the same time - assuming uniform load and
 the
 fact they all have at this point in time the same snapshot since I
 snappull
 frequently - this leads to a fairly synchronized replication across the
 board.

 With the new replication however, it seems that by binding the pulling
 and
 installing as well specifying the timing in delta's only (as opposed to
 absolute-time based like in crontab) we've essentially made it
 impossible
 to effectively keep multiple slaves up to date and synchronized; e.g if
 we
 set poll interval to 15 minutes, a slight offset in the startup times of
 the
 slaves (that can very much be the case for arbitrary resets/maintenance
 operations) can lead to deviations in snappull(+install) times. this in
 turn
 is further made worse by the fact that the pollInterval is then computed
 based on the offset of when the last commit *finished* - and this number
 seems to have a higher variance, e.g due to warmup which might be
 different
 across machines based on the queries they've handled previously.

 To summarize, It seems to me like it might be beneficial to introduce a
 second parameter that acts more like a crontab time-based tableau, in so
 far
 that it can enable a user to specify when an actual commit should occur -
 so
 then we can have the pollInterval set to a low value (e.g 60 seconds) but
 then specify to only perform a commit on the 0,15,30,45-minutes of every
 hour. this makes the commit times on the slaves fairly deterministic.

 Does this make sense or am i missing something with current in-process
 replication?

 Thanks,
 -Chak


 Shalin Shekhar Mangar wrote:

 On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
 jimmoe...@gmail.comwrote:


 In the old replication, I could snappull with multiple slaves
 asynchronously
 but perform the snapinstall on each at the same time (+- epsilon
 seconds),
 so that way production load balanced query serving will always be
 consistent.

 With the new system it seems that i have no control over syncing them,
 but
 rather it polls every few minutes and then decides the next cycle based
 on
 last time it *finished* updating, so in any case I lose control over
 the
 synchronization of snap installation across multiple slaves.


 That is true. How did you synchronize

Re: Choosing between t and s field types

2009-08-14 Thread Avlesh Singh

I just checked and the default schema.xml for SOLR 1.3
(solr/conf/schema.xml.original) and i don't see tint, etc listed.So either
they''re new in 1.4 and I don't know about them or they were manually
defined.

TrieRange fields are new and will make appearance in Solr 1.4. With 1.3 you
can use sint and sfloat for your use cases.

@Ninad -
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/

Cheers
Avlesh

On Fri, Aug 14, 2009 at 6:03 PM, Constantijn Visinescu
baeli...@gmail.comwrote:

I just checked and the default schema.xml for SOLR 1.3
(solr/conf/schema.xml.original) and i don't see tint, etc listed.
So either they''re new in 1.4 and I don't know about them or they were
manually defined.

Can you post your schema.xml entries for tint? (along with any comments it
might have?)

Constantijn

On Fri, Aug 14, 2009 at 1:39 PM, Ninad Raut hbase.user.ni...@gmail.com
wrote:

Hi Constantijn,
What are the t types viz;tint,tfloat etc. for? Is there a special use of
these?

On Fri, Aug 14, 2009 at 4:37 PM, Constantijn Visinescu
baeli...@gmail.comwrote:

Accoridng to the documentation in schema.xml.original sint etc can be
used
for both sorting and range queries?

!-- Numeric field types that manipulate the value into
a string value that isn't human-readable in its internal form,
but with a lexicographic ordering the same as the numeric
ordering,
so that range queries work correctly. --
fieldType name=sint class=solr.SortableIntField
sortMissingLast=true omitNorms=true/

On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut
hbase.user.ni...@gmail.com
wrote:

Hi,
I want certain fields of type int,float and date to be sortable and I
should
be able to run my range queries as well as facet queries on those
fields.
Now as far as I know sint,sfloat fieldtypes make the fields sortable
and
tint,tfloat,tdate allow range queries on the fields.
I want both these features in my fields. How can I make this happen?

Re: Solr 1.4 Replication scheme

2009-08-14 Thread Jibo John

Slightly off topic one question on the index file transfer  
mechanism used in the new 1.4 Replication scheme.
Is my understanding correct that the transfer is over http?  (vs.  
rsync in the script-based snappuller)


Thanks,
-Jibo


On Aug 14, 2009, at 6:34 AM, Yonik Seeley wrote:


Longer term, it might be nice to enable clients to specify what
version of the index they were searching against.  This could be used
to prevent consistency issues across different slaves, even if they
commit at different times.  It could also be used in distributed
search to make sure the index didn't change between phases.

-Yonik
http://www.lucidimagination.com



2009/8/14 Noble Paul നോബിള്‍  नोब्ळ्  
noble.p...@corp.aol.com:
On Fri, Aug 14, 2009 at 2:28 PM,  
KaktuChakarabatijimmoe...@gmail.com wrote:


Hey Noble,
you are right in that this will solve the problem, however it  
implicitly
assumes that commits to the master are infrequent enough ( so that  
most
polling operations yield no update and only every few polls lead  
to an

actual commit. )
This is a relatively safe assumption in most cases, but one that  
couples the
master update policy with the performance of the slaves - if the  
master gets
updated (and committed to) frequently, slaves might face a commit  
on every
1-2 poll's, much more than is feasible given new searcher warmup  
times..
In effect what this comes down to it seems is that i must make the  
master
commit frequency the same as i'd want the slaves to use - and this  
is
markedly different than previous behaviour with which i could have  
the
master get updated(+committed to) at one rate and slaves  
committing those

updates at a different rate.
I see , the argument. But , isn't it better to keep both the mster  
and

slave as consistent as possible? There is no use in committing in
master, if you do not plan to search on those docs. So the best thing
to do is do a commit only as frequently as you wish to commit in a
slave.

On a different track, if we can have an option of disabling commit
after replication, is it worth it? So the user can trigger a commit
explicitly




Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:


usually the pollInterval is kept to a small value like 10secs.  
there

is no harm in polling more frequently. This can ensure that the
replication happens at almost same time




On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com 


wrote:


Hey Shalin,
thanks for your prompt reply.
To clarity:
With the old script-based replication, I would snappull every x  
minutes

(say, on the order of 5 minutes).
Assuming no index optimize occured ( I optimize 1-2 times a day  
so we can
disregard it for the sake of argument), the snappull would take  
a few

seconds to run on each iteration.
I then have a crontab on all slaves that runs snapinstall on a  
fixed

time,
lets say every 15 minutes from start of a round hour, inclusive.  
(slave
machine times are synced e.g via ntp) so that essentially all  
slaves will
begin a snapinstall exactly at the same time - assuming uniform  
load and

the
fact they all have at this point in time the same snapshot since I
snappull
frequently - this leads to a fairly synchronized replication  
across the

board.

With the new replication however, it seems that by binding the  
pulling

and
installing as well specifying the timing in delta's only (as  
opposed to

absolute-time based like in crontab) we've essentially made it
impossible
to effectively keep multiple slaves up to date and synchronized;  
e.g if

we
set poll interval to 15 minutes, a slight offset in the startup  
times of

the
slaves (that can very much be the case for arbitrary resets/ 
maintenance
operations) can lead to deviations in snappull(+install) times.  
this in

turn
is further made worse by the fact that the pollInterval is then  
computed
based on the offset of when the last commit *finished* - and  
this number

seems to have a higher variance, e.g due to warmup which might be
different
across machines based on the queries they've handled previously.

To summarize, It seems to me like it might be beneficial to  
introduce a
second parameter that acts more like a crontab time-based  
tableau, in so

far
that it can enable a user to specify when an actual commit  
should occur -

so
then we can have the pollInterval set to a low value (e.g 60  
seconds) but
then specify to only perform a commit on the 0,15,30,45-minutes  
of every
hour. this makes the commit times on the slaves fairly  
deterministic.


Does this make sense or am i missing something with current in- 
process

replication?

Thanks,
-Chak


Shalin Shekhar Mangar wrote:


On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
jimmoe...@gmail.comwrote:



In the old replication, I could snappull with multiple slaves
asynchronously
but perform the snapinstall on each at the same time (+- epsilon
seconds),
so that way production load balanced query serving will always  
be

consistent.

With the new system

Re: Solr 1.4 Replication scheme

On Fri, Aug 14, 2009 at 11:53 AM, Jibo Johnjiboj...@mac.com wrote:
 Slightly off topic one question on the index file transfer mechanism
 used in the new 1.4 Replication scheme.
 Is my understanding correct that the transfer is over http?  (vs. rsync in
 the script-based snappuller)

Yes, that's correct.

-Yonik
http://www.lucidimagination.com

Re: Choosing between t and s field types

TrieRange ... what are its features? What additional functionality they
provide?

On Fri, Aug 14, 2009 at 8:35 PM, Avlesh Singh avl...@gmail.com wrote:

I just checked and the default schema.xml for SOLR 1.3
(solr/conf/schema.xml.original) and i don't see tint, etc listed.So
either
they''re new in 1.4 and I don't know about them or they were manually
defined.

TrieRange fields are new and will make appearance in Solr 1.4. With 1.3 you
can use sint and sfloat for your use cases.

@Ninad -

http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/

Cheers
Avlesh

On Fri, Aug 14, 2009 at 6:03 PM, Constantijn Visinescu
baeli...@gmail.comwrote:

I just checked and the default schema.xml for SOLR 1.3
(solr/conf/schema.xml.original) and i don't see tint, etc listed.
So either they''re new in 1.4 and I don't know about them or they were
manually defined.

Can you post your schema.xml entries for tint? (along with any comments
it
might have?)

Constantijn

On Fri, Aug 14, 2009 at 1:39 PM, Ninad Raut hbase.user.ni...@gmail.com
wrote:

Hi Constantijn,
What are the t types viz;tint,tfloat etc. for? Is there a special use
of
these?

On Fri, Aug 14, 2009 at 4:37 PM, Constantijn Visinescu
baeli...@gmail.comwrote:

Accoridng to the documentation in schema.xml.original sint etc can be
used
for both sorting and range queries?

!-- Numeric field types that manipulate the value into
a string value that isn't human-readable in its internal
form,
but with a lexicographic ordering the same as the numeric
ordering,
so that range queries work correctly. --
fieldType name=sint class=solr.SortableIntField
sortMissingLast=true omitNorms=true/

On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut
hbase.user.ni...@gmail.com
wrote:

Hi,
I want certain fields of type int,float and date to be sortable and
I
should
be able to run my range queries as well as facet queries on those
fields.
Now as far as I know sint,sfloat fieldtypes make the fields
sortable
and
tint,tfloat,tdate allow range queries on the fields.
I want both these features in my fields. How can I make this
happen?

Re: Choosing between t and s field types

On Fri, Aug 14, 2009 at 1:15 PM, Ninad Rauthbase.user.ni...@gmail.com wrote:
TrieRange ... what are its features? What additional functionality they
provide?

- a generally more efficient FieldCache structure (less memory)
- faster range queries when precisionStep is utilized to index
multiple tokens per value

-Yonik
http://www.lucidimagination.com

On Fri, Aug 14, 2009 at 8:35 PM, Avlesh Singh avl...@gmail.com wrote:

I just checked and the default schema.xml for SOLR 1.3
(solr/conf/schema.xml.original) and i don't see tint, etc listed.So
either
they''re new in 1.4 and I don't know about them or they were manually
defined.

TrieRange fields are new and will make appearance in Solr 1.4. With 1.3 you
can use sint and sfloat for your use cases.

@Ninad -

http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/

Cheers
Avlesh

On Fri, Aug 14, 2009 at 6:03 PM, Constantijn Visinescu
baeli...@gmail.comwrote:

I just checked and the default schema.xml for SOLR 1.3
(solr/conf/schema.xml.original) and i don't see tint, etc listed.
So either they''re new in 1.4 and I don't know about them or they were
manually defined.

Can you post your schema.xml entries for tint? (along with any comments
it
might have?)

Constantijn

On Fri, Aug 14, 2009 at 1:39 PM, Ninad Raut hbase.user.ni...@gmail.com
wrote:

Hi Constantijn,
What are the t types viz;tint,tfloat etc. for? Is there a special use
of
these?

On Fri, Aug 14, 2009 at 4:37 PM, Constantijn Visinescu
baeli...@gmail.comwrote:

Accoridng to the documentation in schema.xml.original sint etc can be
used
for both sorting and range queries?

!-- Numeric field types that manipulate the value into
a string value that isn't human-readable in its internal
form,
but with a lexicographic ordering the same as the numeric
ordering,
so that range queries work correctly. --
fieldType name=sint class=solr.SortableIntField
sortMissingLast=true omitNorms=true/

On Fri, Aug 14, 2009 at 11:08 AM, Ninad Raut
hbase.user.ni...@gmail.com
wrote:

Hi,
I want certain fields of type int,float and date to be sortable and
I
should
be able to run my range queries as well as facet queries on those
fields.
Now as far as I know sint,sfloat fieldtypes make the fields
sortable
and
tint,tfloat,tdate allow range queries on the fields.
I want both these features in my fields. How can I make this
happen?

Re: Solr 1.4 Replication scheme

2009-08-14 Thread Jason Rutherglen

This would be good! Especially for NRT where this problem is
somewhat harder. I think we may need to look at caching readers
per corresponding http session. The pitfall is expiring them
before running out of RAM.

On Fri, Aug 14, 2009 at 6:34 AM, Yonik Seeleyyo...@lucidimagination.com wrote:
 Longer term, it might be nice to enable clients to specify what
 version of the index they were searching against.  This could be used
 to prevent consistency issues across different slaves, even if they
 commit at different times.  It could also be used in distributed
 search to make sure the index didn't change between phases.

 -Yonik
 http://www.lucidimagination.com



 2009/8/14 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabatijimmoe...@gmail.com wrote:

 Hey Noble,
 you are right in that this will solve the problem, however it implicitly
 assumes that commits to the master are infrequent enough ( so that most
 polling operations yield no update and only every few polls lead to an
 actual commit. )
 This is a relatively safe assumption in most cases, but one that couples the
 master update policy with the performance of the slaves - if the master gets
 updated (and committed to) frequently, slaves might face a commit on every
 1-2 poll's, much more than is feasible given new searcher warmup times..
 In effect what this comes down to it seems is that i must make the master
 commit frequency the same as i'd want the slaves to use - and this is
 markedly different than previous behaviour with which i could have the
 master get updated(+committed to) at one rate and slaves committing those
 updates at a different rate.
 I see , the argument. But , isn't it better to keep both the mster and
 slave as consistent as possible? There is no use in committing in
 master, if you do not plan to search on those docs. So the best thing
 to do is do a commit only as frequently as you wish to commit in a
 slave.

 On a different track, if we can have an option of disabling commit
 after replication, is it worth it? So the user can trigger a commit
 explicitly



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 usually the pollInterval is kept to a small value like 10secs. there
 is no harm in polling more frequently. This can ensure that the
 replication happens at almost same time




 On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com
 wrote:

 Hey Shalin,
 thanks for your prompt reply.
 To clarity:
 With the old script-based replication, I would snappull every x minutes
 (say, on the order of 5 minutes).
 Assuming no index optimize occured ( I optimize 1-2 times a day so we can
 disregard it for the sake of argument), the snappull would take a few
 seconds to run on each iteration.
 I then have a crontab on all slaves that runs snapinstall on a fixed
 time,
 lets say every 15 minutes from start of a round hour, inclusive. (slave
 machine times are synced e.g via ntp) so that essentially all slaves will
 begin a snapinstall exactly at the same time - assuming uniform load and
 the
 fact they all have at this point in time the same snapshot since I
 snappull
 frequently - this leads to a fairly synchronized replication across the
 board.

 With the new replication however, it seems that by binding the pulling
 and
 installing as well specifying the timing in delta's only (as opposed to
 absolute-time based like in crontab) we've essentially made it
 impossible
 to effectively keep multiple slaves up to date and synchronized; e.g if
 we
 set poll interval to 15 minutes, a slight offset in the startup times of
 the
 slaves (that can very much be the case for arbitrary resets/maintenance
 operations) can lead to deviations in snappull(+install) times. this in
 turn
 is further made worse by the fact that the pollInterval is then computed
 based on the offset of when the last commit *finished* - and this number
 seems to have a higher variance, e.g due to warmup which might be
 different
 across machines based on the queries they've handled previously.

 To summarize, It seems to me like it might be beneficial to introduce a
 second parameter that acts more like a crontab time-based tableau, in so
 far
 that it can enable a user to specify when an actual commit should occur -
 so
 then we can have the pollInterval set to a low value (e.g 60 seconds) but
 then specify to only perform a commit on the 0,15,30,45-minutes of every
 hour. this makes the commit times on the slaves fairly deterministic.

 Does this make sense or am i missing something with current in-process
 replication?

 Thanks,
 -Chak


 Shalin Shekhar Mangar wrote:

 On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
 jimmoe...@gmail.comwrote:


 In the old replication, I could snappull with multiple slaves
 asynchronously
 but perform the snapinstall on each at the same time (+- epsilon
 seconds),
 so that way production load balanced query serving will always be
 consistent.

 With the new system it seems

Re: Solr 1.4 Replication scheme

On Fri, Aug 14, 2009 at 1:48 PM, Jason
Rutherglenjason.rutherg...@gmail.com wrote:
 This would be good! Especially for NRT where this problem is
 somewhat harder. I think we may need to look at caching readers
 per corresponding http session.

For something like distributed search I was thinking of a simple
reservation mechanism... let the client specify how long to hold open
that version of the index (perhaps still have a max number of open
versions to prevent an errant client from blowing things up).

-Yonik
http://www.lucidimagination.com


 The pitfall is expiring them
 before running out of RAM.

 On Fri, Aug 14, 2009 at 6:34 AM, Yonik Seeleyyo...@lucidimagination.com 
 wrote:
 Longer term, it might be nice to enable clients to specify what
 version of the index they were searching against.  This could be used
 to prevent consistency issues across different slaves, even if they
 commit at different times.  It could also be used in distributed
 search to make sure the index didn't change between phases.

 -Yonik
 http://www.lucidimagination.com



 2009/8/14 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabatijimmoe...@gmail.com 
 wrote:

 Hey Noble,
 you are right in that this will solve the problem, however it implicitly
 assumes that commits to the master are infrequent enough ( so that most
 polling operations yield no update and only every few polls lead to an
 actual commit. )
 This is a relatively safe assumption in most cases, but one that couples 
 the
 master update policy with the performance of the slaves - if the master 
 gets
 updated (and committed to) frequently, slaves might face a commit on every
 1-2 poll's, much more than is feasible given new searcher warmup times..
 In effect what this comes down to it seems is that i must make the master
 commit frequency the same as i'd want the slaves to use - and this is
 markedly different than previous behaviour with which i could have the
 master get updated(+committed to) at one rate and slaves committing those
 updates at a different rate.
 I see , the argument. But , isn't it better to keep both the mster and
 slave as consistent as possible? There is no use in committing in
 master, if you do not plan to search on those docs. So the best thing
 to do is do a commit only as frequently as you wish to commit in a
 slave.

 On a different track, if we can have an option of disabling commit
 after replication, is it worth it? So the user can trigger a commit
 explicitly



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 usually the pollInterval is kept to a small value like 10secs. there
 is no harm in polling more frequently. This can ensure that the
 replication happens at almost same time




 On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabatijimmoe...@gmail.com
 wrote:

 Hey Shalin,
 thanks for your prompt reply.
 To clarity:
 With the old script-based replication, I would snappull every x minutes
 (say, on the order of 5 minutes).
 Assuming no index optimize occured ( I optimize 1-2 times a day so we can
 disregard it for the sake of argument), the snappull would take a few
 seconds to run on each iteration.
 I then have a crontab on all slaves that runs snapinstall on a fixed
 time,
 lets say every 15 minutes from start of a round hour, inclusive. (slave
 machine times are synced e.g via ntp) so that essentially all slaves will
 begin a snapinstall exactly at the same time - assuming uniform load and
 the
 fact they all have at this point in time the same snapshot since I
 snappull
 frequently - this leads to a fairly synchronized replication across the
 board.

 With the new replication however, it seems that by binding the pulling
 and
 installing as well specifying the timing in delta's only (as opposed to
 absolute-time based like in crontab) we've essentially made it
 impossible
 to effectively keep multiple slaves up to date and synchronized; e.g if
 we
 set poll interval to 15 minutes, a slight offset in the startup times of
 the
 slaves (that can very much be the case for arbitrary resets/maintenance
 operations) can lead to deviations in snappull(+install) times. this in
 turn
 is further made worse by the fact that the pollInterval is then computed
 based on the offset of when the last commit *finished* - and this number
 seems to have a higher variance, e.g due to warmup which might be
 different
 across machines based on the queries they've handled previously.

 To summarize, It seems to me like it might be beneficial to introduce a
 second parameter that acts more like a crontab time-based tableau, in so
 far
 that it can enable a user to specify when an actual commit should occur -
 so
 then we can have the pollInterval set to a low value (e.g 60 seconds) but
 then specify to only perform a commit on the 0,15,30,45-minutes of every
 hour. this makes the commit times on the slaves fairly deterministic.

 Does this make sense or am i missing something with current in-process

Re: A Buzzword Problem!!!

2009-08-14 Thread Grant Ingersoll

Do you need to know, when you match which type of word it was, or do  
you just need to know if there was a match?


On Aug 14, 2009, at 5:17 AM, Ninad Raut wrote:


Hi,
I have a scenario in which I need to store Buzz words and their  
frequency in

a particular document.
Also along with the buzzwords I have possible basewords, portar words
associated with the buzzwords.
Buzzword,Baseword,Portar word all need to be searchable.
How can I use dynamic fields and my Solr schema?
Regards,
Ninad


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

ruby client and building spell check dictionary

2009-08-14 Thread Mike Anderson

I set up the spell check  component with this code in the config file:

 searchComponent name=spellcheck class=solr.SpellCheckComponent

 str name=queryAnalyzerFieldTypetextSpell/str

 lst name=spellchecker

  str name=nametitleCheck/str

  str name=classnamesolr.IndexBasedSpellChecker/str

 str name=fielddictionary/str

 str name=accuracy0.7/str

/lst

/searchComponent



which works great. I can build the dictionary from my browser
q=foospellcheck.build=truespellcheck.name=titleCheck

and I can also receive the spellcheck response when I make a query via the
ruby client.


What I'm trying to do now though is build the dictionary via the ruby
client. I added this code to class Solr::Request::Standard 
Solr::Request::Select

  if @params[:spellcheck]

hash[:spellcheck] = true

hash[spellcheck.q] = @params[:spellcheck][:query]

hash[spellcheck.build] = @params[:spellcheck][:build]

  end



and attempt to make a query with  spellcheck.build=true (the
spellcheck.nameis set in the defaults of select). Unfortunately I am
receiving this
exception

Net::HTTPFatalError: 500
 javaioFileNotFoundException__cfdx__javalangRuntimeException_javaioFileNotFoundException__cfdx__at_orgapachesolrspellingIndexBasedSpellCheckerbuildIndexBasedSpellCheckerjava92__at_orgapachesolrhandlercomponentSpellCheckComponentprepareSpellCheckComponentjava107__at_orgapachesolrhandlercomponentSearchHandlerhandleRequestBodySearchHandlerjava150__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1333__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava303__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava232__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayjettyHttpConnectionhandleHttpConnectionjava378__at_orgmortbayjettybioSocketConnector$ConnectionrunSocketConnectorjava226__at_orgmortbaythreadBoundedThreadPool$PoolThreadrunBoundedThreadPooljava442_Caused_by_javaioFileNo

from /usr/lib/ruby/1.8/net/http.rb:2097:in `error!'

from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:165:in
 `post'

from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:151:in
 `send'

from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:174:in
 `create_and_send_query'

from /var/lib/gems/1.8/gems/solr-ruby-0.0.7/lib/solr/connection.rb:92:in
 `query'

from /home/mike/code/pubget.rails/app/models/article.rb:695:in `solr_search'

from /home/mike/code/pubget.rails/app/models/article.rb:635:in
 `solr_build_dictionary



Any help in understanding the exception would be greatly appreciated.

-Mike

Re: Using Lucene's payload in Solr

2009-08-14 Thread Bill Au

Thanks for sharing your code, Ken.  It is pretty much the same code that I
have written except that my custom QueryParser extends Solr's
SolrQueryParser instead of Lucene's QueryParser.  I am also using BFTQ
instead of BTQ.  I have tested it and do see the payload being used in the
explain output.

Functionally I have got everything work now.  I still have to decide how I
want to index the payload (using DelimitedPayloadTokenFilter or my own
custom format/code).

Bill

On Thu, Aug 13, 2009 at 11:31 AM, Ensdorf Ken ensd...@zoominfo.com wrote:

   It looks like things have changed a bit since this subject was last
   brought
   up here.  I see that there are support in Solr/Lucene for indexing
   payload
   data (DelimitedPayloadTokenFilterFactory and
   DelimitedPayloadTokenFilter).
   Overriding the Similarity class is straight forward.  So the last
   piece of
   the puzzle is to use a BoostingTermQuery when searching.  I think
   all I need
   to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser
   under
   the cover.  I think all I need to do is to write my own query parser
   plugin
   that uses a custom query parser, with the only difference being in
  the
   getFieldQuery() method where a BoostingTermQuery is used instead of a
   TermQuery.
 
  The BTQ is now deprecated in favor of the BoostingFunctionTermQuery,
  which gives some more flexibility in terms of how the spans in a
  single document are scored.
 
  
   Am I on the right track?
 
  Yes.
 
   Has anyone done something like this already?
 

 I wrote a QParserPlugin that seems to do the trick.  This is minimally
 tested - we're not actually using it at the moment, but should get you
 going.  Also, as Grant suggested, you may want to sub BFTQ for BTQ below:

 package com.zoominfo.solr.analysis;

 import org.apache.lucene.analysis.Analyzer;
 import org.apache.lucene.index.Term;
 import org.apache.lucene.queryParser.*;
 import org.apache.lucene.search.*;
 import org.apache.lucene.search.payloads.BoostingTermQuery;
 import org.apache.solr.common.params.*;
 import org.apache.solr.common.util.NamedList;
 import org.apache.solr.request.SolrQueryRequest;
 import org.apache.solr.search.*;

 public class BoostingTermQParserPlugin extends QParserPlugin {
  public static String NAME = zoom;

  public void init(NamedList args) {
  }

  public QParser createParser(String qstr, SolrParams localParams,
 SolrParams params, SolrQueryRequest req) {
System.out.print(BoostingTermQParserPlugin::createParser\n);
return new BoostingTermQParser(qstr, localParams, params, req);
  }
 }

 class BoostingTermQueryParser extends QueryParser {

public BoostingTermQueryParser(String f, Analyzer a) {
super(f, a);

  System.out.print(BoostingTermQueryParser::BoostingTermQueryParser\n);
}

@Override
protected Query newTermQuery(Term term){
System.out.print(BoostingTermQueryParser::newTermQuery\n);
return new BoostingTermQuery(term);
}
 }

 class BoostingTermQParser extends QParser {
  String sortStr;
  QueryParser lparser;

  public BoostingTermQParser(String qstr, SolrParams localParams, SolrParams
 params, SolrQueryRequest req) {
super(qstr, localParams, params, req);
System.out.print(BoostingTermQParser::BoostingTermQParser\n);
  }


  public Query parse() throws ParseException {
System.out.print(BoostingTermQParser::parse\n);
String qstr = getString();

String defaultField = getParam(CommonParams.DF);
if (defaultField==null) {
  defaultField =
 getReq().getSchema().getSolrQueryParser(null).getField();
}

lparser = new BoostingTermQueryParser(defaultField,
 getReq().getSchema().getQueryAnalyzer());

// these could either be checked  set here, or in the SolrQueryParser
 constructor
String opParam = getParam(QueryParsing.OP);
if (opParam != null) {
  lparser.setDefaultOperator(AND.equals(opParam) ?
 QueryParser.Operator.AND : QueryParser.Operator.OR);
} else {
  // try to get default operator from schema

  
 lparser.setDefaultOperator(getReq().getSchema().getSolrQueryParser(null).getDefaultOperator());
}

return lparser.parse(qstr);
  }


  public String[] getDefaultHighlightFields() {
return new String[]{lparser.getField()};
  }

 }

UTF8 Problem with http request?

2009-08-14 Thread gateway0


Hi,

First of all I know that there is a utf8 problem with tomcat. So I updated
the server.xml tomcat file with 

Connector port=8080 protocol=HTTP/1.1 
   connectionTimeout=2 
   redirectPort=8443
   URIEncoding=UTF-8 /


- So now the solr admin console returns an successful result
for example:
q=für

result:

response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=qfür
/str
str name=rows10/str
str name=version2.2/str
/lst
/lst
−
result name=response numFound=2 start=0


- However if I use a http request through php5 I´ll get this result:

{responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r
text_two:f�r*
,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}}


If I look into the tomcat console I see this:

14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10}
hits=0 status=0 QTime=0 


I am quite sure it has something to do with the http request. Is it possible
to set the charakterset for an http request? 
I cant find anything regarding the subject.

kind regards, Sebastian

-- 
View this message in context: 
http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: I think this is a bug

On Thu, Aug 13, 2009 at 8:21 PM, Chris Male gento...@gmail.com wrote:

 Hi Paul,

 Yes the comment does look very wrong.  I'll open a JIRA issue and include a
 fix.

 On Thu, Aug 13, 2009 at 4:43 PM, Paul Tomblin ptomb...@xcski.com wrote:

  I don't want to join yet another mailing list or register for JIRA,
  but I just noticed that the Javadocs for
  SolrInputDocument.addField(String name, Object value, float boost) is
  incredibly wrong - it looks like it was copied from a deleteAll
  method.
 


Thanks Paul and Chris. This is fixed in trunk.

-- 
Regards,
Shalin Shekhar Mangar.

RE: UTF8 Problem with http request?

Hey Sebastian,

Did u try - 
1;
URLEncoder.encode(url, UTF-8);


2:if you application is Spring based-try this
filter
filter-nameCharacterEncoding/filter-name
filter-class
org.springframework.web.filter.CharacterEncodingFilter
/filter-class
init-param
param-nameencoding/param-name
param-valueUTF-8/param-value
/init-param
init-param
param-nameforceEncoding/param-name
param-valuetrue/param-value
/init-param
/filter
filter-mapping
filter-nameCharacterEncoding/filter-name
url-pattern/*/url-pattern
/filter-mapping   




Ankit

From: gateway0 [reiterwo...@yahoo.de]
Sent: Friday, August 14, 2009 3:32 PM
To: solr-user@lucene.apache.org
Subject: UTF8 Problem with http request?

Hi,

First of all I know that there is a utf8 problem with tomcat. So I updated
the server.xml tomcat file with

Connector port=8080 protocol=HTTP/1.1
   connectionTimeout=2
   redirectPort=8443
   URIEncoding=UTF-8 /


- So now the solr admin console returns an successful result
for example:
q=für

result:

response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=qfür
/str
str name=rows10/str
str name=version2.2/str
/lst
/lst
−
result name=response numFound=2 start=0


- However if I use a http request through php5 I´ll get this result:

{responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r
text_two:f�r*
,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}}


If I look into the tomcat console I see this:

14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10}
hits=0 status=0 QTime=0


I am quite sure it has something to do with the http request. Is it possible
to set the charakterset for an http request?
I cant find anything regarding the subject.

kind regards, Sebastian

--
View this message in context: 
http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: UTF8 Problem with http request?

2009-08-14 Thread gateway0


Hi,

1.
I use PHP5 what you suggested is a java function I would guess.
In PHP there is something like this:

urlencode(utf8_encode($url));

But sadly that doesnt help.

2. I don´t use Spring 

Strange thing.



ANKITBHATNAGAR wrote:
 
 Hey Sebastian,
 
 Did u try - 
 1;
 URLEncoder.encode(url, UTF-8);
 
 
 2:if you application is Spring based-try this
 filter
   filter-nameCharacterEncoding/filter-name
   filter-class
   org.springframework.web.filter.CharacterEncodingFilter
   /filter-class
   init-param
   param-nameencoding/param-name
   param-valueUTF-8/param-value
   /init-param
   init-param
   param-nameforceEncoding/param-name
   param-valuetrue/param-value
   /init-param
   /filter
   filter-mapping
   filter-nameCharacterEncoding/filter-name
   url-pattern/*/url-pattern
   /filter-mapping   
 
 
 
 
 Ankit
 
 From: gateway0 [reiterwo...@yahoo.de]
 Sent: Friday, August 14, 2009 3:32 PM
 To: solr-user@lucene.apache.org
 Subject: UTF8 Problem with http request?
 
 Hi,
 
 First of all I know that there is a utf8 problem with tomcat. So I updated
 the server.xml tomcat file with
 
 Connector port=8080 protocol=HTTP/1.1
connectionTimeout=2
redirectPort=8443
URIEncoding=UTF-8 /
 
 
 - So now the solr admin console returns an successful result
 for example:
 q=für
 
 result:
 
 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 −
 lst name=params
 str name=indenton/str
 str name=start0/str
 str name=qfür
 /str
 str name=rows10/str
 str name=version2.2/str
 /lst
 /lst
 −
 result name=response numFound=2 start=0
 
 
 - However if I use a http request through php5 I´ll get this result:
 
 {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r
 text_two:f�r*
 ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}}
 
 
 If I look into the tomcat console I see this:
 
 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10}
 hits=0 status=0 QTime=0
 
 
 I am quite sure it has something to do with the http request. Is it
 possible
 to set the charakterset for an http request?
 I cant find anything regarding the subject.
 
 kind regards, Sebastian
 
 --
 View this message in context:
 http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977744.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: UTF8 Problem with http request?

I guess in the header you could try setting the Charest something like this

header('Content-Type: text/plain; charset=ISO-8859-1');

-Original Message-
From: gateway0 [mailto:reiterwo...@yahoo.de] 
Sent: Friday, August 14, 2009 4:08 PM
To: solr-user@lucene.apache.org
Subject: RE: UTF8 Problem with http request?


Hi,

1.
I use PHP5 what you suggested is a java function I would guess.
In PHP there is something like this:

urlencode(utf8_encode($url));

But sadly that doesnt help.

2. I don´t use Spring 

Strange thing.



ANKITBHATNAGAR wrote:
 
 Hey Sebastian,
 
 Did u try - 
 1;
 URLEncoder.encode(url, UTF-8);
 
 
 2:if you application is Spring based-try this
 filter
   filter-nameCharacterEncoding/filter-name
   filter-class
   org.springframework.web.filter.CharacterEncodingFilter
   /filter-class
   init-param
   param-nameencoding/param-name
   param-valueUTF-8/param-value
   /init-param
   init-param
   param-nameforceEncoding/param-name
   param-valuetrue/param-value
   /init-param
   /filter
   filter-mapping
   filter-nameCharacterEncoding/filter-name
   url-pattern/*/url-pattern
   /filter-mapping   
 
 
 
 
 Ankit
 
 From: gateway0 [reiterwo...@yahoo.de]
 Sent: Friday, August 14, 2009 3:32 PM
 To: solr-user@lucene.apache.org
 Subject: UTF8 Problem with http request?
 
 Hi,
 
 First of all I know that there is a utf8 problem with tomcat. So I updated
 the server.xml tomcat file with
 
 Connector port=8080 protocol=HTTP/1.1
connectionTimeout=2
redirectPort=8443
URIEncoding=UTF-8 /
 
 
 - So now the solr admin console returns an successful result
 for example:
 q=für
 
 result:
 
 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 −
 lst name=params
 str name=indenton/str
 str name=start0/str
 str name=qfür
 /str
 str name=rows10/str
 str name=version2.2/str
 /lst
 /lst
 −
 result name=response numFound=2 start=0
 
 
 - However if I use a http request through php5 I´ll get this result:
 
 {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r
 text_two:f�r*
 ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}}
 
 
 If I look into the tomcat console I see this:
 
 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10}
 hits=0 status=0 QTime=0
 
 
 I am quite sure it has something to do with the http request. Is it
 possible
 to set the charakterset for an http request?
 I cant find anything regarding the subject.
 
 kind regards, Sebastian
 
 --
 View this message in context:
 http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977744.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: UTF8 Problem with http request?

Or this // Setting the Content-Type header with charset
header('Content-Type: text/html; charset=utf-8');


-Original Message-
From: gateway0 [mailto:reiterwo...@yahoo.de] 
Sent: Friday, August 14, 2009 4:08 PM
To: solr-user@lucene.apache.org
Subject: RE: UTF8 Problem with http request?


Hi,

1.
I use PHP5 what you suggested is a java function I would guess.
In PHP there is something like this:

urlencode(utf8_encode($url));

But sadly that doesnt help.

2. I don´t use Spring 

Strange thing.



ANKITBHATNAGAR wrote:
 
 Hey Sebastian,
 
 Did u try - 
 1;
 URLEncoder.encode(url, UTF-8);
 
 
 2:if you application is Spring based-try this
 filter
   filter-nameCharacterEncoding/filter-name
   filter-class
   org.springframework.web.filter.CharacterEncodingFilter
   /filter-class
   init-param
   param-nameencoding/param-name
   param-valueUTF-8/param-value
   /init-param
   init-param
   param-nameforceEncoding/param-name
   param-valuetrue/param-value
   /init-param
   /filter
   filter-mapping
   filter-nameCharacterEncoding/filter-name
   url-pattern/*/url-pattern
   /filter-mapping   
 
 
 
 
 Ankit
 
 From: gateway0 [reiterwo...@yahoo.de]
 Sent: Friday, August 14, 2009 3:32 PM
 To: solr-user@lucene.apache.org
 Subject: UTF8 Problem with http request?
 
 Hi,
 
 First of all I know that there is a utf8 problem with tomcat. So I updated
 the server.xml tomcat file with
 
 Connector port=8080 protocol=HTTP/1.1
connectionTimeout=2
redirectPort=8443
URIEncoding=UTF-8 /
 
 
 - So now the solr admin console returns an successful result
 for example:
 q=für
 
 result:
 
 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 −
 lst name=params
 str name=indenton/str
 str name=start0/str
 str name=qfür
 /str
 str name=rows10/str
 str name=version2.2/str
 /lst
 /lst
 −
 result name=response numFound=2 start=0
 
 
 - However if I use a http request through php5 I´ll get this result:
 
 {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r
 text_two:f�r*
 ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}}
 
 
 If I look into the tomcat console I see this:
 
 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10}
 hits=0 status=0 QTime=0
 
 
 I am quite sure it has something to do with the http request. Is it
 possible
 to set the charakterset for an http request?
 I cant find anything regarding the subject.
 
 kind regards, Sebastian
 
 --
 View this message in context:
 http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977744.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: UTF8 Problem with http request?

My final two-cents- although not sure -

use -
encodeURIComponent(data) 
while creating  the query

refer - 
http://www.w3schools.com/jsref/jsref_encodeURIComponent.asp

Ankit

From: Ankit Bhatnagar [abhatna...@vantage.com]
Sent: Friday, August 14, 2009 4:35 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: UTF8 Problem with http request?

Or this // Setting the Content-Type header with charset
header('Content-Type: text/html; charset=utf-8');


-Original Message-
From: gateway0 [mailto:reiterwo...@yahoo.de]
Sent: Friday, August 14, 2009 4:08 PM
To: solr-user@lucene.apache.org
Subject: RE: UTF8 Problem with http request?


Hi,

1.
I use PHP5 what you suggested is a java function I would guess.
In PHP there is something like this:

urlencode(utf8_encode($url));

But sadly that doesnt help.

2. I don´t use Spring

Strange thing.



ANKITBHATNAGAR wrote:

 Hey Sebastian,

 Did u try -
 1;
 URLEncoder.encode(url, UTF-8);


 2:if you application is Spring based-try this
 filter
   filter-nameCharacterEncoding/filter-name
   filter-class
   org.springframework.web.filter.CharacterEncodingFilter
   /filter-class
   init-param
   param-nameencoding/param-name
   param-valueUTF-8/param-value
   /init-param
   init-param
   param-nameforceEncoding/param-name
   param-valuetrue/param-value
   /init-param
   /filter
   filter-mapping
   filter-nameCharacterEncoding/filter-name
   url-pattern/*/url-pattern
   /filter-mapping




 Ankit
 
 From: gateway0 [reiterwo...@yahoo.de]
 Sent: Friday, August 14, 2009 3:32 PM
 To: solr-user@lucene.apache.org
 Subject: UTF8 Problem with http request?

 Hi,

 First of all I know that there is a utf8 problem with tomcat. So I updated
 the server.xml tomcat file with
 
 Connector port=8080 protocol=HTTP/1.1
connectionTimeout=2
redirectPort=8443
URIEncoding=UTF-8 /
 

 - So now the solr admin console returns an successful result
 for example:
 q=für

 result:
 
 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 −
 lst name=params
 str name=indenton/str
 str name=start0/str
 str name=qfür
 /str
 str name=rows10/str
 str name=version2.2/str
 /lst
 /lst
 −
 result name=response numFound=2 start=0
 

 - However if I use a http request through php5 I´ll get this result:
 
 {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r
 text_two:f�r*
 ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}}
 

 If I look into the tomcat console I see this:
 
 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10}
 hits=0 status=0 QTime=0
 

 I am quite sure it has something to do with the http request. Is it
 possible
 to set the charakterset for an http request?
 I cant find anything regarding the subject.

 kind regards, Sebastian

 --
 View this message in context:
 http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html
 Sent from the Solr - User mailing list archive at Nabble.com.




--
View this message in context: 
http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977744.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: UTF8 Problem with http request?

2009-08-14 Thread gateway0


Hi,

thank you for your suggestions. 

I solved the problem now. It was the PHP function strtolower(). As it
turns out it can´t handle utf-8 strings. 

The solution is doing something like this in PHP:

$VALUE = für;
$text = utf8_encode(strtolower(utf8_decode($VALUE)));


Thank you again ANKITBHATNAGAR

kind regards, Sebastian

ANKITBHATNAGAR wrote:
 
 I guess in the header you could try setting the Charest something like
 this
 
 header('Content-Type: text/plain; charset=ISO-8859-1');
 
 -Original Message-
 From: gateway0 [mailto:reiterwo...@yahoo.de] 
 Sent: Friday, August 14, 2009 4:08 PM
 To: solr-user@lucene.apache.org
 Subject: RE: UTF8 Problem with http request?
 
 
 Hi,
 
 1.
 I use PHP5 what you suggested is a java function I would guess.
 In PHP there is something like this:
 
 urlencode(utf8_encode($url));
 
 But sadly that doesnt help.
 
 2. I don´t use Spring 
 
 Strange thing.
 
 
 
 ANKITBHATNAGAR wrote:
 
 Hey Sebastian,
 
 Did u try - 
 1;
 URLEncoder.encode(url, UTF-8);
 
 
 2:if you application is Spring based-try this
 filter
  filter-nameCharacterEncoding/filter-name
  filter-class
  org.springframework.web.filter.CharacterEncodingFilter
  /filter-class
  init-param
  param-nameencoding/param-name
  param-valueUTF-8/param-value
  /init-param
  init-param
  param-nameforceEncoding/param-name
  param-valuetrue/param-value
  /init-param
  /filter
  filter-mapping
  filter-nameCharacterEncoding/filter-name
  url-pattern/*/url-pattern
  /filter-mapping   
 
 
 
 
 Ankit
 
 From: gateway0 [reiterwo...@yahoo.de]
 Sent: Friday, August 14, 2009 3:32 PM
 To: solr-user@lucene.apache.org
 Subject: UTF8 Problem with http request?
 
 Hi,
 
 First of all I know that there is a utf8 problem with tomcat. So I
 updated
 the server.xml tomcat file with
 
 Connector port=8080 protocol=HTTP/1.1
connectionTimeout=2
redirectPort=8443
URIEncoding=UTF-8 /
 
 
 - So now the solr admin console returns an successful result
 for example:
 q=für
 
 result:
 
 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 −
 lst name=params
 str name=indenton/str
 str name=start0/str
 str name=qfür
 /str
 str name=rows10/str
 str name=version2.2/str
 /lst
 /lst
 −
 result name=response numFound=2 start=0
 
 
 - However if I use a http request through php5 I´ll get this result:
 
 {responseHeader:{status:0,QTime:0,params:{fl:db_id,name,def,deadline,start:0,q:text:f�r
 text_two:f�r*
 ,wt:json,fq:,rows:10}},response:{numFound:0,start:0,docs:[]}}
 
 
 If I look into the tomcat console I see this:
 
 14.08.2009 21:21:42 org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=db_id,name,def,deadlinestart=0q=text:f?r+text_two:f?r*+wt=jsonfq=rows=10}
 hits=0 status=0 QTime=0
 
 
 I am quite sure it has something to do with the http request. Is it
 possible
 to set the charakterset for an http request?
 I cant find anything regarding the subject.
 
 kind regards, Sebastian
 
 --
 View this message in context:
 http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977306.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 -- 
 View this message in context:
 http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24977744.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/UTF8-Problem-with-http-request--tp24977306p24978362.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: UTF8 Problem with http request?