Re: Restarting tomcat deletes all Solr indexes

2009-05-12 Thread Andrey Klochkov
Hi,

I know that when starting Solr checks index directory existence, and creates
new fresh index if it doesn't exist. Does it help? If no, the next step I'd
do in your case is patching SolrCore.initIndex method - insert some logging,
or run EmbeddedSolrServer with debugger etc.

On Mon, May 11, 2009 at 1:25 PM, KK dioxide.softw...@gmail.com wrote:

 Hi,
 I'm facing a silly problem. Every time I restart tomcat all the indexes are
 lost. I used all the default configurations. I'm pretty sure there must be
 some basic changes to fix this. I'd highly appreciate if someone could
 direct me fixing this.

 Thanks,
 KK.




-- 
Andrew Klochkov


How to deal with Mark invalid?

2009-05-12 Thread Nikolai Derzhak
Good day, people.

We use solr to search in mailboxes (dovecot).
But with some bad messages solr 1.4-dev generate error:

SEVERE: java.io.IOException: Mark invalid
at java.io.BufferedReader.reset(BufferedReader.java:485)
at
org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171

.

It's issue known as SOLR-42.

How i can log field stored in index (i need message uid) ?

How to ignore such error and/or message ?

Thanks


Custom Servlet Filter, Where to put filter-mappings

2009-05-12 Thread Jacob Singh
Hi folks,

I just wrote a Servlet Filter to handle authentication for our
service.  Here's what I did:

1. Created a dir in contrib
2. Put my project in there, I took the dataimporthandler build.xml as
an example and modified it to suit my needs.  Worked great!
3. ant dist now builds my jar and includes it

I now need to modify web.xml to add my filter-mapping, init params,
etc.  How can I do this cleanly?  Or do I need to manually open up the
archive and edit it and then re-war it?

In common-build I don't see a target for dist-war, so don't see how it
is possible...

Thanks!
Jacob

-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


Re: QueryElevationComponent : hot update of elevate.xml

2009-05-12 Thread Nicolas Pastorino

Hi,

On May 7, 2009, at 6:03 , Noble Paul നോബിള്‍  
नोब्ळ् wrote:



going forward the java based replication is going to be the preferred
means replicating index. It does not support replicating files in the
dataDir , it only supports replicating index files and conf files
(files in conf dir). I was unaware of the fact that it was possible to
put the elevate.xml in dataDir.

reloading on commit is a trivial for a search component. it can
register itself to be an even listener for commit and do a reload of
elevate.xml. This can be a configuration parameter.

str name=refreshOnCommmittrue/str


Thanks for these nice tips and recommendations.
I attached a new version of this requestHandler here : https:// 
issues.apache.org/jira/browse/SOLR-1147.


Would this requestHandler be of any general use and could be part of  
Solr's trunk ?


Thanks in advance,
--
Nicolas Pastorino - eZ Labs






On Wed, May 6, 2009 at 7:08 PM, Nicolas Pastorino n...@ez.no wrote:


On May 6, 2009, at 15:17 , Noble Paul നോബിള്‍  
नोब्ळ् wrote:


Why would you want to write it to the data dir? why can't it be  
in the

same place (conf) ?


Well, fact is that the QueryElevationComponent loads the  
configuration file

( elevate.xml ) either from the data dir, either from the conf dir.
Which means that existing setups using this component maybe using  
either

location.
That is the only reason why i judged necessary to keep supporting  
this

flexibility.

But this could be simplified, forcing the elevate.xml file to be  
in the conf
dir, and having a system ( the one you proposed, or the request  
handler
attached to the issue ) to reload the configuration from the conf  
dir (
which is currently not possible. While when elevate.xml is stored  
in the

dataDir, triggering a commit would reload it ).
I was just unsure about all ins and outs of the Elevation system,  
and then

did not remove this flexibility.

Thanks for your expert eye on this !


On Wed, May 6, 2009 at 6:43 PM, Nicolas Pastorino n...@ez.no  
wrote:


Hello,

On May 6, 2009, at 15:02 , Noble Paul നോബിള്‍  
नोब्ळ् wrote:


The elevate.xml is loaded from conf dir when the core is  
reloaded . if

you post the new xml you will have to reload the core.

A simple solution would be to write a RequestHandler which extends
QueryElevationComponent which can be a listener for commit and  
call an

super.inform() on that event


You may want to have a look at this issue :
https://issues.apache.org/jira/browse/SOLR-1147
The proposed solution ( new request handler, attached to the  
ticket ),

solves the issue in both cases :
* when elevate.xml is in the DataDir.
* when elevate.xml is in the conf dir.

Basically this new request handler receives, as XML, the new
configuration,
writes it to the right place ( some logic was copied from the
QueryElevationComponent.inform() code ), and then calls the  
inform()

method
on the QueryElevationComponent for the current core, as you  
suggested

above,
to reload the Elevate configuration.
--
Nicolas


On Fri, Apr 10, 2009 at 5:18 PM, Nicolas Pastorino n...@ez.no  
wrote:


Hello !


Browsing the mailing-list's archives did not help me find the  
answer,

hence
the question asked directly here.

Some context first :
Integrating Solr with a CMS ( eZ Publish ), we chose to support
Elevation.
The idea is to be able to 'elevate' any object from the CMS.  
This can

be
achieved through eZ Publish's back office, with a dedicated  
Elevate
administration GUI, the configuration is stored in the CMS  
temporarily,

and
then synchronized frequently and/or on demand onto Solr. This
synchronisation is currently done as follows :
1. Generate the elevate.xml based on the stored configuration
2. Replace elevate.xml in Solr's dataDir
3. Commit. It appears that when having elevate.xml in Solr's  
dataDir,

and
solely in this case, commiting triggers a reload of  
elevate.xml. This

does
not happen when elevate.xml is stored in Solr's conf dir.


This method has one main issue though : eZ Publish needs to  
have access

to
the same filesystem as the one on which Solr's dataDir is  
stored. This

is
not always the case when the CMS is clustered for instance --  
show

stopper
:(

Hence the following idea / RFC :
How about extending the Query Elevation system with the  
possibility to

push
an updated elevate.xml file/XML through HTTP ?
This would update the file where it is actually located, and  
trigger a

reload of the configuration.
Not being very knowledgeable about Solr's API ( yet ! ), i cannot
figure
out
whether this would be possible, how this would be achievable  
( which

type
of
plugin for instance ) or even be valid ?

Thanks a lot in advance for your thoughts,
--
Nicolas








--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


--
Nicolas Pastorino
Consultant - Trainer - System Developer
Phone :  +33 (0)4.78.37.01.34
eZ Systems ( Western Europe )  |  http://ez.no









--

Solr Loggin issue

2009-05-12 Thread Sagar Khetkade

Hi,
I have solr implemented in multi-core scenario and also  implemented 
solr-560-slf4j.patch for implementing the logging. But the problem I am facing 
is that the logs are going to the stdout.log file not the log file that I have 
mentioned in the log4j.properties file. Can anybody give me work round  to make 
logs go into the logger mentioned in log4j.properties file.
Thanks in advance.
 
Regards,
Sagar Khetkade
_
Live Search extreme As India feels the heat of poll season, get all the info 
you need on the MSN News Aggregator
http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx

Re: Restarting tomcat deletes all Solr indexes

2009-05-12 Thread KK
Thanks for your response @aklochkov.
 But I again noticed that something is wrong in my solr/tomcat config[I
spent a lot of time making solr run], b'coz in the solr admin page [
http://localhost:8080/solr/admin/] what I see is that the $CWD is the
location where from I restarted tomcat and seems this $cwd gets picked and
used for index data[Is it the default behavior? or something wrong from my
side?, or may be I'm asking some stupid question ].
 Once I was in /etc and from there I restarted the tomcat and when I tried
to open the solr admin page I found an error saying that can not create
index directory some permission issue I think [it gave a directory str like
/etc/solr/index ... ]. I'm pretty sure something is wrong in configuration.
One more thing assures me about this is the fact that I found many solr
index directories here and there[ these are I think the locations where I
was when I restarted tomcat at that time ]. Earlier I was using the
java_opts to set the solr home like this

 export JAVA_OPTS=$JAVA_OPTS -D/usr/local/solr#in .bashrc

but I commented that and instead added the jndi entry in
/usr/local/tomcat/webapps/solr/WEB-INF/web.xml as this

env-entry
   env-entry-namesolr/home/env-entry-name
   env-entry-value/usr/local/solr/env-entry-value
   env-entry-typejava.lang.String/env-entry-type
/env-entry

Even the entry SolrHome in solr admin page say that SolrHome is
/usr/loca/solr but the index gets created in $CWD. Is it the case that I
created entries for SolrHome in multiple places? which is obviously wrong.
Can someone point me what is the issue. Thank you very much.

--KK


On Tue, May 12, 2009 at 2:39 PM, Andrey Klochkov akloch...@griddynamics.com
 wrote:

 Hi,

 I know that when starting Solr checks index directory existence, and
 creates
 new fresh index if it doesn't exist. Does it help? If no, the next step I'd
 do in your case is patching SolrCore.initIndex method - insert some
 logging,
 or run EmbeddedSolrServer with debugger etc.

 On Mon, May 11, 2009 at 1:25 PM, KK dioxide.softw...@gmail.com wrote:

  Hi,
  I'm facing a silly problem. Every time I restart tomcat all the indexes
 are
  lost. I used all the default configurations. I'm pretty sure there must
 be
  some basic changes to fix this. I'd highly appreciate if someone could
  direct me fixing this.
 
  Thanks,
  KK.
 


 --
 Andrew Klochkov



Geographical search based on latitude and longitude

2009-05-12 Thread Norman Leutner
Hi together,

I'm new to Solr and want to port a geographical range search from MySQL to Solr.

Currently I'm using some mathematical functions (based on GRS80 modell) 
directly within MySQL to calculate
the actual distance from the locations within the database to a current 
location (lat and long are known):

$query=SELECT street, zip, city, state, country, 
.$radius.*ACOS(cos(RADIANS(latitude))*cos(.$theta.)*(sin(RADIANS(longitude))*sin(.$phi.)+cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(.$theta.))
 AS Distance FROM ezgis_position WHERE 
.$radius.*ACOS(cos(RADIANS(latitude))*cos(.$theta.)*(sin(RADIANS(longitude))*sin(.$phi.)+cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(.$theta.))
 = .$range. ORDER BY Distance;

This works pretty fine and fast. Due to we want to include this within our Solr 
search result I would like to have a attribute like actual_distance within 
the result. Is there a way to use those functions like (radians, sin, acos,...) 
directly within Solr?

Thanks in advance for any feedback
Norman Leutner


Re: Geographical search based on latitude and longitude

2009-05-12 Thread Grant Ingersoll
See https://issues.apache.org/jira/browse/SOLR-773.  In other words,  
we're working on it and would love some help!


-Grant

On May 12, 2009, at 7:12 AM, Norman Leutner wrote:


Hi together,

I'm new to Solr and want to port a geographical range search from  
MySQL to Solr.


Currently I'm using some mathematical functions (based on GRS80  
modell) directly within MySQL to calculate
the actual distance from the locations within the database to a  
current location (lat and long are known):


$query=SELECT street, zip, city, state, country, . 
$radius.*ACOS(cos(RADIANS(latitude))*cos(. 
$theta.)*(sin(RADIANS(longitude))*sin(.$phi.) 
+cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(. 
$theta.)) AS Distance FROM ezgis_position WHERE . 
$radius.*ACOS(cos(RADIANS(latitude))*cos(. 
$theta.)*(sin(RADIANS(longitude))*sin(.$phi.) 
+cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(. 
$theta.)) = .$range. ORDER BY Distance;


This works pretty fine and fast. Due to we want to include this  
within our Solr search result I would like to have a attribute like  
actual_distance within the result. Is there a way to use those  
functions like (radians, sin, acos,...) directly within Solr?


Thanks in advance for any feedback
Norman Leutner


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Restarting tomcat deletes all Solr indexes

2009-05-12 Thread KK
One more information I would like to add.
 The entry in solr stats page says this:

readerDir : org.apache.lucene.store.FSDirectory@/home/kk/solr/data/index

when I ran from /home/kk
and this:

readerDir : org.apache.lucene.store.FSDirectory@
/home/kk/junk/solr/data/index

after running from /home/kk/junk

That assures the me the problem, but what is the solution?

Thanks,
KK.

On Tue, May 12, 2009 at 4:41 PM, KK dioxide.softw...@gmail.com wrote:

 Thanks for your response @aklochkov.
  But I again noticed that something is wrong in my solr/tomcat config[I
 spent a lot of time making solr run], b'coz in the solr admin page [
 http://localhost:8080/solr/admin/] what I see is that the $CWD is the
 location where from I restarted tomcat and seems this $cwd gets picked and
 used for index data[Is it the default behavior? or something wrong from my
 side?, or may be I'm asking some stupid question ].
  Once I was in /etc and from there I restarted the tomcat and when I tried
 to open the solr admin page I found an error saying that can not create
 index directory some permission issue I think [it gave a directory str like
 /etc/solr/index ... ]. I'm pretty sure something is wrong in configuration.
 One more thing assures me about this is the fact that I found many solr
 index directories here and there[ these are I think the locations where I
 was when I restarted tomcat at that time ]. Earlier I was using the
 java_opts to set the solr home like this

  export JAVA_OPTS=$JAVA_OPTS -D/usr/local/solr#in .bashrc

 but I commented that and instead added the jndi entry in
 /usr/local/tomcat/webapps/solr/WEB-INF/web.xml as this

 env-entry
env-entry-namesolr/home/env-entry-name
env-entry-value/usr/local/solr/env-entry-value
env-entry-typejava.lang.String/env-entry-type
 /env-entry

 Even the entry SolrHome in solr admin page say that SolrHome is
 /usr/loca/solr but the index gets created in $CWD. Is it the case that I
 created entries for SolrHome in multiple places? which is obviously wrong.
 Can someone point me what is the issue. Thank you very much.

 --KK



 On Tue, May 12, 2009 at 2:39 PM, Andrey Klochkov 
 akloch...@griddynamics.com wrote:

 Hi,

 I know that when starting Solr checks index directory existence, and
 creates
 new fresh index if it doesn't exist. Does it help? If no, the next step
 I'd
 do in your case is patching SolrCore.initIndex method - insert some
 logging,
 or run EmbeddedSolrServer with debugger etc.

 On Mon, May 11, 2009 at 1:25 PM, KK dioxide.softw...@gmail.com wrote:

  Hi,
  I'm facing a silly problem. Every time I restart tomcat all the indexes
 are
  lost. I used all the default configurations. I'm pretty sure there must
 be
  some basic changes to fix this. I'd highly appreciate if someone could
  direct me fixing this.
 
  Thanks,
  KK.
 


 --
 Andrew Klochkov





AW: Geographical search based on latitude and longitude

2009-05-12 Thread Norman Leutner
So are you using boundary box to find results within a given range(km)
like mentioned here: 
http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html ?


Best regards

Norman Leutner
all2e GmbH

-Ursprüngliche Nachricht-
Von: Grant Ingersoll [mailto:gsing...@apache.org] 
Gesendet: Dienstag, 12. Mai 2009 13:18
An: solr-user@lucene.apache.org
Betreff: Re: Geographical search based on latitude and longitude

See https://issues.apache.org/jira/browse/SOLR-773.  In other words,  
we're working on it and would love some help!

-Grant

On May 12, 2009, at 7:12 AM, Norman Leutner wrote:

 Hi together,

 I'm new to Solr and want to port a geographical range search from  
 MySQL to Solr.

 Currently I'm using some mathematical functions (based on GRS80  
 modell) directly within MySQL to calculate
 the actual distance from the locations within the database to a  
 current location (lat and long are known):

 $query=SELECT street, zip, city, state, country, . 
 $radius.*ACOS(cos(RADIANS(latitude))*cos(. 
 $theta.)*(sin(RADIANS(longitude))*sin(.$phi.) 
 +cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(. 
 $theta.)) AS Distance FROM ezgis_position WHERE . 
 $radius.*ACOS(cos(RADIANS(latitude))*cos(. 
 $theta.)*(sin(RADIANS(longitude))*sin(.$phi.) 
 +cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(. 
 $theta.)) = .$range. ORDER BY Distance;

 This works pretty fine and fast. Due to we want to include this  
 within our Solr search result I would like to have a attribute like  
 actual_distance within the result. Is there a way to use those  
 functions like (radians, sin, acos,...) directly within Solr?

 Thanks in advance for any feedback
 Norman Leutner

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search



Re: fieldType without tokenizer

2009-05-12 Thread sunnyfr

hi

I tried but Ive an error :
May 12 15:48:51 solr-test jsvc.exec[2583]: May 12, 2009 3:48:51 PM
org.apache.solr.common.SolrException log SEVERE:
org.apache.solr.common.SolrException: Error loading class
'solr.KeywordTokenizer' ^Iat
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310)
^Iat
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325)
^Iat
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:84)
^Iat
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
^Iat org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:804)
^Iat org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58) ^Iat
org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:425) ^Iat
org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:443) ^Iat
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
^Iat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:452)
^Iat org.apache.solr.schema.In

with :
fieldType name=text_simple class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.KeywordTokenizer/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType




Shalin Shekhar Mangar wrote:
 
 On Mon, May 4, 2009 at 9:28 PM, sunnyfr johanna...@gmail.com wrote:
 

 Hi,

 I would like to create a field without tokenizer but I've an error,

 
 You can use KeywordTokenizer which does not do any tokenization.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/fieldType-without-tokenizer-tp23371300p23502994.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: fieldType without tokenizer

2009-05-12 Thread Erik Hatcher

Use KeywordTokenizerFactory.  Pasted from Solr's example schema.xml:

  tokenizer class=solr.KeywordTokenizerFactory/

   Erik


On May 12, 2009, at 9:49 AM, sunnyfr wrote:



hi

I tried but Ive an error :
May 12 15:48:51 solr-test jsvc.exec[2583]: May 12, 2009 3:48:51 PM
org.apache.solr.common.SolrException log SEVERE:
org.apache.solr.common.SolrException: Error loading class
'solr.KeywordTokenizer' ^Iat
org
.apache
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310)
^Iat
org
.apache
.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325)
^Iat
org
.apache
.solr
.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:84)
^Iat
org
.apache
.solr
.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
^Iat  
org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:804)
^Iat org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java: 
58) ^Iat

org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:425) ^Iat
org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:443) ^Iat
org 
.apache 
.solr 
.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
^Iat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java: 
452)

^Iat org.apache.solr.schema.In

with :
   fieldType name=text_simple class=solr.TextField
positionIncrementGap=100 
 analyzer
   tokenizer class=solr.KeywordTokenizer/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldType




Shalin Shekhar Mangar wrote:


On Mon, May 4, 2009 at 9:28 PM, sunnyfr johanna...@gmail.com wrote:



Hi,

I would like to create a field without tokenizer but I've an error,



You can use KeywordTokenizer which does not do any tokenization.

--
Regards,
Shalin Shekhar Mangar.




--
View this message in context: 
http://www.nabble.com/fieldType-without-tokenizer-tp23371300p23502994.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: fieldType without tokenizer

2009-05-12 Thread Koji Sekiguchi

It must be KeywordTokenizer*Factory* :)

Koji

sunnyfr wrote:

hi

I tried but Ive an error :
May 12 15:48:51 solr-test jsvc.exec[2583]: May 12, 2009 3:48:51 PM
org.apache.solr.common.SolrException log SEVERE:
org.apache.solr.common.SolrException: Error loading class
'solr.KeywordTokenizer' ^Iat
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310)
^Iat
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325)
^Iat
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:84)
^Iat
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
^Iat org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:804)
^Iat org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58) ^Iat
org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:425) ^Iat
org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:443) ^Iat
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
^Iat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:452)
^Iat org.apache.solr.schema.In

with :
fieldType name=text_simple class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.KeywordTokenizer/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType




Shalin Shekhar Mangar wrote:
  

On Mon, May 4, 2009 at 9:28 PM, sunnyfr johanna...@gmail.com wrote:



Hi,

I would like to create a field without tokenizer but I've an error,

  

You can use KeywordTokenizer which does not do any tokenization.

--
Regards,
Shalin Shekhar Mangar.





  




Re: Facet counts for common terms of the searched field

2009-05-12 Thread sachin78

Does anybody have answer to this post.I have a similar requirement.

Suppose I have free text field say
I index the field.If I search for textfield:copper.I have to get facet
counts for the most common words found in a textfield.
ie.

example:search for textfield:glass
should return facet counts for common words found textfield.
semiconductor(10),iron(20), silicon (25) material (8) thin(25) and so on.
Can this be done using tagging or MLT.

Thanks,
Sachin


Raju444us wrote:
 
 I have a requirement. If I search for text field let's say metal:glass
 what i want is to get the facet counts for all the terms related to
 glass in my search results.
 
 window(100)  since a window can be glass.
 plastic(10)  plastic is a material just like glass
 Iron(10)
 Paper(15)
 
 Can I use MLT to get this functionality.Please let me know how can I
 achieve this.If possible an example query.
 
 Thanks,
 Raju
 

-- 
View this message in context: 
http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23503794.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Facet counts for common terms of the searched field

2009-05-12 Thread Matt Weber
You may have to take care of this at index time.  You can create a new  
multivalued field that has minimal processing.  Then at index time,  
index the full contents of textfield as normal, but then also split it  
on whitespace and index each word in the new field you just created.   
Now you will be able to facet on this new field and sort the facet by  
frequency (the default) to get the most popular words.


Thanks,

Matt Weber
eSr Technologies
http://www.esr-technologies.com




On May 12, 2009, at 7:33 AM, sachin78 wrote:



Does anybody have answer to this post.I have a similar requirement.

Suppose I have free text field say
I index the field.If I search for textfield:copper.I have to get facet
counts for the most common words found in a textfield.
ie.

example:search for textfield:glass
should return facet counts for common words found textfield.
semiconductor(10),iron(20), silicon (25) material (8) thin(25) and  
so on.

Can this be done using tagging or MLT.

Thanks,
Sachin


Raju444us wrote:


I have a requirement. If I search for text field let's say  
metal:glass

what i want is to get the facet counts for all the terms related to
glass in my search results.

window(100)  since a window can be glass.
plastic(10)  plastic is a material just like glass
Iron(10)
Paper(15)

Can I use MLT to get this functionality.Please let me know how can I
achieve this.If possible an example query.

Thanks,
Raju



--
View this message in context: 
http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23503794.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Facet counts for common terms of the searched field

2009-05-12 Thread sachin78

Thanks Matt for your reply.

What do you mean by frequency(the default)?

Can you please provide an example schema and query will look like.

--Sachin


Matt Weber-2 wrote:
 
 You may have to take care of this at index time.  You can create a new  
 multivalued field that has minimal processing.  Then at index time,  
 index the full contents of textfield as normal, but then also split it  
 on whitespace and index each word in the new field you just created.   
 Now you will be able to facet on this new field and sort the facet by  
 frequency (the default) to get the most popular words.
 
 Thanks,
 
 Matt Weber
 eSr Technologies
 http://www.esr-technologies.com
 
 
 
 
 On May 12, 2009, at 7:33 AM, sachin78 wrote:
 

 Does anybody have answer to this post.I have a similar requirement.

 Suppose I have free text field say
 I index the field.If I search for textfield:copper.I have to get facet
 counts for the most common words found in a textfield.
 ie.

 example:search for textfield:glass
 should return facet counts for common words found textfield.
 semiconductor(10),iron(20), silicon (25) material (8) thin(25) and  
 so on.
 Can this be done using tagging or MLT.

 Thanks,
 Sachin


 Raju444us wrote:

 I have a requirement. If I search for text field let's say  
 metal:glass
 what i want is to get the facet counts for all the terms related to
 glass in my search results.

 window(100)  since a window can be glass.
 plastic(10)  plastic is a material just like glass
 Iron(10)
 Paper(15)

 Can I use MLT to get this functionality.Please let me know how can I
 achieve this.If possible an example query.

 Thanks,
 Raju


 -- 
 View this message in context:
 http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23503794.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 

-- 
View this message in context: 
http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23504241.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to deal with Mark invalid?

2009-05-12 Thread Nikolai Derzhak
OK. I've applied dirty hack as temporary solution:

in src/java/org/apache/solr/analysis/HTMLStripReader.java of 1.4-dev   -
enclosed io.reset in try structure.

( * @version $Id: HTMLStripReader.java 646799 2008-04-10 13:36:23Z yonik $)

  private void restoreState() throws IOException {
try {
  in.reset();
} catch (Exception e) {
}
pushed.setLength(0);
  }



But how to resolve this problem more civilized ?

On Tue, May 12, 2009 at 12:20 PM, Nikolai Derzhak niko...@zapatec.netwrote:

 Good day, people.

 We use solr to search in mailboxes (dovecot).
 But with some bad messages solr 1.4-dev generate error:
 
 SEVERE: java.io.IOException: Mark invalid
 at java.io.BufferedReader.reset(BufferedReader.java:485)
 at
 org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171

 .
 
 It's issue known as SOLR-42.

 How i can log field stored in index (i need message uid) ?

 How to ignore such error and/or message ?

 Thanks


Re: How to deal with Mark invalid?

2009-05-12 Thread Yonik Seeley
I just committed a minor match suggested by Jim Murphy in SOLR-42 to
slightly lower the safe read ahead limit to avoid reading beyond a a
mark.  Could you try out trunk (or wait until the next nightly build?)

-Yonik
http://www.lucidimagination.com

On Tue, May 12, 2009 at 10:57 AM, Nikolai Derzhak niko...@zapatec.net wrote:
 OK. I've applied dirty hack as temporary solution:

 in src/java/org/apache/solr/analysis/HTMLStripReader.java of 1.4-dev   -
 enclosed io.reset in try structure.

 ( * @version $Id: HTMLStripReader.java 646799 2008-04-10 13:36:23Z yonik $)
 
  private void restoreState() throws IOException {
    try {
      in.reset();
    } catch (Exception e) {
    }
    pushed.setLength(0);
  }

 

 But how to resolve this problem more civilized ?

 On Tue, May 12, 2009 at 12:20 PM, Nikolai Derzhak niko...@zapatec.netwrote:

 Good day, people.

 We use solr to search in mailboxes (dovecot).
 But with some bad messages solr 1.4-dev generate error:
 
 SEVERE: java.io.IOException: Mark invalid
 at java.io.BufferedReader.reset(BufferedReader.java:485)
 at
 org.apache.solr.analysis.HTMLStripReader.restoreState(HTMLStripReader.java:171

 .
 
 It's issue known as SOLR-42.

 How i can log field stored in index (i need message uid) ?

 How to ignore such error and/or message ?

 Thanks



Re: Facet counts for common terms of the searched field

2009-05-12 Thread Matt Weber
I mean you can sort the facet results by frequency, which happens to  
be the default behavior.


Here is an example field for your schema:

field name=textfieldfacet type=string indexed=true  
stored=true multiValued=true /


Here is an example query:

http://localhost:8983/solr/select?q=textfield:copperfacet=truefacet.field=textfieldfacetfacet.limit=5

This will give you the top 5 words in the textfieldfacet.

Thanks,

Matt Weber
eSr Technologies
http://www.esr-technologies.com




On May 12, 2009, at 7:57 AM, sachin78 wrote:



Thanks Matt for your reply.

What do you mean by frequency(the default)?

Can you please provide an example schema and query will look like.

--Sachin


Matt Weber-2 wrote:


You may have to take care of this at index time.  You can create a  
new

multivalued field that has minimal processing.  Then at index time,
index the full contents of textfield as normal, but then also split  
it

on whitespace and index each word in the new field you just created.
Now you will be able to facet on this new field and sort the facet by
frequency (the default) to get the most popular words.

Thanks,

Matt Weber
eSr Technologies
http://www.esr-technologies.com




On May 12, 2009, at 7:33 AM, sachin78 wrote:



Does anybody have answer to this post.I have a similar requirement.

Suppose I have free text field say
I index the field.If I search for textfield:copper.I have to get  
facet

counts for the most common words found in a textfield.
ie.

example:search for textfield:glass
should return facet counts for common words found textfield.
semiconductor(10),iron(20), silicon (25) material (8) thin(25) and
so on.
Can this be done using tagging or MLT.

Thanks,
Sachin


Raju444us wrote:


I have a requirement. If I search for text field let's say
metal:glass
what i want is to get the facet counts for all the terms related to
glass in my search results.

window(100)  since a window can be glass.
plastic(10)  plastic is a material just like glass
Iron(10)
Paper(15)

Can I use MLT to get this functionality.Please let me know how  
can I

achieve this.If possible an example query.

Thanks,
Raju



--
View this message in context:
http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23503794.html
Sent from the Solr - User mailing list archive at Nabble.com.







--
View this message in context: 
http://www.nabble.com/Facet-counts-for-common-terms-of-the-searched-field-tp23302410p23504241.html
Sent from the Solr - User mailing list archive at Nabble.com.





Newbie question

2009-05-12 Thread Wayne Pope

Hi,

We're implemented search into our product here at our very small company,
and the developer who integrated Solr has left. I'm picking up the code base
and have run into a problem , which I imagine is simple to solve.

I have this request:

http://localhost:8983/solr/select?start=0rows=20qt=dismaxq=copyhl=truehl.snippets=4hl.fragsize=50facet=truefacet.mincount=1facet.limit=8facet.field=typefq=company-id%3A1wt=javabinversion=2.2

(I've been using this to see it rendered in the browser:
http://localhost:8983/solr/select?indent=onversion=2.2q=copystart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl=onhl.fl=featureshl=truehl.fragsize=50
)


that I've been trying out. I get a good responce - however the hl.fragsize
is ignored and the hl.fragsize in the solrconfig.xml is ignored. Instead I
get back the whole document (10,000 chars!) in the doc txt field. And
bizarely the response header is this:

response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
−
lst name=params
str name=explainOther/
str name=hl.fragsize50/str
str name=indenton/str
str name=hl.flfeatures/str
str name=wtstandard/str
−
arr name=hl
stron/str
strtrue/str
/arr
str name=version2.2/str
str name=rows10/str
str name=fl*,score/str
str name=start0/str
str name=qcopy/str
str name=qtstandard/str
/lst
/lst
−

So it seems that the hl.fragsize was taken into account.

I'm sure I'm being dumb but I don't know how to solve this. Any ideas?
many thanks
-- 
View this message in context: 
http://www.nabble.com/Newbie-question-tp23505802p23505802.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Loggin issue

2009-05-12 Thread Jay Hill
Usually that means there is another log4j.properties or log4j.xml file in
your classpath that is being found before the one you are intending to use.
Check your classpath for other versions of these files.

-Jay


On Tue, May 12, 2009 at 3:38 AM, Sagar Khetkade
sagar.khetk...@hotmail.comwrote:


 Hi,
 I have solr implemented in multi-core scenario and also  implemented
 solr-560-slf4j.patch for implementing the logging. But the problem I am
 facing is that the logs are going to the stdout.log file not the log file
 that I have mentioned in the log4j.properties file. Can anybody give me work
 round  to make logs go into the logger mentioned in log4j.properties file.
 Thanks in advance.

 Regards,
 Sagar Khetkade
 _
 Live Search extreme As India feels the heat of poll season, get all the
 info you need on the MSN News Aggregator
 http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx



Replication master+slave

2009-05-12 Thread Bryan Talbot
For replication in 1.4, the wiki at http://wiki.apache.org/solr/SolrReplication 
 says that a node can be both the master and a slave:


A node can act as both master and slave. In that case both the master  
and slave configuration lists need to be present inside the  
ReplicationHandler requestHandler in the solrconfig.xml.


What does this mean?  Does the core then poll itself for updates?

I'd like to have a single set of configuration files that are shared  
by masters and slaves and avoid duplicating configuration details in  
multiple files (one for master and one for slave) to ease management  
and failover.  Is this possible?


When I attempt to setup a multi server master-slave configuration and  
include both master and slave replication configuration options, I  
into some problems.  I'm  running a nightly build from May 7.



  requestHandler name=/replication class=solr.ReplicationHandler 
lst name=master
  str name=replicateAftercommit/str
/lst
lst name=slave
  str name=masterUrlhttp://master_core01:8983/solr/core01/ 
replication/str

  str name=pollInterval00:00:60/str
/lst
  /requestHandler


When the replication admin page (http://master_core01:8983/solr/core01/ 
admin/replication/index.jsp) is visited, the severe error show below  
appears in the solr log.  The server is otherwise idle so there is no  
reason all threads should be busy unless the replication code is  
getting itself into a loop.


What's the right way to do this?



May 11, 2009 8:01:22 PM org.apache.tomcat.util.threads.ThreadPool  
logFull
SEVERE: All threads (150) are currently busy, waiting. Increase  
maxThreads (150) or check the servlet status
May 11, 2009 8:01:41 PM org.apache.solr.handler.ReplicationHandler  
getReplicationDetails

WARNING: Exception while invoking a 'details' method on master
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java: 
218)
at java.io.BufferedInputStream.read(BufferedInputStream.java: 
237)
at  
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at  
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at  
org 
.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java: 
1116)
at  
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager 
$ 
HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java: 
1413)
at  
org 
.apache 
.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java: 
1973)
at  
org 
.apache 
.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java: 
1735)
at  
org 
.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java: 
1098)
at  
org 
.apache 
.commons 
.httpclient 
.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at  
org 
.apache 
.commons 
.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java: 
171)
at  
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 
397)
at  
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 
323)
at  
org 
.apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.java: 
183)
at  
org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java: 
178)
at  
org 
.apache 
.solr 
.handler 
.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:555)
at  
org 
.apache 
.solr 
.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java: 
147)
at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
at  
org 
.apache.jsp.admin.replication.index_jsp.executeCommand(index_jsp.java: 
34)
at  
org.apache.jsp.admin.replication.index_jsp._jspService(index_jsp.java: 
208)
at  
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at  
org 
.apache 
.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:331)
at  
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:329)
at  
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at  
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 
269)
at  
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at  
org 
.apache 
.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java: 
679)
at  
org 
.apache 
.catalina 

Re: AW: Geographical search based on latitude and longitude

2009-05-12 Thread Grant Ingersoll
Yes, that is part of it, but there is more to it.  See Yonik's comment  
about needs further down.



On May 12, 2009, at 7:36 AM, Norman Leutner wrote:


So are you using boundary box to find results within a given range(km)
like mentioned here: http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html 
 ?



Best regards

Norman Leutner
all2e GmbH

-Ursprüngliche Nachricht-
Von: Grant Ingersoll [mailto:gsing...@apache.org]
Gesendet: Dienstag, 12. Mai 2009 13:18
An: solr-user@lucene.apache.org
Betreff: Re: Geographical search based on latitude and longitude

See https://issues.apache.org/jira/browse/SOLR-773.  In other words,
we're working on it and would love some help!

-Grant

On May 12, 2009, at 7:12 AM, Norman Leutner wrote:


Hi together,

I'm new to Solr and want to port a geographical range search from
MySQL to Solr.

Currently I'm using some mathematical functions (based on GRS80
modell) directly within MySQL to calculate
the actual distance from the locations within the database to a
current location (lat and long are known):

$query=SELECT street, zip, city, state, country, .
$radius.*ACOS(cos(RADIANS(latitude))*cos(.
$theta.)*(sin(RADIANS(longitude))*sin(.$phi.)
+cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(.
$theta.)) AS Distance FROM ezgis_position WHERE .
$radius.*ACOS(cos(RADIANS(latitude))*cos(.
$theta.)*(sin(RADIANS(longitude))*sin(.$phi.)
+cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(.
$theta.)) = .$range. ORDER BY Distance;

This works pretty fine and fast. Due to we want to include this
within our Solr search result I would like to have a attribute like
actual_distance within the result. Is there a way to use those
functions like (radians, sin, acos,...) directly within Solr?

Thanks in advance for any feedback
Norman Leutner


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Restarting tomcat deletes all Solr indexes

2009-05-12 Thread Shalin Shekhar Mangar
You can fix the path of the index in your solrconfig.xml

On Tue, May 12, 2009 at 4:48 PM, KK dioxide.softw...@gmail.com wrote:

 One more information I would like to add.
  The entry in solr stats page says this:

 readerDir : org.apache.lucene.store.FSDirectory@/home/kk/solr/data/index

 when I ran from /home/kk
 and this:

 readerDir : org.apache.lucene.store.FSDirectory@
 /home/kk/junk/solr/data/index

 after running from /home/kk/junk

 That assures the me the problem, but what is the solution?

 Thanks,
 KK.

 On Tue, May 12, 2009 at 4:41 PM, KK dioxide.softw...@gmail.com wrote:

  Thanks for your response @aklochkov.
   But I again noticed that something is wrong in my solr/tomcat config[I
  spent a lot of time making solr run], b'coz in the solr admin page [
  http://localhost:8080/solr/admin/] what I see is that the $CWD is the
  location where from I restarted tomcat and seems this $cwd gets picked
 and
  used for index data[Is it the default behavior? or something wrong from
 my
  side?, or may be I'm asking some stupid question ].
   Once I was in /etc and from there I restarted the tomcat and when I
 tried
  to open the solr admin page I found an error saying that can not create
  index directory some permission issue I think [it gave a directory str
 like
  /etc/solr/index ... ]. I'm pretty sure something is wrong in
 configuration.
  One more thing assures me about this is the fact that I found many solr
  index directories here and there[ these are I think the locations where I
  was when I restarted tomcat at that time ]. Earlier I was using the
  java_opts to set the solr home like this
 
   export JAVA_OPTS=$JAVA_OPTS -D/usr/local/solr#in .bashrc
 
  but I commented that and instead added the jndi entry in
  /usr/local/tomcat/webapps/solr/WEB-INF/web.xml as this
 
  env-entry
 env-entry-namesolr/home/env-entry-name
 env-entry-value/usr/local/solr/env-entry-value
 env-entry-typejava.lang.String/env-entry-type
  /env-entry
 
  Even the entry SolrHome in solr admin page say that SolrHome is
  /usr/loca/solr but the index gets created in $CWD. Is it the case that
 I
  created entries for SolrHome in multiple places? which is obviously
 wrong.
  Can someone point me what is the issue. Thank you very much.
 
  --KK
 
 
 
  On Tue, May 12, 2009 at 2:39 PM, Andrey Klochkov 
  akloch...@griddynamics.com wrote:
 
  Hi,
 
  I know that when starting Solr checks index directory existence, and
  creates
  new fresh index if it doesn't exist. Does it help? If no, the next step
  I'd
  do in your case is patching SolrCore.initIndex method - insert some
  logging,
  or run EmbeddedSolrServer with debugger etc.
 
  On Mon, May 11, 2009 at 1:25 PM, KK dioxide.softw...@gmail.com wrote:
 
   Hi,
   I'm facing a silly problem. Every time I restart tomcat all the
 indexes
  are
   lost. I used all the default configurations. I'm pretty sure there
 must
  be
   some basic changes to fix this. I'd highly appreciate if someone could
   direct me fixing this.
  
   Thanks,
   KK.
  
 
 
  --
  Andrew Klochkov
 
 
 




-- 
Regards,
Shalin Shekhar Mangar.


Re: Newbie question

2009-05-12 Thread Shalin Shekhar Mangar
On Tue, May 12, 2009 at 9:48 PM, Wayne Pope waynemailingli...@gmail.comwrote:


 I have this request:


 http://localhost:8983/solr/select?start=0rows=20qt=dismaxq=copyhl=truehl.snippets=4hl.fragsize=50facet=truefacet.mincount=1facet.limit=8facet.field=typefq=company-id%3A1wt=javabinversion=2.2

 (I've been using this to see it rendered in the browser:

 http://localhost:8983/solr/select?indent=onversion=2.2q=copystart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl=onhl.fl=featureshl=truehl.fragsize=50
 )


 that I've been trying out. I get a good responce - however the hl.fragsize
 is ignored and the hl.fragsize in the solrconfig.xml is ignored. Instead I
 get back the whole document (10,000 chars!) in the doc txt field. And
 bizarely the response header is this:


hl.fragsize is relevant only for the snippets created by the highlighter.
The returned fields will always have the complete data for a document. Does
that answer your question?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Replication master+slave

2009-05-12 Thread Shalin Shekhar Mangar
On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot btal...@aeriagames.comwrote:

 For replication in 1.4, the wiki at
 http://wiki.apache.org/solr/SolrReplication says that a node can be both
 the master and a slave:

 A node can act as both master and slave. In that case both the master and
 slave configuration lists need to be present inside the ReplicationHandler
 requestHandler in the solrconfig.xml.

 What does this mean?  Does the core then poll itself for updates?


No. This type of configuration is meant for repeaters. Suppose there are
slaves in multiple data-centers (say data center A and B). There is always a
single master (say in A). One of the slaves in B is used as a master for the
other slaves in B. Therefore, this one slave in B is both a master as well
as the slave.



 I'd like to have a single set of configuration files that are shared by
 masters and slaves and avoid duplicating configuration details in multiple
 files (one for master and one for slave) to ease management and failover.
  Is this possible?


You wouldn't want the master to be a slave. So I guess you'd need to have a
separate file. Also, it needs to be a separate file so that the slave does
not become a master when the solrconfig.xml is replicated.



 When I attempt to setup a multi server master-slave configuration and
 include both master and slave replication configuration options, I into some
 problems.  I'm  running a nightly build from May 7.


Not sure what happened. Is that the url for this solr (meaning same solr url
is master and slave of itself)? If yes, that is not a valid configuration.

-- 
Regards,
Shalin Shekhar Mangar.


error when seting queryResultWindowSize to zero

2009-05-12 Thread Marc Sturlese

I have seen that if I set the value of queryResultWindowSize  to 0 in
solrconfig.xml solr will return an error of divided by zero.
Checking the source I have seen it can be fixed in SolrIndexSearcher. At the
end of the function getDocListC it's coded:

if (maxDocRequested  queryResultWindowSize) {
  supersetMaxDoc=queryResultWindowSize;
} else {
  supersetMaxDoc = ((maxDocRequested -1)/queryResultWindowSize +
1)*queryResultWindowSize;
  if (supersetMaxDoc  0) supersetMaxDoc=maxDocRequested;
}

I have sorted it oud doing (just addin parenthesis):

if (maxDocRequested  queryResultWindowSize) {
  supersetMaxDoc=queryResultWindowSize;
} else {
  supersetMaxDoc = ((maxDocRequested -1)/(queryResultWindowSize +
1))*queryResultWindowSize;
  if (supersetMaxDoc  0) supersetMaxDoc=maxDocRequested;
}

I have seen this is happening in a recent trunk. Is my fix correct?

-- 
View this message in context: 
http://www.nabble.com/error-when-seting-queryResultWindowSize-to-zero-tp23508478p23508478.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Selective Searches Based on User Identity

2009-05-12 Thread Terence Gannon
Paul -- thanks for the reply, I appreciate it.  That's a very practical
approach, and is worth taking a closer look at.  Actually, taking your idea
one step further, perhaps three fields; 1) ownerUid (uid of the document's
owner) 2) grantedUid (uid of users who have been granted access), and 3)
deniedUid (uid of users specifically denied access to the document).  These
fields, coupled with some business rules around how they were populated
should cover off all possibilities I think.

Access to the Solr instance would have to be tightly controlled, but that's
something that should be done anyway.  You sure wouldn't want end users
preparing their own XML and throwing it at Solr -- it would be pretty easy
to figure out how to get around the access/denied fields and get at stuff
the owner didn't intend.

This approach mimics to some degree what is being done in the operating
system, but it's still elegant and provides the level of control required.
 Anybody else have any thoughts in this regard?  Has anybody implemented
anything similar, and if so, how did it work?  Thanks, and best regards...

Terence


Re: error when seting queryResultWindowSize to zero

2009-05-12 Thread Yonik Seeley
On Tue, May 12, 2009 at 3:03 PM, Marc Sturlese marc.sturl...@gmail.com wrote:
 I have seen that if I set the value of queryResultWindowSize  to 0 in
 solrconfig.xml solr will return an error of divided by zero.

Seems like a configuration error since requesting that results be
retrieved in 0 size chunks doesn't make a lot of sense.

 Checking the source I have seen it can be fixed in SolrIndexSearcher. At the
 end of the function getDocListC it's coded:

        if (maxDocRequested  queryResultWindowSize) {
          supersetMaxDoc=queryResultWindowSize;
        } else {
          supersetMaxDoc = ((maxDocRequested -1)/queryResultWindowSize +
 1)*queryResultWindowSize;
          if (supersetMaxDoc  0) supersetMaxDoc=maxDocRequested;
        }

 I have sorted it oud doing (just addin parenthesis):

        if (maxDocRequested  queryResultWindowSize) {
          supersetMaxDoc=queryResultWindowSize;
        } else {
          supersetMaxDoc = ((maxDocRequested -1)/(queryResultWindowSize +
 1))*queryResultWindowSize;
          if (supersetMaxDoc  0) supersetMaxDoc=maxDocRequested;
        }

 I have seen this is happening in a recent trunk. Is my fix correct?

The +1 really needs to be after the divide (we're rounding up).

If a fix is needed, I imagine it would be at the time that config
parameter is read... if it's less than or equal to 0, then set it to
1.

-Yonik
http://www.lucidimagination.com


Re: Selective Searches Based on User Identity

2009-05-12 Thread Matt Weber
I also work with the FAST Enterprise Search engine and this is exactly  
how their Security Access Module works.  They actually use a modified  
base-32 encoded value for indexing, but that is because they don't  
have the luxury of untokenized/un-processed String fields like Solr.


Thanks,

Matt Weber
eSr Technologies
http://www.esr-technologies.com




On May 12, 2009, at 12:26 PM, Terence Gannon wrote:

Paul -- thanks for the reply, I appreciate it.  That's a very  
practical
approach, and is worth taking a closer look at.  Actually, taking  
your idea
one step further, perhaps three fields; 1) ownerUid (uid of the  
document's
owner) 2) grantedUid (uid of users who have been granted access),  
and 3)
deniedUid (uid of users specifically denied access to the  
document).  These
fields, coupled with some business rules around how they were  
populated

should cover off all possibilities I think.

Access to the Solr instance would have to be tightly controlled, but  
that's
something that should be done anyway.  You sure wouldn't want end  
users
preparing their own XML and throwing it at Solr -- it would be  
pretty easy
to figure out how to get around the access/denied fields and get at  
stuff

the owner didn't intend.

This approach mimics to some degree what is being done in the  
operating
system, but it's still elegant and provides the level of control  
required.
Anybody else have any thoughts in this regard?  Has anybody  
implemented
anything similar, and if so, how did it work?  Thanks, and best  
regards...


Terence




Re: Selective Searches Based on User Identity

2009-05-12 Thread Jay Hill
The only downside would be that you would have to update a document anytime
a user was granted or denied access. You would have to query before the
update to get the current values for grantedUID and deniedUID, remove/add
values, and update the index. If you don't have a lot of changes in the
system that wouldn't be a big deal, but if a lot of changes are happening
throughout the day you might have to queue requests and batch them.

-Jay

On Tue, May 12, 2009 at 1:05 PM, Matt Weber m...@mattweber.org wrote:

 I also work with the FAST Enterprise Search engine and this is exactly how
 their Security Access Module works.  They actually use a modified base-32
 encoded value for indexing, but that is because they don't have the luxury
 of untokenized/un-processed String fields like Solr.

 Thanks,

 Matt Weber
 eSr Technologies
 http://www.esr-technologies.com





 On May 12, 2009, at 12:26 PM, Terence Gannon wrote:

  Paul -- thanks for the reply, I appreciate it.  That's a very practical
 approach, and is worth taking a closer look at.  Actually, taking your
 idea
 one step further, perhaps three fields; 1) ownerUid (uid of the document's
 owner) 2) grantedUid (uid of users who have been granted access), and 3)
 deniedUid (uid of users specifically denied access to the document).
  These
 fields, coupled with some business rules around how they were populated
 should cover off all possibilities I think.

 Access to the Solr instance would have to be tightly controlled, but
 that's
 something that should be done anyway.  You sure wouldn't want end users
 preparing their own XML and throwing it at Solr -- it would be pretty easy
 to figure out how to get around the access/denied fields and get at stuff
 the owner didn't intend.

 This approach mimics to some degree what is being done in the operating
 system, but it's still elegant and provides the level of control required.
 Anybody else have any thoughts in this regard?  Has anybody implemented
 anything similar, and if so, how did it work?  Thanks, and best regards...

 Terence





RE: Selective Searches Based on User Identity

2009-05-12 Thread Terence Gannon
Thanks for the tip.  I went to their website (www.fastsearch.com), and got
as far as the second line, top left 'A Microsoft Subsidiary'...at which
point, hopes of it being another open source solution quickly faded. ;-)
Seriously, though, it looks like an interesting product, but open source is
a mandatory requirement for my particular application.  But the fact they
implemented this functionality would seem to support that it's a valid
requirement, and I'll keep plugging away on it.  Thank you very much for
bringing FAST to my attention...I appreciate it!  Best regards...

Terence



-Original Message-
From: Matt Weber [mailto:m...@mattweber.org]
Sent: May 12, 2009 14:06
To: solr-user@lucene.apache.org
Subject: Re: Selective Searches Based on User Identity



I also work with the FAST Enterprise Search engine and this is exactly

how their Security Access Module works.  They actually use a modified

base-32 encoded value for indexing, but that is because they don't

have the luxury of untokenized/un-processed String fields like Solr.



Thanks,



Matt Weber

eSr Technologies

http://www.esr-technologies.com


Re: Selective Searches Based on User Identity

2009-05-12 Thread Matt Weber
Here is a good presentation on search security from the Infonortics  
Search Conference that was held a few weeks ago.


http://www.infonortics.com/searchengines/sh09/slides/kehoe.pdf

The approach you are using is called early-binding.  As Jay mentioned,  
one of the downsides is updating the documents each time you have an  
ACL change.  You could use the late-binding approach that checks each  
result after the query but before you display to the user.  I don't  
recommend this approach because it will strain your security  
infrastructure because you will need to check if the user can access  
each result.


Good luck.

Thanks,

Matt Weber
eSr Technologies
http://www.esr-technologies.com




On May 12, 2009, at 1:21 PM, Jay Hill wrote:

The only downside would be that you would have to update a document  
anytime
a user was granted or denied access. You would have to query before  
the
update to get the current values for grantedUID and deniedUID,  
remove/add
values, and update the index. If you don't have a lot of changes in  
the
system that wouldn't be a big deal, but if a lot of changes are  
happening

throughout the day you might have to queue requests and batch them.

-Jay

On Tue, May 12, 2009 at 1:05 PM, Matt Weber m...@mattweber.org  
wrote:


I also work with the FAST Enterprise Search engine and this is  
exactly how
their Security Access Module works.  They actually use a modified  
base-32
encoded value for indexing, but that is because they don't have the  
luxury

of untokenized/un-processed String fields like Solr.

Thanks,

Matt Weber
eSr Technologies
http://www.esr-technologies.com





On May 12, 2009, at 12:26 PM, Terence Gannon wrote:

Paul -- thanks for the reply, I appreciate it.  That's a very  
practical
approach, and is worth taking a closer look at.  Actually, taking  
your

idea
one step further, perhaps three fields; 1) ownerUid (uid of the  
document's
owner) 2) grantedUid (uid of users who have been granted access),  
and 3)

deniedUid (uid of users specifically denied access to the document).
These
fields, coupled with some business rules around how they were  
populated

should cover off all possibilities I think.

Access to the Solr instance would have to be tightly controlled, but
that's
something that should be done anyway.  You sure wouldn't want end  
users
preparing their own XML and throwing it at Solr -- it would be  
pretty easy
to figure out how to get around the access/denied fields and get  
at stuff

the owner didn't intend.

This approach mimics to some degree what is being done in the  
operating
system, but it's still elegant and provides the level of control  
required.
Anybody else have any thoughts in this regard?  Has anybody  
implemented
anything similar, and if so, how did it work?  Thanks, and best  
regards...


Terence








Who is running 1.4 nightly in production?

2009-05-12 Thread Walter Underwood
We're planning our move to 1.4, and want to run one of our production
servers with the new code. Just to feel better about it, is anyone else
running 1.4 in production?

I'm building 2009-05-11 right now.

wuner



Re: Who is running 1.4 nightly in production?

2009-05-12 Thread Matthew Runo
We're using 1.4-dev 749558:749756M that we built on 2009-03-03  
13:10:05 for our master/slave production environment using the Java  
Replication code.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On May 12, 2009, at 2:02 PM, Walter Underwood wrote:


We're planning our move to 1.4, and want to run one of our production
servers with the new code. Just to feel better about it, is anyone  
else

running 1.4 in production?

I'm building 2009-05-11 right now.

wuner





camel-casing and dismax troubles

2009-05-12 Thread Geoffrey Young
hi all :)

I'm having trouble with camel-cased query strings and the dismax handler.

a user query

 LeAnn Rimes

isn't matching the indexed term

 Leann Rimes

even though both are lower-cased in the end.  furthermore, the
analysis tool shows a match.

the debug query looks like

 parsedquery:+((DisjunctionMaxQuery((search-en:\(leann le)
ann\)) DisjunctionMaxQuery((search-en:rimes)))~2) (),

I have a feeling it's due to how the broken up tokens are added back
into the token stream with PreserveOriginal, and some strange
interaction between that order and dismax, but I'm not entirely sure.

configs follow.  thoughts appreciated.

--Geoff

  fieldType name=search-en class=solr.TextField
positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.ISOLatin1AccentFilterFactory /
  filter class=solr.WordDelimiterFilterFactory preserveOriginal=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=1
  catenateNumbers=1
  catenateAll=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory ignoreCase=false
words=stopwords-en.txt/
/analyzer

analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.ISOLatin1AccentFilterFactory /
  filter class=solr.WordDelimiterFilterFactory preserveOriginal=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=0
  catenateNumbers=0
  catenateAll=0/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=false
words=stopwords-en.txt/
/analyzer
  /fieldType


Re: how to manually add data to indexes generated by nutch-1.0 using solr

2009-05-12 Thread alxsss

 Tried to add a new record using



 curl http://localhost:8983/solr/update -H Content-Type: text/xml 
--data-binary 'add
doc boost=2.5
field name=segment20090512170318/field
field name=digest86937aaee8e748ac3007ed8b66477624/field
field name=boost0.21189615/field
field name=urltest.com/field
field name=titletest test/field
field name=tstamp 20090513003210909/field
/doc /add'

I get

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 
name=QTime71/int/lst
/response


and added records are not found in the search.

Any ideas what went wrong?


Thanks.
Alex.


 

-Original Message-
From: alx...@aim.com
To: solr-user@lucene.apache.org
Sent: Mon, 11 May 2009 12:14 pm
Subject: how to manually add data to indexes generated by nutch-1.0 using solr










Hello,

I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I needed to?

index a few files also. But I know keywords for those files and their?
locations. I need to add them manually. I took a look to two tutorials on the 
wiki, but did not find any info about this issue.
Is there a tutorial on, step by step procedure of adding data to? nutch index 
using solr? manually?

Thanks in advance.
Alex.



 



Re: Who is running 1.4 nightly in production?

2009-05-12 Thread Erik Hatcher
We run a not too distant trunk (1.4, probably a month or so ago)  
version of Solr on LucidFind at http://www.lucidimagination.com/search


Erik

On May 12, 2009, at 5:02 PM, Walter Underwood wrote:


We're planning our move to 1.4, and want to run one of our production
servers with the new code. Just to feel better about it, is anyone  
else

running 1.4 in production?

I'm building 2009-05-11 right now.

wuner




Re: how to manually add data to indexes generated by nutch-1.0 using solr

2009-05-12 Thread Erik Hatcher
send a commit/ request afterwards, or you can add ?commit=true to  
the /update request with the adds.


Erik

On May 12, 2009, at 8:57 PM, alx...@aim.com wrote:



Tried to add a new record using



curl http://localhost:8983/solr/update -H Content-Type: text/xml -- 
data-binary 'add

doc boost=2.5
field name=segment20090512170318/field
field name=digest86937aaee8e748ac3007ed8b66477624/field
field name=boost0.21189615/field
field name=urltest.com/field
field name=titletest test/field
field name=tstamp 20090513003210909/field
/doc /add'

I get

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint  
name=QTime71/int/lst

/response


and added records are not found in the search.

Any ideas what went wrong?


Thanks.
Alex.




-Original Message-
From: alx...@aim.com
To: solr-user@lucene.apache.org
Sent: Mon, 11 May 2009 12:14 pm
Subject: how to manually add data to indexes generated by nutch-1.0  
using solr











Hello,

I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I  
needed to?


index a few files also. But I know keywords for those files and their?
locations. I need to add them manually. I took a look to two  
tutorials on the

wiki, but did not find any info about this issue.
Is there a tutorial on, step by step procedure of adding data to?  
nutch index

using solr? manually?

Thanks in advance.
Alex.









RE: Selective Searches Based on User Identity

2009-05-12 Thread Terence Gannon
In reply to both Matt and Jay's comments, the particular situation I'm
dealing with is one where rights will change relatively little once
they are established.  Typically a document will be loaded and
indexed, and a decision will be made on sharing that more-or-less
immediately.  It might change a couple of times after that, but that
will be it.  So early-binding seems like the better option.  Thanks to
both of you for your suggestions and help.

Terence

PS. I wish I had known about that conference...looks like it would
have been very helpful to me right now!

-Original Message-
From: Matt Weber [mailto:m...@mattweber.org]
Sent: May 12, 2009 14:41
To: solr-user@lucene.apache.org
Subject: Re: Selective Searches Based on User Identity



Here is a good presentation on search security from the Infonortics

Search Conference that was held a few weeks ago.



http://www.infonortics.com/searchengines/sh09/slides/kehoe.pdf



The approach you are using is called early-binding.  As Jay mentioned,

one of the downsides is updating the documents each time you have an

ACL change.  You could use the late-binding approach that checks each

result after the query but before you display to the user.  I don't

recommend this approach because it will strain your security

infrastructure because you will need to check if the user can access

each result.



Good luck.



Thanks,



Matt Weber

eSr Technologies

http://www.esr-technologies.com


Re: Replication master+slave

2009-05-12 Thread Jian Han Guo
I was looking at the same problem, and had a discussion with Noble. You can
use a hack to achieve what you want, see

https://issues.apache.org/jira/browse/SOLR-1154

Thanks,

Jianhan


On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot btal...@aeriagames.comwrote:

 So how are people managing solrconfig.xml files which are largely the same
 other than differences for replication?

 I don't think it's a good thing to maintain two copies of the same file
 and I'd like to avoid that.  Maybe enabling the XInclude feature in
 DocumentBuilders would make it possible to modularize configuration files to
 make this possible?


 http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean)http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware%28boolean%29


 -Bryan





 On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote:

  On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot btal...@aeriagames.com
 wrote:

  For replication in 1.4, the wiki at
 http://wiki.apache.org/solr/SolrReplication says that a node can be both
 the master and a slave:

 A node can act as both master and slave. In that case both the master and
 slave configuration lists need to be present inside the
 ReplicationHandler
 requestHandler in the solrconfig.xml.

 What does this mean?  Does the core then poll itself for updates?



 No. This type of configuration is meant for repeaters. Suppose there are
 slaves in multiple data-centers (say data center A and B). There is always
 a
 single master (say in A). One of the slaves in B is used as a master for
 the
 other slaves in B. Therefore, this one slave in B is both a master as well
 as the slave.



 I'd like to have a single set of configuration files that are shared by
 masters and slaves and avoid duplicating configuration details in
 multiple
 files (one for master and one for slave) to ease management and failover.
 Is this possible?


 You wouldn't want the master to be a slave. So I guess you'd need to have
 a
 separate file. Also, it needs to be a separate file so that the slave does
 not become a master when the solrconfig.xml is replicated.



 When I attempt to setup a multi server master-slave configuration and
 include both master and slave replication configuration options, I into
 some
 problems.  I'm  running a nightly build from May 7.


 Not sure what happened. Is that the url for this solr (meaning same solr
 url
 is master and slave of itself)? If yes, that is not a valid configuration.

 --
 Regards,
 Shalin Shekhar Mangar.





RE: Solr Loggin issue

2009-05-12 Thread Sagar Khetkade

 

I have only one log4j.properties file in classpath and even if i configure for 
the particular package where the solr exception would come then also the same 
issue. I had removed the logger for my application and using only for solr 
logging.

 

~Sagar

 


 
 Date: Tue, 12 May 2009 09:59:01 -0700
 Subject: Re: Solr Loggin issue
 From: jayallenh...@gmail.com
 To: solr-user@lucene.apache.org
 
 Usually that means there is another log4j.properties or log4j.xml file in
 your classpath that is being found before the one you are intending to use.
 Check your classpath for other versions of these files.
 
 -Jay
 
 
 On Tue, May 12, 2009 at 3:38 AM, Sagar Khetkade
 sagar.khetk...@hotmail.comwrote:
 
 
  Hi,
  I have solr implemented in multi-core scenario and also implemented
  solr-560-slf4j.patch for implementing the logging. But the problem I am
  facing is that the logs are going to the stdout.log file not the log file
  that I have mentioned in the log4j.properties file. Can anybody give me work
  round to make logs go into the logger mentioned in log4j.properties file.
  Thanks in advance.
 
  Regards,
  Sagar Khetkade
  _
  Live Search extreme As India feels the heat of poll season, get all the
  info you need on the MSN News Aggregator
  http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx
 

_
Live Search extreme As India feels the heat of poll season, get all the info 
you need on the MSN News Aggregator
http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx