Re: includes in solrconfig.xml

2008-08-11 Thread Jacob Singh
Thanks Erik,

I didn't know about that.  I'll give it a shot!

-Jacob
Erik Hatcher wrote:
 Well, let's not forget about XML's entity reference includes.  It's not
 the prettiest thing, but you can do the sort of thing mentioned here:
 
 http://www.xml.com/pub/a/2001/03/14/trxml10.html
 
 Erik
 
 
 On Aug 9, 2008, at 11:16 PM, Otis Gospodnetic wrote:
 No, not possible.

 Otis
 -- 
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Jacob Singh [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Friday, August 8, 2008 11:04:02 PM
 Subject: includes in solrconfig.xml

 Hello,

 Is it possible to include an external xml file from within
 solrconfig.xml?

 Or even better, to scan a directory ala conf.d in apache?

 Thanks,
 jacob
 



Re: Still no results after removing from stopwords

2008-08-11 Thread Norberto Meijome
On Sun, 10 Aug 2008 19:58:24 -0700 (PDT)
SoupErman [EMAIL PROTECTED] wrote:

 I needed to run a search with a query containing the word not, so I removed
 not from the stopwords.txt file. Which seemed to work, at least as far as
 parsing the query. It was now successfully searching for that keyword, as
 noted in the query debugger. However it isn't returning any results where
 not is in the query, which suggests not hasn't been indexed. However
 looking at the listing for a particular item, not is listed as one of the
 keywords, so it should be finding it?

Hi Michael,
did you reindex your documents after 1) changing your settings and 2) 
restarting SOLR (to allow your settings to come into effect)?

B

_
{Beto|Norberto|Numard} Meijome

Real Programmers don't comment their code. If it was hard to write, it should 
be hard to understand and even harder to modify.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Can't Delete Record

2008-08-11 Thread Vj Ali

Hi:
I am tying to delete the index by both deleteid  and by using query. But
when i searched the record again it again shows me the XML of the deleted
record. i also sends coomit tag as well. Why record is not deleting???
please help me urgent.

Regards,
Ali Vajahat
Lahore Pakistan
-- 
View this message in context: 
http://www.nabble.com/Can%27t-Delete-Record-tp18926195p18926195.html
Sent from the Solr - User mailing list archive at Nabble.com.



External Application (JIVE) : integration

2008-08-11 Thread Vicky_Dev

Hi 

I am beginner in Solr. 

Question: We have two types of search in site. 
a) First one is : Product search . Since all data is within database (our
control environment, this type of search can be implemented using Solr
(based on Lucene index) 
b) Second one is: External application / Third party search--Jive search
(Community search)

For second type of search web services are exposed. We can pass search query
to external application and retrieve results from third party.

Now results from both searches needs to be combined and shown to user 

For e.g. If search query contains Canon then site search will give site
results (suppose 90 search results) and Jive search gives 30 items 

Now on screen we have to shown combined results. Results should be sorted by
relevance . Search results shown on screen can contain --first result record
from site search , second result record from jive 
Third from Jive and next one from site search . It entirely depend on
relevance to search query strin

Is it possible to dynamically add third party results to search results and
call function to rearrange search results ?

~Vikrant



-- 
View this message in context: 
http://www.nabble.com/External-Application-%28JIVE%29-%3A-integration-tp18926597p18926597.html
Sent from the Solr - User mailing list archive at Nabble.com.



Newbie question about memory allocation between solr and OS

2008-08-11 Thread Dallan Quass
Sorry for the newbie question.  When running solr under tomcat I notice that
the amount of memory tomcat uses increases over time until it reaches the
maximum limit set (with the Xms and Xmx switches) for the jvm.

Is it better to allocate give all available physical memory to the jvm, or
to allocate enough so that solr doesn't run out of memory and let the OS use
the rest for disk buffers?  That is, will lucene take good advantage if
given extra memory, or does the extra memory end up being used for data
structures that are no longer in use but haven't been garbage-collected by
the jvm yet?

Thank you,

--dallan



Re: Newbie question about memory allocation between solr and OS

2008-08-11 Thread Yonik Seeley
On Mon, Aug 11, 2008 at 10:52 AM, Dallan Quass [EMAIL PROTECTED] wrote:
 Sorry for the newbie question.  When running solr under tomcat I notice that
 the amount of memory tomcat uses increases over time until it reaches the
 maximum limit set (with the Xms and Xmx switches) for the jvm.

 Is it better to allocate give all available physical memory to the jvm, or
 to allocate enough so that solr doesn't run out of memory and let the OS use
 the rest for disk buffers?

The latter... let the OS have as much as you can for disk buffers.

-Yonik


Re: Can't Delete Record

2008-08-11 Thread Shalin Shekhar Mangar
Hi Ali,

We can help you more if you can give us the following details.

What is the type of the id as defined in schema.xml?
What is the query you are using to delete? Does that same query show results
if you search through the admin?
Are there any exceptions in the logs?

On Mon, Aug 11, 2008 at 7:18 PM, Vj Ali [EMAIL PROTECTED] wrote:


 Hi:
 I am tying to delete the index by both deleteid  and by using query.
 But
 when i searched the record again it again shows me the XML of the deleted
 record. i also sends coomit tag as well. Why record is not deleting???
 please help me urgent.

 Regards,
 Ali Vajahat
 Lahore Pakistan
 --
 View this message in context:
 http://www.nabble.com/Can%27t-Delete-Record-tp18926195p18926195.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


RE: Newbie question about memory allocation between solr and OS

2008-08-11 Thread Dallan Quass
Thanks Yonik!

In case anyone monitoring this list isn't sold already on solr, my use of
solr is pretty non-standard -- I've written nearly a dozen plugins to
customize it for my particular needs.  Yet I've been able to do everything I
need using plugins and without modifying the core code.  It works like a
charm.

--dallan

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
 Of Yonik Seeley
 Sent: Monday, August 11, 2008 10:15 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Newbie question about memory allocation between 
 solr and OS
 
 On Mon, Aug 11, 2008 at 10:52 AM, Dallan Quass 
 [EMAIL PROTECTED] wrote:
  Sorry for the newbie question.  When running solr under tomcat I 
  notice that the amount of memory tomcat uses increases over 
 time until 
  it reaches the maximum limit set (with the Xms and Xmx 
 switches) for the jvm.
 
  Is it better to allocate give all available physical memory to the 
  jvm, or to allocate enough so that solr doesn't run out of 
 memory and 
  let the OS use the rest for disk buffers?
 
 The latter... let the OS have as much as you can for disk buffers.
 
 -Yonik



Re: Newbie question about memory allocation between solr and OS

2008-08-11 Thread Shalin Shekhar Mangar
Dallan, perhaps you can share some of your experiences on this thread:

http://markmail.org/message/ksdnbkdt72ayomv3

Thanks!

On Mon, Aug 11, 2008 at 9:35 PM, Dallan Quass [EMAIL PROTECTED] wrote:

 Thanks Yonik!

 In case anyone monitoring this list isn't sold already on solr, my use of
 solr is pretty non-standard -- I've written nearly a dozen plugins to
 customize it for my particular needs.  Yet I've been able to do everything
 I
 need using plugins and without modifying the core code.  It works like a
 charm.

 --dallan

  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
  Of Yonik Seeley
  Sent: Monday, August 11, 2008 10:15 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Newbie question about memory allocation between
  solr and OS
 
  On Mon, Aug 11, 2008 at 10:52 AM, Dallan Quass
  [EMAIL PROTECTED] wrote:
   Sorry for the newbie question.  When running solr under tomcat I
   notice that the amount of memory tomcat uses increases over
  time until
   it reaches the maximum limit set (with the Xms and Xmx
  switches) for the jvm.
  
   Is it better to allocate give all available physical memory to the
   jvm, or to allocate enough so that solr doesn't run out of
  memory and
   let the OS use the rest for disk buffers?
 
  The latter... let the OS have as much as you can for disk buffers.
 
  -Yonik




-- 
Regards,
Shalin Shekhar Mangar.


Lower Case Filter Factory

2008-08-11 Thread swarag

Hi,
I am using the basic text field in schema.xml. Here is an excerpt. 

field name=name type=text  index=true  stored=true 
multiValued=false omitNorms=true/

and the fieldType text is as follows:

fieldtype name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldtype
 
When I query:
http://localhost:8983/solr/select?q=p*
I get results back, but when I query as 
http://localhost:8983/solr/select?q=P*

I get no results. Is there anything wrong im doing?
Thanks,
Swarag





-- 
View this message in context: 
http://www.nabble.com/Lower-Case-Filter-Factory-tp18930459p18930459.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Newbie question about memory allocation between solr and OS

2008-08-11 Thread Chris Hostetter

: solr is pretty non-standard -- I've written nearly a dozen plugins to
: customize it for my particular needs.  Yet I've been able to do everything I
: need using plugins and without modifying the core code.  It works like a
: charm.

I would *love* to hear more about your use cases for writing plugins...

http://www.nabble.com/Seeking-Anecdotes%3A-Solr-Plugins-to18601039.html#a18601039


-Hoss



Re: Snappuller ssh opening and closing multiple times

2008-08-11 Thread Chris Hostetter

: After checking message logs (/var/log/messages) we see that the
: snappuller script is opening and closing ssh sessions multiple times
: within a very short timeframe, in one case an ssh session is being
: opened and closed 21 times within .3 seconds. This means that an ssh
: login is being executed more than 3000 times daily. Without having
: looked at the scripts in-depth can somebody confirm whether this
: behavior is normal or not. Is this the cause of the rsync process? 

you don't nee much depth to see that snappuller only explicitly execs ssh 
3 times per invocation.  The rest (by process of elimination) must be the 
normal behavior of the (single) call to rsync.

-Hoss



Best strategy for dates in solr-ruby

2008-08-11 Thread Ian Connor
Hi,

I originally used a Ruby Date class for my dates, but found when I set
the type to solr.DateField in the solrconfig.xml, it returned a parse
error. After that, I switched to Time and it worked fine.

However, I now have some dates that are out of the Time range (e.g.
1865) so Date would work better here than time.

What is the best strategy here:
1. Use Dates and treat it as a solr.String;
2. Customize the Date class to output a valid solr.DateField string; or
3. Treat it as a string in ruby and handle to/from Date in my model?

-- 
Regards,

Ian Connor


Re: External Application (JIVE) : integration

2008-08-11 Thread Grant Ingersoll


On Aug 11, 2008, at 10:11 AM, Vicky_Dev wrote:



Hi

I am beginner in Solr.

Question: We have two types of search in site.
a) First one is : Site search . Since all data is within database (our
control environment, this type of search can be implemented using Solr
(based on Lucene index)
b) Second one is: External application / Third party search--Jive  
search

(Community search)

For second type of search web services are exposed. We can pass  
search query

to external application and retrieve results from third party.

Now results from both searches needs to be combined and shown to user

For e.g. If search query contains Canon then site search will give  
site

results (suppose 90 search results) and Jive search gives 30 items

Now on screen we have to shown combined results. Results should be  
sorted by
relevance . Search results shown on screen can contain --first  
result record

from site search , second result record from jive
Third from Jive and next one from site search . It entirely depend on
relevance to search query strin

Is it possible to dynamically add third party results to search  
results and

call function to rearrange search results ?



Is it possible, yes, you can just add in a SearchComponent that adds/ 
sorts the results from Jive into the Solr results.  Is it meaningful,  
doubtful.  What does a relevance score in Jive mean in relation to a  
relevance score for site search?  I would guess nothing.


-Grant


concurrent optimize and update

2008-08-11 Thread Jeremy Hinegardner
Hi all,

What happens internally in solr when an optimize/commit request is submitted by
one process, and some other process starts submitting Xml documents to add?  Is
this generally a safe thing to do?   

Basically I'm continually adding documents to solr, and decided that autocommit
/ would be a good thing for me to use, so I'm using that every 25000 docs or
every 15 minutes.  Now I want to do an optimize every 24 hours or so, so I was
going to cron that up, but do I also need to stop the indexing processes from
submitting xml docs to the update handler while the optimize is taking place?

enjoy,

-jeremy

-- 

 Jeremy Hinegardner  [EMAIL PROTECTED] 



Re: concurrent optimize and update

2008-08-11 Thread Yonik Seeley
On Mon, Aug 11, 2008 at 6:16 PM, Jeremy Hinegardner
[EMAIL PROTECTED] wrote:
 What happens internally in solr when an optimize/commit request is submitted 
 by
 one process, and some other process starts submitting Xml documents to add?  
 Is
 this generally a safe thing to do?

It's safe... the adds will block until the commit or optimize has finished.

-Yonik


Best way to index without diacritics

2008-08-11 Thread Alejandro Garza Gonzalez
I have utf-8 content that I wat to index, however I want searches 
without diacritics to return results.


For example, a document with the words nino en mexico should return 
results like a document with the phrase Niño en México.


Ideally, exact diacritic matches should score higher (searching for 
niño exactly should make a document with niño score higher than a 
document with nino)


Any pointers on how to do this? I found about the 
/solr/.ISOLatin1AccentFilterFactory but it seems to only strip 
diacritics from iso-latin characters. How about UTF diacritics?

--
_ ___ _ _ _ _ _ _ _
*Ing. Alejandro Garza González*
Director, Tecnología e Innovación, Biblioteca
Tecnológico de Monterrey, Campus Monterrey

Tel.: 52(81) 8358-1400 ext. 4037 Fax: 52(81) 8328-4067
Enlace Intercampus: 80 689 4037
http://biblioteca.mty.itesm.mx

El contenido de este mensaje de datos no se considera oferta, propuesta 
o acuerdo, sino hasta que sea confirmado en documento por escrito que 
contenga la firma autógrafa del apoderado legal del ITESM. El contenido 
de este mensaje de datos es confidencial y se entiende dirigido y para 
uso exclusivo del destinatario, por lo que no podrá distribuirse y/o 
difundirse por ningún medio sin la previa autorización del emisor 
original. Si usted no es el destinatario, se le prohíbe su utilización 
total o parcial para cualquier fin.


The content of this data transmission must not be considered an offer, 
proposal, understanding or agreement unless it is confirmed in a 
document signed by a legal representative of ITESM. The content of this 
data transmission is confidential and is intended to be delivered only 
to the addressees. Therefore, it shall not be distributed and/or 
disclosed through any means without the authorization of the original 
sender. If you are not the addressee, you are forbidden from using it, 
either totally or partially, for any purpose.




Highlighting Output

2008-08-11 Thread Tricia Williams

Martin,

I've been over some of the same thoughts you present here in the last 
few years.  The path of least resistance ended up being to deal with the 
highlighting portion of OCRed images outside of Solr.  That's not to say 
it couldn't or shouldn't be done differently.  I briefly even pursued a 
similar course of action evident in 
https://issues.apache.org/jira/browse/SOLR-386.  This would make it 
easier if you wanted to write your own highlighter.


I'm interested to see what others think of your suggestions.  I've 
forwarded this to the solr-user list.


Tricia

 Original Message 
Subject:Highlighting Output
Date:   Mon, 11 Aug 2008 17:21:55 -0400
From:   Martin Owens [EMAIL PROTECTED]
To: 	Tricia Williams [EMAIL PROTECTED], 
[EMAIL PROTECTED]




Hello Solr Users,

I've been thinking about the highlighting functionality in Solr. I
recently had th good fortune to be helped by Tricia Williams with
payload issues relating to highlighting.

What I see though is that the highlighting functionality is heavily tied
to the fragment (highlight context) functionality. This actually makes
it interesting to write a plane highlight method that just returns meta
data (so some other process can do the actual highlighting in some
custom fashion).

So is it worth while to make sure that solr is able to do multiple
different kinds of highlighting, even if it means passing meta data back
in the request? Should we have standard ways to index and read back
payload information if we're dealing with pages, books, co-ordinates
(for highlighting images) and other meta data which is used for
highlights (chat offset, term offset eccettera). I also noticed much of
the highlighting code to do with fragments being duplicated in custom
code.

Other thoughts? does this make things more complex for normal
highlighting?

Best Regards, Martin Owens




number of matching documents incorrect during postOptimize

2008-08-11 Thread Tom Morton
Hi all,
   I'm trying to check that an import using the dataImportHandler was clean
before I take a snapshot of the index to be pulled via snappuller to query
nodes.  One of the checks I do is verify that a certain minimum number of
documents are returned for a query.  I do this in a script that I'm calling
via the postOptimize hook.  However, after a full import the numFound
results from the query are not accurate until after the postOptimize code
completes and so my checks are failing.

Glancing at the code this looks non-trivial to fix as the hook call is
pretty deep in the call stack.
org.apache.solr.handler.dataimport.DataImporter.doFullImport execute
eventually calls
org.apache.solr.update.UpdateHandler.callPostOptimizeCallbacks

One option would be to spawn and background a new job to check the status
with an initial sleep to wait for the postOptimize that spawned it to
finish.  This is pretty ugly and could lead to some race conditions but will
probably work.

Any better recommendations on how to acheive this functionality?

Thanks...Tom


Re: unique key

2008-08-11 Thread Norberto Meijome
On Wed, 6 Aug 2008 12:25:34 +1000
Norberto Meijome [EMAIL PROTECTED] wrote:

 On Tue, 5 Aug 2008 14:41:08 -0300
 Scott Swan [EMAIL PROTECTED] wrote:
 
  I currently have multiple documents that i would like to index but i would 
  like to combine two fields to produce the unique key.
  
  the documents either have 1 or the other fields so by combining the two 
  fields i will get a unique result.
  
  is this possible in the solr schema? 

 
 Hi Scott,
 you can't do that by the schema - you need to do it when you generate your 
 document, before posting it to SOLR.

Hi again,
after reading the DataImportHandler documentation, you could do this too with 
specific configuration in DIH itself. Of course, you have to be using DIH to 
load data into your SOLR ;)

B

_
{Beto|Norberto|Numard} Meijome

Intellectual: 'Someone who has been educated beyond his/her intelligence'
   Arthur C. Clarke, from 3001, The Final Odyssey, Sources.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Can't Delete Record

2008-08-11 Thread Norberto Meijome
On Mon, 11 Aug 2008 06:48:05 -0700 (PDT)
Vj Ali [EMAIL PROTECTED] wrote:

  i also sends coomit tag as well.

maybe you need 

commit/

instead of coomit
?


_
{Beto|Norberto|Numard} Meijome

With sufficient thrust, pigs fly just fine. However, this is not necessarily a 
good idea. It is hard to be sure where they are going to land, and it could be 
dangerous sitting under them as they fly overhead.
   [RFC1925 - section 2, subsection 3]

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


adds / delete within same 'transaction'..

2008-08-11 Thread Norberto Meijome
Hello :)

I *think* i know the answer, but i'd like to confirm :

Say I have 
docid1id/nameold/name/doc

already indexed and commited (ie, 'live' ) 

What happens if I issue:

deleteid1/id/delete
adddocid1/idnamenew/name/doc
commit/

will delete happen first, and then the add, or could it be that the add happens 
before delete, in which case i end up with no more doc id=1 ? 

thanks!!
B
_
{Beto|Norberto|Numard} Meijome

Anyone who isn't confused here doesn't really understand what's going on.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.