Re: How to make a field mandatory in the schema?

2008-08-05 Thread Koji Sekiguchi

required=true ?

  field name=id type=string indexed=true stored=true 
required=true /


Koji

Gudata wrote:

Is it possible to make a field for a document mandatory in the solr schema or
I must validate my xml against my document xml schema before I post it to
SOLR for update?
  




How to make a field mandatory in the schema?

2008-08-05 Thread Gudata

Is it possible to make a field for a document mandatory in the solr schema or
I must validate my xml against my document xml schema before I post it to
SOLR for update?
-- 
View this message in context: 
http://www.nabble.com/How-to-make-a-field-mandatory-in-the-schema--tp18828259p18828259.html
Sent from the Solr - User mailing list archive at Nabble.com.



dismax bq

2008-08-05 Thread Jason Rennie
I'd like to be able to specify query term weights/boosts, which it sounds
like bq was created for.  I think my understanding from the wiki is a bit
rough, so I'm hoping I might be able to get some questions answered here.
Any thoughts/comments are much appreciated.

I initially tried simply passing a dismax-style query in bq w/ an empty q
param and got no results.  Is this because bq terms must specify fields.
I.e. bq=shoes won't work, but bq=title:shoes will boost docs that match
shoes in the title field?

Does bq simply add boosts to query terms?  Say my only qf is title and
q=bootsbq=shoes^0.5.  Does this translate to a lucene query of
q=title:boots^1.0+title:shoes^0.5.  If, instead, q=shoes+boots, would the
lucene query be q=title:boots^1.0+title:shoes^1.5 ?

Is it possible to negatively boost a term without completely negging it?
I.e. is it possible to do something like q=shoesbq=bags^-1.0 ?

Thanks,

Jason


Re: solr 1.3 ??

2008-08-05 Thread Vicky_Dev


Thanks for response Norberto 

Problem is that ..we can not use non release version whilst starting new
project. 

For e.g.: if you use one method which is introduced in DataImportHAndler and
later in point in solr 1.3, same method is removed then we have to revise
all code.

Can we get solr 1.3 release as soon as possible? Otherwise some interim
release (1.2.x) containing DataImportHandler will also a good option. 

Are we expecting solr 1.3 release soon ???

~Vikrant 



Norberto Meijome-2 wrote:
 
 On Mon, 4 Aug 2008 21:13:09 -0700 (PDT)
 Vicky_Dev [EMAIL PROTECTED] wrote:
 
 Can we get solr 1.3 release as soon as possible? Otherwise some interim
 release (1.2.x) containing DataImportHandler will also a good option. 
 
 Any Thoughts?
 
 
 have you tried one of the nightly builds? I've been following it every so
 often...sometimes there is a problem, but hardly ever... you can find a
 build
 you are comfortable with, and it'll be far closer to the actual 1.3 when
 released than 1.2 .
 
 B
 
 _
 {Beto|Norberto|Numard} Meijome
 
 Quantum Logic Chicken:
   The chicken is distributed probabalistically on all sides of the
   road until you observe it on the side of your course.
 
 I speak for myself, not my employer. Contents may be hot. Slippery when
 wet.
 Reading disclaimers makes you go blind. Writing them is worse. You have
 been
 Warned.
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.3tp18824290p18833196.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.3 ??

2008-08-05 Thread Shalin Shekhar Mangar
Hi Vikrant,

I would second Norberto's suggestion to use a nightly for now. Release
planning for 1.3 is underway and we are actively working towards that.
Hopefully, we should be able to get it out in a month assuming there are no
show stoppers, however there is no hard date yet.

There's been a lot of changes in Solr since 1.2 release which makes it very
difficult for DataImportHandler to work with Solr 1.2. Since we are close
towards a new release, I don't forsee any activity towards making
DataImportHandler compatible with 1.2. Backward incompatible changes in the
new features introduced with 1.3 are rare even in nightly builds but yes,
they are possible. However, I don't forsee any major overhaul in
DataImportHandler right now.

On Tue, Aug 5, 2008 at 8:40 PM, Vicky_Dev [EMAIL PROTECTED]wrote:



 Thanks for response Norberto

 Problem is that ..we can not use non release version whilst starting new
 project.

 For e.g.: if you use one method which is introduced in DataImportHAndler
 and
 later in point in solr 1.3, same method is removed then we have to revise
 all code.

 Can we get solr 1.3 release as soon as possible? Otherwise some interim
 release (1.2.x) containing DataImportHandler will also a good option.

 Are we expecting solr 1.3 release soon ???

 ~Vikrant

 --
 View this message in context:
 http://www.nabble.com/solr-1.3tp18824290p18833196.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Diagnostic tools

2008-08-05 Thread Yonik Seeley
On Tue, Aug 5, 2008 at 12:43 PM, Kashyap, Raghu
[EMAIL PROTECTED] wrote:
 Are there are tools that are available to view the indexing process? We
 have a cron process which posts XML files to the solr index server.
 However, we are NOT seeing the documents posted correctly and we are
 also NOT getting any errors from the client

You need to send a commit before index changes become visible.

-Yonik


Unlock on startup

2008-08-05 Thread sundar shankar
Hi All,
I am having to test solr indexing quite a bit on my local and dev 
environments. I had the 
 
unlockOnStartuptrue/unlockOnStartup. 
 
But restarting my server still doesn't seem to remove the writelock file. Is 
there some other configuration that I might have to do get this fixed.
 
 
My Configurations :
 
Solr 1.3 on Windows xp(local) and RHL on dev box.
Jboss 4.05
 
Regards
Sundar
_
Searching for the best deals on travel? Visit MSN Travel.
http://msn.coxandkings.co.in/cnk/cnk.do

unique key

2008-08-05 Thread Scott Swan
I currently have multiple documents that i would like to index but i would 
like to combine two fields to produce the unique key.


the documents either have 1 or the other fields so by combining the two 
fields i will get a unique result.


is this possible in the solr schema? 



Re: solr 1.3 ??

2008-08-05 Thread Grant Ingersoll


On Aug 5, 2008, at 11:10 AM, Vicky_Dev wrote:




Thanks for response Norberto

Problem is that ..we can not use non release version whilst starting  
new

project.


I would think that is when you can most live with it, since you aren't  
close to production yet, but that's your call, not mine.





For e.g.: if you use one method which is introduced in  
DataImportHAndler and
later in point in solr 1.3, same method is removed then we have to  
revise

all code.

Can we get solr 1.3 release as soon as possible? Otherwise some  
interim

release (1.2.x) containing DataImportHandler will also a good option.


The DIH is marked as experimental, anyway, so just b/c you incorporate  
it in 1.3 does not mean it isn't going to change in 1.4.


That being said, we do strive to maintain back-compatibility.

-Grant


RE: Out of memory on Solr sorting

2008-08-05 Thread sundar shankar
Hi all,
I seemed to have found the solution to this problem. Apparently, 
allocating enough virtual memory on the server seems to only solve on half of 
the problem. Even after allocating 4 gigs of Virtual memory on jboss server, I 
still did get the Out of memory on sorting. 
 
I didn't how ever notice that the LRU cache on my config was set to default 
which was still 512 megs of max memory. I had to increase that to a round 2 
gigs and the sorting did work perfectly ok.
 
Even though I am satisfied that I have found the solution to the problem, i am 
still not satisfied to know that Sort consumes so much memory. In no products 
have I seen sorting 10 fields take up 1 gig and half of virtual memory. I am 
not sure, if there could be a better implementation of this. But something 
doesn't seem right to me.
 
Thanks for all your support. It has truly been overwhelming.
 
Sundar



 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of 
 memory on Solr sorting Date: Tue, 29 Jul 2008 10:43:05 -0700  A sneaky 
 source of OutOfMemory errors is the permanent generation. If you add this: 
 -XX:PermSize=64m -XX:MaxPermSize=96m You will increase the size of the 
 permanent generation. We found this helped.  Also note that when you 
 undeploy a war file, the old deployment has permanent storage that is not 
 reclaimed, and so each undeploy/redeploy cycle eats up the permanent 
 generation pool.  -Original Message- From: david w [mailto:[EMAIL 
 PROTECTED]  Sent: Tuesday, July 29, 2008 7:20 AM To: 
 solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting  
 Hi, Daniel  I got the same probem like Sundar. Is that possible to tell me 
 what profiling tool you are using?  thx a lot.  /David  On Tue, Jul 
 29, 2008 at 8:19 PM, Daniel Alheiros [EMAIL PROTECTED]wrote:   Hi 
 Sundar.   Well it would be good if you could do some profiling on your 
 Solr app.  I've done it during the indexing process so I could figure out 
 what   was going on in the OutOfMemoryErrors I was getting.   But you 
 won't definitelly need to have as much memory as your whole   index size. I 
 have a 3.5 million documents (aprox. 10Gb) running on   this 2Gb heap VM. 
   Cheers,  Daniel   -Original Message-  From: sundar 
 shankar [mailto:[EMAIL PROTECTED]  Sent: 23 July 2008 23:45  To: 
 solr-user@lucene.apache.org  Subject: RE: Out of memory on Solr sorting  
   Hi Daniel,  I am afraid that didnt solve my problem. I was guessing my 
   problem was that I have too much of data and too little memory   
 allocated for that. I happened to read in couple of the posts which   
 mentioned that I need VM that is close to the size of my data(folder).   I 
 have like 540 Megs now and a little more than a million and a half   docs. 
 Ideally in that case 512 megs should be enough for me. In fact I   am able 
 to perform all other operations now, commit, optmize, select,   update, 
 nightly cron jobs to index data again. etc etc with no   hassles. Even my 
 load tests perform very well. Just the sort and it   doesnt seem to work. I 
 allocated  2 gigs of memory now. Still same results. Used the GC params u 
 gave me   too. No change what so ever. Am not sure, whats going on. Is 
 there   something that I can do to find out how much is needed in actuality 
 as   my production server might need to be configured in accordance.   
 I dont store any documents. We basically fetch standard column data   from 
 oracle database store them into Solr fields. Before I had   EdgeNGram 
 configured and had Solr 1.2, My data size was less that half   of what it 
 is right now. I guess if I remember right, it was of the   order of 100 
 megs. The max size of a field right now might not cross a 100 chars too.  
 Quizzled even more now.   -Sundar   P.S: My configurations :  Solr 
 1.3  Red hat  540 megs of data (1855013 docs)  2 gigs of memory 
 installed and allocated like this   JAVA_OPTS=$JAVA_OPTS -Xms2048m 
 -Xmx2048m -XX:MinHeapFreeRatio=50   -XX:NewSize=1024m  -XX:NewRatio=2 
 -Dsun.rmi.dgc.client.gcInterval=360  
 -Dsun.rmi.dgc.server.gcInterval=360   Jboss 4.05 Subject: 
 RE: Out of memory on Solr sorting   Date: Wed, 23 Jul 2008 10:49:06 +0100 
   From: [EMAIL PROTECTED]   To: solr-user@lucene.apache.org 
 Hi I haven't read the whole thread so I will take my chances here. 
 I've been fighting recently to keep my Solr instances stable because 
they were frequently crashing with OutOfMemoryErrors. I'm using Solr  
  1.2 and when it happens there is a bug that makes the index locked
 unless you restart Solr... So in my cenario it was extremelly  damaging.  
After some profiling I realized that my major problem was caused by  
   the way the JVM heap was being used as I haven't configured it to
 run using any advanced configuration (I had just made it bigger -Xmx 
 and Xms 1.5 Gb), it's running on Sun JVM 1.5 (the most recent1.5   
 available) and it's deployed 

RE: Out of memory on Solr sorting

2008-08-05 Thread Fuad Efendi

Hi Sundar,


If increasing LRU cache helps you:
- you are probably using 'tokenized' field for sorting (could you  
confirm please?)...


...you should use 'non-tokenized single-valued non-boolean' for better  
performance of

sorting...


Fuad Efendi
==
http://www.tokenizer.org



Quoting sundar shankar [EMAIL PROTECTED]:


Hi all,
I seemed to have found the solution to this problem.   
Apparently, allocating enough virtual memory on the server seems to   
only solve on half of the problem. Even after allocating 4 gigs of   
Virtual memory on jboss server, I still did get the Out of memory on  
 sorting.


I didn't how ever notice that the LRU cache on my config was set to   
default which was still 512 megs of max memory. I had to increase   
that to a round 2 gigs and the sorting did work perfectly ok.


Even though I am satisfied that I have found the solution to the   
problem, i am still not satisfied to know that Sort consumes so much  
 memory. In no products have I seen sorting 10 fields take up 1 gig   
and half of virtual memory. I am not sure, if there could be a   
better implementation of this. But something doesn't seem right to me.


Thanks for all your support. It has truly been overwhelming.

Sundar







RE: Out of memory on Solr sorting

2008-08-05 Thread sundar shankar



The field is of type text_ws. Is this not recomended. Should I use text 
instead?

 Date: Tue, 5 Aug 2008 10:58:35 -0700 From: [EMAIL PROTECTED] To: [EMAIL 
 PROTECTED] Subject: RE: Out of memory on Solr sorting  Hi Sundar,   If 
 increasing LRU cache helps you: - you are probably using 'tokenized' field 
 for sorting (could you  confirm please?)...  ...you should use 
 'non-tokenized single-valued non-boolean' for better  performance of 
 sorting...   Fuad Efendi == http://www.tokenizer.org  
 Quoting sundar shankar [EMAIL PROTECTED]:   Hi all,  I seemed to have 
 found the solution to this problem.   Apparently, allocating enough virtual 
 memory on the server seems to   only solve on half of the problem. Even 
 after allocating 4 gigs of   Virtual memory on jboss server, I still did 
 get the Out of memory on   sorting.   I didn't how ever notice that the 
 LRU cache on my config was set to   default which was still 512 megs of max 
 memory. I had to increase   that to a round 2 gigs and the sorting did work 
 perfectly ok.   Even though I am satisfied that I have found the solution 
 to the   problem, i am still not satisfied to know that Sort consumes so 
 much   memory. In no products have I seen sorting 10 fields take up 1 gig  
  and half of virtual memory. I am not sure, if there could be a   better 
 implementation of this. But something doesn't seem right to me.   Thanks 
 for all your support. It has truly been overwhelming.   Sundar

Movies, sports  news! Get your daily entertainment fix, only on live.com Try 
it now! 
_
Searching for the best deals on travel? Visit MSN Travel.
http://msn.coxandkings.co.in/cnk/cnk.do

RE: Out of memory on Solr sorting

2008-08-05 Thread Fuad Efendi
My understanding of Lucene Sorting is that it will sort by 'tokens'  
and not by 'full fields'... so that for sorting you need 'full-string'  
(non-tokenized) field, and to search you need another one tokenized.


For instance, use 'string' for sorting, and 'text_ws' for search; and  
use 'copyField'... (some memory for copyField)


Sorting using tokenized field: 100,000 documents, each 'Book Title'  
consists of 10 tokens in average, ... - total 1,000,000 (probably  
unique) tokens in a hashtable; with nontokenized field - 100,000  
entries, and Lucene internal FieldCache is used instead of SOLR LRU.



Also, with tokenized fields 'sorting' is not natural (alphabetical order)...


Fuad Efendi
==
http://www.linkedin.com/in/liferay

Quoting sundar shankar [EMAIL PROTECTED]:


The field is of type text_ws. Is this not recomended. Should I use  
 text instead?


If increasing LRU cache helps you: -  you are probably using  
'tokenized' field for sorting (could you   confirm please?)...  
...you should use 'non-tokenized  single-valued non-boolean' for  
better performance of sorting...





RE: Out of memory on Solr sorting

2008-08-05 Thread Fuad Efendi

Best choice for sorting field:
!-- This is an example of using the KeywordTokenizer along
 With various TokenFilterFactories to produce a sortable field
 that does not include some properties of the source text
  --
fieldType name=alphaOnlySort class=solr.TextField  
sortMissingLast=true omitNorms=true


- case-insentitive etc...


I might be partially wrong about SOLR LRU Cache but it is used somehow  
in your specific case... 'filterCache' is probably used for  
'tokenized' sorting: it stores (token, DocList)...



Fuad Efendi
==
http://www.tokenizer.org


Quoting Fuad Efendi [EMAIL PROTECTED]:


My understanding of Lucene Sorting is that it will sort by 'tokens' and
not by 'full fields'... so that for sorting you need 'full-string'
(non-tokenized) field, and to search you need another one tokenized.

For instance, use 'string' for sorting, and 'text_ws' for search; and
use 'copyField'... (some memory for copyField)

Sorting using tokenized field: 100,000 documents, each 'Book Title'
consists of 10 tokens in average, ... - total 1,000,000 (probably
unique) tokens in a hashtable; with nontokenized field - 100,000
entries, and Lucene internal FieldCache is used instead of SOLR LRU.


Also, with tokenized fields 'sorting' is not natural (alphabetical order)...


Fuad Efendi
==
http://www.linkedin.com/in/liferay

Quoting sundar shankar [EMAIL PROTECTED]:


The field is of type text_ws. Is this not recomended. Should I   
use  text instead?


If increasing LRU cache helps you: -  you are probably using   
'tokenized' field for sorting (could you   confirm please?)...   
...you should use 'non-tokenized  single-valued non-boolean' for   
better performance of sorting...






Re: Solr Logo thought

2008-08-05 Thread Stephen Weiss
My issue with the logos presented was they made solr look like a  
school project instead of the powerful tool that it is.  The tricked  
out font or whatever just usually doesn't play well with the business  
types... they want serious-looking software.  First impressions are  
everything.  While the fiery colors are appropriate for something  
named Solr, you can play with that without getting silly - take a look  
at:


http://www.ascsolar.com/images/asc_solar_splash_logo.gif
http://www.logostick.com/images/EOS_InvestmentingLogo_lg.gif

(Luckily there are many businesses that do solar energy!)

They have the same elements but with a certain simplicity and elegance.

I know probably some people don't care if it makes the boss or client  
happy, but, these are the kinds of seemingly insignificant things that  
make people choose a bad, proprietary piece of junk over something  
solid and open-source... it's all about appearances!  The people  
making the decision often have little else to go on, unfortunately.


--
Steve

On Aug 5, 2008, at 3:45 PM, Lukáš Vlček wrote:


Hi,

I would like to give it a shot. Are there any solr logo success
criteria/requirements? Any hints or suggestions from community is  
welcomed.
Just close your eyes, start dreaming and send my couple of words  
about what

you see... I am all ears.

Also I found that the wiki mentions some
genesishttp://wiki.apache.org/solr/FAQ#head-6d74c2bb4171b0908a4695cbb24acd368a29dc06 
of

Solar/Solr technology but still I don't understand if the relation to
sun
is intentional or coincidence.

Regards,
Lukas

On Tue, Aug 5, 2008 at 4:08 AM, Norberto Meijome  
[EMAIL PROTECTED]wrote:



On Mon, 4 Aug 2008 09:29:30 -0700
Ryan McKinley [EMAIL PROTECTED] wrote:



If there is a still room for new log design for Solr and the
community is
open for it then I can try to come up with some proposal. Doing  
logo

for
Mahout was really interesting experience.



In my opinion, yes  I'd love to see more effort put towards  the
logo.  I have stayed out of this discussion since I don't really  
think

any of the logos under consideration are complete.  (I begged some
friends to do two of the three logos under consideration)  I would
love to refine them, but time... oooh time.


+1

If we are going to change what we have, i'd love to see some more  
options ,

or
better quality - no offence meant , but those logos aren't really  
a huge

improvement or departure from the current one.

I think whatever we change to we'll be wanting to use it for a long  
time.


B
_
{Beto|Norberto|Numard} Meijome

If you find a solution and become attached to it, the solution may  
become

your
next problem.

I speak for myself, not my employer. Contents may be hot. Slippery  
when

wet.
Reading disclaimers makes you go blind. Writing them is worse. You  
have

been
Warned.





--
http://blog.lukas-vlcek.com/




Re: Out of memory on Solr sorting

2008-08-05 Thread Yonik Seeley
On Tue, Aug 5, 2008 at 1:59 PM, Fuad Efendi [EMAIL PROTECTED] wrote:
 If increasing LRU cache helps you:
 - you are probably using 'tokenized' field for sorting (could you confirm
 please?)...

Sorting does not utilize any Solr caches.

-Yonik


Re: multivaluefield and order

2008-08-05 Thread Smiley, David W. (DSMILEY)
Yes.


On 8/5/08 4:58 PM, Ian Connor [EMAIL PROTECTED] wrote:

 Hi,
 
 When you store a multivaluefield in a given order
 ['one','two','three','four'], will it always return the values in that
 order?



Re: Out of memory on Solr sorting

2008-08-05 Thread Fuad Efendi
I know, and this is strange... I was guessing filterCache is used  
implicitly to get DocSet for token; as Sundar wrote, increase of  
LRUCache helped him (he is sorting on 'text-ws' field)

-Fuad

If increasing LRU cache helps you:
- you are probably using 'tokenized' field for sorting (could you confirm
please?)...


Sorting does not utilize any Solr caches.

-Yonik







RE: Out of memory on Solr sorting

2008-08-05 Thread sundar shankar
Yes this is what I did. I got an out of memory while executing a query with a 
sort param
 
1. Stopped Jboss server
 
2. 
filterCache  class=solr.LRUCache  size=2048  initialSize=512 
 autowarmCount=256/
   !-- queryResultCache caches results of searches - ordered lists of 
document ids (DocList) based on a query, a sort, and the range of 
documents requested.  --queryResultCache  class=solr.LRUCache  
size=2048  initialSize=512  autowarmCount=256/
  !-- documentCache caches Lucene Document objects (the stored fields for each 
document).   Since Lucene internal document ids are transient, this cache 
will not be autowarmed.  --documentCache  class=solr.LRUCache  
size=2048  initialSize=512  autowarmCount=0/
In these 3 params, I changed size from 512 to 2048. 3. Restarted the server
4. Ran query again.
It worked just fine. after that. I am currently reinexing, replaving the 
text_ws to string and having the default size of all 3 caches to 512 and seeing 
if the problem goes away.
 
-Sundar



 Date: Tue, 5 Aug 2008 14:05:05 -0700 From: [EMAIL PROTECTED] To: 
 solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting  I 
 know, and this is strange... I was guessing filterCache is used  implicitly 
 to get DocSet for token; as Sundar wrote, increase of  LRUCache helped him 
 (he is sorting on 'text-ws' field) -Fuad  If increasing LRU cache helps 
 you:  - you are probably using 'tokenized' field for sorting (could you 
 confirm  please?)...   Sorting does not utilize any Solr caches.   
 -Yonik
_
Searching for the best deals on travel? Visit MSN Travel.
http://msn.coxandkings.co.in/cnk/cnk.do

Re: multivaluefield and order

2008-08-05 Thread Ian Connor
Thanks for the quick reply.

I was searching for multivalue field, multi value, order and positon
and didn't find the answer.

However, with this little bit of keyword loading, the next person to
search will be all good.

Order IS conserved when storing mutivalued fields in solr and lucene.

On Tue, Aug 5, 2008 at 4:59 PM, Smiley, David W. (DSMILEY)
[EMAIL PROTECTED] wrote:
 Yes.


 On 8/5/08 4:58 PM, Ian Connor [EMAIL PROTECTED] wrote:

 Hi,

 When you store a multivaluefield in a given order
 ['one','two','three','four'], will it always return the values in that
 order?





-- 
Regards,

Ian Connor


Re: Indexing time boosts on particular field

2008-08-05 Thread Erick Erickson
I think you want to boost specific clauses at *search* time, not
index time. Something like adding a clause
+CourseType:MATHMATICS^10

Best
Erick

On Tue, Aug 5, 2008 at 4:35 PM, Vicky_Dev [EMAIL PROTECTED]wrote:


 Hi

 Requirement: For given document , if course type = MATHMATICS then search
 results contains Course type MATHMATICS  documents then show search
 results should show Course type MATHMATICS  documents prior than any
 other
 documents.

 Course Type will be one of the field whist creating solr index

 I have went through Index-time boost documentation. It is saying:

 An Index-time boost on a value of a multiValued field applies to all values
 for that field and not on individual values.

 Is there any way to boost Course type MATHMATICS  documents at index
 time?

 Thanks in advance
 ~Vikrant



 --
 View this message in context:
 http://www.nabble.com/Indexing-time-boosts-on-particular-field-tp18839400p18839400.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Out of memory on Solr sorting

2008-08-05 Thread Fuad Efendi
Sundar, very strange that increase of size/initialSize of LRUCache  
helps with OutOfMemoryError...


2048 is number of entries in cache and _not_ 2Gb of memory...

Making size==initialSize of HashMap-based LRUCache would help with  
performance anyway; may be with OOMs (probably no need to resize  
HashMap...)





In these 3 params, I changed size from 512 to 2048. 3. Restarted the server




sorting  I know, and this is strange... I was guessing   
filterCache is used  implicitly to get DocSet for token; as Sundar  
 wrote, increase of  LRUCache helped him (he is sorting on   
'text-ws' field) -Fuad  If increasing LRU cache helps you:
- you are probably using 'tokenized' field for sorting (could you   
confirm  please?)...   Sorting does not utilize any Solr   
caches.   -Yonik   

_
Searching for the best deals on travel? Visit MSN Travel.
http://msn.coxandkings.co.in/cnk/cnk.do






config reload JMX capabilities

2008-08-05 Thread Kashyap, Raghu
One of the requirements we have is that when we deploy new data for solr
config (synonyms, dictionary etc) we should NOT be restarting the solr
instances for the changes to take effect. 

Are there ConfigReload capabilities through JMX that can help us do
this?

Thanks in Advance

 

-Raghu



RE: Diagnostic tools

2008-08-05 Thread Kashyap, Raghu
Yes we are sending the commits.

-Raghu

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Tuesday, August 05, 2008 12:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Diagnostic tools

On Tue, Aug 5, 2008 at 12:43 PM, Kashyap, Raghu
[EMAIL PROTECTED] wrote:
 Are there are tools that are available to view the indexing process?
We
 have a cron process which posts XML files to the solr index server.
 However, we are NOT seeing the documents posted correctly and we are
 also NOT getting any errors from the client

You need to send a commit before index changes become visible.

-Yonik


RE: Out of memory on Solr sorting

2008-08-05 Thread sundar shankar
Oh Wow, I didnt know that was the case. I am completely left baffled now. BAck 
to square one I guess. :)

 Date: Tue, 5 Aug 2008 14:31:28 -0700 From: [EMAIL PROTECTED] To: 
 solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting  
 Sundar, very strange that increase of size/initialSize of LRUCache  helps 
 with OutOfMemoryError...  2048 is number of entries in cache and _not_ 2Gb 
 of memory...  Making size==initialSize of HashMap-based LRUCache would help 
 with  performance anyway; may be with OOMs (probably no need to resize  
 HashMap...) In these 3 params, I changed size from 512 to 2048. 3. 
 Restarted the server sorting  I know, and this is strange... I 
 was guessing   filterCache is used  implicitly to get DocSet for token; 
 as Sundar   wrote, increase of  LRUCache helped him (he is sorting on  
  'text-ws' field) -Fuad  If increasing LRU cache helps you:- 
 you are probably using 'tokenized' field for sorting (could you   confirm 
  please?)...   Sorting does not utilize any Solr   caches.   
 -Yonik 
 _  
 Searching for the best deals on travel? Visit MSN Travel.  
 http://msn.coxandkings.co.in/cnk/cnk.do   
_
Searching for the best deals on travel? Visit MSN Travel.
http://msn.coxandkings.co.in/cnk/cnk.do

Re: Sum of one field

2008-08-05 Thread Leonardo Dias

Hello, Otis!

I believe the best approach would be hacking the SolrIndexSearcher in 
our case. Let me explain further what we want to know with a Car ad 
website example.


Imagine that you have a website called CarStores and that you let people 
search by brand, sorting by price etc.


So I'm looking for a Ferrari. CarStore says that there are 5 ads for 
Ferrari, but one ad has 2 Ferraris being sold, the other ad has 3 
Ferraris and all the others have 1 Ferrari each, meaning that there are 
5 ads and 8 Ferraris. And yes, I'm doing an example with Fibonacci 
numbers. ;)


Since I believe this could be a solution not only for us, maybe it's a 
simple feature SOLR could have embedded in its code base.


If you guys think this is a good idea, please let me know. I believe it 
would be very useful to let people understand what are they finding when 
they search.


Best,

Leonardo.

Otis Gospodnetic escreveu:

Leonardo,
You'd have to read that quantity fields for all matching documents one way or 
the other.
One way is by getting all results and pulling that field out, so you can get 
the sum..
Another way is to hack the SolrIndexSearcher and get this value in one of the 
HitCollector collect method calls.
Another possibility, if your index is fairly static, might be to read it all 
documents' (not just matches') quantity field and store that in a 
docID-quantity map structure that lets you look up quantity for any docID you 
want.


There may be other/better ways of doing this, but this is what comes to (my) 
mind first.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: Leonardo Dias [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, August 4, 2008 1:19:45 PM
Subject: Sum of one field

Everyone exhibits your search for x has returned y results on the top 
of the results page, but we need something else, which would be 
something like your search for x returned y results in z records, 
being z the numdocs of the SOLR response and y a SUM(quantity) of all 
returned records.


In SQL you can do something like:

SELECT count(1), sum(quantity) FROM table

But with SOLR we don't know how can we do the same without having to 
return all the XML result for the field quantity and then sum it to 
show the total. Any hints on how to do it in a better way?




cheers,

Leonardo




  




Re: Unlock on startup

2008-08-05 Thread Koji Sekiguchi

Try:

lockTypesingle/lockType

Koji

sundar shankar wrote:

Hi All,
I am having to test solr indexing quite a bit on my local and dev environments. I had the 
 
unlockOnStartuptrue/unlockOnStartup. 
 
But restarting my server still doesn't seem to remove the writelock file. Is there some other configuration that I might have to do get this fixed.
 
 
My Configurations :
 
Solr 1.3 on Windows xp(local) and RHL on dev box.

Jboss 4.05
 
Regards

Sundar
_
Searching for the best deals on travel? Visit MSN Travel.
http://msn.coxandkings.co.in/cnk/cnk.do
  




Re: Diagnostic tools

2008-08-05 Thread Norberto Meijome
On Tue, 5 Aug 2008 11:43:44 -0500
Kashyap, Raghu [EMAIL PROTECTED] wrote:

 Hi,

Hi Kashyap,
please don't hijack topic threads.

http://en.wikipedia.org/wiki/Thread_hijacking

thanks!!
B
_
{Beto|Norberto|Numard} Meijome

Software QA is like cleaning my cat's litter box: Sift out the big chunks. Stir 
in the rest. Hope it doesn't stink.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: unique key

2008-08-05 Thread Norberto Meijome
On Tue, 5 Aug 2008 14:41:08 -0300
Scott Swan [EMAIL PROTECTED] wrote:

 I currently have multiple documents that i would like to index but i would 
 like to combine two fields to produce the unique key.
 
 the documents either have 1 or the other fields so by combining the two 
 fields i will get a unique result.
 
 is this possible in the solr schema? 
 

Hi Scott,
you can't do that by the schema - you need to do it when you generate your 
document, before posting it to SOLR.

btw, please don't hijack topic threads.

http://en.wikipedia.org/wiki/Thread_hijacking

thanks!!
B
_
{Beto|Norberto|Numard} Meijome

Law of Conservation of Perversity: 
  we can't make something simpler without making something else more complex

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Sum of one field

2008-08-05 Thread Norberto Meijome
On Tue, 05 Aug 2008 18:58:42 -0300
Leonardo Dias [EMAIL PROTECTED] wrote:

 So I'm looking for a Ferrari. CarStore says that there are 5 ads for 
 Ferrari, but one ad has 2 Ferraris being sold, the other ad has 3 
 Ferraris and all the others have 1 Ferrari each, meaning that there are 
 5 ads and 8 Ferraris. And yes, I'm doing an example with Fibonacci 
 numbers. ;)

why not create one separate document per car? It'll make it easier (for the 
client) to manage too when one of the cars is sold but not the other 4

B
_
{Beto|Norberto|Numard} Meijome

With sufficient thrust, pigs fly just fine. However, this is not necessarily a 
good idea. It is hard to be sure where they are going to land, and it could be 
dangerous sitting under them as they fly overhead.
   [RFC1925 - section 2, subsection 3]

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: config reload JMX capabilities

2008-08-05 Thread Shalin Shekhar Mangar
On Wed, Aug 6, 2008 at 3:09 AM, Kashyap, Raghu [EMAIL PROTECTED]wrote:


 Are there ConfigReload capabilities through JMX that can help us do
 this?


No, only statistics are exposed through JMX at present.

SOLR-561 enables support for automatic config file replication to slaves
without downtime. However, a lot of work is left in that.

https://issues.apache.org/jira/browse/SOLR-561

-- 
Regards,
Shalin Shekhar Mangar.