Re: DIH ConcurrentModificationException

2009-05-05 Thread Shalin Shekhar Mangar
This is fixed in trunk.

2009/5/5 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 hi Walter,
 it needs synchronization. I shall open a bug.



 On Mon, May 4, 2009 at 7:31 PM, Walter Ferrara walters...@gmail.com
 wrote:
  I've got a ConcurrentModificationException during a cron-ed delta import
 of
  DIH, I'm using multicore solr nightly from hudson 2009-04-02_08-06-47.
  I don't know if this stacktrace maybe useful to you, but here it is:
 
  java.util.ConcurrentModificationException
 at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(Unknown
  Source)
 at java.util.LinkedHashMap$EntryIterator.next(Unknown Source)
 at java.util.LinkedHashMap$EntryIterator.next(Unknown Source)
 at
 
 org.apache.solr.handler.dataimport.DataImporter.getStatusMessages(DataImporter.java:384)
 at
 
 org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:210
  )
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
 at
 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at
  org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at
 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at
  org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at
  org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at
  org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at
 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
 at
 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at
  org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
 at org.mortbay.jetty.Server.handle(Server.java:285)
 at
  org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
 at
 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
 at
 org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
 at
 org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
 at
 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
 at
 
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 
  of-course due to the nature of this exception I doubt it can be
 reproduced
  easily (this is the only one I've got, and the croned job runned a lot of
  times), but maybe should a synchronized be put somewhere?
  ciao,
  Walter
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
Regards,
Shalin Shekhar Mangar.


Re: Getting access to current core's conf dir

2009-05-05 Thread Shalin Shekhar Mangar
On Tue, May 5, 2009 at 11:11 AM, Amit Nithian anith...@gmail.com wrote:

 I am trying to get at the configuration directory in an implementation of
 the SolrEventListener.


Implement SolrCoreAware and use solrCore.getResourceLoader().getConfigDir()

-- 
Regards,
Shalin Shekhar Mangar.


Externalize database parameters from data-config.xml

2009-05-05 Thread con

I have a spring-ibatis project running in my development environment.
Now i am setting solr search as part of the application. Everything works
fine as expected and solr is providing good results.

The only problem i am having is that i have to set the database parameters
including the username and password in the data-config.xml.
Two drawbacks in this aproach is that 
1) the db parameters are been exposed to outside
2) dynamically loading the database params on demand (as i have multiple
databases)
I think this can be solved by using the db configurations that is defined in
the spring configs.

Is there anyway to achieve this. Or atleast can i externalise these
parameters from the data-config.xml, so that i can encrypt the password.

Thanks
con
-- 
View this message in context: 
http://www.nabble.com/Externalize-database-parameters-from-data-config.xml-tp23384483p23384483.html
Sent from the Solr - User mailing list archive at Nabble.com.



Spellcheck.build

2009-05-05 Thread Andrew McCombe
Hi

I have imported/indexed around half a million rows from my database
into solr and then rebuilt the spellchecker.  I've also setup the
delta-import to handle and  new or changed rows from the database.  Do
I need to rebuild the spellchecker each time I run the delta-import?

Regards
Andrew


Re: Spellcheck.build

2009-05-05 Thread Markus Jelsma - Buyways B.V.
Hi,


I suppose if the new records contain terms which are not yet found in
the spellcheck index/dictionary, it should be rebuilt.


Cheers,


On Tue, 2009-05-05 at 11:49 +0100, Andrew McCombe wrote:

 Hi
 
 I have imported/indexed around half a million rows from my database
 into solr and then rebuilt the spellchecker.  I've also setup the
 delta-import to handle and  new or changed rows from the database.  Do
 I need to rebuild the spellchecker each time I run the delta-import?
 
 Regards
 Andrew


Re: Externalize database parameters from data-config.xml

2009-05-05 Thread Avlesh Singh
If Solr is a part of your application, then why not have tokens in your
data-config.xml as place holders for db username, password etc which can be
replaced with the actual values as a part of your project build/deploy task.

Cheers
Avlesh

On Tue, May 5, 2009 at 3:32 PM, con convo...@gmail.com wrote:


 I have a spring-ibatis project running in my development environment.
 Now i am setting solr search as part of the application. Everything works
 fine as expected and solr is providing good results.

 The only problem i am having is that i have to set the database parameters
 including the username and password in the data-config.xml.
 Two drawbacks in this aproach is that
1) the db parameters are been exposed to outside
2) dynamically loading the database params on demand (as i have
 multiple
 databases)
 I think this can be solved by using the db configurations that is defined
 in
 the spring configs.

 Is there anyway to achieve this. Or atleast can i externalise these
 parameters from the data-config.xml, so that i can encrypt the password.

 Thanks
 con
 --
 View this message in context:
 http://www.nabble.com/Externalize-database-parameters-from-data-config.xml-tp23384483p23384483.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Spellcheck.build

2009-05-05 Thread Shalin Shekhar Mangar
On Tue, May 5, 2009 at 4:19 PM, Andrew McCombe eupe...@gmail.com wrote:


 I have imported/indexed around half a million rows from my database
 into solr and then rebuilt the spellchecker.  I've also setup the
 delta-import to handle and  new or changed rows from the database.  Do
 I need to rebuild the spellchecker each time I run the delta-import?


Yes, you'd need to rebuild the index. Also look at buildOnCommit or
buildOnOptimize configuration parameter in the spell check configuration.

http://wiki.apache.org/solr/SpellCheckComponent#head-4375b11a78463f5f8b70967074d0787ea3778592
-- 
Regards,
Shalin Shekhar Mangar.


Re: Externalize database parameters from data-config.xml

2009-05-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
There are two options.

1) pass on the user name and password as request parameters and use
the request parameters in the datasource

dataSource user=x password=${dataimporter.request.pwd} /
where pwd is a request parameter passed

2) if you can create jndi datasources in the appserver use the
jndiName attribute in dataSource



On Tue, May 5, 2009 at 3:32 PM, con convo...@gmail.com wrote:

 I have a spring-ibatis project running in my development environment.
 Now i am setting solr search as part of the application. Everything works
 fine as expected and solr is providing good results.

 The only problem i am having is that i have to set the database parameters
 including the username and password in the data-config.xml.
 Two drawbacks in this aproach is that
        1) the db parameters are been exposed to outside
        2) dynamically loading the database params on demand (as i have 
 multiple
 databases)
 I think this can be solved by using the db configurations that is defined in
 the spring configs.

 Is there anyway to achieve this. Or atleast can i externalise these
 parameters from the data-config.xml, so that i can encrypt the password.

 Thanks
 con
 --
 View this message in context: 
 http://www.nabble.com/Externalize-database-parameters-from-data-config.xml-tp23384483p23384483.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Wildcard with Double Quotes Query

2009-05-05 Thread dabboo

Hi,

I am searching for English Portal using double quotes and I am getting all
the records which contains English Portal as together anywhere in any
field.

for e.g. records are appearing which have, English Portal, English Portal
Sacromanto, Core English Portal etc.

Problem is, if I am passing only nglish Portal then it is not returning
any of these results. Is there any way I can pass the wildcards as prefix
and suffix with this search string and get the desired results.

Please suggest.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23387746.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Wildcard with Double Quotes Query

2009-05-05 Thread Erick Erickson
I don't remember the answer, but I'm sure this has been discussed
many times on the mailing list. Have you tried searching that? You're
essentially asking about wildcarded phrase queries

Best
Erick

On Tue, May 5, 2009 at 9:52 AM, dabboo ag...@sapient.com wrote:


 Hi,

 I am searching for English Portal using double quotes and I am getting
 all
 the records which contains English Portal as together anywhere in any
 field.

 for e.g. records are appearing which have, English Portal, English Portal
 Sacromanto, Core English Portal etc.

 Problem is, if I am passing only nglish Portal then it is not returning
 any of these results. Is there any way I can pass the wildcards as prefix
 and suffix with this search string and get the desired results.

 Please suggest.

 Thanks,
 Amit Garg
 --
 View this message in context:
 http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23387746.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Wildcard with Double Quotes Query

2009-05-05 Thread dabboo

Hi Eric,

I searched but couldnt find anything related. I am still looking in some
threads to find out if I can get somthing related. I would appreciate if you
can please provide me some pointers.

Thanks,
Amit Garg

Erick Erickson wrote:
 
 I don't remember the answer, but I'm sure this has been discussed
 many times on the mailing list. Have you tried searching that? You're
 essentially asking about wildcarded phrase queries
 
 Best
 Erick
 
 On Tue, May 5, 2009 at 9:52 AM, dabboo ag...@sapient.com wrote:
 

 Hi,

 I am searching for English Portal using double quotes and I am getting
 all
 the records which contains English Portal as together anywhere in any
 field.

 for e.g. records are appearing which have, English Portal, English Portal
 Sacromanto, Core English Portal etc.

 Problem is, if I am passing only nglish Portal then it is not returning
 any of these results. Is there any way I can pass the wildcards as prefix
 and suffix with this search string and get the desired results.

 Please suggest.

 Thanks,
 Amit Garg
 --
 View this message in context:
 http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23387746.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23388979.html
Sent from the Solr - User mailing list archive at Nabble.com.



Multi-index Design

2009-05-05 Thread Chris Masters

Hi All,

I'm [still!] evaluating Solr and setting up a PoC. The requirements are to 
index the following objects:

 - people - name, status, date added, address, profile, other people specific 
fields like group...
 - organisations - name, status, date added, address, profile, other 
organisational specific fields like size...
 - products - name, status, date added, profile, other product specific fields 
like product groups..

AND...I need to isolate indexes to a number of dynamic domains (customerA, 
customerB...) that will grow over time.

So, my initial thoughts are to do the following:

 - flatten the searchable objects as much as I can - use a type field to 
distinguish - into a single index
 - use multi-core approach to segregate domains of data

So, a couple questions on this:

 1) Is this approach/design sensible and do others use it?

 2) By flattening the data we will only index common fields; is it unreasonable 
to do a second database search and union the results when doing advanced 
searches on non indexed fields? Do others do this?

 3) I've read that I can dynamically add a new core - this fits well with the 
ability to dynamically add new domains; how scaliable is this approach? Would 
it be unreasonable to have 20-30 dynaimically created cores? I guess, 
redundancy aside and given our one core per domain approach, we could easily 
spill onto other physical servers without the need for replication? 

Thanks again for your help!
rotis





Re: Wildcard with Double Quotes Query

2009-05-05 Thread dabboo

I am using dismax request to achieve this. Though I am able to do wildcard
search with dismax but I am not sure if I can do the wildcard with phrase. 

Please suggest.

Amit

Erick Erickson wrote:
 
 I don't remember the answer, but I'm sure this has been discussed
 many times on the mailing list. Have you tried searching that? You're
 essentially asking about wildcarded phrase queries
 
 Best
 Erick
 
 On Tue, May 5, 2009 at 9:52 AM, dabboo ag...@sapient.com wrote:
 

 Hi,

 I am searching for English Portal using double quotes and I am getting
 all
 the records which contains English Portal as together anywhere in any
 field.

 for e.g. records are appearing which have, English Portal, English Portal
 Sacromanto, Core English Portal etc.

 Problem is, if I am passing only nglish Portal then it is not returning
 any of these results. Is there any way I can pass the wildcards as prefix
 and suffix with this search string and get the desired results.

 Please suggest.

 Thanks,
 Amit Garg
 --
 View this message in context:
 http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23387746.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23389978.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multi-index Design

2009-05-05 Thread Walter Underwood
That is how we do it at Netflix. --wunder

On 5/5/09 7:59 AM, Chris Masters roti...@yahoo.com wrote:

  1) Is this approach/design sensible and do others use it?



Re: Multi-index Design

2009-05-05 Thread Walter Underwood
More precisely, we use a single core, flat schema, with a type field.

wunder

On 5/5/09 8:48 AM, Walter Underwood wunderw...@netflix.com wrote:

 That is how we do it at Netflix. --wunder
 
 On 5/5/09 7:59 AM, Chris Masters roti...@yahoo.com wrote:
 
  1) Is this approach/design sensible and do others use it?
 



Lucene/Solr Meetup / May 20th, Reston VA, 6-8:30 pm

2009-05-05 Thread Erik Hatcher

Lucene/Solr Meetup / May 20th, Reston VA, 6-8:30 pm
http://www.meetup.com/NOVA-Lucene-Solr-Meetup/

Join us for an evening of presentations and discussion on
Lucene/Solr, the Apache Open Source Search Engine/Platform, featuring:

Erik Hatcher, Lucid Imagination, Apache Lucene/Solr PMC: Solr power  
your data: How to get up an running in 20 minutes or less

Ryan McKinley: Apache Lucene/Solr PMC: Geo Search with Solr and Voyager
Dan Chudnov, Library of Congress: The World Digital Library -- Solr  
searches across time and space
Aaron McCurry, Near Infinity: Using Lucene as primary store for  
structured data store that horizontally scales to billions of records


4 presentations, followed by QA / panel discussion.
We'll have some food and beverages.

RSVP -- seats are limited -- at http://www.meetup.com/NOVA-Lucene-Solr-Meetup/

Hosted by:  Near Infinity
Sponsored by: Lucid Imagination

Questions: ta...@lucidimagination.co


Master Slave data distribution | rsync fail issue

2009-05-05 Thread tushar kapoor

Hi,

I am facing an issue while performing snapshot pulling thru Snappuller
script from slave server :
We have the setup of multicores on Master Solr and Slave Solr servers. 
Scenario , 2 cores are set :
i)  CORE_WWW.ABCD.COM
ii) CORE_WWW.XYZ.COM

rsync-enable and rsync-start script run from CORE_WWW.ABCD.COM on master
server. Thus rsyncd.commf file got generated on CORE_WWW.ABCD.COM  only ,
but not on CORE_WWW.XYZ.COM.
Rsyncd.conf of CORE_WWW.ABCD.COM :
 rsyncd.conf file  
uid = webuser
gid = webuser
use chroot = no
list = no
pid file =
/opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/logs/rsyncd.pid
log file =
/opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/logs/rsyncd.log
[solr]
path =
/opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/data
comment = Solr

rsync error used to get generated while doing the  pulling of master server
snapshot of a particular core CORE_WWW.XYZ.COM from slave end, for core
CORE_WWW.ABCD.COM snappuller occured without any error.

Also, this issue is coming only when snapshot are generated at master end
thru the way given below:
A)  Snapshot are generated automatically by
editing  “${SOLR_HOME}/solr/conf/solrconfig.xml” to let either commit index
or optimize index trigger the snapshooter (search “postCommit” and
“postOptimize” to find the configuration section). 

Sample of solrconfig.xml entry on Master server End:
I)
listener event=postCommit class=solr.RunExecutableListener
  str 
name=exe/opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/bin/snapshooter/str
  str
name=dir/opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/bin/str
  bool name=waittrue/bool
  arr name=args strarg1/str strarg2/str /arr
  arr name=env strMYVAR=val1/str /arr
/listener 

same way done for core CORE_WWW.XYZ.COM solrConfig.xml.
II) The  dataDir tag remains commented on both the cores .XML on master
server.

Log sample  for more clearity :
rsyncd.log of the core CORE_WWW.XYZ.COM:
2009/05/01 15:48:40 command: ./rsyncd-start
2009/05/01 15:48:40 [15064] rsyncd version 2.6.3 starting, listening on port
18983
2009/05/01 15:48:40 rsyncd started with
data_dir=/opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.XYZ.COm/data
and accepting requests
2009/05/01 15:50:36 [15195] rsync on solr/snapshot.20090501153311/ from
deltrialmac.mac1.com (10.210.7.191)
2009/05/01 15:50:36 [15195] rsync: link_stat snapshot.20090501153311/. (in
solr) failed: No such file or directory (2)
2009/05/01 15:50:36 [15195] rsync error: some files could not be transferred
(code 23) at main.c(442)
2009/05/01 15:52:23 [15301] rsync on solr/snapshot.20090501155030/ from
delpearsondm.sapient.com (10.210.7.191)
2009/05/01 15:52:23 [15301] wrote 3438 bytes  read 290 bytes  total size
2779
2009/05/01 16:03:31 [15553] rsync on solr/snapshot.20090501160112/ from
deltrialmac.mac1.com (10.210.7.191)
2009/05/01 16:03:31 [15553] rsync: link_stat snapshot.20090501160112/. (in
solr) failed: No such file or directory (2)
2009/05/01 16:03:31 [15553] rsync error: some files could not be transferred
(code 23) at main.c(442)
2009/05/01 16:04:27 [15674] rsync on solr/snapshot.20090501160054/ from
deltrialmac.mac1.com (10.210.7.191)
2009/05/01 16:04:27 [15674] wrote 4173214 bytes  read 290 bytes  total size
4174633

I m unable to figure out that from where /. gets appeneded at the end 
snapshot.20090501153311/.
Snappuller.log
2009/05/04 16:55:43 started by solrUser
2009/05/04 16:55:43 command:
/opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.PUFFINBOOKS.CA/bin/snappuller
-u webuser
2009/05/04 16:55:52 pulling snapshot snapshot.20090504164935
2009/05/04 16:56:09 rsync failed
2009/05/04 16:56:24 failed (elapsed time: 41 sec)

Error shown on console : 
rsync: link_stat snapshot.20090504164935/. (in solr) failed: No such file
or directory (2)
client: nothing to do: perhaps you need to specify some filenames or the
--recursive option?
rsync error: some files could not be transferred (code 23) at main.c(723)

B) The same issue is not coming while manually running the Snapshot script
after reguler interval of time at Master server and then running Snappuller
script at slave end for multiple cores. The postCommit/postOptimize part of
solrConfig.xml has been commented.
Here also rsync script run thru the core CORE_WWW.ABCD.COM. Snappuller and
snapinstaller occurred successfully.

Thanks in advance.

-- 
View this message in context: 
http://www.nabble.com/Master-Slave-data-distribution-%7C-rsync-fail-issue-tp23391580p23391580.html
Sent from the Solr - User mailing list archive at Nabble.com.



OutOfMemory error

2009-05-05 Thread Francis Yakin

I am having frequent OutOfMemory error on our slaves server.

SEVERE: Error during auto-warming of 
key:org.apache.solr.search.queryresult...@aca6b9cb:java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 34279632, Num elements: 8569904
SEVERE: Error during auto-warming of 
key:org.apache.solr.search.queryresult...@f9947c35:java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
SEVERE: Error during auto-warming of 
key:org.apache.solr.search.queryresult...@d938cfa3:java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
Exception in thread [ACTIVE] ExecuteThread: '2' for queue: 
'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
Exception in thread [ACTIVE] ExecuteThread: '5' for queue: 
'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
Exception in thread [ACTIVE] ExecuteThread: '8' for queue: 
'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
Exception in thread [STANDBY] ExecuteThread: '3' for queue: 
'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
Exception in thread [ACTIVE] ExecuteThread: '13' for queue: 
'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: 
allocLargeObjectOrArray - Object size: 8208, Num elements: 8192


We are running weblogic and java version is 1.5.

We set the heap size to 1.5GB?

What's the recommendation for this issue?

Thanks

Francis



Upgrading from 1.2.0 to 1.3.0

2009-05-05 Thread Francis Yakin

What's the best way to upgrade solr from 1.2.0 to 1.3.0 ?

We have the current index that our users search running on 1.2.0 Solr version.

We would like to upgrade it to 1.3.0?

We have Master/Slaves env.

What's the best way to upgrade it without affecting the search? Do we need to 
do it on master or slaves first?



Thanks

Francis




MoreLikeThis sort

2009-05-05 Thread Yogy Rudenko
Hello,

I am trying to sort MoreLikeThis results by a date field instead of
relevance. Regular sort parameters don't seem to have any effect on the
results and I can't find any mlt.sort or similar parameters in MoreLikeThis
handler. My conclusion is that MoreLikeThis does not have a sort alternative
to relevance, is that the correct conclusion.

Thanks,
Yogy


Re: OutOfMemory error

2009-05-05 Thread Erick Erickson
I'm guessing (and it's only a guess) that you have some field
that's a datestamp and that you're sorting on it in your warmup
queries??? If so, there are possibilities.

It would help a lot if you'd tell us more about the structure of
your index and what your autowarm queries look like, otherwise
there's not much information here to go on

Best
Erick


On Tue, May 5, 2009 at 1:00 PM, Francis Yakin fya...@liquid.com wrote:


 I am having frequent OutOfMemory error on our slaves server.

 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@aca6b9cb:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 34279632, Num elements: 8569904
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@f9947c35:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
 SEVERE: Error during auto-warming of
 key:org.apache.solr.search.queryresult...@d938cfa3:java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
 Exception in thread [ACTIVE] ExecuteThread: '2' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '5' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '8' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [STANDBY] ExecuteThread: '3' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '13' for queue:
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192


 We are running weblogic and java version is 1.5.

 We set the heap size to 1.5GB?

 What's the recommendation for this issue?

 Thanks

 Francis




Re: OutOfMemory error

2009-05-05 Thread Otis Gospodnetic

Hi Francis,

How big are your caches?  Please paste the relevant part of the config.
Which of your fields do you sort by?  Paste definitions of those fields from 
schema.xml, too.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Francis Yakin fya...@liquid.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Tuesday, May 5, 2009 1:00:07 PM
 Subject: OutOfMemory error
 
 
 I am having frequent OutOfMemory error on our slaves server.
 
 SEVERE: Error during auto-warming of 
 key:org.apache.solr.search.queryresult...@aca6b9cb:java.lang.OutOfMemoryError:
  
 allocLargeObjectOrArray - Object size: 34279632, Num elements: 8569904
 SEVERE: Error during auto-warming of 
 key:org.apache.solr.search.queryresult...@f9947c35:java.lang.OutOfMemoryError:
  
 allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
 SEVERE: Error during auto-warming of 
 key:org.apache.solr.search.queryresult...@d938cfa3:java.lang.OutOfMemoryError:
  
 allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
 Exception in thread [ACTIVE] ExecuteThread: '2' for queue: 
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: 
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '5' for queue: 
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: 
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '8' for queue: 
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: 
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [STANDBY] ExecuteThread: '3' for queue: 
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: 
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 Exception in thread [ACTIVE] ExecuteThread: '13' for queue: 
 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: 
 allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 
 
 We are running weblogic and java version is 1.5.
 
 We set the heap size to 1.5GB?
 
 What's the recommendation for this issue?
 
 Thanks
 
 Francis



Re: Wildcard with Double Quotes Query

2009-05-05 Thread Otis Gospodnetic

I don't think you can do wildcard with a phrase.  A path for that is sitting in 
Lucene's JIRA.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: dabboo ag...@sapient.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, May 5, 2009 11:35:40 AM
 Subject: Re: Wildcard with Double Quotes Query
 
 
 I am using dismax request to achieve this. Though I am able to do wildcard
 search with dismax but I am not sure if I can do the wildcard with phrase. 
 
 Please suggest.
 
 Amit
 
 Erick Erickson wrote:
  
  I don't remember the answer, but I'm sure this has been discussed
  many times on the mailing list. Have you tried searching that? You're
  essentially asking about wildcarded phrase queries
  
  Best
  Erick
  
  On Tue, May 5, 2009 at 9:52 AM, dabboo wrote:
  
 
  Hi,
 
  I am searching for English Portal using double quotes and I am getting
  all
  the records which contains English Portal as together anywhere in any
  field.
 
  for e.g. records are appearing which have, English Portal, English Portal
  Sacromanto, Core English Portal etc.
 
  Problem is, if I am passing only nglish Portal then it is not returning
  any of these results. Is there any way I can pass the wildcards as prefix
  and suffix with this search string and get the desired results.
 
  Please suggest.
 
  Thanks,
  Amit Garg
  --
  View this message in context:
  
 http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23387746.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23389978.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multi-index Design

2009-05-05 Thread Otis Gospodnetic

Chris,

1) I'd put different types of data in different cores/instances, unless you 
relly need to search them all together.  By using only common attributes 
you are kind of killing the richness of data and your ability to do something 
useful with it.

2) I'd triple-check the do a second database search and union the results when 
doing advanced searches on non indexed field part if you are dealing with 
non-trivial query rate.

3) Some people have thousands of Solr cores.  Not sure on how many machines, 
but it's all a function of data size, hardware specs, query complexity and rate.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Chris Masters roti...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, May 5, 2009 10:59:40 AM
 Subject: Multi-index Design
 
 
 Hi All,
 
 I'm [still!] evaluating Solr and setting up a PoC. The requirements are to 
 index 
 the following objects:
 
  - people - name, status, date added, address, profile, other people specific 
 fields like group...
  - organisations - name, status, date added, address, profile, other 
 organisational specific fields like size...
  - products - name, status, date added, profile, other product specific 
 fields 
 like product groups..
 
 AND...I need to isolate indexes to a number of dynamic domains (customerA, 
 customerB...) that will grow over time.
 
 So, my initial thoughts are to do the following:
 
  - flatten the searchable objects as much as I can - use a type field to 
 distinguish - into a single index
  - use multi-core approach to segregate domains of data
 
 So, a couple questions on this:
 
  1) Is this approach/design sensible and do others use it?
 
  2) By flattening the data we will only index common fields; is it 
 unreasonable 
 to do a second database search and union the results when doing advanced 
 searches on non indexed fields? Do others do this?
 
  3) I've read that I can dynamically add a new core - this fits well with the 
 ability to dynamically add new domains; how scaliable is this approach? Would 
 it 
 be unreasonable to have 20-30 dynaimically created cores? I guess, redundancy 
 aside and given our one core per domain approach, we could easily spill onto 
 other physical servers without the need for replication? 
 
 Thanks again for your help!
 rotis



Using UUID for unique key

2009-05-05 Thread vivek sar
Hi,

 I've a distributed Solr instances. I'm using Java's UUID
(UUID.randomUUID()) to generate the unique id for my documents. Before
adding unique key I was able to commit 50K records in 15sec (pretty
constant over the growing index), after adding unique key it's taking
over 35 sec for 50k and the time is increasing as the index size
grows. Here is my schema setting for unique key,

field name=id type=string indexed=true stored=true
required=true omitNorms=true compressed=false/

Why is commit taking so long? Should I not be using UUID key for
unique keys? What are other options - timestamp etc.?

Thanks,
-vivek


RE: OutOfMemory error

2009-05-05 Thread Francis Yakin

Here is cache in solrconfig.xml


 !-- Cache used by SolrIndexSearcher for filters (DocSets),
 unordered sets of *all* documents that match a query.
 When a new searcher is opened, its caches may be prepopulated
 or autowarmed using data from caches in the old searcher.
 autowarmCount is the number of items to prepopulate.  For LRUCache,
 the autowarmed items will be the most recently accessed items.
   Parameters:
 class - the SolrCache implementation (currently only LRUCache)
 size - the maximum number of entries in the cache
 initialSize - the initial capacity (number of entries) of
   the cache.  (seel java.util.HashMap)
 autowarmCount - the number of entries to prepopulate from
   and old cache.
 --
filterCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=256/

   !-- queryResultCache caches results of searches - ordered lists of
 document ids (DocList) based on a query, a sort, and the range
 of documents requested.  --
queryResultCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=256/

  !-- documentCache caches Lucene Document objects (the stored fields for each 
document).
   Since Lucene internal document ids are transient, this cache will not be 
autowarmed.  --
documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

!-- If true, stored fields that are not requested will be loaded lazily.

This can result in a significant speed improvement if the usual case is to
not load all stored fields, especially if the skipped fields are large 
compressed
text fields.
--
enableLazyFieldLoadingtrue/enableLazyFieldLoading

!-- Example of a generic cache.  These caches may be accessed by name
 through SolrIndexSearcher.getCache(),cacheLookup(), and cacheInsert().
 The purpose is to enable easy caching of user/application level data.
 The regenerator argument should be specified as an implementation
 of solr.search.CacheRegenerator if autowarming is desired.  --
!--
cache name=myUserCache
  class=solr.LRUCache
  size=4096
  initialSize=1024
  autowarmCount=1024
  regenerator=org.mycompany.mypackage.MyRegenerator
  /
--

   !-- An optimization that attempts to use a filter to satisfy a search.
 If the requested sort does not include score, then the filterCache
 will be checked for a filter matching the query. If found, the filter
 will be used as the source of document ids, and then the sort will be
 applied to that.
useFilterForSortedQuerytrue/useFilterForSortedQuery
   --

   !-- An optimization for use with the queryResultCache.  When a search
 is requested, a superset of the requested number of document ids
 are collected.  For example, if a search for a particular query
 requests matching documents 10 through 19, and queryWindowSize is 50,
 then documents 0 through 50 will be collected and cached.  Any further
 requests in that range can be satisfied via the cache.  --
queryResultWindowSize10/queryResultWindowSize

!-- This entry enables an int hash representation for filters (DocSets)
 when the number of items in the set is less than maxSize.  For smaller
 sets, this representation is more memory efficient, more efficient to
 iterate over, and faster to take intersections.  --
HashDocSet maxSize=3000 loadFactor=0.75/


!-- boolToFilterOptimizer converts boolean clauses with zero boost
 into cached filters if the number of docs selected by the clause 
exceeds
 the threshold (represented as a fraction of the total index) --
boolTofilterOptimizer enabled=true cacheSize=32 threshold=.05/


!-- a newSearcher event is fired whenever a new searcher is being prepared
 and there is a current searcher handling requests (aka registered). --
!-- QuerySenderListener takes an array of NamedList and executes a
 local query request for each NamedList in sequence. --
!--
listener event=newSearcher class=solr.QuerySenderListener


r...@solrslave06 conf]# cat solrconfig.xml  | grep -i cache
!-- Cache used by SolrIndexSearcher for filters (DocSets),
 When a new searcher is opened, its caches may be prepopulated
 or autowarmed using data from caches in the old searcher.
 autowarmCount is the number of items to prepopulate.  For LRUCache,
 class - the SolrCache implementation (currently only LRUCache)
 size - the maximum number of entries in the cache
   the cache.  (seel java.util.HashMap)
   and old cache.
filterCache
  class=solr.LRUCache
   !-- queryResultCache caches results of searches - ordered lists of
queryResultCache
  

Re: OutOfMemory error

2009-05-05 Thread Otis Gospodnetic

Hi,

Timestamp is your most likely source of the problem.  Round that as much as you 
can or use tdate field type (you'll need to grab the nightly build).  How many 
documents are in this index - 1.5GB is a relatively large heap.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Francis Yakin fya...@liquid.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Tuesday, May 5, 2009 1:50:07 PM
 Subject: RE: OutOfMemory error
 
 
 Here is cache in solrconfig.xml
 
 
 
 
   class=solr.LRUCache
   size=512
   initialSize=512
   autowarmCount=256/
 
   
 
   class=solr.LRUCache
   size=512
   initialSize=512
   autowarmCount=256/
 
   
 
   class=solr.LRUCache
   size=512
   initialSize=512
   autowarmCount=0/
 
 
 true
 
 
 
 
   
 
   
 10
 
 
 
 
 
 
 
 
 
 
 
 
 
   class=solr.LRUCache
 
 
   class=solr.LRUCache
  If the requested sort does not include score, then the filterCache
   
  into cached filters if the number of docs selected by the clause 
 exceeds
 
   1-2 for read-only slaves, higher for masters w/o cache warming. --
every xsltCacheLifetimeSeconds.
 5
 
 
 And here is in schema.xml
 
  Sort artist name used by mp3 store to sort artist title for search
 --
 
 
 
   
 omitNorms=true/
   
 multiValued=true/
 
   
   
 stored=true/
 
   
   
 stored=false/
 
   
   
 stored=false/
 
   
   
 stored=false/
 
 
   
   
 default=NOW multiValued=false/
 
 
 omitNorms=true/
 
 
 
 omitNorms=true/
 
 
 
 
 
 
 
 
 
 
 !-- Numeric field types that manipulate the value into
  a string value that isn't human-readable in its internal form,
  but with a lexicographic ordering the same as the numeric ordering,
  so that range queries work correctly. --
 
 omitNorms=true/
 
 sortMissingLast=true omitNorms=true/
 
 sortMissingLast=true omitNorms=true/
 
 sortMissingLast=true omitNorms=true/
 
 
 
 
   
 
 
 ignoreCase=true expand=true/
 
 words=stopwords.txt/
 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1 
 catenateAll=0/
 
 
 protected=protwords.txt/
 
   
   
 
 
 ignoreCase=true expand=true/
 
 words=stopwords.txt/
 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0 
 catenateAll=0/
 
 
 protected=protwords.txt/
 
   
 
 
 
 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: Tuesday, May 05, 2009 10:32 AM
 To: solr-user@lucene.apache.org
 Subject: Re: OutOfMemory error
 
 
 Hi Francis,
 
 How big are your caches?  Please paste the relevant part of the config.
 Which of your fields do you sort by?  Paste definitions of those fields from 
 schema.xml, too.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
  From: Francis Yakin 
  To: solr-user@lucene.apache.org 
  Sent: Tuesday, May 5, 2009 1:00:07 PM
  Subject: OutOfMemory error
 
 
  I am having frequent OutOfMemory error on our slaves server.
 
  SEVERE: Error during auto-warming of
  key:org.apache.solr.search.queryresult...@aca6b9cb:java.lang.OutOfMemoryError:
  allocLargeObjectOrArray - Object size: 34279632, Num elements: 8569904
  SEVERE: Error during auto-warming of
  key:org.apache.solr.search.queryresult...@f9947c35:java.lang.OutOfMemoryError:
  allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
  SEVERE: Error during auto-warming of
  key:org.apache.solr.search.queryresult...@d938cfa3:java.lang.OutOfMemoryError:
  allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868
  Exception in thread [ACTIVE] ExecuteThread: '2' for queue:
  'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
  allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
  Exception in thread [ACTIVE] ExecuteThread: '5' for queue:
  'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
  allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
  Exception in thread [ACTIVE] ExecuteThread: '8' for queue:
  'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
  allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
  Exception in thread [STANDBY] ExecuteThread: '3' for queue:
  'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
  allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
  Exception in thread [ACTIVE] ExecuteThread: '13' for queue:
  'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError:
  allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
 
 
  We are running weblogic and java version is 1.5.
 
  We set the heap size to 1.5GB?
 
  What's the 

Re: Using UUID for unique key

2009-05-05 Thread Otis Gospodnetic

You really had nothing in uniqueKey element in schema.xml at first?  I'm not 
looking at Solr code right now, but it could be the lack of the cost of that 
lookup that made things faster.  Now you have a lookup + generation + more data 
to pass through analyzer + write out, though I can't imagine how that would 
make things 2x slower.  You didn't say whether you cleared the old index after 
adding UUID key did you do that?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, May 5, 2009 1:49:21 PM
 Subject: Using UUID for unique key
 
 Hi,
 
 I've a distributed Solr instances. I'm using Java's UUID
 (UUID.randomUUID()) to generate the unique id for my documents. Before
 adding unique key I was able to commit 50K records in 15sec (pretty
 constant over the growing index), after adding unique key it's taking
 over 35 sec for 50k and the time is increasing as the index size
 grows. Here is my schema setting for unique key,
 
 
 required=true omitNorms=true compressed=false/
 
 Why is commit taking so long? Should I not be using UUID key for
 unique keys? What are other options - timestamp etc.?
 
 Thanks,
 -vivek



Re: Multi-index Design

2009-05-05 Thread Michael Ludwig

Chris Masters schrieb:


 - flatten the searchable objects as much as I can - use a type field
   to distinguish - into a single index
 - use multi-core approach to segregate domains of data


Some newbie questions:

(1) What is a type field? Is it to designate different types of
documents, e.g. product descriptions and forum postings?

(2) Would I include such a type field in the data I send to the update
facility and maybe configure Solr to take special action depending on
the value of the update field?

(3) Like, write the processing results to a domain dedicated to that
type of data that I could limit my search to, as per Otis' post?

(4) And is that what's called a core here?

(5) Or, failing (3), and lumping everything together in one search
domain (core?), would I use that type field to limit my search to
a particular type of data?

Michael Ludwig


Re: Using UUID for unique key

2009-05-05 Thread Yonik Seeley
On Tue, May 5, 2009 at 1:49 PM, vivek sar vivex...@gmail.com wrote:
  I've a distributed Solr instances. I'm using Java's UUID
 (UUID.randomUUID()) to generate the unique id for my documents. Before
 adding unique key I was able to commit 50K records in 15sec (pretty
 constant over the growing index), after adding unique key it's taking
 over 35 sec for 50k and the time is increasing as the index size
 grows.

Using unique keys will be slower than not using them... it's extra
work that Lucene needs to do - internally it needs to do searches on
the ids to delete any previous versions.

-Yonik
http://www.lucidimagination.com


Re: Multi-index Design

2009-05-05 Thread Matt Weber
1 - A field that is called type which is probably a string field  
that you index values such as people, organization, product.


2 - Yes, for each document you are indexing, you will include it's  
type, ie. person


3, 4, 5 - You would have a core for each domain.  Each domain will  
then have it's own index that contains documents of all types.  See http://wiki.apache.org/solr/MultipleIndexes 
.


Thanks,

Matt Weber




On May 5, 2009, at 11:14 AM, Michael Ludwig wrote:


Chris Masters schrieb:


- flatten the searchable objects as much as I can - use a type field
  to distinguish - into a single index
- use multi-core approach to segregate domains of data


Some newbie questions:

(1) What is a type field? Is it to designate different types of
documents, e.g. product descriptions and forum postings?

(2) Would I include such a type field in the data I send to the  
update

facility and maybe configure Solr to take special action depending on
the value of the update field?

(3) Like, write the processing results to a domain dedicated to that
type of data that I could limit my search to, as per Otis' post?

(4) And is that what's called a core here?

(5) Or, failing (3), and lumping everything together in one search
domain (core?), would I use that type field to limit my search to
a particular type of data?

Michael Ludwig




Re: Using UUID for unique key

2009-05-05 Thread vivek sar
I did clean up the indexes and re-started the index process from
scratch (new index file). As another test if I use simple numeric
counter for unique id the index speed is fast (within 20 sec for
commit 50k records). I'm thinking UUID might not be the way to go for
unique id - I'll look into using sequence# instead.

Thanks,
-vivek

On Tue, May 5, 2009 at 11:03 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 You really had nothing in uniqueKey element in schema.xml at first?  I'm not 
 looking at Solr code right now, but it could be the lack of the cost of that 
 lookup that made things faster.  Now you have a lookup + generation + more 
 data to pass through analyzer + write out, though I can't imagine how that 
 would make things 2x slower.  You didn't say whether you cleared the old 
 index after adding UUID key did you do that?

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, May 5, 2009 1:49:21 PM
 Subject: Using UUID for unique key

 Hi,

 I've a distributed Solr instances. I'm using Java's UUID
 (UUID.randomUUID()) to generate the unique id for my documents. Before
 adding unique key I was able to commit 50K records in 15sec (pretty
 constant over the growing index), after adding unique key it's taking
 over 35 sec for 50k and the time is increasing as the index size
 grows. Here is my schema setting for unique key,


 required=true omitNorms=true compressed=false/

 Why is commit taking so long? Should I not be using UUID key for
 unique keys? What are other options - timestamp etc.?

 Thanks,
 -vivek




Jira issue Solr-948

2009-05-05 Thread Manepalli, Kalyan
Hi all,
I was wondering if anyone had used the new helper methods in 
SolrPluginutils added as part of 
Solr-948https://issues.apache.org/jira/browse/SOLR-948.
I tried the same implementation with Solr 1.3 and everything works correctly, 
but for one issue.
In the response XML, the numFound is always 0 even though the result shows 
the docs.

I fixed this by setting the numFound on the solrDocumentList.

Let me know if anyone has come across this issue.

Thanks,
Kalyan Manepalli



RE: Multi-index Design

2009-05-05 Thread Manepalli, Kalyan
That's how we do it in Orbitz. We use type field to separate content, review 
and promotional information in one single index. And then we use the 
last-components to plugin these data together.

Only thing that we haven't yet tested is the scalability of this model, since 
our data is small.

Thanks,
Kalyan Manepalli

-Original Message-
From: Chris Masters [mailto:roti...@yahoo.com]
Sent: Tuesday, May 05, 2009 10:00 AM
To: solr-user@lucene.apache.org
Subject: Multi-index Design


Hi All,

I'm [still!] evaluating Solr and setting up a PoC. The requirements are to 
index the following objects:

 - people - name, status, date added, address, profile, other people specific 
fields like group...
 - organisations - name, status, date added, address, profile, other 
organisational specific fields like size...
 - products - name, status, date added, profile, other product specific fields 
like product groups..

AND...I need to isolate indexes to a number of dynamic domains (customerA, 
customerB...) that will grow over time.

So, my initial thoughts are to do the following:

 - flatten the searchable objects as much as I can - use a type field to 
distinguish - into a single index
 - use multi-core approach to segregate domains of data

So, a couple questions on this:

 1) Is this approach/design sensible and do others use it?

 2) By flattening the data we will only index common fields; is it unreasonable 
to do a second database search and union the results when doing advanced 
searches on non indexed fields? Do others do this?

 3) I've read that I can dynamically add a new core - this fits well with the 
ability to dynamically add new domains; how scaliable is this approach? Would 
it be unreasonable to have 20-30 dynaimically created cores? I guess, 
redundancy aside and given our one core per domain approach, we could easily 
spill onto other physical servers without the need for replication?

Thanks again for your help!
rotis





Re: Lucene/Solr Meetup / May 20th, Reston VA, 6-8:30 pm

2009-05-05 Thread Allahbaksh Asadullah
Dear Erik,
It would be great if you can upload the presentation online. It would help
all of us. And if possible video too.
Warm Regards,
Allahbaksh

On Tue, May 5, 2009 at 11:40 PM, Lukáš Vlček lukas.vl...@gmail.com wrote:

 Hello,any plans to upload these presentations on the web (or even better
 release video recordings)?
 Lukas

 On Tue, May 5, 2009 at 6:49 PM, Erik Hatcher e...@ehatchersolutions.com
 wrote:

  Lucene/Solr Meetup / May 20th, Reston VA, 6-8:30 pm
  http://www.meetup.com/NOVA-Lucene-Solr-Meetup/
 
  Join us for an evening of presentations and discussion on
  Lucene/Solr, the Apache Open Source Search Engine/Platform, featuring:
 
  Erik Hatcher, Lucid Imagination, Apache Lucene/Solr PMC: Solr power your
  data: How to get up an running in 20 minutes or less
  Ryan McKinley: Apache Lucene/Solr PMC: Geo Search with Solr and Voyager
  Dan Chudnov, Library of Congress: The World Digital Library -- Solr
  searches across time and space
  Aaron McCurry, Near Infinity: Using Lucene as primary store for
 structured
  data store that horizontally scales to billions of records
 
  4 presentations, followed by QA / panel discussion.
  We'll have some food and beverages.
 
  RSVP -- seats are limited -- at
  http://www.meetup.com/NOVA-Lucene-Solr-Meetup/
 
  Hosted by:  Near Infinity
  Sponsored by: Lucid Imagination
 
  Questions: ta...@lucidimagination.co
 



 --
 http://blog.lukas-vlcek.com/




-- 
Allahbaksh Mohammedali Asadullah,
Software Engineering  Technology Labs,
Infosys Technolgies Limited, Electronic City,
Hosur Road, Bangalore 560 100, India.
(Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927.
Fax: 91-80-28520362 | Mobile: 91-9845505322.