Re: How does ReplicationHandler backup work?

2009-08-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
the backup command can be used on any instance (master or slave). The
backup has nothing to do with replication . The perlication works w/o
backup at all.

either fire a fetchindex command explicitlyor use the pollInterval in
the slave to automatically fetch the index if and when it is updated

On Sat, Aug 29, 2009 at 12:33 AM, vivek sarvivex...@gmail.com wrote:
 Hi,

  As one of our requirement we need to backup Master indexes to Slave
 periodically. I've been able to successfully sync the index using
 fetchIndex command,

   
 http://localhost:9006/solr/audit_20090828_1/replication?command=fetchindexmasterUrl=http://localhost:8080/solr/audit_20090828_1/replication

 now, I'm wondering how do I do the backup. Looking at the wiki,
 http://wiki.apache.org/solr/SolrReplication, it seems there is a
 backup command, but that says backup on Master. I tried replacing
 command fetchindex to backup, but that didn't work. How can do I
 complete index backup (for a particular core) from Master to Slave?

 Thanks,
 -vivek




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Multiple cores

2009-08-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
use the dataDir attribute in the core tag to specify the data
directory. The property is not required

On Fri, Aug 28, 2009 at 11:56 PM, Paul Tomblinptomb...@xcski.com wrote:
 I'm trying to instantiate multiple cores.  Since nothing is different
 between the two cores except the schema and the data dir, I was hoping
 to share the same instanceDir.  Solr seems to recognize that there are
 two cores, and gives me two different admin pages.  But unfortunately
 both the admin pages are pointing to the same data dir and same
 schema.

 My solr.xml file looks like:

 solr persistent=false
  cores adminPath=/admin/cores
    core name=chunks instanceDir=.
        property name=dataDir value=./data/
        property name=schemaName value=schema.xml/
    /core
    core name=meta instanceDir=.
        property name=dataDir value=./meta.data//
        property name=schemaName value=metaschema.xml/
    /core
  /cores
 /solr

 As well as the property dataDir, I've also tried solr.data.dataDir
 and I've also tried putting it as an attribute in the core tag, like
    core name=meta instanceDir=. dataDir=./meta.data/

 Any help?
 --
 http://www.linkedin.com/in/paultomblin




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


RE: Solr index - Size and indexing speed

2009-08-29 Thread engy.ali

Hi, 

Thanks for your reply.

I will work on your suggestion for using only one solr instance.

I tried to merge the 15 indexes again, and I found out that the new merged
index (without opitmization) size was about 351 GB , but when I optimize it
the size return back to 411 GB, Why?

I thought that optimization would decrease or at least be equal to the same
index size before optimization



Funtick wrote:
 
 Hi,
 
 Can you try to use single SOLR instance with heavy RAM (so that
 ramBufferSizeMB=8192 for instance) and mergeFactor=10? Single SOLR
 instance
 is fast enough ( 100 client threads of Tomcat; configurable) - I usually
 prefer single instance for single writable box with heavy RAM allocation
 and good I/O.
 
 Merging 15 indexes and 4-times larger size could happen, for instance,
 because of differences in SOLR Schema and Lucene; ensure that schema is
 the
 same (using Luke for instance). SOLR 1.4 has some new powerful features
 such
 as document-term cache stored somewhere (uninverted index) (Yonik), term
 vectors, stored=true, copyField, etc. 
 
 Do not do commit per 100; do it once at the end...
 
 
 
 -Original Message-
 From: engy.ali [mailto:omeshm...@hotmail.com] 
 Sent: August-25-09 3:31 PM
 To: solr-user@lucene.apache.org
 Subject: Solr index - Size and indexing speed
 
 
  Summary
 ===
 
 I had about 120,000 object of total size 71.2 GB, those objects are
 already
 indexed using Lucene. The index size is about 111 GB.
 
 I tried to use solr 1.4 nightly build to index the same collection. I
 divided collection on three servers, each server had 5 solr instances (not
 solr cores) up and running. 
 
 After collection had been indexed, i merge the 15 indexes.
 
 Problems
 ==
 
 1. The new merged index size is about 411 GB (i.e: 4 times larger than old
 index using lucene)
 
 I tried to index only on object using lucene and same object using solr to
 verify the size and the result was that the new index is about twice size
 of
 old index.
 
 DO you have any idea what might be the reason?
 
 
 2. the indexing speed is slow, 100 object on single solr instance were
 indexed in 1 hour so i estimated that 1000 on single instance can be done
 in
 10 hours, but that was not the case, the indexing time exceeds estimated
 time by about 12 hour.
 
 is that might be related to the growth of index?if not, so what might be
 the
 reason.
 
 Note: I do a commit/100 object and an optimize by the end of the whole
 operation. I also changed the mergeFactor from 10 to 15.
 
 
 3.  I google and found out that solr is using an inverted index, but I
 want
 to know what is the internal structure of solr index,for example if i have
 a
 word and its stems, how it will be store in the index 
 
 Thanks, 
 Engy
 -- 
 View this message in context:
 http://www.nabble.com/Solr-index---Size-and-indexing-speed-tp25140702p251407
 02.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-index---Size-and-indexing-speed-tp25140702p25201981.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr index - Size and indexing speed

2009-08-29 Thread Yonik Seeley
On Sat, Aug 29, 2009 at 7:09 AM, engy.aliomeshm...@hotmail.com wrote:
 I thought that optimization would decrease or at least be equal to the same
 index size before optimization

Some index structures like norms are non-sparse.  Index one unique
field with norms and there is a byte allocated for every document in
the index.  Merge that with another index, and the size for the norms
goes to byte[maxDoc()]

-Yonik
http://www.lucidimagination.com


Re: Solr index - Size and indexing speed

2009-08-29 Thread Yonik Seeley
On Tue, Aug 25, 2009 at 3:30 PM, engy.aliomeshm...@hotmail.com wrote:

  Summary
 ===

 I had about 120,000 object of total size 71.2 GB, those objects are already
 indexed using Lucene. The index size is about 111 GB.

 I tried to use solr 1.4 nightly build to index the same collection. I
 divided collection on three servers, each server had 5 solr instances (not
 solr cores) up and running.

 After collection had been indexed, i merge the 15 indexes.

 Problems
 ==

 1. The new merged index size is about 411 GB (i.e: 4 times larger than old
 index using lucene)

 I tried to index only on object using lucene and same object using solr to
 verify the size and the result was that the new index is about twice size of
 old index.

 DO you have any idea what might be the reason?

Check out the schema you are using - it may contain copyFields, etc.
You should be able to get to exactly the same size of index as you had
with Lucene (Solr just uses Lucene for indexing after all).

-Yonik
http://www.lucidimagination.com


indexing of documents

2009-08-29 Thread manishkbawne

I am trying to index pdf and other documents. but got this error:-

 java.lang.ClassCastException:
 org.apache.solr.handler.extraction.ExtractingRequestHandler cannot be
 cast to org.apache.solr.request.SolrRequestHandler
   at
 org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:154)
   at
 org.apache.solr.core.RequestHandlers$1.create(RequestHandlers.java:163)
   at
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
   at
 org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:171)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:535)
   at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:122)
   at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
   at
 org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
   at
 org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
   at
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
   at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
   at
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
   at org.mortbay.jetty.Server.doStart(Server.java:210)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
 org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)
   at org.mortbay.start.Main.invokeMain(Main.java:183)
   at org.mortbay.start.Main.start(Main.java:497)
   at org.mortbay.start.Main.main(Main.java:115)

Can somebody plzz help me to resolve this error?
-- 
View this message in context: 
http://www.nabble.com/indexing-of-documents-tp25203490p25203490.html
Sent from the Solr - User mailing list archive at Nabble.com.



Impact of compressed=true attribute (in schema.xml) on Indexing/Query

2009-08-29 Thread Silent Surfer
Hi,

We observed that when we use the setting compressed=true the index size is 
around 0.66 times the actual log file, where as if we do not use any 
compressed=true setting, the index size is almost as much as 2.6 times.

Our sample solr document size is approximately 1000 bytes. In addition to the 
text data we have around 9 metadata tags associated to it. 

We need to display all off the metadata values on the GUI, and hence we are 
setting stored=true in our schema.xml

Now the question is, how the compressed=true flag impacts the indexing and 
Querying operations. I am sure that there will be CPU utilization spikes as 
there will be operation of compressing(during indexing) and 
uncompressing(during querying) of the indexed data. I am mainly looking for any 
bench marks for the above scenario.

The expected volumes of the data coming in would be approximately 400 GB of 
data per day, so it is very important for us to evaluate the compressed=true, 
due to the file system utilization and index sizing issues.

Any help would be greatly appreciated..

Thanks,
sS


  



RE: Solr index - Size and indexing speed

2009-08-29 Thread Fuad Efendi
I tried to merge the 15 indexes again, and I found out that the new merged
index (without opitmization) size was about 351 GB , but when I optimize it
the size return back to 411 GB, Why?


Just as a sample, IOT in Oracle... 


Ok, just in a kids-lang, what 'optimization' means? It means that Map is
physically sorted by Key... For Lucene, 'map' is 'term - documentIDs'.

Ok, still no any problem... but what if KEY is compressed? (or, for
instance, 'normalized' if you are still with RDBMS) And we need to
decompress it for uniting 15 maps?

-Fuad