How to index large set data

2009-05-21 Thread Jianbin Dai

Hi,

I have about 45GB xml files to be indexed. I am using DataImportHandler. I 
started the full import 4 hours ago, and it's still running
My computer has 4GB memory. Any suggestion on the solutions?
Thanks!

JB


  



Phrase Search Issue

2009-05-21 Thread dabboo

Hi,

I am facing one issue in phrase query. I am entering 'Top of the world' as
my search criteria. I am expecting it to return all the records in which,
one field should all these words in any order. 

But it is treating as OR and returning all the records, which are having
either of these words. I am doing this using dismax request. 

I would appreciate if somebody can provide me some pointers.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Phrase-Search-Issue-tp23648813p23648813.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Phrase Search Issue

2009-05-21 Thread dabboo

This problem is related with the default operator in dismax. Currently OR is
the default operator and it is behaving perfectly fine. I have changed the
default operator in schema.xml to AND, I also have changed the minimum match
to 100%.

But it seems like AND as default operator doesnt work with Dismax.
Please suggest.

Thanks,
Amit Garg



dabboo wrote:
 
 Hi,
 
 I am facing one issue in phrase query. I am entering 'Top of the world' as
 my search criteria. I am expecting it to return all the records in which,
 one field should all these words in any order. 
 
 But it is treating as OR and returning all the records, which are having
 either of these words. I am doing this using dismax request. 
 
 I would appreciate if somebody can provide me some pointers.
 
 Thanks,
 Amit Garg
 

-- 
View this message in context: 
http://www.nabble.com/Phrase-Search-Issue-tp23648813p23649189.html
Sent from the Solr - User mailing list archive at Nabble.com.



what does the version parameter in the query mean?

2009-05-21 Thread Anshuman Manur
Hello all,

I'm using Solr 1.3.0, and when I query my index for solr using the admin
page, the query string in the address bar of my browser reads like this:

http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10indent=on

Now, I don't know what version=2.2 means, and the wiki or the docs don't
tell me. Could someone enlighten me?

Thank You
Anshuman Manur


Re: How to change the weight of the fields ?

2009-05-21 Thread Vincent Pérès

It seems I can only search on the field 'text'. With the following url :
http://localhost:8983/solr/select/?q=novelqt=dismaxfl=title_s,idversion=2.2start=0rows=10indent=ondebugQuery=on

I get answers, but on the debug area, it seems it's only searching on the
'text' field (with or without 'qt' the results are displayed within the same
order) :

lst name=debug
str name=rawquerystringnovel/str
str name=querystringnovel/str
−
str name=parsedquery
+DisjunctionMaxQuery((text:novel^0.5 | title_s:novel^5.0 |
id:novel^10.0)~0.01) ()
/str
−
str name=parsedquery_toString
+(text:novel^0.5 | title_s:novel^5.0 | id:novel^10.0)~0.01 ()
/str
−
lst name=explain
−
str name=33395

0.014641666 = (MATCH) sum of:
  0.014641666 = (MATCH) max plus 0.01 times others of:
0.014641666 = (MATCH) weight(text:novel^0.5 in 114927), product of:
  0.01362607 = queryWeight(text:novel^0.5), product of:
0.5 = boost
3.4734163 = idf(docFreq=10634, numDocs=43213)
0.007845918 = queryNorm
  1.0745333 = (MATCH) fieldWeight(text:novel in 114927), product of:
1.4142135 = tf(termFreq(text:novel)=2)
3.4734163 = idf(docFreq=10634, numDocs=43213)
0.21875 = fieldNorm(field=text, doc=114927)
/str
etc.

I should have a debug below with a search of the term into 'title_s' and
'id' no?

Thanks for your answers !
Vincent
-- 
View this message in context: 
http://www.nabble.com/How-to-change-the-weight-of-the-fields---tp23619971p23649624.html
Sent from the Solr - User mailing list archive at Nabble.com.



Strange Phrase Query Issue with Dismax

2009-05-21 Thread dabboo

Hi,

I am facing very strange issue on solr, not sure if it is already a bug.

If I am searching for 'Top 500' then it returns all the records which
contains either of these anywhere, which is fine.

But if I search for 'Top 500 Companies' in any order, it gives me all those
records, which contains these 3 words in any one of the field, irrespective
of sequence. In this case, it is not returning me the records, which
contains either of these word (which actually is my requirement).

Please suggest.

Thanks,
Amit Garg
-- 
View this message in context: 
http://www.nabble.com/Strange-Phrase-Query-Issue-with-Dismax-tp23650114p23650114.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
On Wed, May 20, 2009 at 11:18 AM, James X
hello.nigerian.spamm...@gmail.com wrote:
 Hi Mike, thanks for the quick response:

 $ java -version
 java version 1.6.0_11
 Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)

 I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
 hitting that yet!

The issue didn't spell this out very well -- I've added a comment.

 The exception always reports 0 length, but the number of of docs varies,
 heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
     89 1 docs vs 0 length
     20 2 docs vs 0 length
      9 3 docs vs 0 length
      1 4 docs vs 0 length
      3 5 docs vs 0 length
      2 6 docs vs 0 length
      1 7 docs vs 0 length
      1 9 docs vs 0 length
      1 10 docs vs 0 length

Hmm... odd that it's always 0 file length.  What filesystem  IO
devices is the index being written to?

 The only unusual thing I can think of that we're doing with Solr is
 aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a
 pattern between core admin operations and these exceptions, however...

I think from Lucene's standpoint this just means creating  closing
lots of IndexWriters?  (Which should be just fine).

What are your documents like?  Ie, how many and what type of fields?
Are you adding docs from multiple threads?  (Solr would do so, I
believe, so I guess: is your client that's submitting docs to a given
core, doing so with multiple threads?).

Mike


Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
Another question: are there any other exceptions in your logs?  Eg
problems adding certain documents, or anything?

Mike

On Wed, May 20, 2009 at 11:18 AM, James X
hello.nigerian.spamm...@gmail.com wrote:
 Hi Mike, thanks for the quick response:

 $ java -version
 java version 1.6.0_11
 Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)

 I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
 hitting that yet!

 The exception always reports 0 length, but the number of of docs varies,
 heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
     89 1 docs vs 0 length
     20 2 docs vs 0 length
      9 3 docs vs 0 length
      1 4 docs vs 0 length
      3 5 docs vs 0 length
      2 6 docs vs 0 length
      1 7 docs vs 0 length
      1 9 docs vs 0 length
      1 10 docs vs 0 length

 The only unusual thing I can think of that we're doing with Solr is
 aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a
 pattern between core admin operations and these exceptions, however...

 James

 On Wed, May 20, 2009 at 2:37 AM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 Hmm... somehow Lucene is flushing a new segment on closing the
 IndexWriter, and thinks 1 doc had been added to the stored fields
 file, yet the fdx file is the wrong size (0 bytes).  This check (
 exception) are designed to prevent corruption from entering the index,
 so it's at least good to see CheckIndex passes after this.

 I don't think you're hitting LUCENE-1521: that issue only happens if a
 single segment has more than ~268 million docs.

 Which exact JRE version are you using?

 When you hit this exception, is it always 1 docs vs 0 length in bytes?

 Mike

 On Wed, May 20, 2009 at 3:19 AM, James X
 hello.nigerian.spamm...@gmail.com wrote:
  Hello all,I'm running Solr 1.3 in a multi-core environment. There are up
 to
  2000 active cores in each Solr webapp instance at any given time.
 
  I've noticed occasional errors such as:
  SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
 docs
  vs 0 length in bytes of _h.fdx
         at
 
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
         at
 
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
         at
 
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
         at
 
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
         at
  org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
         at
  org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
         at
 org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
         at
  org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
         at
  org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
 
  during commit / optimise operations.
 
  These errors then cause cascading errors during updates on the offending
  cores:
  SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
 timed
  out: SingleInstanceLock: write.lock
         at org.apache.lucene.store.Lock.obtain(Lock.java:85)
         at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070)
         at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924)
         at
  org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116)
         at
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
 
  This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but
 when I
  upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains.
 
  CheckIndex doesn't find any problems with the index, and problems
 disappear
  after an (inconvenient, for me) restart of Solr.
 
  Firstly, can I as the symptoms are so close to those in 1521, can I check
 my
  Lucene upgrade method should work:
  - unzip the Solr 1.3 war
  - remove the Lucene 2.4dev jars
  (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries,
  lucene-memory,lucene-highlighter, lucene-analyzers)
  - move in the Lucene 2.4.1 jars
  - rezip the directory structures as solr.war.
 
  I think this has worked, as solr/default/admin/registry.jsp shows:
   lucene-spec-version2.4.1/lucene-spec-version
   lucene-impl-version2.4.1 750176 - 2009-03-04
  21:56:52/lucene-impl-version
 
  Secondly, if this Lucene fix isn't the right solution to this problem,
 can
  anyone suggest an alternative approach? The only problems I've had up to
 now
  is to do with the number of allowed file handles, which was fixed by
  changing limits.conf (RHEL machine).
 
  Many thanks!
  James
 




Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread Michael McCandless
If you're able to run a patched version of Lucene, can you apply the
attached patch, run it, get the issue to happen again, and post back
the resulting exception?

It only adds further diagnostics to that RuntimeException you're hitting.

Another thing to try is turning on assertions, which may very well
catch the issue sooner.

Mike

On Wed, May 20, 2009 at 11:18 AM, James X
hello.nigerian.spamm...@gmail.com wrote:
 Hi Mike, thanks for the quick response:

 $ java -version
 java version 1.6.0_11
 Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)

 I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
 hitting that yet!

 The exception always reports 0 length, but the number of of docs varies,
 heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
     89 1 docs vs 0 length
     20 2 docs vs 0 length
      9 3 docs vs 0 length
      1 4 docs vs 0 length
      3 5 docs vs 0 length
      2 6 docs vs 0 length
      1 7 docs vs 0 length
      1 9 docs vs 0 length
      1 10 docs vs 0 length

 The only unusual thing I can think of that we're doing with Solr is
 aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot a
 pattern between core admin operations and these exceptions, however...

 James

 On Wed, May 20, 2009 at 2:37 AM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 Hmm... somehow Lucene is flushing a new segment on closing the
 IndexWriter, and thinks 1 doc had been added to the stored fields
 file, yet the fdx file is the wrong size (0 bytes).  This check (
 exception) are designed to prevent corruption from entering the index,
 so it's at least good to see CheckIndex passes after this.

 I don't think you're hitting LUCENE-1521: that issue only happens if a
 single segment has more than ~268 million docs.

 Which exact JRE version are you using?

 When you hit this exception, is it always 1 docs vs 0 length in bytes?

 Mike

 On Wed, May 20, 2009 at 3:19 AM, James X
 hello.nigerian.spamm...@gmail.com wrote:
  Hello all,I'm running Solr 1.3 in a multi-core environment. There are up
 to
  2000 active cores in each Solr webapp instance at any given time.
 
  I've noticed occasional errors such as:
  SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
 docs
  vs 0 length in bytes of _h.fdx
         at
 
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
         at
 
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
         at
 
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
         at
 
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
         at
  org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
         at
  org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
         at
 org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
         at
  org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
         at
 org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
         at
  org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
 
  during commit / optimise operations.
 
  These errors then cause cascading errors during updates on the offending
  cores:
  SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
 timed
  out: SingleInstanceLock: write.lock
         at org.apache.lucene.store.Lock.obtain(Lock.java:85)
         at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1070)
         at
 org.apache.lucene.index.IndexWriter.init(IndexWriter.java:924)
         at
  org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116)
         at
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
 
  This looks like http://issues.apache.org/jira/browse/LUCENE-1521, but
 when I
  upgraded Lucene to 2.4.1 under Solr 1.3, the issue still remains.
 
  CheckIndex doesn't find any problems with the index, and problems
 disappear
  after an (inconvenient, for me) restart of Solr.
 
  Firstly, can I as the symptoms are so close to those in 1521, can I check
 my
  Lucene upgrade method should work:
  - unzip the Solr 1.3 war
  - remove the Lucene 2.4dev jars
  (lucene-core, lucene-spellchecker, lucene-snowball, lucene-queries,
  lucene-memory,lucene-highlighter, lucene-analyzers)
  - move in the Lucene 2.4.1 jars
  - rezip the directory structures as solr.war.
 
  I think this has worked, as solr/default/admin/registry.jsp shows:
   lucene-spec-version2.4.1/lucene-spec-version
   lucene-impl-version2.4.1 750176 - 2009-03-04
  21:56:52/lucene-impl-version
 
  Secondly, if this Lucene fix isn't the right solution to this problem,
 can
  anyone suggest an alternative approach? The only problems I've 

Re: best way to cache base queries (before application of filters)

2009-05-21 Thread Yonik Seeley
On Thu, May 21, 2009 at 3:30 AM, Kent Fitch kent.fi...@gmail.com wrote:
  #2) Your problem might be able to be solved with field collapsing on
  the category field in the future (but it's not in Solr yet).
 Sorry - I didnt understand this

A single relevancy search, but group or collapse results based on the
value of the category field such that you don't get more than 10
results for each value of category.

but it's not in Solr yet...
http://issues.apache.org/jira/browse/SOLR-236

 - we've got one query we want filtered 5 ways to find the top scoring
 results matching the query and each filter

The problem is that caching the base query involves caching not only
all of the matching documents, but the score for each document.
That's expensive.

You could also write your own HitCollector that filtered the results
of the base query 5 different ways simultaneously.

-Yonik
http://www.lucidimagination.com


Re: master/slave failure scenario

2009-05-21 Thread nk 11
Just curious. What would be the disadvantages of a no replication / multi
master (no slave) setup?
The client code should do the updates for evey master ofc, but if one
machine would fail then I can imediatly continue the indexing process and
also I can query the index on any machine for a valid result.
I might be missing something...
On Thu, May 14, 2009 at 4:19 PM, nk 11 nick.cass...@gmail.com wrote:

 wow! that was just a couple of days old!
 thanks as lot!
   2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 yeah there is a hack

 https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708316

 On Thu, May 14, 2009 at 6:07 PM, nk 11 nick.cass...@gmail.com wrote:
  sorry for the mail. I wanted to hit reply :(
 
  On Thu, May 14, 2009 at 3:37 PM, nk 11 nick.cass...@gmail.com wrote:
 
  oh, so the configuration must be manualy changed?
  Can't something be passed at (re)start time?
 
  2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  On Thu, May 14, 2009 at 4:07 PM, nk 11 nick.cass...@gmail.com
 wrote:
   Ok so the VIP will point to the new master. but what makes a slave
   promoted
   to a master? Only the fact that it will receive add/update requests?
   And I suppose that this hot promotion is possible only if the
 slave
   is
   convigured as master also...
  right.. By default you can setup all slaves to be master also. It does
  not cost anything if it is not serving any requests.
 
  so , if you have such a setting you will have to disable that slave to
  be a slave and restart it and you will have to make the VIP point to
  this new slave as master.
 
  so hot promotion is still not possible.
  
   2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
  
   ideally , we don't do that.
   you can just keep the master host behind a VIP so if you wish to
   change the master make the VIP point to the new host
  
   On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass...@gmail.com
   wrote:
This is more interesting.Such a procedure would involve taking
 down
and
reconfiguring the slave?
   
On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
btal...@aeriagames.comwrote:
   
Or ...
   
1. Promote existing slave to new master
2. Add new slave to cluster
   
   
   
   
-Bryan
   
   
   
   
   
On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
   
 - Migrate configuration files from old master (or backup) to
 new
master.
- Replicate from a slave to the new master.
- Resume indexing to new master.
   
-Jay
   
On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com
 
wrote:
   
 Nice.
What if the master fails permanently (like a disk crash...)
 and
the
new
master is a clean machine?
2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
   
 On Wed, May 13, 2009 at 12:10 PM, nk 11 
 nick.cass...@gmail.com
wrote:
   
Hello
   
I'm kind of new to Solr and I've read about replication, and
the
fact
   
that a
   
node can act as both master and slave.
I a replica fails and then comes back on line I suppose that
 it
will
   
resyncs
   
with the master.
   
right
   
   
But what happnes if the master fails? A slave that is
configured as
   
master
   
will kick in? What if that slave is not yes fully sync'ed
 with
the
   
failed
   
master and has old data?
   
if the master fails you can't index the data. but the slaves
will
continue serving the requests with the last index. You an
 bring
back
the master up and resume indexing.
   
   
What happens when the original master comes back on line? He
will
   
remain
   
a
   
slave because there is another node with the master role?
   
Thank you!
   
   
   
   
--
-
Noble Paul | Principal Engineer| AOL | http://aol.com
   
   
   
   
   
  
  
  
   --
   -
   Noble Paul | Principal Engineer| AOL | http://aol.com
  
  
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 
 



 --
  -
 Noble Paul | Principal Engineer| AOL | http://aol.com





Customizing SOLR-236 field collapsing

2009-05-21 Thread Marc Sturlese

Hey there,
I have been testing the last adjacent field collapsing patch in trunk and
seems to work perfectly. I am trying to modify the function of it but don't
know exactly how to do it. What I would like to do is instead of collapse
the results send them to the end of the results cue.
Aparently it is not possible to do that due to the way it is implemented. I
have noticed that you get a DocSet of the ids that survived the collapsing
and that match the query and filters (collapseFilterDocSet =
collapseFilter.getDocSet();, you get it in CollapseComponent.java.
Once it is done the search is excuted again, this time the DocSet obtained
before is passed as a filter:

DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(),
  collapseFilterDocSet
== null? rb.getFilters(): null,
  collapseFilterDocSet,
 
rb.getSortSpec().getSort(),
 
rb.getSortSpec().getOffset(),
 
rb.getSortSpec().getCount(),
  rb.getFieldFlags());

The result of this search will give you the final result (with the correct
offset and start).
I have thought in saving the collapsed docs in another DocSet and after do
something with them... but don't know how to manage it.
Any clue about how could I reach the goal?
Thanks in advance
-- 
View this message in context: 
http://www.nabble.com/Customizing-SOLR-236-field-collapsing-tp23653220p23653220.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to index large set data

2009-05-21 Thread Erick Erickson
This isn't much data to go on. Do you have any idea what your throughput is?How
many documents are you indexing? one 45G doc or 4.5 billion 10 character
docs?
Have you looked at any profiling data to see how much memory is being
consumed?
Are you IO bound or CPU bound?

Best
Erick

On Thu, May 21, 2009 at 2:18 AM, Jianbin Dai djian...@yahoo.com wrote:


 Hi,

 I have about 45GB xml files to be indexed. I am using DataImportHandler. I
 started the full import 4 hours ago, and it's still running
 My computer has 4GB memory. Any suggestion on the solutions?
 Thanks!

 JB







Re: Plugin Not Found

2009-05-21 Thread Jeff Newburn
Nothing else is in the lib directory but this one jar.

Additionally, the logs seem to say that it finds the lib as shown below
INFO: Solr home set to '/home/zetasolr/'
May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr
classloader

However as soon as it tries the component it cannot find the class.

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
 Reply-To: solr-user@lucene.apache.org
 Date: Thu, 21 May 2009 10:19:19 +0530
 To: solr-user@lucene.apache.org
 Subject: Re: Plugin Not Found
 
 what else is there in the solr.home/lib other than this component?
 
 On Wed, May 20, 2009 at 9:08 PM, Jeff Newburn jnewb...@zappos.com wrote:
 I tried to change the package name to com.zappos.solr.
 
 When I declared the search component with:
 searchComponent name=facetcube
 class=com.zappos.solr.FacetCubeComponent/
 
 I get:
 SEVERE: org.apache.solr.common.SolrException: Unknown Search Component:
 facetcube
    at org.apache.solr.core.SolrCore.getSearchComponent(SolrCore.java:874)
    at
 org.apache.solr.handler.component.SearchHandler.inform(SearchHandler.java:12
 7)
    at
 
 
 When I declare the component with solr.FacetCubeComponent I get the same
 error message.
 
 When we turned on trace we got the same exception plus
 Caused by: java.lang.ClassNotFoundException:
 com.zappos.solr.FacetCubeComponent
    at
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav
 a:1360)
    at
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav
 a:1206)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:29
 4)
    ... 27 more
 
 
 
 --
 Jeff Newburn
 Software Engineer, Zappos.com
 jnewb...@zappos.com - 702-943-7562
 
 
 From: Grant Ingersoll gsing...@apache.org
 Reply-To: solr-user@lucene.apache.org
 Date: Wed, 20 May 2009 10:38:30 -0400
 To: solr-user@lucene.apache.org
 Subject: Re: Plugin Not Found
 
 Just a wild guess here, but...
 
 Try doing one of two things:
 1. change the package name to be something other than o.a.s
 2. Change your config to use solr.FacetCubeComponent
 
 You might also try turning on trace level logging for the
 SolrResourceLoader and report back the output.
 
 -Grant
 
 On May 20, 2009, at 10:20 AM, Jeff Newburn wrote:
 
 Error is below. This error does not appear when I manually copy the
 jar file
 into the tomcat webapp directory only when I try to put it in the
 solr.home
 lib directory.
 
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.component.FacetCubeComponent'
    at
 org
 .apache
 .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:31
 0)
    at
 org
 .apache
 .solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:
 325)
    at
 org
 .apache
 .solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader
 .java:84)
    at
 org
 .apache
 .solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.j
 ava:141)
    at
 org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:841)
    at org.apache.solr.core.SolrCore.init(SolrCore.java:528)
    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:
 350)
    at org.apache.solr.core.CoreContainer.load(CoreContainer.java:227)
    at
 org.apache.solr.core.CoreContainer
 $Initializer.initialize(CoreContainer.java
 :107)
    at
 org
 .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:
 69)
    at
 org
 .apache
 .catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilter
 Config.java:275)
    at
 org
 .apache
 .catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFil
 terConfig.java:397)
    at
 org
 .apache
 .catalina.core.ApplicationFilterConfig.init(ApplicationFilterCon
 fig.java:108)
    at
 org
 .apache
 .catalina.core.StandardContext.filterStart(StandardContext.java:37
 09)
    at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:
 4356)
    at
 org
 .apache
 .catalina.core.ContainerBase.addChildInternal(ContainerBase.java:7
 91)
    at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:
 771)
    at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525)
    at
 org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:829)
    at
 org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:718)
    at
 org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:490)
    at org.apache.catalina.startup.HostConfig.start(HostConfig.java:
 1147)
    at
 org
 .apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:
 311)
    at
 org
 .apache
 .catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSuppor
 

Re: master/slave failure scenario

2009-05-21 Thread Bryan Talbot
Indexing is usually much more expensive that replication so it won't  
scale well as you add more servers.  Also, what would a client do if  
it was able to send the update to only some of the servers because  
others were down (for maintenance, etc)?




-Bryan




On May 21, 2009, at May 21, 6:04 AM, nk 11 wrote:

Just curious. What would be the disadvantages of a no replication /  
multi

master (no slave) setup?
The client code should do the updates for evey master ofc, but if one
machine would fail then I can imediatly continue the indexing  
process and

also I can query the index on any machine for a valid result.
I might be missing something...
On Thu, May 14, 2009 at 4:19 PM, nk 11 nick.cass...@gmail.com wrote:


wow! that was just a couple of days old!
thanks as lot!
 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ्  
noble.p...@corp.aol.com



yeah there is a hack

https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel 
#action_12708316


On Thu, May 14, 2009 at 6:07 PM, nk 11 nick.cass...@gmail.com  
wrote:

sorry for the mail. I wanted to hit reply :(

On Thu, May 14, 2009 at 3:37 PM, nk 11 nick.cass...@gmail.com  
wrote:


oh, so the configuration must be manualy changed?
Can't something be passed at (re)start time?

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ्  
noble.p...@corp.aol.com


On Thu, May 14, 2009 at 4:07 PM, nk 11 nick.cass...@gmail.com

wrote:
Ok so the VIP will point to the new master. but what makes a  
slave

promoted
to a master? Only the fact that it will receive add/update  
requests?

And I suppose that this hot promotion is possible only if the

slave

is
convigured as master also...
right.. By default you can setup all slaves to be master also.  
It does

not cost anything if it is not serving any requests.

so , if you have such a setting you will have to disable that  
slave to
be a slave and restart it and you will have to make the VIP  
point to

this new slave as master.

so hot promotion is still not possible.


2009/5/14 Noble Paul നോബിള്‍ नोब्ळ्  
noble.p...@corp.aol.com


ideally , we don't do that.
you can just keep the master host behind a VIP so if you wish  
to

change the master make the VIP point to the new host

On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass. 
1...@gmail.com

wrote:

This is more interesting.Such a procedure would involve taking

down

and
reconfiguring the slave?

On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
btal...@aeriagames.comwrote:


Or ...

1. Promote existing slave to new master
2. Add new slave to cluster




-Bryan





On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:

- Migrate configuration files from old master (or backup) to

new

master.

- Replicate from a slave to the new master.
- Resume indexing to new master.

-Jay

On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com



wrote:

Nice.

What if the master fails permanently (like a disk crash...)

and

the
new
master is a clean machine?
2009/5/13 Noble Paul നോബിള്‍ नो 
ब्ळ् noble.p...@corp.aol.com


On Wed, May 13, 2009 at 12:10 PM, nk 11 

nick.cass...@gmail.com

wrote:



Hello

I'm kind of new to Solr and I've read about  
replication, and

the
fact


that a


node can act as both master and slave.
I a replica fails and then comes back on line I suppose  
that

it

will


resyncs


with the master.


right



But what happnes if the master fails? A slave that is
configured as


master


will kick in? What if that slave is not yes fully sync'ed

with

the


failed



master and has old data?


if the master fails you can't index the data. but the  
slaves

will
continue serving the requests with the last index. You an

bring

back
the master up and resume indexing.


What happens when the original master comes back on  
line? He

will


remain



a


slave because there is another node with the master role?

Thank you!





--
-
Noble Paul | Principal Engineer| AOL | http://aol.com












--
-
Noble Paul | Principal Engineer| AOL | http://aol.com







--
-
Noble Paul | Principal Engineer| AOL | http://aol.com









--
-
Noble Paul | Principal Engineer| AOL | http://aol.com








RE: Creating a distributed search in a searchComponent

2009-05-21 Thread siping liu

I was looking for answer to the same question, and have similar concern. Looks 
like any serious customization work requires developing custom SearchComponent, 
but it's not clear to me how Solr designer wanted this to be done. I have more 
confident to either do it at Lucene level, or stay on client side and using 
something like Multi-core (as discussed here 
http://wiki.apache.org/solr/MultipleIndexes).


 
 Date: Wed, 20 May 2009 13:47:20 -0400
 Subject: RE: Creating a distributed search in a searchComponent
 From: nicholas.bai...@rackspace.com
 To: solr-user@lucene.apache.org
 
 It seems I sent this out a bit too soon. After looking at the source it seems 
 there are two seperate paths for distributed and regular queries, however the 
 prepare method for for all components is run before the shards parameter is 
 checked. So I can build the shards portion by using the prepare method of the 
 my own search component. 
 
 However I'm not sure if this is the greatest idea in case solr changes at 
 some point.
 
 -Nick
 
 -Original Message-
 From: Nick Bailey nicholas.bai...@rackspace.com
 Sent: Wednesday, May 20, 2009 1:29pm
 To: solr-user@lucene.apache.org
 Subject: Creating a distributed search in a searchComponent
 
 Hi,
 
 I am wondering if it is possible to basically add the distributed portion of 
 a search query inside of a searchComponent.
 
 I am hoping to build my own component and add it as a first-component to the 
 StandardRequestHandler. Then hopefully I will be able to use this component 
 to build the shards parameter of the query and have the Handler then treat 
 the query as a distributed search. Anyone have any experience or know if this 
 is possible?
 
 Thanks,
 Nick
 
 
 

_
Hotmail® has ever-growing storage! Don’t worry about storage limits.
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage1_052009

Re: Plugin Not Found

2009-05-21 Thread Mark Miller

Jeff Newburn wrote:

Nothing else is in the lib directory but this one jar.

Additionally, the logs seem to say that it finds the lib as shown below
INFO: Solr home set to '/home/zetasolr/'
May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr
classloader

However as soon as it tries the component it cannot find the class.

  
Something must be wacky. I just did a quick custom component with 1.3 
and trunk, and it loaded no problem in both cases.


Anything odd about your Component? Your sure it extends SearchComponent?

As Noble mentioned, you will not be able to find other classes/jars in 
the solr.home/lib directory from a class/jar in the solr.home/lib 
directory. But this, oddly, doesn't appear to be the issue your facing.


Do share if you have anything else you can add.

--
- Mark

http://www.lucidimagination.com





Re: Creating a distributed search in a searchComponent

2009-05-21 Thread Shalin Shekhar Mangar
On Wed, May 20, 2009 at 10:59 PM, Nick Bailey nicholas.bai...@rackspace.com
 wrote:

 Hi,

 I am wondering if it is possible to basically add the distributed portion
 of a search query inside of a searchComponent.

 I am hoping to build my own component and add it as a first-component to
 the StandardRequestHandler.  Then hopefully I will be able to use this
 component to build the shards parameter of the query and have the Handler
 then treat the query as a distributed search.  Anyone have any experience or
 know if this is possible?


You can also add a ServletFilter before SolrDispatchFilter and add the
parameters before Solr processes the query.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Creating a distributed search in a searchComponent

2009-05-21 Thread Shalin Shekhar Mangar
Also look at SOLR-565 and see if that helps you.

https://issues.apache.org/jira/browse/SOLR-565

On Thu, May 21, 2009 at 9:58 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:


 On Wed, May 20, 2009 at 10:59 PM, Nick Bailey 
 nicholas.bai...@rackspace.com wrote:

 Hi,

 I am wondering if it is possible to basically add the distributed portion
 of a search query inside of a searchComponent.

 I am hoping to build my own component and add it as a first-component to
 the StandardRequestHandler.  Then hopefully I will be able to use this
 component to build the shards parameter of the query and have the Handler
 then treat the query as a distributed search.  Anyone have any experience or
 know if this is possible?


 You can also add a ServletFilter before SolrDispatchFilter and add the
 parameters before Solr processes the query.

 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.


Re: what does the version parameter in the query mean?

2009-05-21 Thread Jay Hill
I was interested in this recently and also couldn't find anything on the
wiki. I found this in the list archive:

The version parameter determines the XML protocol used in the response.
Clients are strongly encouraged to ''always'' specify the protocol version,
so as to ensure that the format of the response they receive does not change
unexpectedly if/when the Solr server is upgraded.

Here is a link to the archive:
http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg00518.html

-Jay


On Thu, May 21, 2009 at 1:06 AM, Anshuman Manur anshuman_ma...@stragure.com
 wrote:

 Hello all,

 I'm using Solr 1.3.0, and when I query my index for solr using the admin
 page, the query string in the address bar of my browser reads like this:


 http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10indent=on

 Now, I don't know what version=2.2 means, and the wiki or the docs don't
 tell me. Could someone enlighten me?

 Thank You
 Anshuman Manur



No sanity checks before replicating files?

2009-05-21 Thread Damien Tournoud
Hi list,

We have deployed an experimental Solr 1.4 cluster (a master/slave
setup, with automatic promotion of the slave as a master in case of
failure) on drupal.org, to manage our medium size index (3GB, about
400K documents).

One of the problem we are facing is that there seems to be no sanity
checks before downloading files. Take the following scenario:

 - initial situation: s1 is master, s2 is slave
 - s1 fails, the virtual IP falls back to s2
 - some updates happen on s2
 - suppose now that s1 gets back online, s2 tries to replicate from
s1, but after replicating all the files (3GB), the commit fails
because the local index has been locally updated, the replication
fails, but the process restarts at the next poll (redownload all the
index files, fails again...) and so on

We are considering configuring each server to replicate from the
virtual IP, which should solve that issue for us, but couldn't the
slave do some sanity checks before trying to download all the files
from the master?

Thanks in advance for any help you could provide,

Damien Tournoud


Re: master/slave failure scenario

2009-05-21 Thread nk 11
You are right... I just don't like the idea of stopping the indexing process
if the master fails until a new one is started (more or less by hand).

On Thu, May 21, 2009 at 6:49 PM, Bryan Talbot btal...@aeriagames.comwrote:

 Indexing is usually much more expensive that replication so it won't scale
 well as you add more servers.  Also, what would a client do if it was able
 to send the update to only some of the servers because others were down (for
 maintenance, etc)?



 -Bryan





 On May 21, 2009, at May 21, 6:04 AM, nk 11 wrote:

  Just curious. What would be the disadvantages of a no replication / multi
 master (no slave) setup?
 The client code should do the updates for evey master ofc, but if one
 machine would fail then I can imediatly continue the indexing process and
 also I can query the index on any machine for a valid result.
 I might be missing something...
 On Thu, May 14, 2009 at 4:19 PM, nk 11 nick.cass...@gmail.com wrote:

  wow! that was just a couple of days old!
 thanks as lot!
  2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

  yeah there is a hack


 https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel
 #action_12708316

 On Thu, May 14, 2009 at 6:07 PM, nk 11 nick.cass...@gmail.com wrote:

 sorry for the mail. I wanted to hit reply :(

 On Thu, May 14, 2009 at 3:37 PM, nk 11 nick.cass...@gmail.com wrote:


 oh, so the configuration must be manualy changed?
 Can't something be passed at (re)start time?

 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com


 On Thu, May 14, 2009 at 4:07 PM, nk 11 nick.cass...@gmail.com

 wrote:

 Ok so the VIP will point to the new master. but what makes a slave
 promoted
 to a master? Only the fact that it will receive add/update requests?
 And I suppose that this hot promotion is possible only if the

 slave

 is
 convigured as master also...

 right.. By default you can setup all slaves to be master also. It
 does
 not cost anything if it is not serving any requests.

 so , if you have such a setting you will have to disable that slave
 to
 be a slave and restart it and you will have to make the VIP point to
 this new slave as master.

 so hot promotion is still not possible.


 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com


 ideally , we don't do that.
 you can just keep the master host behind a VIP so if you wish to
 change the master make the VIP point to the new host

 On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass...@gmail.com
 wrote:

 This is more interesting.Such a procedure would involve taking

 down

 and
 reconfiguring the slave?

 On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
 btal...@aeriagames.comwrote:

  Or ...

 1. Promote existing slave to new master
 2. Add new slave to cluster




 -Bryan





 On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:

 - Migrate configuration files from old master (or backup) to

 new

 master.

 - Replicate from a slave to the new master.
 - Resume indexing to new master.

 -Jay

 On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com


  wrote:

 Nice.

 What if the master fails permanently (like a disk crash...)

 and

 the
 new
 master is a clean machine?
 2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 On Wed, May 13, 2009 at 12:10 PM, nk 11 

 nick.cass...@gmail.com

 wrote:


  Hello

 I'm kind of new to Solr and I've read about replication, and
 the
 fact

  that a

  node can act as both master and slave.
 I a replica fails and then comes back on line I suppose that

 it

 will

  resyncs

  with the master.

  right


 But what happnes if the master fails? A slave that is
 configured as

  master

  will kick in? What if that slave is not yes fully sync'ed

 with

 the

  failed


  master and has old data?


  if the master fails you can't index the data. but the slaves
 will
 continue serving the requests with the last index. You an

 bring

 back
 the master up and resume indexing.


  What happens when the original master comes back on line? He
 will

  remain


  a

  slave because there is another node with the master role?

 Thank you!




 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com








 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com






 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com







 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com







Re: Customizing SOLR-236 field collapsing

2009-05-21 Thread Marc Sturlese

Yes, I have tried it but I see couple of problems doing that.

I will have to do more searches so response time will increase. 

The second thing is that, imagine I show the results collapsed in page one
and put a button to see the non collapsed results. If later results for the
second page are requested, some results from the non collapsed request would
be the same that some results that apeared in the first page doing
collapsing:

collapsing page 1 shows docs:
1-2-3-6-7

non collapsing results page 1 shows docs:
1-2-3-4-5

collapsing results page 2 shows docs:
8-9-10-11-12

non collapsing results page 2 show docs:
6-7-8-9-10

I want to avoid that and make the response as fast as possible. That is the
reason because I want to send the collapsed docs to the end of the queue...

Thanks



Thomas Traeger-2 wrote:
 
 Is adding QueryComponent to your SearchComponents an option? When 
 combined with the CollapseComponent this approach would return the 
 collapsed and the complete result set.
 
 i.e.:
 
 arr name=components
   strcollapse/str
   strquery/str
   strfacet/str
   strmlt/str
   strhighlight/str
 /arr
 
 Thomas
 
 Marc Sturlese schrieb:
 Hey there,
 I have been testing the last adjacent field collapsing patch in trunk and
 seems to work perfectly. I am trying to modify the function of it but
 don't
 know exactly how to do it. What I would like to do is instead of collapse
 the results send them to the end of the results cue.
 Aparently it is not possible to do that due to the way it is implemented.
 I
 have noticed that you get a DocSet of the ids that survived the
 collapsing
 and that match the query and filters (collapseFilterDocSet =
 collapseFilter.getDocSet();, you get it in CollapseComponent.java.
 Once it is done the search is excuted again, this time the DocSet
 obtained
 before is passed as a filter:

 DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(),
  
 collapseFilterDocSet
 == null? rb.getFilters(): null,
  
 collapseFilterDocSet,
  
 rb.getSortSpec().getSort(),
  
 rb.getSortSpec().getOffset(),
  
 rb.getSortSpec().getCount(),
  
 rb.getFieldFlags());

 The result of this search will give you the final result (with the
 correct
 offset and start).
 I have thought in saving the collapsed docs in another DocSet and after
 do
 something with them... but don't know how to manage it.
 Any clue about how could I reach the goal?
 Thanks in advance
   
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Customizing-SOLR-236-field-collapsing-tp23653220p23656522.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: master/slave failure scenario

2009-05-21 Thread Otis Gospodnetic

Hi,

You should be able to do the following.
Put masters behind a load balancer (LB).
Create a LB VIP and a pool with 2 masters, masterA  masterB with a rule that 
all requests always go to A unless A is down.  If If A is down they go to B.
Bring up master instances A and B on 2 servers and make them point to the 
shared storage.

masterA \
   \-- shared storage
   /
masterB /

Your indexing client doesn't talk to the servers directly. It talks through the 
VIP you created in LB.
At any one time only one of the masters is active.
If A goes down, LB detects it and makes B active.
Your indexer may have to reconnect if it detects a failure, maybe it would need 
to reindex some number of documents if they didn't make it to disk before A 
died, maybe even some lock file cleanup might be needed, but the above should 
be doable with little effort.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: nk 11 nick.cass...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, May 21, 2009 12:44:55 PM
 Subject: Re: master/slave failure scenario
 
 You are right... I just don't like the idea of stopping the indexing process
 if the master fails until a new one is started (more or less by hand).
 
 On Thu, May 21, 2009 at 6:49 PM, Bryan Talbot wrote:
 
  Indexing is usually much more expensive that replication so it won't scale
  well as you add more servers.  Also, what would a client do if it was able
  to send the update to only some of the servers because others were down (for
  maintenance, etc)?
 
 
 
  -Bryan
 
 
 
 
 
  On May 21, 2009, at May 21, 6:04 AM, nk 11 wrote:
 
   Just curious. What would be the disadvantages of a no replication / multi
  master (no slave) setup?
  The client code should do the updates for evey master ofc, but if one
  machine would fail then I can imediatly continue the indexing process and
  also I can query the index on any machine for a valid result.
  I might be missing something...
  On Thu, May 14, 2009 at 4:19 PM, nk 11 wrote:
 
   wow! that was just a couple of days old!
  thanks as lot!
   2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
 
   yeah there is a hack
 
 
  
 https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel
  #action_12708316
 
  On Thu, May 14, 2009 at 6:07 PM, nk 11 wrote:
 
  sorry for the mail. I wanted to hit reply :(
 
  On Thu, May 14, 2009 at 3:37 PM, nk 11 wrote:
 
 
  oh, so the configuration must be manualy changed?
  Can't something be passed at (re)start time?
 
  2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
 
 
  On Thu, May 14, 2009 at 4:07 PM, nk 11 
 
  wrote:
 
  Ok so the VIP will point to the new master. but what makes a slave
  promoted
  to a master? Only the fact that it will receive add/update requests?
  And I suppose that this hot promotion is possible only if the
 
  slave
 
  is
  convigured as master also...
 
  right.. By default you can setup all slaves to be master also. It
  does
  not cost anything if it is not serving any requests.
 
  so , if you have such a setting you will have to disable that slave
  to
  be a slave and restart it and you will have to make the VIP point to
  this new slave as master.
 
  so hot promotion is still not possible.
 
 
  2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् 
 
 
  ideally , we don't do that.
  you can just keep the master host behind a VIP so if you wish to
  change the master make the VIP point to the new host
 
  On Wed, May 13, 2009 at 10:52 PM, nk 11 
  wrote:
 
  This is more interesting.Such a procedure would involve taking
 
  down
 
  and
  reconfiguring the slave?
 
  On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
  wrote:
 
   Or ...
 
  1. Promote existing slave to new master
  2. Add new slave to cluster
 
 
 
 
  -Bryan
 
 
 
 
 
  On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
 
  - Migrate configuration files from old master (or backup) to
 
  new
 
  master.
 
  - Replicate from a slave to the new master.
  - Resume indexing to new master.
 
  -Jay
 
  On Wed, May 13, 2009 at 4:26 AM, nk 11 
 
 
   wrote:
 
  Nice.
 
  What if the master fails permanently (like a disk crash...)
 
  and
 
  the
  new
  master is a clean machine?
  2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् 
 
  On Wed, May 13, 2009 at 12:10 PM, nk 11 
 
  nick.cass...@gmail.com
 
  wrote:
 
 
   Hello
 
  I'm kind of new to Solr and I've read about replication, and
  the
  fact
 
   that a
 
   node can act as both master and slave.
  I a replica fails and then comes back on line I suppose that
 
  it
 
  will
 
   resyncs
 
   with the master.
 
   right
 
 
  But what happnes if the master fails? A slave that is
  configured as
 
   master
 
   will kick in? What if that slave is not yes fully sync'ed
 
  with
 
  the
 
   failed
 
 
   master and has old data?
 
 
   if the master fails you can't index the data. but the slaves
  will
  continue 

Re: No sanity checks before replicating files?

2009-05-21 Thread Otis Gospodnetic

Hi Damien,

Interesting, this is similar to my suggestion to another person I just replied 
to here on solr-user.
Have you actually run into this problem?  I haven't tried it, but I'd think the 
first next replication (copying index from s1 to s2) would not necessarily 
fail, but would simply overwrite any changes that were made on s2 while it was 
serving as the master.  Is that not what happens?  If that's what happens, then 
I think what you'd simply have to do is to:

1) bring s1 back up, but don't make it a master immediately
2) take away the master role from s2
3) make s1 copy the index from s2, since s2 might have a more up to date index 
now
4) make s1 the master


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Damien Tournoud dam...@tournoud.net
 To: solr-user@lucene.apache.org
 Sent: Thursday, May 21, 2009 12:37:10 PM
 Subject: No sanity checks before replicating files?
 
 Hi list,
 
 We have deployed an experimental Solr 1.4 cluster (a master/slave
 setup, with automatic promotion of the slave as a master in case of
 failure) on drupal.org, to manage our medium size index (3GB, about
 400K documents).
 
 One of the problem we are facing is that there seems to be no sanity
 checks before downloading files. Take the following scenario:
 
 - initial situation: s1 is master, s2 is slave
 - s1 fails, the virtual IP falls back to s2
 - some updates happen on s2
 - suppose now that s1 gets back online, s2 tries to replicate from
 s1, but after replicating all the files (3GB), the commit fails
 because the local index has been locally updated, the replication
 fails, but the process restarts at the next poll (redownload all the
 index files, fails again...) and so on
 
 We are considering configuring each server to replicate from the
 virtual IP, which should solve that issue for us, but couldn't the
 slave do some sanity checks before trying to download all the files
 from the master?
 
 Thanks in advance for any help you could provide,
 
 Damien Tournoud



clustering SOLR-769

2009-05-21 Thread Allahbaksh Asadullah
Hi,
I built Solr from SVN today morning. I am using Clustering example. I
have added my own schema.xml.

The problem is the even though I change carrot.snippet field from
features to filecontent the clustering results are not changed a bit.
Please note features field is also there in my document.

   str name=carrot.titlename/str
   !-- The field to cluster on --
   str name=carrot.snippetfeatures/str
   str name=carrot.urlid/str

Why I get the same cluster even though I have changed the
carrot.snippet. Whether there is some problem with my understarnding?

Regards,
allahbaksh


Re: No sanity checks before replicating files?

2009-05-21 Thread Damien Tournoud
Hi Otis,

Thanks for your answer.

On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Interesting, this is similar to my suggestion to another person I just 
 replied to here on solr-user.
 Have you actually run into this problem?  I haven't tried it, but I'd think 
 the first next replication (copying index from s1 to s2) would not 
 necessarily fail, but would simply overwrite any changes that were made on s2 
 while it was serving as the master.  Is that not what happens?

No it doesn't. For some reason, Solr download all the files of the
index, but fails to commit the changes locally. At the next poll, the
process restarts. Not only does this clogs the network, but it also
unnecessarily uses resources on the newly promoted slave, until we
change its configuration.

 If that's what happens, then I think what you'd simply have to do is to:

 1) bring s1 back up, but don't make it a master immediately
 2) take away the master role from s2
 3) make s1 copy the index from s2, since s2 might have a more up to date 
 index now
 4) make s1 the master

Once s2 is the master, we want it to stay this way. We will reassign
s1 as the slave at a later stage, when resources allows. What worries
me is that strange behavior of Solr 1.4 replication when the slave
index is fresher then the master one.

Damien


Re: How to change the weight of the fields ?

2009-05-21 Thread Otis Gospodnetic

Hi,

I'm not sure why the rest of the scoring explanation is not shown, but your 
query *was* expanded to search on text and title_s, and id fields, so I think 
that expanded/rewritten query is what went to the index.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Vincent Pérès vincent.pe...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, May 21, 2009 4:34:00 AM
 Subject: Re: How to change the weight of the fields ?
 
 
 It seems I can only search on the field 'text'. With the following url :
 http://localhost:8983/solr/select/?q=novelqt=dismaxfl=title_s,idversion=2.2start=0rows=10indent=ondebugQuery=on
 
 I get answers, but on the debug area, it seems it's only searching on the
 'text' field (with or without 'qt' the results are displayed within the same
 order) :
 
 
 novel
 novel
 −
 
 +DisjunctionMaxQuery((text:novel^0.5 | title_s:novel^5.0 |
 id:novel^10.0)~0.01) ()
 
 −
 
 +(text:novel^0.5 | title_s:novel^5.0 | id:novel^10.0)~0.01 ()
 
 −
 
 −
 
 
 0.014641666 = (MATCH) sum of:
   0.014641666 = (MATCH) max plus 0.01 times others of:
 0.014641666 = (MATCH) weight(text:novel^0.5 in 114927), product of:
   0.01362607 = queryWeight(text:novel^0.5), product of:
 0.5 = boost
 3.4734163 = idf(docFreq=10634, numDocs=43213)
 0.007845918 = queryNorm
   1.0745333 = (MATCH) fieldWeight(text:novel in 114927), product of:
 1.4142135 = tf(termFreq(text:novel)=2)
 3.4734163 = idf(docFreq=10634, numDocs=43213)
 0.21875 = fieldNorm(field=text, doc=114927)
 
 etc.
 
 I should have a debug below with a search of the term into 'title_s' and
 'id' no?
 
 Thanks for your answers !
 Vincent
 -- 
 View this message in context: 
 http://www.nabble.com/How-to-change-the-weight-of-the-fields---tp23619971p23649624.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Phrase Search Issue

2009-05-21 Thread Otis Gospodnetic

Amit,

Append debugQuery=true to the search request URL and you'll see how your query 
string was interpreted.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: dabboo ag...@sapient.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, May 21, 2009 3:48:45 AM
 Subject: Re: Phrase Search Issue
 
 
 This problem is related with the default operator in dismax. Currently OR is
 the default operator and it is behaving perfectly fine. I have changed the
 default operator in schema.xml to AND, I also have changed the minimum match
 to 100%.
 
 But it seems like AND as default operator doesnt work with Dismax.
 Please suggest.
 
 Thanks,
 Amit Garg
 
 
 
 dabboo wrote:
  
  Hi,
  
  I am facing one issue in phrase query. I am entering 'Top of the world' as
  my search criteria. I am expecting it to return all the records in which,
  one field should all these words in any order. 
  
  But it is treating as OR and returning all the records, which are having
  either of these words. I am doing this using dismax request. 
  
  I would appreciate if somebody can provide me some pointers.
  
  Thanks,
  Amit Garg
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Phrase-Search-Issue-tp23648813p23649189.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: No sanity checks before replicating files?

2009-05-21 Thread Otis Gospodnetic

Aha, I see.  Perhaps you can post the error message/stack trace?

As for the sanity check, I bet a call to 
http://host:port/solr/replication?command=indexversion could be used ensure 
only newer versions of the index are being pulled.  We'll see what Paul says 
when he wakes up. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Damien Tournoud dam...@tournoud.net
 To: solr-user@lucene.apache.org
 Sent: Thursday, May 21, 2009 1:26:30 PM
 Subject: Re: No sanity checks before replicating files?
 
 Hi Otis,
 
 Thanks for your answer.
 
 On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic
 wrote:
  Interesting, this is similar to my suggestion to another person I just 
  replied 
 to here on solr-user.
  Have you actually run into this problem?  I haven't tried it, but I'd think 
 the first next replication (copying index from s1 to s2) would not 
 necessarily 
 fail, but would simply overwrite any changes that were made on s2 while it 
 was 
 serving as the master.  Is that not what happens?
 
 No it doesn't. For some reason, Solr download all the files of the
 index, but fails to commit the changes locally. At the next poll, the
 process restarts. Not only does this clogs the network, but it also
 unnecessarily uses resources on the newly promoted slave, until we
 change its configuration.
 
  If that's what happens, then I think what you'd simply have to do is to:
 
  1) bring s1 back up, but don't make it a master immediately
  2) take away the master role from s2
  3) make s1 copy the index from s2, since s2 might have a more up to date 
  index 
 now
  4) make s1 the master
 
 Once s2 is the master, we want it to stay this way. We will reassign
 s1 as the slave at a later stage, when resources allows. What worries
 me is that strange behavior of Solr 1.4 replication when the slave
 index is fresher then the master one.
 
 Damien



Re: Plugin Not Found

2009-05-21 Thread Jeff Newburn
One additional note we are on 1.4 tunk as of 5/7/2009.  Just not sure why it
won't load since it obviously works fine if directly inserted into the
WEB-INF directory.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


 From: Mark Miller markrmil...@gmail.com
 Reply-To: solr-user@lucene.apache.org
 Date: Thu, 21 May 2009 12:19:47 -0400
 To: solr-user@lucene.apache.org
 Subject: Re: Plugin Not Found
 
 Jeff Newburn wrote:
 Nothing else is in the lib directory but this one jar.
 
 Additionally, the logs seem to say that it finds the lib as shown below
 INFO: Solr home set to '/home/zetasolr/'
 May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader
 createClassLoader
 INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr
 classloader
 
 However as soon as it tries the component it cannot find the class.
 
   
 Something must be wacky. I just did a quick custom component with 1.3
 and trunk, and it loaded no problem in both cases.
 
 Anything odd about your Component? Your sure it extends SearchComponent?
 
 As Noble mentioned, you will not be able to find other classes/jars in
 the solr.home/lib directory from a class/jar in the solr.home/lib
 directory. But this, oddly, doesn't appear to be the issue your facing.
 
 Do share if you have anything else you can add.
 
 -- 
 - Mark
 
 http://www.lucidimagination.com
 
 
 



Regarding Delta-Import Query in DIH

2009-05-21 Thread jayakeerthi s
Hi All,

I understand from the details provided under
http://wiki.apache.org/solr/DataImportHandler regarding Delta-import that
there should be an additional column *last_modified* of timestamp type in
the table.

Is there any other way/method the same can be achieved without creating the
additional column *last_modified* in the tables?? please advise.


Thanks in advance


Re: Plugin Not Found

2009-05-21 Thread Grant Ingersoll
Can you share your full log (at least through startup) as well as the  
config for both the component and the ReqHandler that is using it?


-Grant

On May 21, 2009, at 3:37 PM, Jeff Newburn wrote:

One additional note we are on 1.4 tunk as of 5/7/2009.  Just not  
sure why it

won't load since it obviously works fine if directly inserted into the
WEB-INF directory.
--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562



From: Mark Miller markrmil...@gmail.com
Reply-To: solr-user@lucene.apache.org
Date: Thu, 21 May 2009 12:19:47 -0400
To: solr-user@lucene.apache.org
Subject: Re: Plugin Not Found

Jeff Newburn wrote:

Nothing else is in the lib directory but this one jar.

Additionally, the logs seem to say that it finds the lib as shown  
below

INFO: Solr home set to '/home/zetasolr/'
May 20, 2009 10:16:56 AM org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to  
Solr

classloader

However as soon as it tries the component it cannot find the class.



Something must be wacky. I just did a quick custom component with 1.3
and trunk, and it loaded no problem in both cases.

Anything odd about your Component? Your sure it extends  
SearchComponent?


As Noble mentioned, you will not be able to find other classes/jars  
in

the solr.home/lib directory from a class/jar in the solr.home/lib
directory. But this, oddly, doesn't appear to be the issue your  
facing.


Do share if you have anything else you can add.

--
- Mark

http://www.lucidimagination.com







--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: clustering SOLR-769

2009-05-21 Thread Stanislaw Osinski
Hi.


 I built Solr from SVN today morning. I am using Clustering example. I
 have added my own schema.xml.

 The problem is the even though I change carrot.snippet field from
 features to filecontent the clustering results are not changed a bit.
 Please note features field is also there in my document.

   str name=carrot.titlename/str
   !-- The field to cluster on --
   str name=carrot.snippetfeatures/str
   str name=carrot.urlid/str

 Why I get the same cluster even though I have changed the
 carrot.snippet. Whether there is some problem with my understarnding?


If you get back to the clustering dir in examples and change

str name=carrot.snippetfeatures/str

to

str name=carrot.snippetmanu/str

do you see any change in clusters?

Cheers,

Staszek

--
http://carrot2.org


Re: java.lang.RuntimeException: after flush: fdx size mismatch

2009-05-21 Thread James X
Hi Mike,Documents are web pages, about 20 fields, mostly strings, a couple
of integers, booleans and one html field (for document body content).

I do have a multi-threaded client pushing docs to Solr, so yes, I suppose
that would mean I have several active Solr worker threads.

The only exceptions I have are the RuntimeException flush errors, followed
by a handful (normally 10-20) of LockObtainFailedExceptions, which i
presumed were being caused by the faulty threads dying and failing to
release locks.

Oh wait, I am getting WstxUnexpectedCharException exceptions every now and
then:
SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
((CTRL-CHAR, code 8))
 at [row,col {unknown-source}]: [1,26070]
at
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327)

I presumed these were caused by character encoding issues, but haven't
looked into them at all yet.

Thanks again for your help! I'll make some time this afternoon to build some
patched Lucene jars and get the results


On Thu, May 21, 2009 at 5:06 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Another question: are there any other exceptions in your logs?  Eg
 problems adding certain documents, or anything?

 Mike

 On Wed, May 20, 2009 at 11:18 AM, James X
 hello.nigerian.spamm...@gmail.com wrote:
  Hi Mike, thanks for the quick response:
 
  $ java -version
  java version 1.6.0_11
  Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
  Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode)
 
  I hadn't noticed the 268m trigger for LUCENE-1521 - I'm definitely not
  hitting that yet!
 
  The exception always reports 0 length, but the number of of docs varies,
  heavily weighted towards 1 or two docs. Of the last 130 or so exceptions:
  89 1 docs vs 0 length
  20 2 docs vs 0 length
   9 3 docs vs 0 length
   1 4 docs vs 0 length
   3 5 docs vs 0 length
   2 6 docs vs 0 length
   1 7 docs vs 0 length
   1 9 docs vs 0 length
   1 10 docs vs 0 length
 
  The only unusual thing I can think of that we're doing with Solr is
  aggressively CREATE-ing and UNLOAD-ing cores. I've not been able to spot
 a
  pattern between core admin operations and these exceptions, however...
 
  James
 
  On Wed, May 20, 2009 at 2:37 AM, Michael McCandless 
  luc...@mikemccandless.com wrote:
 
  Hmm... somehow Lucene is flushing a new segment on closing the
  IndexWriter, and thinks 1 doc had been added to the stored fields
  file, yet the fdx file is the wrong size (0 bytes).  This check (
  exception) are designed to prevent corruption from entering the index,
  so it's at least good to see CheckIndex passes after this.
 
  I don't think you're hitting LUCENE-1521: that issue only happens if a
  single segment has more than ~268 million docs.
 
  Which exact JRE version are you using?
 
  When you hit this exception, is it always 1 docs vs 0 length in bytes?
 
  Mike
 
  On Wed, May 20, 2009 at 3:19 AM, James X
  hello.nigerian.spamm...@gmail.com wrote:
   Hello all,I'm running Solr 1.3 in a multi-core environment. There are
 up
  to
   2000 active cores in each Solr webapp instance at any given time.
  
   I've noticed occasional errors such as:
   SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1
  docs
   vs 0 length in bytes of _h.fdx
  at
  
 
 org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
  at
  
 
 org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
  at
  
 
 org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
  at
  
 
 org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
  at
  
 org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567)
  at
   org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3540)
  at
  org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
  at
  
 org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
  at
  org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1602)
  at
  org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1578)
  at
   org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
  
   during commit / optimise operations.
  
   These errors then cause cascading errors during updates on the
 

Re: Solr statistics of top searches and results returned

2009-05-21 Thread Grant Ingersoll


On May 20, 2009, at 4:33 AM, Shalin Shekhar Mangar wrote:


On Wed, May 20, 2009 at 1:31 PM, Plaatje, Patrick 
patrick.plaa...@getronics.com wrote:



At the moment Solr does not have such functionality. I have written a
plugin for Solr though which uses a second Solr core to store/index  
the
searches. If you're interested, send me an email and I'll get you  
the source

for the plugin.


Patrick, this will be a useful addition. However instead of doing  
this with

another core, we can keep running statistics which can be shown on the
statistics page itself. What do you think?


I think you will want some type of persistence mechanism otherwise you  
will end up consuming a lot of resources keeping track of all the  
query strings, unless I'm missing something.  Either a Lucene index  
(Solr core) or the option of embedding a DB.  Ideally, it would be  
pluggable such that people could choose their storage mechanism.  Most  
people do this kind of thing offline via log analysis as logs can grow  
quite large quite quickly.





A related approach for showing slow queries was discussed recently.  
There's

an issue open which has more details:

https://issues.apache.org/jira/browse/SOLR-1101

--
Regards,
Shalin Shekhar Mangar.


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: clustering SOLR-769

2009-05-21 Thread Allahbaksh Asadullah
Hi,
I will try this. Because when I tried it with field declared by me there was
no change. Will check out this and let you know.
Is it possbile to specify more than one snippet field or should I use copy
field to copy copy two or three field into single field and specify it in
snippet field.
Regards,
Allahbaksh

On Fri, May 22, 2009 at 2:24 AM, Stanislaw Osinski
stanis...@osinski.namewrote:

 Hi.


 I built Solr from SVN today morning. I am using Clustering example. I
 have added my own schema.xml.

 The problem is the even though I change carrot.snippet field from
 features to filecontent the clustering results are not changed a bit.
 Please note features field is also there in my document.

   str name=carrot.titlename/str
   !-- The field to cluster on --
   str name=carrot.snippetfeatures/str
   str name=carrot.urlid/str

 Why I get the same cluster even though I have changed the
 carrot.snippet. Whether there is some problem with my understarnding?


 If you get back to the clustering dir in examples and change

 str name=carrot.snippetfeatures/str

 to

 str name=carrot.snippetmanu/str

 do you see any change in clusters?

 Cheers,

 Staszek

 --
 http://carrot2.org




-- 
Allahbaksh Mohammedali Asadullah,
Software Engineering  Technology Labs,
Infosys Technolgies Limited, Electronic City,
Hosur Road, Bangalore 560 100, India.
(Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927.
Fax: 91-80-28520362 | Mobile: 91-9845505322.


getting all rows from SOLRJ client using setRows method

2009-05-21 Thread darniz

Hello 
is there a way you can get all the results back from SOLR when querying
solrJ client

my gut feeling was that this might work
query.setRows(-1)

The way is to change the configuration xml file, but that like hard coding
the configuration, and there also i have to set some valid number, i cant
say return all rows.

Is there a way to done through query.

Thanks
rashid


-- 
View this message in context: 
http://www.nabble.com/getting-all-rows-from-SOLRJ-client-using-setRows-method-tp23662668p23662668.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: getting all rows from SOLRJ client using setRows method

2009-05-21 Thread Ryan McKinley


careful what you ask for...  what if you have a million docs?  will  
you get an OOM?


Maybe a better solution is to run a loop where you grab a bunch of  
docs and then increase the start value.


but you can always use:
query.setRows( Integer.MAX_VALUE )

ryan


On May 21, 2009, at 8:37 PM, darniz wrote:



Hello
is there a way you can get all the results back from SOLR when  
querying

solrJ client

my gut feeling was that this might work
query.setRows(-1)

The way is to change the configuration xml file, but that like hard  
coding
the configuration, and there also i have to set some valid number, i  
cant

say return all rows.

Is there a way to done through query.

Thanks
rashid


--
View this message in context: 
http://www.nabble.com/getting-all-rows-from-SOLRJ-client-using-setRows-method-tp23662668p23662668.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: what does the version parameter in the query mean?

2009-05-21 Thread Anshuman Manur
ahI see! thank you so much for the response!

I'm using SolrJ, so I probably don't need to set XML version since the wiki
tells me that it uses binary as a default!

On Thu, May 21, 2009 at 10:00 PM, Jay Hill jayallenh...@gmail.com wrote:

 I was interested in this recently and also couldn't find anything on the
 wiki. I found this in the list archive:

 The version parameter determines the XML protocol used in the response.
 Clients are strongly encouraged to ''always'' specify the protocol version,
 so as to ensure that the format of the response they receive does not
 change
 unexpectedly if/when the Solr server is upgraded.

 Here is a link to the archive:
 http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg00518.html

 -Jay


 On Thu, May 21, 2009 at 1:06 AM, Anshuman Manur 
 anshuman_ma...@stragure.com
  wrote:

  Hello all,
 
  I'm using Solr 1.3.0, and when I query my index for solr using the
 admin
  page, the query string in the address bar of my browser reads like this:
 
 
 
 http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10indent=on
 
  Now, I don't know what version=2.2 means, and the wiki or the docs don't
  tell me. Could someone enlighten me?
 
  Thank You
  Anshuman Manur
 



lock problem

2009-05-21 Thread Ashish P

Hi, 
The scenario is I have 2 different solr instances running at different
locations concurrently. The data location for both instances is same:
\\hostname\FileServer\CoreTeam\Research\data.
Both instances use  EmbeddedSolrServer and locktype at both instances is
'single'.

I am getting following exception : 
Cannot overwrite: \\hostname\FileServer\CoreTeam\Research\data\index\_1.fdt
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:440)
at org.apache.lucene.index.FieldsWriter.init(FieldsWriter.java:64)
at
org.apache.lucene.index.StoredFieldsWriter.initFieldsWriter(StoredFieldsWriter.java:73)

I tried simple locktype also but it shows timeout exception when writing to
index.
Please help me out..
Thanks,
Ashish


-- 
View this message in context: 
http://www.nabble.com/lock-problem-tp23663558p23663558.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Regarding Delta-Import Query in DIH

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
the last_modified column is just one way. The query has to be
intelligent enough to detect the delta . it doesn't matter how you do
it

On Fri, May 22, 2009 at 1:32 AM, jayakeerthi s mail2keer...@gmail.com wrote:
 Hi All,

 I understand from the details provided under
 http://wiki.apache.org/solr/DataImportHandler regarding Delta-import that
 there should be an additional column *last_modified* of timestamp type in
 the table.

 Is there any other way/method the same can be achieved without creating the
 additional column *last_modified* in the tables?? please advise.


 Thanks in advance




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: No sanity checks before replicating files?

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
Let us see what is the desired behavior.

When s1 comes back up online , s2 must download a fresh copy of index
from s1 because s1 is the slave and s2 has a newer version of index
than s1.

Are you suggesting that s2 downloads the index files and then commit
fails? The code is written as follows

boolean freshDownloadneeded = myIndexGeneration = mastersIndexgeneration;

then it should be a problem

can u post the stacktrace?

On Thu, May 21, 2009 at 11:45 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Aha, I see.  Perhaps you can post the error message/stack trace?

 As for the sanity check, I bet a call to 
 http://host:port/solr/replication?command=indexversion could be used ensure 
 only newer versions of the index are being pulled.  We'll see what Paul says 
 when he wakes up. :)

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Damien Tournoud dam...@tournoud.net
 To: solr-user@lucene.apache.org
 Sent: Thursday, May 21, 2009 1:26:30 PM
 Subject: Re: No sanity checks before replicating files?

 Hi Otis,

 Thanks for your answer.

 On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic
 wrote:
  Interesting, this is similar to my suggestion to another person I just 
  replied
 to here on solr-user.
  Have you actually run into this problem?  I haven't tried it, but I'd think
 the first next replication (copying index from s1 to s2) would not 
 necessarily
 fail, but would simply overwrite any changes that were made on s2 while it 
 was
 serving as the master.  Is that not what happens?

 No it doesn't. For some reason, Solr download all the files of the
 index, but fails to commit the changes locally. At the next poll, the
 process restarts. Not only does this clogs the network, but it also
 unnecessarily uses resources on the newly promoted slave, until we
 change its configuration.

  If that's what happens, then I think what you'd simply have to do is to:
 
  1) bring s1 back up, but don't make it a master immediately
  2) take away the master role from s2
  3) make s1 copy the index from s2, since s2 might have a more up to date 
  index
 now
  4) make s1 the master

 Once s2 is the master, we want it to stay this way. We will reassign
 s1 as the slave at a later stage, when resources allows. What worries
 me is that strange behavior of Solr 1.4 replication when the slave
 index is fresher then the master one.

 Damien





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: How to index large set data

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
check the status page of DIH and see if it is working properly. and
if, yes what is the rate of indexing

On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai djian...@yahoo.com wrote:

 Hi,

 I have about 45GB xml files to be indexed. I am using DataImportHandler. I 
 started the full import 4 hours ago, and it's still running
 My computer has 4GB memory. Any suggestion on the solutions?
 Thanks!

 JB








-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: How to index large set data

2009-05-21 Thread Jianbin Dai

Hi Paul,

Thank you so much for answering my questions. It really helped.
After some adjustment, basically setting mergeFactor to 1000 from the default 
value of 10, I can finished the whole job in 2.5 hours. I checked that during 
running time, only around 18% of memory is being used, and VIRT is always 
1418m. I am thinking it may be restricted by JVM memory setting. But I run the 
data import command through web, i.e.,
http://host:port/solr/dataimport?command=full-import, how can I set the 
memory allocation for JVM? 
Thanks again!

JB

--- On Thu, 5/21/09, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com wrote:

 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
 Subject: Re: How to index large set data
 To: solr-user@lucene.apache.org
 Date: Thursday, May 21, 2009, 9:57 PM
 check the status page of DIH and see
 if it is working properly. and
 if, yes what is the rate of indexing
 
 On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai djian...@yahoo.com
 wrote:
 
  Hi,
 
  I have about 45GB xml files to be indexed. I am using
 DataImportHandler. I started the full import 4 hours ago,
 and it's still running
  My computer has 4GB memory. Any suggestion on the
 solutions?
  Thanks!
 
  JB
 
 
 
 
 
 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 






Re: How to index large set data

2009-05-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
what is the total no:of docs created ?  I guess it may not be memory
bound. indexing is mostly amn IO bound operation. You may be able to
get a better perf if a SSD is used (solid state disk)

On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai djian...@yahoo.com wrote:

 Hi Paul,

 Thank you so much for answering my questions. It really helped.
 After some adjustment, basically setting mergeFactor to 1000 from the default 
 value of 10, I can finished the whole job in 2.5 hours. I checked that during 
 running time, only around 18% of memory is being used, and VIRT is always 
 1418m. I am thinking it may be restricted by JVM memory setting. But I run 
 the data import command through web, i.e.,
 http://host:port/solr/dataimport?command=full-import, how can I set the 
 memory allocation for JVM?
 Thanks again!

 JB

 --- On Thu, 5/21/09, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com 
 wrote:

 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
 Subject: Re: How to index large set data
 To: solr-user@lucene.apache.org
 Date: Thursday, May 21, 2009, 9:57 PM
 check the status page of DIH and see
 if it is working properly. and
 if, yes what is the rate of indexing

 On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai djian...@yahoo.com
 wrote:
 
  Hi,
 
  I have about 45GB xml files to be indexed. I am using
 DataImportHandler. I started the full import 4 hours ago,
 and it's still running
  My computer has 4GB memory. Any suggestion on the
 solutions?
  Thanks!
 
  JB
 
 
 
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com









-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com