Re: Problem with caps and star symbol

2011-05-30 Thread Saumitra Chowdhury
I am sending some xml to understand the scenario.

Indexed term = ROLE_DELETE
Search Term = roledelete
response
lst name=responseHeader
int name=status0/int
int name=QTime4/int
lst name=params
str name=indenton/str
str name=start0/str
str name=qname : roledelete/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=1 start=0


Indexed term = ROLE_DELETE
Search Term = role
response
lst name=responseHeader
int name=status0/int
int name=QTime5/int
lst name=params
str name=indenton/str
str name=start0/str
str name=qname : role/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=1 start=0
doc
str name=creationDateMon May 30 13:09:14 BDST 2011/str
str name=displayNameGlobal Role for Deletion/str
str name=idrole:9223372036854775802/str
str name=lastModifiedDateMon May 30 13:09:14 BDST 2011/str
str name=nameROLE_DELETE/str
/doc
/result
/response
doc
str name=creationDateMon May 30 13:09:14 BDST 2011/str
str name=displayNameGlobal Role for Deletion/str
str name=idrole:9223372036854775802/str
str name=lastModifiedDateMon May 30 13:09:14 BDST 2011/str
str name=nameROLE_DELETE/str
/doc
/result
/response



Indexed term = ROLE_DELETE
Search Term = role*
response
lst name=responseHeader
int name=status0/int
int name=QTime4/int
lst name=params
str name=indenton/str
str name=start0/str
str name=qname : role*/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=1 start=0
doc
str name=creationDateMon May 30 13:09:14 BDST 2011/str
str name=displayNameGlobal Role for Deletion/str
str name=idrole:9223372036854775802/str
str name=lastModifiedDateMon May 30 13:09:14 BDST 2011/str
str name=nameROLE_DELETE/str
/doc
/result
/response



Indexed term = ROLE_DELETE
Search Term = Role*

response
lst name=responseHeader
int name=status0/int
int name=QTime4/int
lst name=params
str name=indenton/str
str name=start0/str
str name=qname : Role*/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=0 start=0/
/response



Indexed term = ROLE_DELETE
Search Term = ROLE_DELETE*

response
lst name=responseHeader
int name=status0/int
int name=QTime4/int
lst name=params
str name=indenton/str
str name=start0/str
str name=qname : ROLE_DELETE*/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
result name=response numFound=0 start=0/
/response

I am also adding a analysis html.



On Mon, May 30, 2011 at 7:19 AM, Erick Erickson erickerick...@gmail.comwrote:

 I'd start by looking at the analysis page from the Solr admin page. That
 will give you an idea of the transformations the various steps carry out,
 it's invaluable!

 Best
 Erick
 On May 26, 2011 12:53 AM, Saumitra Chowdhury 
 saumi...@smartitengineering.com wrote:
  Hi all ,
  In my schema.xml i am using WordDelimiterFilterFactory,
  LowerCaseFilterFactory, StopFilterFactory for index analyzer and an extra
  SynonymFilterFactory for query analyzer. I am indexing a field name
  '*name*'.Now
  if a value with all caps like NAME_BILL is indexed I am able get this
 as
  search result with the term  *name_bill *,  *NAME_BILL *,  *namebill
 *,
  *namebill** ,  *nameb**  ... But for the term like following  *
  NAME_BILL** ,  *name_bill** ,  *namebill** ,  *NAME**  the result
  does mot show this document. Can anyone please explain why this is
  happening? .In fact star  *  is not giving any result in many
  cases specially if it is used after full value of a field.
 
  Portion of my schema is given below.
 
  fieldType name=text_ws class=solr.TextField
 positionIncrementGap=100
  -
  analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
  /fieldType
  -
  fieldType name=text class=solr.TextField positionIncrementGap=100
  -
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=0
  generateNumberParts=0 catenateWords=1 catenateNumbers=1
  catenateAll=0/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
  /analyzer
  -
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=0
  generateNumberParts=0 catenateWords=1 catenateNumbers=1
  catenateAll=0/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
  /analyzer
  /fieldType
  -
  fieldType name=textTight class=solr.TextField
  positionIncrementGap=100
  -
  analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=0
  generateNumberParts=0 catenateWords=1 catenateNumbers=1
  catenateAll=0/
  filter class=solr.LowerCaseFilterFactory/
  filter 

Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances

2011-05-30 Thread Tanguy Moal

Hello,

Sorry for re-posting this but it seems my message got lost in the 
mailing list's messages stream without hitting anyone's attention... =D


Shortly, has anyone already experienced dramatic indexing slowdowns 
during large bulk imports with overwriteDupes turned on and a fairly 
high duplicates rate (around 4-8x) ?


It seems to produce a lot of deletions, which in turn appear to make the 
merging of segments pretty slow, by fairly increasing the number of 
little reads operations occuring simultaneously with the regular large 
write operations of the merge. Added to the poor IO performances of a 
commodity SATA drive, indexing takes ages.


I temporarily bypassed that limitation by disabling the overwriting of 
duplicates, but that changes the way I request the index, requiring me 
to turn on field collapsing at search time.


Is this a known limitation ?

Has anyone a few hints on how to optimize the handling of index time 
deduplication ?


More details on my setup and the state of my understanding are in my 
previous message here-after.


Thank you very much in advance.

Regards,

Tanguy

On 05/25/11 15:35, Tanguy Moal wrote:

Dear list,

I'm posting here after some unsuccessful investigations.
In my setup I push documents to Solr using the StreamingUpdateSolrServer.

I'm sending a comfortable initial amount of documents (~250M) and 
wished to perform overwriting of duplicated documents at index time, 
during the update, taking advantage of the UpdateProcessorChain.


At the beginning of the indexing stage, everything is quite fast; 
documents arrive at a rate of about 1000 doc/s.
The only extra processing during the import is computation of a couple 
of hashes that are used to identify uniquely documents given their 
content, using both stock (MD5Signature) and custom (derived from 
Lookup3Signature) update processors.

I send a commit command to the server every 500k documents sent.

During a first period, the server is CPU bound. After a short while 
(~10 minutes), the rate at which documents are received starts to fall 
dramatically, the server being IO bound.
I've been firstly thinking of a normal speed decrease during the 
commit, while my push client is waiting for the flush to occur. That 
would have been a normal slowdown.


The thing that retained my attention was the fact that unexpectedly, 
the server was performing a lot of small reads, way more the number 
writes, which seem to be larger.
The combination of the many small reads with the constant amount of 
bigger writes seem to be creating a lot of IO contention on my 
commodity SATA drive, and the ETA of my built index started to 
increase scarily =D


I then restarted the JVM with JMX enabled so I could start 
investigating a little bit more. I've the realized that the 
UpdateHandler was performing many reads while processing the update 
request.


Are there any known limitations around the UpdateProcessorChain, when 
overwriteDupes is set to true ?
I turned that off, which of course breaks the intent of my built 
index, but for comparison purposes it's good.


That did the trick, indexing is fast again, even with the periodic 
commits.


I therefor have two questions, an interesting first  one and a boring 
second one :


1 / What's the workflow of the UpdateProcessorChain when one or more 
processors have overwriting of duplicates turned on ? What happens 
under the hood ?


I tried to answer that myself looking at DirectUpdateHandler2 and my 
understanding stopped at the following :

- The document is added to the lucene IW
- The duplicates are deleted from the lucene IW
The dark magic I couldn't understand seems to occur around the idTerm 
and updateTerm things, in the addDoc method. The deletions seem to be 
buffered somewhere, I just didn't get it :-)


I might be wrong since I didn't read the code more than that, but the 
point might be at how does solr handles deletions, which is something 
still unclear to me. In anyways, a lot of reads seem to occur for that 
precise task and it tends to produce a lot of IO, killing indexing 
performances when overwriteDupes is on. I don't even understand why so 
many read operations occur at this stage since my process had a 
comfortable amount of RAM (with Xms=Xmx=8GB), with only 4.5GB are used 
so far.


Any help, recommandation or idea is welcome :-)

2 / In the case there isn't a simple fix for this, I'll have to do 
with duplicates in my index. I don't mind since solr offers a great 
grouping feature, which I already use in some other applications. The 
only thing I don't know yet is that if I do rely on grouping at search 
time, in combination with the Stats component (which is the intent of 
that index), and limiting the results to 1 document per group, will 
the computed statistics take those duplicates into account or not ? 
Shortly, how well does the Stats component behave when combined to 
hits collapsing ?


I had firstly implemented my solution using overwriteDupes 

Re: Problem with spellchecking, dont want multiple request to SOLR

2011-05-30 Thread Jan Høydahl
Hi,

Define two searchComponents with different names. Then refer to both in 
last-components in your Search Request Handler config.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 27. mai 2011, at 10.01, roySolr wrote:

 mm ok. I configure 2 spellcheckers:
 
 searchComponent name=spellcheck class=solr.SpellCheckComponent
lst name=spellchecker
   str name=namespell_what/str
   str name=fieldspell_what/str
   str name=buildOnOptimizetrue/str
   str name=spellcheckIndexDirspellchecker_what/str
   /lst
   lst name=spellchecker
   str name=namespell_where/str
   str name=fieldspell_where/str
   str name=buildOnOptimizetrue/str
   str name=spellcheckIndexDirspellchecker_where/str
   /lst
 /searchComponent
 
 How can i enable it in my search request handler and search both in one
 request?
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p2992076.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: wildcards and German umlauts

2011-05-30 Thread Jan Høydahl
Hi,

Agree that this is annoying for foreign languages. I get the idea behind the 
original behaviour, but there could be more elegant ways of handling it. It 
would make sense to always run the CharFilters. Perhaps a mechanism where 
TokenFilters can be tagged for exclusion from wildcard terms would be an idea. 
That way we can skip stemming, synonym and phonetic for wildcard terms, but 
still do lowercasing and characterNormalization.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 29. mai 2011, at 19.24, mdz-munich wrote:

 Ah, NOW I got it. It's not a bug, it's a feature. 
 
 But that would mean, that every character-manipulation (e.g.
 char-mapping/replacement, Porter-Stemmer in some cases ...) would cause a
 wildcard-query to fail. That too bad.
 
 But why? What's the Problem with passing the prefix through the
 analyzer/filter-chain?  
 
 Greetz,
 
 Sebastian
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/wildcards-and-German-umlauts-tp499972p2999237.html
 Sent from the Solr - User mailing list archive at Nabble.com.



n-gram speed

2011-05-30 Thread Denis Kuzmenok
I have a database with n-gram field, about 5 millions documents. QTime
is  about 200-1000 ms, database is not optimized because it must reply
to  queries  everytime  and  data  are updated often. Is it normal?
Solr: 3.1, java -Xms2048M -Xmx4096M
Server: i7, 12Gb




collapse component with pivot faceting

2011-05-30 Thread Isha Garg

Hi All!

 Can anyone tell me how pivot faceting works in combination 
with field collapsing.?

Please guide me in this respect.


Thanks!
Isha Garg


SOLR-1155 on 3.1

2011-05-30 Thread Ofer Fort
Hey all,
In the last comment on SOLR-1155 by Jayson Minard (
https://issues.apache.org/jira/browse/SOLR-1155?focusedCommentId=13019955page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13019955
)
I'll look at updating this for 3.1
was it integrated into 3.1? if not is there a patch one can use?
thanks


Can we stream binary data with StreamingUpdateSolrServer ?

2011-05-30 Thread pravesh
Hi,

I'm using StreamingUpdateSolrServer to post a batch of content to SOLR1.4.1.
By looking at StreamingUpdateSolrServer code, it looks it only provides the
content to be streamed in XML format only.

Can we use it to stream data in binary format?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-we-stream-binary-data-with-StreamingUpdateSolrServer-tp3001813p3001813.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: n-gram speed

2011-05-30 Thread Tor Henning Ueland
2011/5/30 Denis Kuzmenok forward...@ukr.net:
 I have a database with n-gram field, about 5 millions documents. QTime
 is  about 200-1000 ms, database is not optimized because it must reply
 to  queries  everytime  and  data  are updated often. Is it normal?
 Solr: 3.1, java -Xms2048M -Xmx4096M
 Server: i7, 12Gb

Start by optimizing it, it wont stop working due to a optimize. Some
other vital info is the size of the index, disk type used etc (SSD,
SATA, IDE..)

-- 
Mvh
Tor Henning Ueland


DataImportHandler

2011-05-30 Thread adpablos
Hi,

i've tryed to install DataImportHandler but i've some problems when run up
solr.


GRAVE: org.apache.solr.common.SolrException: Error Instantiating Request
Handler, 
org.apache.solr.handler.dataimport.DataImportHandler is not a
org.apache.solr.request.SolrRequestHandler

This is the log.

I've 

  requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdb-data-config.xml/str
/lst
  /requestHandler

in my solrconfig.xml

i'm working ina  java project and in my eclipse project, i can write
something like this: SolrRequestHandler srh = new DataImportHandler(); with
out problem. 

Sorry about my english and thank you in advance.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-tp3001957p3001957.html
Sent from the Solr - User mailing list archive at Nabble.com.


is replication eating up OldGen space

2011-05-30 Thread Bernd Fehling

Dear list,
after switching from FAST to Solr I get the first _real_ data.
This includes search times, memory consumption, perfomance of solr,...

What I recognized so far is that something eats up my OldGen and
I assume it might be replication.

Current Data:
one master - indexing only
two slaves - search only
over 28 million docs
single instance
single core
index size 140g
current heap size 16g

After startup I have about 4g heap in use and about 3.5g of OldGen.
After one week and some replications OldGen is filled close to 100 percent.
If I start an optimize under this condition I get OOM of heap.
So my assumption is that something is eating up my heap.

Any idea how to trace this down?

May be a memory leak somewhere?

Best regards
Bernd

--
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Spellcheck component not returned with numeric queries

2011-05-30 Thread Markus Jelsma
Hi,

The spell check component's output is not written when sending queries that 
consist of numbers only. Clients depending on the availability of the 
spellcheck output need to check if the output is actually there.

This is with a very recent Solr 3.x check out. Is this a feature or a bug? 
File an issue?

Cheers,
-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: return unaltered complete multivalued fields with Highlighted results

2011-05-30 Thread alexei
Thank you for the reply Erick. 
I can return the stored content but I would like to show the highlighted
results. 
With multivalued fields there seems to be some sorting of highlighed results
(in order of importance?) going on.
The problem is: 
1 - I could not find a way to keep the original order of my text. 
2 - I could not display all of the values in my multivalued field.

So if I have a multivalued field with four values:
value1
value2 with text
value3 
value4 and something

and the search is: value2 something

the highlighted result would be:
value2 with text
value4 and something

value1 and value3 will be skipped completely. When a field is not
multivalued everything works as advertised.

Any suggestions? 

Regards,
Alexei

--
View this message in context: 
http://lucene.472066.n3.nabble.com/return-unaltered-complete-multivalued-fields-with-Highlighted-results-tp2967146p3002248.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 3.1 commit errors

2011-05-30 Thread Denis Kuzmenok
After restart i have these errors every time i do commit via post.jar.

Config: multicore / 5 cores, Solr 3.1

Lock obtain timed out: 
SimpleFSLock@/home/ava/solr/example/multicore/context/data/index/write.lock  
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
SimpleFSLock@/home/ava/solr/example/multicore/context/data/index/write.lock  at 
org.apache.lucene.store.Lock.obtain(Lock.java:84)  at 
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)  at 
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)  at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
  at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
  at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
  at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)  at 
org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)  at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)  at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) 
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
  at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
  at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)  
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)  
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)  at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)  at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)  at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
  at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)  
at org.mortbay.j

Tried to google a little bit but without any luck..



Re: return unaltered complete multivalued fields with Highlighted results

2011-05-30 Thread lboutros
Hi Alexei,

We have the same issue/behavior.
The highlighting component fragments the fields to highlight and choose the
bests to be returned and highlighted.
You can return all fragments with the maximum size for each one, but it will
never return fragments with scores equal to 0, I mean without any words
found.

To return the whole mutli valued field, the Highlighting component needs to
be modified for this specific case.
That is something we should do in the next weeks.

If I missed something, I would be happy to find another solution too :)

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/return-unaltered-complete-multivalued-fields-with-Highlighted-results-tp2967146p3002357.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler

2011-05-30 Thread Jeffrey Chang
I faced the same problem before, but that's because some parent classloader has 
loaded the DataImport class instead of using the SolrResourceLoader's delegated 
classloader.

How are you starting your Solr? Via Eclipse? If you try starting Solr using 
cmdline, will you encounter the same issue?



On May 30, 2011, at 9:28 PM, adpablos adpab...@molinodeideas.es wrote:

 Hi,
 
 i've tryed to install DataImportHandler but i've some problems when run up
 solr.
 
 
 GRAVE: org.apache.solr.common.SolrException: Error Instantiating Request
 Handler, 
 org.apache.solr.handler.dataimport.DataImportHandler is not a
 org.apache.solr.request.SolrRequestHandler
 
 This is the log.
 
 I've 
 
  requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdb-data-config.xml/str
/lst
  /requestHandler
 
 in my solrconfig.xml
 
 i'm working ina  java project and in my eclipse project, i can write
 something like this: SolrRequestHandler srh = new DataImportHandler(); with
 out problem. 
 
 Sorry about my english and thank you in advance.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/DataImportHandler-tp3001957p3001957.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: n-gram speed

2011-05-30 Thread Otis Gospodnetic
Denis,

Also, what are your documents and queries like?  Maybe give a few examples so 
we 
can help.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Tor Henning Ueland tor.henn...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Mon, May 30, 2011 8:40:34 AM
 Subject: Re: n-gram speed
 
 2011/5/30 Denis Kuzmenok forward...@ukr.net:
  I have a  database with n-gram field, about 5 millions documents. QTime
  is  about  200-1000 ms, database is not optimized because it must reply
  to  queries   everytime  and  data  are updated often. Is it normal?
  Solr: 3.1, java  -Xms2048M -Xmx4096M
  Server: i7, 12Gb
 
 Start by optimizing it, it  wont stop working due to a optimize. Some
 other vital info is the size of  the index, disk type used etc (SSD,
 SATA, IDE..)
 
 -- 
 Mvh
 Tor  Henning Ueland
 


Re: Can we stream binary data with StreamingUpdateSolrServer ?

2011-05-30 Thread Otis Gospodnetic
I'm not looking at the source code, but this doesn't sound right.  I think it 
uses javabin.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: pravesh suyalprav...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Mon, May 30, 2011 8:40:28 AM
 Subject: Can we stream binary data with StreamingUpdateSolrServer ?
 
 Hi,
 
 I'm using StreamingUpdateSolrServer to post a batch of content to  SOLR1.4.1.
 By looking at StreamingUpdateSolrServer code, it looks it only  provides the
 content to be streamed in XML format only.
 
 Can we use it  to stream data in binary format?
 
 
 
 --
 View this message in  context: 
http://lucene.472066.n3.nabble.com/Can-we-stream-binary-data-with-StreamingUpdateSolrServer-tp3001813p3001813.html

 Sent  from the Solr - User mailing list archive at Nabble.com.
 


Re: SOLR-1155 on 3.1

2011-05-30 Thread Otis Gospodnetic
I think the answers to both are negative.

Vote for it!

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Ofer Fort ofer...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Mon, May 30, 2011 7:50:15 AM
 Subject: SOLR-1155 on 3.1
 
 Hey all,
 In the last comment on SOLR-1155 by Jayson Minard (
https://issues.apache.org/jira/browse/SOLR-1155?focusedCommentId=13019955page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13019955
5
 )
 I'll  look at updating this for 3.1
 was it integrated into 3.1? if not is there a  patch one can use?
 thanks
 


Explain the difference in similarity and similarityProvider

2011-05-30 Thread Brian Lamb
I'm looking over the patch notes from
https://issues.apache.org/jira/browse/SOLR-2338 and I do not understand the
difference between

similarity class=com.example.solr.CustomSimilarityFactory
  str name=paramkeyparam value/str
/similarity

and

similarityProvider
class=org.apache.solr.schema.CustomSimilarityProviderFactory
  str name=echois there an echo?/str
/similarityProvider

When would I use one over the other?

Thanks,

Brian Lamb


Solr Dismax bf bq vs. q:{boost ...}

2011-05-30 Thread chazzuka
I tried to do this:

#1. search phrases in title^3  text^1
#2. based on result #1 add boost for field closed:0^2
#3. based on result in #2 boost based on last_modified
 
and i tried like these:

/solr/select
?q={!boost b=$dateboost v=$qq defType=dismax}
dateboost=recip(ms(NOW/HOUR,modified),8640,2,1)
qq=video
qf=title^3+text
pf=title^3+text
bq=closed:0^2
debugQuery=true

then i tried differently by changing solrconfig like these:

str name=qftitle^3 text/str
str name=pftitle^3 text/str
str name=bfrecip(ms(NOW/HOUR,modified),8640,2,1)/str
str name=bqclosed:0^2/str

with query:
/solr/select
?q=video
debugQuery=true

both seems give wrong results, anyone have an idea about doing those tasks?

thanks in advanced



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Dismax-bf-bq-vs-q-boost-tp3003028p3003028.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing files Solr cell and Amazon S3

2011-05-30 Thread Greg Georges
Hello everyone,

We have our infrastructure on Amazon cloud servers, and we use the S3 file 
system. We need to index files using Solr Cell. From what I have read, we need 
to stream files to Solr in order for it to extract the metadata into the index. 
If we stream data through a public url there will be costs associated to the 
transfer on the Amazon cloud. We have planned to have a directory with the 
files, is it possible to tell solr to add documents from a specific folder 
location? Or must we stream them into Solr? In SolrJ I see that the only option 
is streaming. Thank you very much.

Greg


Replication Error - Index fetch failed - File Not Found OverlappingFileLockException

2011-05-30 Thread Renaud Delbru

Hi,

For months, we were using apache solr 3.1.0 snapshots without problems.
Recently, we have upgraded our index to apache solr 3.1.0,
and also moved to a multi-core infrastructure (4 core per nodes, each 
core having its own index).


We found that one of the index slave started to show failure, i.e., 
query errors. By looking at the log, we observed some errors during the 
latest snappull, due to two type of exceptions:

- java.io.FileNotFoundException: File does not exist ...
and
- java.nio.channels.OverlappingFileLockException: null

Then, after the failed pull, the index started to show some index 
related failure:


java.io.IOException: read past EOF at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:207)]


However, after manually restarting the node, everything went back to normal.

You can find a more detailed log at [1].

We are afraid to see this problem occurring again. Have you some idea on 
what can be the cause ? Or a solution to avoid such problem ?


[1] http://pastebin.com/vbnyrUgJ

Thanks in advance
--
Renaud Delbru


Resolved- Re: Replication Error - Index fetch failed - File Not Found OverlappingFileLockException

2011-05-30 Thread Renaud Delbru

Hi,

I found out the problem by myself.
The reason was a bad deployment of of Solr on tomcat. Two instances of 
solr were instantiated instead of one. The two instances were managing 
the same indexes, and therefore were trying to write at the same time.


My apologies for the noise created on the ml,
--
Renaud Delbru

On 30/05/11 21:52, Renaud Delbru wrote:

Hi,

For months, we were using apache solr 3.1.0 snapshots without problems.
Recently, we have upgraded our index to apache solr 3.1.0,
and also moved to a multi-core infrastructure (4 core per nodes, each
core having its own index).

We found that one of the index slave started to show failure, i.e.,
query errors. By looking at the log, we observed some errors during the
latest snappull, due to two type of exceptions:
- java.io.FileNotFoundException: File does not exist ...
and
- java.nio.channels.OverlappingFileLockException: null

Then, after the failed pull, the index started to show some index
related failure:

java.io.IOException: read past EOF at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:207)]


However, after manually restarting the node, everything went back to
normal.

You can find a more detailed log at [1].

We are afraid to see this problem occurring again. Have you some idea on
what can be the cause ? Or a solution to avoid such problem ?

[1] http://pastebin.com/vbnyrUgJ

Thanks in advance




Re: Indexing files Solr cell and Amazon S3

2011-05-30 Thread Jan Høydahl
Hi,

You can use parameter stream.file to tell Solr to read the file from local 
disk, not stream across network:
http://lucene.472066.n3.nabble.com/Example-of-using-quot-stream-file-quot-to-post-a-binary-file-to-solr-td781172.html

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 30. mai 2011, at 22.46, Greg Georges wrote:

 Hello everyone,
 
 We have our infrastructure on Amazon cloud servers, and we use the S3 file 
 system. We need to index files using Solr Cell. From what I have read, we 
 need to stream files to Solr in order for it to extract the metadata into the 
 index. If we stream data through a public url there will be costs associated 
 to the transfer on the Amazon cloud. We have planned to have a directory with 
 the files, is it possible to tell solr to add documents from a specific 
 folder location? Or must we stream them into Solr? In SolrJ I see that the 
 only option is streaming. Thank you very much.
 
 Greg