Re: how to disable segmentation when querying?

2012-06-28 Thread wangjing
Field中的文本必须进行分析后才能生成索引


On Thu, Jun 28, 2012 at 12:02 PM, Sheng LUO sheng.peisi@gmail.com wrote:
 Hi there,

 how can I disable segmentation when querying?
 I tried to delete analyzer type=query.../analyzer from schema.xml.
 But it will use default analyzer instead.

 Any ideas?

 Thanks


Re: Solr seems to hang

2012-06-28 Thread Arkadi Colson

It now hanging for 15 hour and nothing changes in the index directory.

Tips for further debugging?

On 06/27/2012 03:50 PM, Arkadi Colson wrote:
I'm sending files to solr with the php Solr library. I'm doing a 
commit every 1000 documents:

   autoCommit
 maxDocs1000/maxDocs
!-- maxTime1000/maxTime --
   /autoCommit

Hard to say how long it's hanging. At least for 1 hour. After that I 
restarted Tomcat to continue... I will have a look at the indexes next 
time it's hanging. Thanks for the tip!


SOLR: 3.6
TOMCAT: 7.0.28
JAVA: 1.7.0_05-b05


On 06/27/2012 03:13 PM, Erick Erickson wrote:

How long is it hanging? And how are you sending files to Tika, and
especially how often do you commit? One problem that people
run into is that they commit too often, causing segments to be
merged and occasionally that just takes a while and people
think that Solr is hung.

18G isn't very large as indexes go, so it's unlikely that's your 
problem,

except if merging is going on in which case you might be copying a bunch
of data. So try seeing if you're getting a bunch of disk activity, 
you can get
a crude idea of what's going on if you just look at the index 
directory on

your Solr server while it's hung.

What version of Solr are you using? Details matter

Best
Erick

On Wed, Jun 27, 2012 at 7:51 AM, Arkadi Colson ark...@smartbit.be 
wrote:

Anybody an idea?

The thread Dump looks like this:

Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed 
mode):


http-8983-6 daemon prio=10 tid=0x41126000 nid=0x5c1 in
Object.wait() [0x7fa0ad197000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x00070abf4ad0 (a
org.apache.tomcat.util.net.JIoEndpoint$Worker)
at java.lang.Object.wait(Object.java:485)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458) 


- locked 0x00070abf4ad0 (a
org.apache.tomcat.util.net.JIoEndpoint$Worker)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484)
at java.lang.Thread.run(Thread.java:662)

pool-4-thread-1 prio=10 tid=0x7fa0a054d800 nid=0x5be waiting on
condition [0x7f9f962f4000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x000702598b30 (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) 


at java.util.concurrent.DelayQueue.take(DelayQueue.java:160)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) 


at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) 


at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) 


at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) 


at java.lang.Thread.run(Thread.java:662)

http-8983-5 daemon prio=10 tid=0x412d2800 nid=0x5bd runnable
[0x7f9f94171000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:735) 


at
org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:366) 


at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:814) 


at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) 


at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)

http-8983-4 daemon prio=10 tid=0x41036000 nid=0x5b1 in
Object.wait() [0x7f9f966c9000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x00070b6e4790 (a
org.apache.lucene.index.DocumentsWriter)
at java.lang.Object.wait(Object.java:485)
at
org.apache.lucene.index.DocumentsWriter.waitIdle(DocumentsWriter.java:986) 


- locked 0x00070b6e4790 (a
org.apache.lucene.index.DocumentsWriter)
at
org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:524)
- locked 0x00070b6e4790 (a
org.apache.lucene.index.DocumentsWriter)
at 
org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3580)

- locked 0x00070b6e4858 (a
org.apache.solr.update.SolrIndexWriter)
at 
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3545)

at

Re: Solr seems to hang

2012-06-28 Thread Li Li
could you please use jstack to dump the call stacks?

On Thu, Jun 28, 2012 at 2:53 PM, Arkadi Colson ark...@smartbit.be wrote:
 It now hanging for 15 hour and nothing changes in the index directory.

 Tips for further debugging?


 On 06/27/2012 03:50 PM, Arkadi Colson wrote:

 I'm sending files to solr with the php Solr library. I'm doing a commit
 every 1000 documents:
       autoCommit
         maxDocs1000/maxDocs
 !--         maxTime1000/maxTime --
       /autoCommit

 Hard to say how long it's hanging. At least for 1 hour. After that I
 restarted Tomcat to continue... I will have a look at the indexes next time
 it's hanging. Thanks for the tip!

 SOLR: 3.6
 TOMCAT: 7.0.28
 JAVA: 1.7.0_05-b05


 On 06/27/2012 03:13 PM, Erick Erickson wrote:

 How long is it hanging? And how are you sending files to Tika, and
 especially how often do you commit? One problem that people
 run into is that they commit too often, causing segments to be
 merged and occasionally that just takes a while and people
 think that Solr is hung.

 18G isn't very large as indexes go, so it's unlikely that's your problem,
 except if merging is going on in which case you might be copying a bunch
 of data. So try seeing if you're getting a bunch of disk activity, you
 can get
 a crude idea of what's going on if you just look at the index directory
 on
 your Solr server while it's hung.

 What version of Solr are you using? Details matter

 Best
 Erick

 On Wed, Jun 27, 2012 at 7:51 AM, Arkadi Colson ark...@smartbit.be
 wrote:

 Anybody an idea?

 The thread Dump looks like this:

 Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed
 mode):

 http-8983-6 daemon prio=10 tid=0x41126000 nid=0x5c1 in
 Object.wait() [0x7fa0ad197000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on 0x00070abf4ad0 (a
 org.apache.tomcat.util.net.JIoEndpoint$Worker)
        at java.lang.Object.wait(Object.java:485)
        at

 org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458)
        - locked 0x00070abf4ad0 (a
 org.apache.tomcat.util.net.JIoEndpoint$Worker)
        at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484)
        at java.lang.Thread.run(Thread.java:662)

 pool-4-thread-1 prio=10 tid=0x7fa0a054d800 nid=0x5be waiting on
 condition [0x7f9f962f4000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  0x000702598b30 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at

 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.DelayQueue.take(DelayQueue.java:160)
        at

 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
        at

 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
        at

 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:662)

 http-8983-5 daemon prio=10 tid=0x412d2800 nid=0x5bd runnable
 [0x7f9f94171000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at

 org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:735)
        at

 org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:366)
        at

 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:814)
        at

 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
        at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:662)

 http-8983-4 daemon prio=10 tid=0x41036000 nid=0x5b1 in
 Object.wait() [0x7f9f966c9000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on 0x00070b6e4790 (a
 org.apache.lucene.index.DocumentsWriter)
        at java.lang.Object.wait(Object.java:485)
        at

 org.apache.lucene.index.DocumentsWriter.waitIdle(DocumentsWriter.java:986)
        - locked 0x00070b6e4790 (a
 org.apache.lucene.index.DocumentsWriter)
        at
 org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:524)
        - locked 0x00070b6e4790 (a
 org.apache.lucene.index.DocumentsWriter)
        at
 org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3580)
        - locked 0x00070b6e4858 (a
 

RE: what is precisionStep and positionIncrementGap

2012-06-28 Thread ZHANG Liang F
Thanks a lot, but the precisionStep is still very vague to me! Could you give 
me a example? 

-Original Message-
From: Li Li [mailto:fancye...@gmail.com] 
Sent: 2012年6月28日 11:25
To: solr-user@lucene.apache.org
Subject: Re: what is precisionStep and positionIncrementGap

1. precisionStep is used for ranging query of Numeric Fields. see 
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html
2. positionIncrementGap is used for phrase query of multi-value fields e.g. 
doc1 has two titles.
   title1: ab cd
   title2: xy zz
   if your positionIncrementGap is 0, then the position of the 4 terms are 
0,1,2,3.
   if you search phrase cd xy, it will hit. But you may think it should not 
match
   so you can adjust positionIncrementGap to a larger one. e.g. 100.
   Then the positions now are 0,1,100,101. the phrase query will not match it.

On Thu, Jun 28, 2012 at 10:00 AM, ZHANG Liang F 
liang.f.zh...@alcatel-sbell.com.cn wrote:
 Hi,
 in the schema.xml, usually there will be fieldType definition like 
 this: fieldType name=int class=solr.TrieIntField 
 precisionStep=0 omitNorms=true positionIncrementGap=0/

 the precisionStep and positionIncrementGap is not very clear to me. Could you 
 please elaborate more on these 2?

 Thanks!

 Liang


Re: what is precisionStep and positionIncrementGap

2012-06-28 Thread Li Li
read How it works of
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html
if you can read Chinese, I have a blog explaining the details of the
implementation.
http://blog.csdn.net/fancyerii/article/details/7256379

On Thu, Jun 28, 2012 at 3:51 PM, ZHANG Liang F
liang.f.zh...@alcatel-sbell.com.cn wrote:
 Thanks a lot, but the precisionStep is still very vague to me! Could you give 
 me a example?

 -Original Message-
 From: Li Li [mailto:fancye...@gmail.com]
 Sent: 2012年6月28日 11:25
 To: solr-user@lucene.apache.org
 Subject: Re: what is precisionStep and positionIncrementGap

 1. precisionStep is used for ranging query of Numeric Fields. see 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html
 2. positionIncrementGap is used for phrase query of multi-value fields e.g. 
 doc1 has two titles.
   title1: ab cd
   title2: xy zz
   if your positionIncrementGap is 0, then the position of the 4 terms are 
 0,1,2,3.
   if you search phrase cd xy, it will hit. But you may think it should not 
 match
   so you can adjust positionIncrementGap to a larger one. e.g. 100.
   Then the positions now are 0,1,100,101. the phrase query will not match it.

 On Thu, Jun 28, 2012 at 10:00 AM, ZHANG Liang F 
 liang.f.zh...@alcatel-sbell.com.cn wrote:
 Hi,
 in the schema.xml, usually there will be fieldType definition like
 this: fieldType name=int class=solr.TrieIntField
 precisionStep=0 omitNorms=true positionIncrementGap=0/

 the precisionStep and positionIncrementGap is not very clear to me. Could 
 you please elaborate more on these 2?

 Thanks!

 Liang


RE: what is precisionStep and positionIncrementGap

2012-06-28 Thread ZHANG Liang F
看了你的blog,写得相当不错啊!
我有一个网站www.ecmkit.com,专注内容管理的。有空多交流! 

-Original Message-
From: Li Li [mailto:fancye...@gmail.com] 
Sent: 2012年6月28日 15:54
To: solr-user@lucene.apache.org
Subject: Re: what is precisionStep and positionIncrementGap

read How it works of
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html
if you can read Chinese, I have a blog explaining the details of the 
implementation.
http://blog.csdn.net/fancyerii/article/details/7256379

On Thu, Jun 28, 2012 at 3:51 PM, ZHANG Liang F 
liang.f.zh...@alcatel-sbell.com.cn wrote:
 Thanks a lot, but the precisionStep is still very vague to me! Could you give 
 me a example?

 -Original Message-
 From: Li Li [mailto:fancye...@gmail.com]
 Sent: 2012年6月28日 11:25
 To: solr-user@lucene.apache.org
 Subject: Re: what is precisionStep and positionIncrementGap

 1. precisionStep is used for ranging query of Numeric Fields. see 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/al
 l/org/apache/lucene/search/NumericRangeQuery.html
 2. positionIncrementGap is used for phrase query of multi-value fields e.g. 
 doc1 has two titles.
   title1: ab cd
   title2: xy zz
   if your positionIncrementGap is 0, then the position of the 4 terms are 
 0,1,2,3.
   if you search phrase cd xy, it will hit. But you may think it 
 should not match
   so you can adjust positionIncrementGap to a larger one. e.g. 100.
   Then the positions now are 0,1,100,101. the phrase query will not match it.

 On Thu, Jun 28, 2012 at 10:00 AM, ZHANG Liang F 
 liang.f.zh...@alcatel-sbell.com.cn wrote:
 Hi,
 in the schema.xml, usually there will be fieldType definition like
 this: fieldType name=int class=solr.TrieIntField
 precisionStep=0 omitNorms=true positionIncrementGap=0/

 the precisionStep and positionIncrementGap is not very clear to me. Could 
 you please elaborate more on these 2?

 Thanks!

 Liang


writing unit test for a search component which works only with distributed search

2012-06-28 Thread srinir
I have written a custom search component,  but the code only supports
distributed search. Since we dont use non-distributed search and the search
works differently for non-distributed case, we decided to concentrate only
on distributed search. 

I am trying to write unit test for my custom component. I could see that
BaseDistributedSearchTestCase's query method compares the result of a single
sharded control and multisharded test. I cannot use this method as my
component would work only for multisharded search. Ideally I would like to
use something like SolrTestCaseJ4's assertQ method, where i can use an xpath
expression to validate the results. Does SolrTestCaseJ4 already support
distributed search or do I need to customize it ? Or is there any other way
to write unit test case that would work for distributed-only cases ?

Thanks for your help!
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/writing-unit-test-for-a-search-component-which-works-only-with-distributed-search-tp3991795.html
Sent from the Solr - User mailing list archive at Nabble.com.


searching for more then one word

2012-06-28 Thread Arkadi Colson

Hi

I indexed following strings:

abcdefg hijklmnop

When searching for abcdefg hijklmnop Solr returns the result but when 
searching for abcdefg hijklmnop Solr returns nothing.


Any idea how to search for more then one word?

[params] = SolrObject Object
(
[debugQuery] = true
[shards] = 
solr03-gs.intnet.smartbit.be:8983/solr,solr04-gs.intnet.smartbit.be:8983/solr,solr03-dcg.intnet.smartbit.be:8983/solr,solr04-dcg.intnet.smartbit.be:8983/solr
[fl] = 
id,smsc_module,smsc_modulekey,smsc_userid,smsc_ssid,smsc_description,smsc_content,smsc_courseid,smsc_lastdate,score
[indent] = on
[start] = 0
[q] = (smsc_content:abcdefg hijklmnop || smsc_description:abcdefg 
hijklmnop)  (smsc_lastdate:[2008-05-28T08:45:50Z TO 2012-06-28T08:45:50Z])
[distrib] = true
[wt] = xml
[version] = 2.2
[rows] = 50
)


fieldType name=text class=solr.TextField 
positionIncrementGap=100

  analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=Dutch /

  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=Dutch /

  /analyzer
/fieldType


Thanks!

--
Smartbit bvba
Hoogstraat 13
B-3670 Meeuwen
T: +32 11 64 08 80
F: +32 89 46 81 10
W: http://www.smartbit.be
E: ark...@smartbit.be



LineEntityProcessor Usage

2012-06-28 Thread kiran kumar
Hello,
I have a question regarding configuration of LineEntityProcessor. How do we
configure LineEntityProcessor to read a line of text from a file,parse the
line and assign it to specific fields in schema.
How exactly is the text in a line gets mapped to fields in schema. I have
searched a lot and didn't find any example of how to do that. Can somebody
please give me an example of how to do this.Also please help me in
understanding the concept of lineEntityProcessor.

Thanks  Regards,
Kiran Bushireddy


Re: SolrJ Response

2012-06-28 Thread Sachin Aggarwal
use simple java api for url connection and print the stream u received...

On Thu, Jun 28, 2012 at 2:24 PM, Shanu Jha shanuu@gmail.com wrote:

 Hi,

 I am getting solr document after querying solr using solrJ. I think solrj
 parse the response to generate its own response onject. I want to display
 response in json/xmk as it comes from solr.

 Please help.

 AJ




-- 

Thanks  Regards

Sachin Aggarwal
7760502772


Re: SolrJ Response

2012-06-28 Thread Jochen Just
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

 I want to display response in json/xmk as it comes from solr.
Why don't you use the JSON QueryResponseWriter from Solr directly?
http://wiki.apache.org/solr/SolJSON should give you all you need to
get started.

Jochen


- -- 
Jochen Just   Fon:   (++49) 711/28 07 57-193
avono AG  Mobil: (++49) 172/73 85 387
Breite Straße 2   Mail:  jochen.j...@avono.de
70173 Stuttgart   WWW:   http://www.avono.de


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJP7FG7AAoJEP1xbhgWUHmSJOQP/AyKfHI1bOkmcgKL0PVAnQu9
sCdAUWhL732JWd+TT77onnQQ2g2vKWnIg+00fmU+x5B52uUzU2nKRMCLaVhGlSJK
NSb4c5DJVcdzz6G5fofkQLZKahLRSi9d3p8A5c5CMSEvkWLAYR3OPrNTn7dUiJNA
D1JyQjSbMMwyf41msjLF84oF4C4Nb+0eY2bqiF5rlMBdEzazYl4hlMkVxzu6taiQ
Yf38CB+vd91OznpvMTr89XCuTi+l9XmG0d0TKKvKq4r2sDTrQyBM8q3oyTPeNyKy
VsmUP+m6kqlPWOlSjJyxw5PQz5IlfwRskTbrMS4ZCBDH7Bam5D0UtZzuY+DJRKCM
eW49MLgbA2IPYnvfd78v+VxCj9Xyh49QZd0ea1uXve7ABp7WeRj/1L8CdHvAK6/k
5NFW02/A+PoI3+QTgcYzXaO5N+AG3maAhLELDgZ1fQW/wISRLSBeSRj7QEQRPLJE
rpekf7v3S0fBJyk2cn7YITTuqMogwktVYv/OQ6wB7+1O8cXzt6p4BYRneqmPw4Ll
6Vr/ESdGMTOu7VAzWqB9pmCMjfORtqKIFIVcGyIAGFlD5xhH3aepM2bpbCVJnaMM
GYnXoLLSB2mrexyccthBQV+sYOvZerjtcvoBY5ZIxcbT+HBag6ReFrLxM3AUWmLI
jtGvkh4y1180l2AKgSSJ
=QsJw
-END PGP SIGNATURE-


WordBreakSolrSpellChecker ignores MinBreakWordLength?

2012-06-28 Thread Carrie Coy
I set MinBreakWordLength = 3 thinking it would prevent 
WordBreakSolrSpellChecker from suggesting corrections made up of 
subwords shorter than 3 characters, but I still get suggestions like this:


query: Touch N' Match
suggestion: (t o u ch) 'n (m a t ch)

Can someone help me understand why?  Here is the relevant portion of 
solrconfig.xml:


str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.count10/str
str name=spellcheck.collatetrue/str
str name=spellcheck.maxCollations15/str
str name=spellcheck.maxCollationTries100/str
str name=spellcheck.alternativeTermCount4/str
str name=spellcheck.collateParam.mm100%/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.MinBreakWordLength3/str
str name=spellcheck.maxChanges3/str



Re: Query Logic Question

2012-06-28 Thread Rublex
Jack,

Thank you the *:* solutions seems to work.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-Logic-Question-tp3991689p3991881.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: LineEntityProcessor Usage

2012-06-28 Thread Dyer, James
LineEntityProcessor outputs the entire line in a field called rawLine.  You 
then need to write a transformer that will parse out the data.  But see 
https://issues.apache.org/jira/browse/SOLR-2549 for enhancements that will 
parse the data without needing a transformer, if the data is in fixed-width or 
delimited format.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: kiran kumar [mailto:kirankumarsm...@gmail.com] 
Sent: Wednesday, June 27, 2012 11:03 PM
To: solr-user@lucene.apache.org
Subject: LineEntityProcessor Usage

Hello,
I have a question regarding configuration of LineEntityProcessor. How do we
configure LineEntityProcessor to read a line of text from a file,parse the
line and assign it to specific fields in schema.
How exactly is the text in a line gets mapped to fields in schema. I have
searched a lot and didn't find any example of how to do that. Can somebody
please give me an example of how to do this.Also please help me in
understanding the concept of lineEntityProcessor.

Thanks  Regards,
Kiran Bushireddy


RE: WordBreakSolrSpellChecker ignores MinBreakWordLength?

2012-06-28 Thread Dyer, James
Carrie,

Try taking the workbreak parameters out of the request handler configuration 
and instead put them in the spellchecker configuration.  You also need to 
remove the spellcheck. prefix.  Also, the correct spelling for this parameter 
is minBreakLength.  Here's an example.

lst name=spellchecker
 str name=namewordbreak/str
 str name=classnamesolr.WordBreakSolrSpellChecker/str  
 str name=field{your field name here}/str
 str name=combineWordstrue/str
 str name=breakWordstrue/str
 int name=maxChanges3/int
 int name=minBreakLength3/int
/lst

All of the parameters in the following source file go in the spellchecker 
configuration like this:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/WordBreakSolrSpellChecker.java

Descriptions of each of these parameters can be found in this source file:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/suggest/src/java/org/apache/lucene/search/spell/WordBreakSpellChecker.java

Let me know if this works out for you.  Any more feedback you can provide on 
the newer spellcheck features you're using is appreciated.  Thanks.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Carrie Coy [mailto:c...@ssww.com] 
Sent: Thursday, June 28, 2012 8:20 AM
To: solr-user@lucene.apache.org
Subject: WordBreakSolrSpellChecker ignores MinBreakWordLength?

I set MinBreakWordLength = 3 thinking it would prevent 
WordBreakSolrSpellChecker from suggesting corrections made up of 
subwords shorter than 3 characters, but I still get suggestions like this:

query: Touch N' Match
suggestion: (t o u ch) 'n (m a t ch)

Can someone help me understand why?  Here is the relevant portion of 
solrconfig.xml:

str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.count10/str
str name=spellcheck.collatetrue/str
str name=spellcheck.maxCollations15/str
str name=spellcheck.maxCollationTries100/str
str name=spellcheck.alternativeTermCount4/str
str name=spellcheck.collateParam.mm100%/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.MinBreakWordLength3/str
str name=spellcheck.maxChanges3/str



Re: how to disable segmentation when querying?

2012-06-28 Thread Sheng LUO
谢谢回复,这个我明白,已经找到方法在查询时不分词了,使用solr.whitespacetokenizerfactory或者solr.keywordstokenizerfactory就可以做到了

2012/6/28 wangjing ppm10...@gmail.com

 Field中的文本必须进行分析后才能生成索引


 On Thu, Jun 28, 2012 at 12:02 PM, Sheng LUO sheng.peisi@gmail.com
 wrote:
  Hi there,
 
  how can I disable segmentation when querying?
  I tried to delete analyzer type=query.../analyzer from schema.xml.
  But it will use default analyzer instead.
 
  Any ideas?
 
  Thanks



edismax parser ignores mm parameter when tokenizer splits tokens (hypenated words, WDF splitting etc)

2012-06-28 Thread Tom Burton-West
Hello,

My previous e-mail with a CJK example has received no replies.   I verified
that this problem also occurs for English.  For example in the case of the
word fire-fly , The ICUTokenizer and the WordDelimeterFilter both split
this into two tokens fire and fly.

With an edismax query and a must match of 2 :  q={!edsmax mm=2} if the
words are entered separately at [fire fly], the edismax parser honors the
mm parameter and does the equivalent of a Boolean AND query.  However if
the words are entered as a hypenated word [fire-fly], the tokenizer splits
these into two tokens fire and fly and the edismax parser does the
equivalent of a Boolean OR query.

I'm not sure I understand the output of the debugQuery, but judging by the
number of hits returned it appears that edismax is not honoring the mm
parameter. Am I missing something, or is this a bug?

 I'd like to file a JIRA issue, but want to find out if I am missing
something here.

Details of several queries are appended below.

Tom Burton-West

edismax query mm=2   query with hypenated word [fire-fly]

lst name=debug
str name=rawquerystring{!edismax mm=2}fire-fly/str
str name=querystring{!edismax mm=2}fire-fly/str
str name=parsedquery+DisjunctionMaxQuery(((ocr:fire ocr:fly)))/str
str name=parsedquery_toString+((ocr:fire ocr:fly))/str


Entered as separate words [fire fly]  numFound=184962
 edismax mm=2
lst name=debug
str name=rawquerystring{!edismax mm=2}fire fly/str
str name=querystring{!edismax mm=2}fire fly/str
str name=parsedquery
+((DisjunctionMaxQuery((ocr:fire)) DisjunctionMaxQuery((ocr:fly)))~2)
/str


Regular Boolean AND query:   [fire AND fly] numFound=184962
str name=rawquerystringfire AND fly/str
str name=querystringfire AND fly/str
str name=parsedquery+ocr:fire +ocr:fly/str
str name=parsedquery_toString+ocr:fire +ocr:fly/str

Regular Boolean OR query: fire OR fly 366047  numFound=366047
lst name=debug
str name=rawquerystringfire OR fly/str
str name=querystringfire OR fly/str
str name=parsedqueryocr:fire ocr:fly/str
str name=parsedquery_toStringocr:fire ocr:fly/str


Re: searching for more then one word

2012-06-28 Thread Kissue Kissue
The analysis page is your best friend in these circumstances. Use the
analysis page in solr admin and turn verbose output for both index and
query and see what the analysis chain looks like. You maybe able to find
the culprit.


On Thu, Jun 28, 2012 at 10:57 AM, Arkadi Colson ark...@smartbit.be wrote:

 Hi

 I indexed following strings:

 abcdefg hijklmnop

 When searching for abcdefg hijklmnop Solr returns the result but when
 searching for abcdefg hijklmnop Solr returns nothing.

 Any idea how to search for more then one word?

 [params] = SolrObject Object
(
[debugQuery] = true
[shards] = solr03-gs.intnet.smartbit.be:**
 8983/solr,solr04-gs.intnet.**smartbit.be:8983/solr,solr03-**
 dcg.intnet.smartbit.be:8983/**solr,solr04-dcg.intnet.**
 smartbit.be:8983/solrhttp://solr03-gs.intnet.smartbit.be:8983/solr,solr04-gs.intnet.smartbit.be:8983/solr,solr03-dcg.intnet.smartbit.be:8983/solr,solr04-dcg.intnet.smartbit.be:8983/solr
[fl] = id,smsc_module,smsc_modulekey,**
 smsc_userid,smsc_ssid,smsc_**description,smsc_content,smsc_**
 courseid,smsc_lastdate,score
[indent] = on
[start] = 0
[q] = (smsc_content:abcdefg hijklmnop ||
 smsc_description:abcdefg hijklmnop)  
 (smsc_lastdate:[2008-05-28T08:**45:50Z
 TO 2012-06-28T08:45:50Z])
[distrib] = true
[wt] = xml
[version] = 2.2
[rows] = 50
)


fieldType name=text class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
charFilter class=solr.**HTMLStripCharFilterFactory/
tokenizer class=solr.**KeywordTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_en.txt,**stopwords_du.txt enablePositionIncrements=**
 true/
filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.**LowerCaseFilterFactory/
filter class=solr.**SnowballPorterFilterFactory
 language=Dutch /
  /analyzer
  analyzer type=query
tokenizer class=solr.**KeywordTokenizerFactory/
filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/--
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_en.txt,**stopwords_du.txt enablePositionIncrements=**
 true/
filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.**LowerCaseFilterFactory/
filter class=solr.**SnowballPorterFilterFactory
 language=Dutch /
  /analyzer
/fieldType


 Thanks!

 --
 Smartbit bvba
 Hoogstraat 13
 B-3670 Meeuwen
 T: +32 11 64 08 80
 F: +32 89 46 81 10
 W: http://www.smartbit.be
 E: ark...@smartbit.be




LineEntityProcessor Usage

2012-06-28 Thread kiran kumar
Hello,
I have a question regarding configuration of LineEntityProcessor. How do we
configure LineEntityProcessor to read a line of text from a file,parse the
line and assign it to specific fields in schema.
How exactly is the text in a line gets mapped to fields in schema. I have
searched a lot and didn't find any example of how to do that. Can somebody
please give me an example of how to do this.Also please help me in
understanding the concept of lineEntityProcessor.

Thanks  Regards,
Kiran Bushireddy


Re: Solved: WordBreakSolrSpellChecker ignores MinBreakWordLength?

2012-06-28 Thread Carrie Coy
Thanks! The combination of these two suggestions (relocating the 
wordbreak parameters to the spellchecker configuration and correcting 
the spelling of the parameter to minBreakLength) fixed the problem I 
was having.


On 06/28/2012 10:22 AM, Dyer, James wrote:

Carrie,

Try taking the workbreak parameters out of the request handler configuration and instead put them 
in the spellchecker configuration.  You also need to remove the spellcheck. prefix.  Also, the 
correct spelling for this parameter is minBreakLength.  Here's an example.

lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=field{your field name here}/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges3/int
  int name=minBreakLength3/int
/lst

All of the parameters in the following source file go in the spellchecker 
configuration like this:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/WordBreakSolrSpellChecker.java

Descriptions of each of these parameters can be found in this source file:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/suggest/src/java/org/apache/lucene/search/spell/WordBreakSpellChecker.java

Let me know if this works out for you.  Any more feedback you can provide on 
the newer spellcheck features you're using is appreciated.  Thanks.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Carrie Coy [mailto:c...@ssww.com]
Sent: Thursday, June 28, 2012 8:20 AM
To: solr-user@lucene.apache.org
Subject: WordBreakSolrSpellChecker ignores MinBreakWordLength?

I set MinBreakWordLength = 3 thinking it would prevent
WordBreakSolrSpellChecker from suggesting corrections made up of
subwords shorter than 3 characters, but I still get suggestions like this:

query: Touch N' Match
suggestion: (t o u ch) 'n (m a t ch)

Can someone help me understand why?  Here is the relevant portion of
solrconfig.xml:

str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.count10/str
str name=spellcheck.collatetrue/str
str name=spellcheck.maxCollations15/str
str name=spellcheck.maxCollationTries100/str
str name=spellcheck.alternativeTermCount4/str
str name=spellcheck.collateParam.mm100%/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.MinBreakWordLength3/str
str name=spellcheck.maxChanges3/str



Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'

2012-06-28 Thread derohit
Hi All,

I am facing an ecpetion while trying to use dataImportHandler for Indexing
My solrcofig.xml help is:-
?xml version=1.0 encoding=UTF-8 ?   config
abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError

luceneMatchVersionLUCENE_36/luceneMatchVersion lib
dir=../../dist/ regex=apache-solr-dataimporthandler-d.*.jar /
directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/
updateHandler class=solr.DirectUpdateHandler2 / requestDispatcher
handleSelect=true  requestParsers enableRemoteStreaming=false /  
/requestDispatcher requestHandler name=standard
class=solr.StandardRequestHandler default=true / requestHandler
name=/update   class=solr.JsonUpdateRequestHandler  

startup=lazy / requestHandler name=/admin/  
class=solr.admin.AdminHandlers / requestHandler name=/admin/ping
class=solr.PingRequestHandler lst name=invariants  str
name=qtsearch/str   str name=qsolrpingquery/str /lst
lst name=defaults   str name=echoParamsall/str /lst  
/requestHandler   requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler lst
name=defaults   str name=configdata-config.xml/str /lst  
/requestHandler admin defaultQuerysolr/defaultQuery  
/admin   /config
 

and Jar's name is apache-solr-dataimporthandler-3.6.0.jar

Please revert if someone has the solution to it.

Regards
Rohit




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-loading-class-org-apache-solr-handler-dataimport-DataImportHandler-tp3991940.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: LineEntityProcessor Usage

2012-06-28 Thread Lance Norskog
It creates one field with the line as a string. 'rawLine'
http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/DataImportHandler

To make other fields from the contents of the string, you can use the
RegexTransformer to pull text out of the string.

On Thu, Jun 28, 2012 at 9:55 AM, kiran kumar kirankumarsm...@gmail.com wrote:
 Hello,
 I have a question regarding configuration of LineEntityProcessor. How do we
 configure LineEntityProcessor to read a line of text from a file,parse the
 line and assign it to specific fields in schema.
 How exactly is the text in a line gets mapped to fields in schema. I have
 searched a lot and didn't find any example of how to do that. Can somebody
 please give me an example of how to do this.Also please help me in
 understanding the concept of lineEntityProcessor.

 Thanks  Regards,
 Kiran Bushireddy



-- 
Lance Norskog
goks...@gmail.com


Re: Autocomplete using facets

2012-06-28 Thread David Smiley (@MITRE.org)
Ugo,
I suggest simply manually filtering out red from the facet.prefix results
you get back.  Not ideal, but it's easy and your problem seems like an
infrequent event and a minor nuisance.
~ David Smiley
p.s. thanks for buying my book

-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-using-facets-tp3991377p3991953.html
Sent from the Solr - User mailing list archive at Nabble.com.


index writer in searchComponent

2012-06-28 Thread Peyman Faratin
Hi

Is it possible to add a new document to the index in a custom SearchComponent 
(that also implements a SolrCoreAware)? I can get a reference to the 
indexReader via the ResponseBuilder parameter of the process() method using

rb.req.getSearcher().getReader()

But is it possible to actually add a new document to the index _after_ 
searching the index? I.e accessing the indexWriter?

thank you

Peyman

How do we use HTMLStripCharFilterFactory

2012-06-28 Thread derohit
Hi All,

I am new to SOLR. Please hellp me with configuration of
HTMLStripCharFilterFactory.

If some tutorial is there, will be of great help.

Regards
Rohit

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-we-use-HTMLStripCharFilterFactory-tp3991955.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: soft commits in EmbeddedSolrServer

2012-06-28 Thread Raimon Bosch
Yes,

This worked for me:

//Solr Server initialization
System.setProperty(solr.solr.home, solrHome);
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
coreContainer = initializer.initialize();
server = new EmbeddedSolrServer(coreContainer, your_corename);

//Create your SolrInputDocument doc
...

//Soft commit
UpdateRequest req = new UpdateRequest();
req.setAction(ACTION.COMMIT, false, false, true);
req.add( doc );
UpdateResponse rsp = req.process( server );

Regards,
Raimon Bosch.

2012/6/26 Mark Miller markrmil...@gmail.com

 Yes - just pass the param same as you would if not using embedded

 On Jun 25, 2012, at 4:40 PM, Raimon Bosch wrote:

  Old question but I'm still wondering if this is possible. I'm using Solr
  4.0.
 
  Can I use the EmbeddedSolrServer to perform soft commits?
 
  2011/9/16 Raimon Bosch raimon.bo...@gmail.com
 
  Hi all,
 
  I'm checking how to do soft commits with the new version of Solr. I'm
  using EmbeddedSolrServer to add documents to my index. How can I
 perform a
  soft commit using this class? Is it possible? Or should I use the trunk?
 
  http://wiki.apache.org/solr/NearRealtimeSearch
 
 
 http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html
 
  Thanks in advance,
  Raimon Bosch.
 

 - Mark Miller
 lucidimagination.com














avgTimePerRequest JMX M-Bean displays with NaN instead of 0 - when no activity

2012-06-28 Thread geeky2
hello all,

environment: solr 3.5, jboss, wily

we have been setting up jmx monitoring for our solr installation.

while running tests - i noticed that of the 6 JMX M-Beans
(avgRequestsPerSecond, avgTimePerRequest, errors, requests, timeouts,
totalTime) ...

the avgTimePerRequest M-Bean was producing NaN when there was no search
activity.

all of the other M-Beans displayed a 0 (zero) when there was no search
activity.

we were able to compensate for this issue with custom scripting in wily on
our side.

can someone help me understand this inconsistency?

is this just a WAD (works as a designed) ?

thanks for any help or insight



--
View this message in context: 
http://lucene.472066.n3.nabble.com/avgTimePerRequest-JMX-M-Bean-displays-with-NaN-instead-of-0-when-no-activity-tp3991962.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FileNotFoundException during commit. concurrences process?!

2012-06-28 Thread Karthik Muthuswami

On Jun 26, 2012, at 7:35 AM, stockii wrote:

 Hello again.
 
 this is my Exception.
 with SolrVersion: 4.0.0.2012.04.26.09.00.41
 
 SEVERE: Exception while solr commit.
 java.io.FileNotFoundException: _8l.cfs
   at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266)
   at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:216)
   at
 org.apache.lucene.index.TieredMergePolicy.size(TieredMergePolicy.java:640)
   at
 org.apache.lucene.index.TieredMergePolicy.useCompoundFile(TieredMergePolicy.java:616)
   at
 org.apache.lucene.index.IndexWriter.useCompoundFile(IndexWriter.java:2078)
   at
 org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:1968)
   at
 org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:497)
   at
 org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:477)
   at
 org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201)
   at
 org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119)
   at
 org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148)
   at
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:438)
   at
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:553)
   at 
 org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2416)
   at
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2548)
   at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2530)
   at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:414)
   at
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
   at
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
   at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:783)
   at
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
   at
 org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:107)
   at
 org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:286)
   at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:246)
   at
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:404)
   at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:443)
   at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:422)
 Jun 26, 2012 4:28:05 PM
 org.apache.solr.handler.dataimport.SimplePropertiesWriter
 readIndexerProperties
 
 
 My Architecture is.
 2 Solr Instances.
 One Instance update a index (updater, and another Instance is only for
 searching. (searcher)
 every minute is coming an update.
 
 
 - updater runs without problems
 - after commit of updater all changes are available in the
 updater-instance
 - NOW is commin my searcher and start an commit=true on each of his core
 to refresh the changes.
 
 NOW i get SOMETIMES my Exception =(  
 Anybody a idea ?
 
 here is a part of my solrconfig.xml (updater AND searcher)
 -
 indexConfig
  indexConfig
useCompoundFiletrue/useCompoundFile
ramBufferSizeMB128/ramBufferSizeMB
mergeFactor2/mergeFactor
 
 
lockTypesingle/lockType 
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout
   unlockOnStartupfalse/unlockOnStartup
 
reopenReaderstrue/reopenReaders
infoStream file=INFOSTREAM.txtfalse/infoStream 
 
deletionPolicy class=solr.SolrDeletionPolicy
 
  str name=maxCommitsToKeep1/str
 
  str name=maxOptimizedCommitsToKeep0/str
/deletionPolicy
  /indexConfig
 
 
  updateHandler class=solr.DirectUpdateHandler2 /
 /indexConfig
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/FileNotFoundException-during-commit-concurrences-process-tp3991384.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Regards,
Karthik



Strange behaviour with default request handler

2012-06-28 Thread dbenjamin
Hi,

I have a strange behaviour with the default request handler.

In the index i have :

doc
str name=date2012-06-28T10:22:51Z/str
str name=description/
str name=firstNameSophie/str
str name=iduser-6/str
str name=lastNameMichel/str
str name=screenNameSophie/str
str name=slugsophie/str
/doc
doc
str name=date2012-06-28T10:22:51Z/str
str name=description/
str name=firstNameSophia/str
str name=iduser-7/str
str name=lastNameMartinez/str
str name=screenNameSophia/str
str name=slugsophia/str
/doc

And when i search for soph, i only get Sophie in the results and not
Sophia.
When i search for *:*, i get everything.

Why is that ? Did i miss a basic configuration option ?

My schema looks like :

fields
   field name=id type=string indexed=true stored=true
required=true /
   field name=who type=text indexed=true stored=false
multiValued=true/
   field name=screenName type=string indexed=false stored=true
required=true /
   field name=slug type=string indexed=false stored=true
required=true /
   field name=firstName type=string indexed=false stored=true /
   field name=lastName type=string indexed=false stored=true /
   field name=description type=string indexed=false stored=true /
   field name=date type=string indexed=false stored=true
required=true/
 /fields

uniqueKeyid/uniqueKey

defaultSearchFieldwho/defaultSearchField
solrQueryParser defaultOperator=OR/
copyField source=screenName dest=who/
copyField source=firstName dest=who/
copyField source=lastName dest=who/

Any advice ?
Thanks ! ;-)

Cya,
benjamin.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-behaviour-with-default-request-handler-tp3991976.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do we use HTMLStripCharFilterFactory

2012-06-28 Thread kiran kumar
Hi
specify transformer=HTMLStripTransformer at entity level and for the
field you want to strip html just set stripHTML=true
It should work..

Kiran

On Thu, Jun 28, 2012 at 4:09 PM, derohit mailrohi...@gmail.com wrote:

 Hi All,

 I am new to SOLR. Please hellp me with configuration of
 HTMLStripCharFilterFactory.

 If some tutorial is there, will be of great help.

 Regards
 Rohit

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-do-we-use-HTMLStripCharFilterFactory-tp3991955.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks  Regards,
Kiran Kumar


core isolation

2012-06-28 Thread Dominique Bejean

Hi,

In Solr 3.x the parameter abortOnConfigurationError=false allows cores 
continue to work even if an other core fails due to a configuration error.


This parameter doesn't exist anymore in Solr 4.0 but afetr some tests, 
it looks like cores are isolated from each other. By isolated, I mean if 
a core fails due to a configuration error or an error like 
ClassNotFoundException, the other cores continue to work.


In an other hand I think there are some errors that will make all cores 
hang :


* OutOfMemoryError
* OutOfMemoryError : PermGen space
* Too many open files
* ...


I am using Tomcat 6, can somebody confirm this isolation in Solr 4.0 ? 
Which errors do not impact other cores and which errors impact other cores ?


Regards

Dominique


Strange spikes in query response times...any ideas where else to look?

2012-06-28 Thread solr

Greetings all,

We are working on building up a large Solr index for over 300 million  
records...and this is our first look at Solr. We are currently running  
a set of unique search queries against a single server (so no  
replication, no indexing going on at the same time, and no distributed  
search) with a set number of records (in our case, 10 million records  
in the index) for about 30 minutes, with nearly all of our searches  
being unique (I say nearly because our set of queries is unique, but  
I have not yet confirmed that JMeter is selecting these queries with  
no replacement).


We are striving for a 2 second response time on the average, and  
indeed we are pretty darned close. In fact, if you look at the average  
responses time, we are well under the 2 seconds per query.  
Unfortunately, we are seeing that about once every 6 minutes or so  
(and it is not a regular event...exactly six minutes apart...it is  
about six minutes but it fluctuates) we get a single query that  
returns in something like 15 to 20 seconds


We have been trying to identify what is causing this spike every so  
often and we are completely baffled. What we have done thus far:


1) Looked through the SAR logs and have not seen anything that  
correlates to this issue
2) Tracked the JVM statistics...especially the garbage  
collections...no correlations there either

3) Examined the queries...no pattern obvious there
4) Played with the JVM memory settings (heap settings, cache settings,  
and any other settings we could find)
5) Changed hardware: Brand new 4 processor, 8 gig RAM server with a  
fresh install of Redhat 5.7 enterprise, tried on a large instance of  
AWS EC2, tried on a fresh instance of a VMWare based virtual machine  
from our own data center) an still nothing is giving us a clue as to  
what is causing these spikes

5) No correlation found between the number of hits returned and the spikes


Our data is very simple and so are the queries. The schema consists of  
40 fields, most of which are string fields, 2 of which are  
location fields, and a small handful of which are integer fields.  
All fields are indexed and all fields are stored.


Our queries are also rather simple. Many of the queries are a simple  
one-field search. The most complex query we have is a 3-field search.  
Again, no correlation has been established between the query and these  
spikes. Also, about 60% of our queries return zero hits (on the  
assumption that we want to make solr search its entire index every so  
often. 60% is more than we intended and we will fix that soon...but  
that is what is currently happening. Again, no correlation found  
between spikes and 0-hit returned queries).


For some time we were testing with 100 million records in the index  
and the aggregate data looked quite good. Most queries were returning  
in under 2 seconds. Unfortunately, it was when we looked at the  
individual data points that we found spikes every 6-8 minutes or so  
hitting sometimes as high as 150 seconds!


We have been testing with 100 million records in the index, 50 million  
records in the index, 25 million, 20 million, 15 million, and 10  
million records. As I  indicated at the start, we are now at 10  
million records with 15-20 seconds spikes.


As we have decreased the number of records in the index,the size (but  
not the frequency) of the spikes has been dropping.


My question is: Is this type of behavior normal for Solr when it is  
being overstressed? I've read of lots of people with far more  
complicated schemas running MORE than 10 million records in an index  
and never once complained about these spikes. Since I am new at this,  
I am not sure what Solr's failure mode looks like when it has too  
many records to search.


I am hoping someone looking at this note can at least give me another  
direction to look. 10 million records searched in less than 2 seconds  
most of the time is great...but those 10 and 20 seconds spikes are not  
going to go over well with our customers...and I somehow think there  
is more we should be able to do here.


Thanks.

Peter S. Lee
ProQuest



RE: Strange spikes in query response times...any ideas where else to look?

2012-06-28 Thread Michael Ryan
A few questions...

1) Do you only see these spikes when running JMeter? I.e., do you ever see a 
spike when you manually run a query?

2) How are you measuring the response time? In my experience there are three 
different ways to measure query speed. Usually all of them will be 
approximately equal, but in some situations they can be quite different, and 
this difference can be a clue as to where the bottleneck is:
  1) The response time as seen by the end user (in this case, JMeter)
  2) The response time as seen by the container (for example, in Jetty you can 
get this by enabling logLatency in jetty.xml)
  3) The QTime as returned in the Solr response

3) Are you running multiple queries concurrently, or are you just using a 
single thread in JMeter?

-Michael

-Original Message-
From: s...@isshomefront.com [mailto:s...@isshomefront.com] 
Sent: Thursday, June 28, 2012 7:56 PM
To: solr-user@lucene.apache.org
Subject: Strange spikes in query response times...any ideas where else to 
look?

Greetings all,

We are working on building up a large Solr index for over 300 million  
records...and this is our first look at Solr. We are currently running  
a set of unique search queries against a single server (so no  
replication, no indexing going on at the same time, and no distributed  
search) with a set number of records (in our case, 10 million records  
in the index) for about 30 minutes, with nearly all of our searches  
being unique (I say nearly because our set of queries is unique, but  
I have not yet confirmed that JMeter is selecting these queries with  
no replacement).

We are striving for a 2 second response time on the average, and  
indeed we are pretty darned close. In fact, if you look at the average  
responses time, we are well under the 2 seconds per query.  
Unfortunately, we are seeing that about once every 6 minutes or so  
(and it is not a regular event...exactly six minutes apart...it is  
about six minutes but it fluctuates) we get a single query that  
returns in something like 15 to 20 seconds

We have been trying to identify what is causing this spike every so  
often and we are completely baffled. What we have done thus far:

1) Looked through the SAR logs and have not seen anything that  
correlates to this issue
2) Tracked the JVM statistics...especially the garbage  
collections...no correlations there either
3) Examined the queries...no pattern obvious there
4) Played with the JVM memory settings (heap settings, cache settings,  
and any other settings we could find)
5) Changed hardware: Brand new 4 processor, 8 gig RAM server with a  
fresh install of Redhat 5.7 enterprise, tried on a large instance of  
AWS EC2, tried on a fresh instance of a VMWare based virtual machine  
from our own data center) an still nothing is giving us a clue as to  
what is causing these spikes
5) No correlation found between the number of hits returned and the spikes


Our data is very simple and so are the queries. The schema consists of  
40 fields, most of which are string fields, 2 of which are  
location fields, and a small handful of which are integer fields.  
All fields are indexed and all fields are stored.

Our queries are also rather simple. Many of the queries are a simple  
one-field search. The most complex query we have is a 3-field search.  
Again, no correlation has been established between the query and these  
spikes. Also, about 60% of our queries return zero hits (on the  
assumption that we want to make solr search its entire index every so  
often. 60% is more than we intended and we will fix that soon...but  
that is what is currently happening. Again, no correlation found  
between spikes and 0-hit returned queries).

For some time we were testing with 100 million records in the index  
and the aggregate data looked quite good. Most queries were returning  
in under 2 seconds. Unfortunately, it was when we looked at the  
individual data points that we found spikes every 6-8 minutes or so  
hitting sometimes as high as 150 seconds!

We have been testing with 100 million records in the index, 50 million  
records in the index, 25 million, 20 million, 15 million, and 10  
million records. As I  indicated at the start, we are now at 10  
million records with 15-20 seconds spikes.

As we have decreased the number of records in the index,the size (but  
not the frequency) of the spikes has been dropping.

My question is: Is this type of behavior normal for Solr when it is  
being overstressed? I've read of lots of people with far more  
complicated schemas running MORE than 10 million records in an index  
and never once complained about these spikes. Since I am new at this,  
I am not sure what Solr's failure mode looks like when it has too  
many records to search.

I am hoping someone looking at this note can at least give me another  
direction to look. 10 million records searched in less than 2 seconds  
most of the time is great...but those 10 and 20 seconds 

RE: Strange spikes in query response times...any ideas where else to look?

2012-06-28 Thread solr

Michael,

Thank you for responding...and for the excellent questions.

1) We have never seen this response time spike with a user-interactive  
search. However, in the span of about 40 minutes, which included about  
82,000 queries, we only saw a handful of near-equally distributed  
spikes. We have tried sending queries from the admin tool while the  
test was running, but given those odds, I'm not surprised we've never  
hit on one of those few spikes we are seeing in the test results.


2) Good point and I should have mentioned this. We are using multiple  
methods to track these response times.
  a) Looking at the catalina.out file and plotting the response times  
recorded there (I think this is logging the QTime as seen by Solr).
  b) Looking at what JMeter is reporting as response times. In  
general, these are very close if not identical to what is being seen  
in the Catalina.out file. I have not run a line-by-line comparison,  
but putting the query response graphs next to each other shows them to  
be nearly (or possibly exactly) the same. Nothing looked out of the  
ordinary.


3) We are using multiple threads. Before your email I was looking at  
the results, doing some math, and double checking the reports from  
JMeter. I did notice that our throughput is much higher than we meant  
for it to be. JMeter is set up to run 15 threads from a single test  
machine...but I noticed that the JMeter report is showing close to 47  
queries per second. We are only targeting TWO to FIVE queries per  
second. This is up next on our list of things to look at and how to  
control more effectively. We do have three separate machines set up  
for JMeter testing and we are investigating to see if perhaps all  
three of these machines are inadvertently being launched during the  
test at one time and overwhelming the server. This *might* be one  
facet of the problem. Agreed on that.


Even as we investigate this last item regarding the number of  
users/threads, I wouldn't mind any other thoughts you or anyone else  
had to offer. We are checking on this user/threads issue and for the  
sake of anyone else you finds this discussion useful I'll note what we  
find.


Thanks again.

Peter S. Lee
ProQuest

Quoting Michael Ryan mr...@moreover.com:


A few questions...

1) Do you only see these spikes when running JMeter? I.e., do you  
ever see a spike when you manually run a query?


2) How are you measuring the response time? In my experience there  
are three different ways to measure query speed. Usually all of them  
will be approximately equal, but in some situations they can be  
quite different, and this difference can be a clue as to where the  
bottleneck is:

  1) The response time as seen by the end user (in this case, JMeter)
  2) The response time as seen by the container (for example, in  
Jetty you can get this by enabling logLatency in jetty.xml)

  3) The QTime as returned in the Solr response

3) Are you running multiple queries concurrently, or are you just  
using a single thread in JMeter?


-Michael

-Original Message-
From: s...@isshomefront.com [mailto:s...@isshomefront.com]
Sent: Thursday, June 28, 2012 7:56 PM
To: solr-user@lucene.apache.org
Subject: Strange spikes in query response times...any ideas where  
else to look?


Greetings all,

We are working on building up a large Solr index for over 300 million
records...and this is our first look at Solr. We are currently running
a set of unique search queries against a single server (so no
replication, no indexing going on at the same time, and no distributed
search) with a set number of records (in our case, 10 million records
in the index) for about 30 minutes, with nearly all of our searches
being unique (I say nearly because our set of queries is unique, but
I have not yet confirmed that JMeter is selecting these queries with
no replacement).

We are striving for a 2 second response time on the average, and
indeed we are pretty darned close. In fact, if you look at the average
responses time, we are well under the 2 seconds per query.
Unfortunately, we are seeing that about once every 6 minutes or so
(and it is not a regular event...exactly six minutes apart...it is
about six minutes but it fluctuates) we get a single query that
returns in something like 15 to 20 seconds

We have been trying to identify what is causing this spike every so
often and we are completely baffled. What we have done thus far:

1) Looked through the SAR logs and have not seen anything that
correlates to this issue
2) Tracked the JVM statistics...especially the garbage
collections...no correlations there either
3) Examined the queries...no pattern obvious there
4) Played with the JVM memory settings (heap settings, cache settings,
and any other settings we could find)
5) Changed hardware: Brand new 4 processor, 8 gig RAM server with a
fresh install of Redhat 5.7 enterprise, tried on a large instance of
AWS EC2, tried on a fresh instance of a VMWare 

Re: Strange spikes in query response times...any ideas where else to look?

2012-06-28 Thread Otis Gospodnetic
Peter,

These could be JVM, or it could be index reopening and warmup queries, or  
Grab SPM for Solr - http://sematext.com/spm - in 24-48h we'll release an agent 
that tracks and graphs errors and timings of each Solr search component, which 
may reveal interesting stuff.  In the mean time, look at the graph with IO as 
well as graph with caches.  That's where I'd first look for signs.

Re users/threads question - if I understand correctly, this is the problem: 
 JMeter is set up to run 15 threads from a single test machine...but I noticed 
that the JMeter report is showing close to 47 queries per second.  It sounds 
like you re equating # of threads to QPS, which isn't right.  Imagine you had 
10 threads and each query took 0.1 seconds (processed by a single CPU core) and 
the server had 10 CPU cores.  That would mean that your 1 thread could run 10 
queries per second utilizing just 1 CPU core. And 10 threads would utilize all 
10 CPU cores and would give you 10x higher throughput - 10x10=100 QPS.

So if you need to simulate just 2-5 QPS, just lower the number of threads.  
What that number should be depends on query complexity and hw resources (cores 
or IO).

Otis

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




 From: s...@isshomefront.com s...@isshomefront.com
To: solr-user@lucene.apache.org 
Sent: Thursday, June 28, 2012 9:20 PM
Subject: RE: Strange spikes in query response times...any ideas where else 
to look?
 
Michael,

Thank you for responding...and for the excellent questions.

1) We have never seen this response time spike with a user-interactive search. 
However, in the span of about 40 minutes, which included about 82,000 queries, 
we only saw a handful of near-equally distributed spikes. We have tried 
sending queries from the admin tool while the test was running, but given 
those odds, I'm not surprised we've never hit on one of those few spikes we 
are seeing in the test results.

2) Good point and I should have mentioned this. We are using multiple methods 
to track these response times.
  a) Looking at the catalina.out file and plotting the response times recorded 
there (I think this is logging the QTime as seen by Solr).
  b) Looking at what JMeter is reporting as response times. In general, these 
are very close if not identical to what is being seen in the Catalina.out 
file. I have not run a line-by-line comparison, but putting the query response 
graphs next to each other shows them to be nearly (or possibly exactly) the 
same. Nothing looked out of the ordinary.

3) We are using multiple threads. Before your email I was looking at the 
results, doing some math, and double checking the reports from JMeter. I did 
notice that our throughput is much higher than we meant for it to be. JMeter 
is set up to run 15 threads from a single test machine...but I noticed that 
the JMeter report is showing close to 47 queries per second. We are only 
targeting TWO to FIVE queries per second. This is up next on our list of 
things to look at and how to control more effectively. We do have three 
separate machines set up for JMeter testing and we are investigating to see if 
perhaps all three of these machines are inadvertently being launched during 
the test at one time and overwhelming the server. This *might* be one facet of 
the problem. Agreed on that.

Even as we investigate this last item regarding the number of users/threads, I 
wouldn't mind any other thoughts you or anyone else had to offer. We are 
checking on this user/threads issue and for the sake of anyone else you finds 
this discussion useful I'll note what we find.

Thanks again.

Peter S. Lee
ProQuest

Quoting Michael Ryan mr...@moreover.com:

 A few questions...
 
 1) Do you only see these spikes when running JMeter? I.e., do you ever see a 
 spike when you manually run a query?
 
 2) How are you measuring the response time? In my experience there are three 
 different ways to measure query speed. Usually all of them will be 
 approximately equal, but in some situations they can be quite different, and 
 this difference can be a clue as to where the bottleneck is:
   1) The response time as seen by the end user (in this case, JMeter)
   2) The response time as seen by the container (for example, in Jetty you 
can get this by enabling logLatency in jetty.xml)
   3) The QTime as returned in the Solr response
 
 3) Are you running multiple queries concurrently, or are you just using a 
 single thread in JMeter?
 
 -Michael
 
 -Original Message-
 From: s...@isshomefront.com [mailto:s...@isshomefront.com]
 Sent: Thursday, June 28, 2012 7:56 PM
 To: solr-user@lucene.apache.org
 Subject: Strange spikes in query response times...any ideas where else to 
 look?
 
 Greetings all,
 
 We are working on building up a large Solr index for over 300 million
 records...and this is our first look at Solr. We are currently running
 a set of unique search 

Re: SolrJ Response

2012-06-28 Thread Sachin Aggarwal
hey, one more thing to add when u query to server u need to add the
response type u need. have a look at this page.

http://lucidworks.lucidimagination.com/display/solr/Response+Writers

On Thu, Jun 28, 2012 at 6:14 PM, Jochen Just jochen.j...@avono.de wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

  I want to display response in json/xmk as it comes from solr.
 Why don't you use the JSON QueryResponseWriter from Solr directly?
 http://wiki.apache.org/solr/SolJSON should give you all you need to
 get started.

 Jochen


 - --
 Jochen Just   Fon:   (++49) 711/28 07 57-193
 avono AG  Mobil: (++49) 172/73 85 387
 Breite Straße 2   Mail:  jochen.j...@avono.de
 70173 Stuttgart   WWW:   http://www.avono.de


 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

 iQIcBAEBAgAGBQJP7FG7AAoJEP1xbhgWUHmSJOQP/AyKfHI1bOkmcgKL0PVAnQu9
 sCdAUWhL732JWd+TT77onnQQ2g2vKWnIg+00fmU+x5B52uUzU2nKRMCLaVhGlSJK
 NSb4c5DJVcdzz6G5fofkQLZKahLRSi9d3p8A5c5CMSEvkWLAYR3OPrNTn7dUiJNA
 D1JyQjSbMMwyf41msjLF84oF4C4Nb+0eY2bqiF5rlMBdEzazYl4hlMkVxzu6taiQ
 Yf38CB+vd91OznpvMTr89XCuTi+l9XmG0d0TKKvKq4r2sDTrQyBM8q3oyTPeNyKy
 VsmUP+m6kqlPWOlSjJyxw5PQz5IlfwRskTbrMS4ZCBDH7Bam5D0UtZzuY+DJRKCM
 eW49MLgbA2IPYnvfd78v+VxCj9Xyh49QZd0ea1uXve7ABp7WeRj/1L8CdHvAK6/k
 5NFW02/A+PoI3+QTgcYzXaO5N+AG3maAhLELDgZ1fQW/wISRLSBeSRj7QEQRPLJE
 rpekf7v3S0fBJyk2cn7YITTuqMogwktVYv/OQ6wB7+1O8cXzt6p4BYRneqmPw4Ll
 6Vr/ESdGMTOu7VAzWqB9pmCMjfORtqKIFIVcGyIAGFlD5xhH3aepM2bpbCVJnaMM
 GYnXoLLSB2mrexyccthBQV+sYOvZerjtcvoBY5ZIxcbT+HBag6ReFrLxM3AUWmLI
 jtGvkh4y1180l2AKgSSJ
 =QsJw
 -END PGP SIGNATURE-




-- 

Thanks  Regards

Sachin Aggarwal
7760502772