date:20111128

Hi All,

I making fuzzy search in my solr application like below,

q:squre~ 0.6

i want that some prefix length should not go for match in fuzzy query , say
for in this ex. i want that my fuzzy query should not go to match for squ
, and rest of term go for fuzzy search. i am doing it by applying wild query
with fuzzy query as below

q:squre~ 0.6 AND squ*

i want to know that  , is any better way of doing this? , as per i read
around for it, i read that we can set prefix length in our fuzzy query for
no. of char. we dont want to match in our fuzzy query. but i didn't found
anything how can i set it my solr fuzzy query.

Thanks in Advance.
Meghana





--
View this message in context: 
http://lucene.472066.n3.nabble.com/fuzzy-search-with-prefix-tp3542064p3542064.html
Sent from the Solr - User mailing list archive at Nabble.com.

make fuzzy search for phrase

Hi All,

I am doing fuzzy search in my solr , its working good for signle term , but
when searching for  phrases i get either bulk of data or very less data. is
there any good way for getting satisfactory amount of data with nice
accuracy.

1) q:kenny zemanski : 9 recors
2) keny~0.7 zemansi~0.7 AND  ken* : 22948 records. 

i want to get amount of data that is good in accuracy and some what near to
my actual results. by applying more accuracy than if 0.7 , i am getting very
less data and none match with my desired result.

anybody have any idea?
any help much appreciated.
Meghana 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/make-fuzzy-search-for-phrase-tp3542079p3542079.html
Sent from the Solr - User mailing list archive at Nabble.com.

[newbie] solrj SolrQuery indent response

2011-11-28 Thread halil

Hi List,

I am new to Solr and lucene world. I have a simple question. I wrote below
code segment and it works.

public class SolrjTest {


public static void main(String[] args) throws MalformedURLException,
SolrServerException{
ClassPathXmlApplicationContext c = new
ClassPathXmlApplicationContext(/patrades-search-solrj-test-beans.xml);
SolrServer server = (SolrServer)c.getBean(solrServer);
SolrQuery query = new SolrQuery();
query.setQuery( *:* )
.setFacet(true)
;
QueryResponse rsp = server.query( query );
System.err.println(rsp.toString());
}
}


I want to indent the response string, but couldnt find any answer. I looked
at book, mail archive and google.  Most relevant link is below

http://wiki.apache.org/solr/SimpleFacetParameters

but i dont know how to use it. Api is hard to read by the way.

regards,

-Halil AĞIN

RE: DIH Strange Problem

2011-11-28 Thread Kai Gülzau

Do you use Java 6 update 29? There is a known issue with the latest mssql 
driver:

http://blogs.msdn.com/b/jdbcteam/archive/2011/11/07/supported-java-versions-november-2011.aspx

In addition, there are known connection failure issues with Java 6 update 29, 
and the developer preview (non production) versions of Java 6 update 30 and 
Java 6 update 30 build 12.  We are in contact with Java on these issues and we 
will update this blog once we have more information.

Should work with update 28.

Kai

-Original Message-
From: Husain, Yavar [mailto:yhus...@firstam.com] 
Sent: Monday, November 28, 2011 1:02 PM
To: solr-user@lucene.apache.org; Shawn Heisey
Subject: RE: DIH Strange Problem

I figured out the solution and Microsoft and not Solr is the problem here :):

I downloaded and build latest Solr (3.4) from sources and finally hit following 
line of code in Solr (where I put my debug statement) :

if(url != null){
   LOG.info(Yavar: getting handle to driver manager:);
   c = DriverManager.getConnection(url, initProps);
   LOG.info(Yavar: got handle to driver manager:); }

The call to Driver Manager was not returning. Here was the error!! The Driver 
we were using was Microsoft Type 4 JDBC driver for SQL Server. I downloaded 
another driver called jTDS jDBC driver and installed that. Problem got fixed!!!

So please follow the following steps:

1. Download jTDS jDBC driver from http://jtds.sourceforge.net/ 2. Put the 
driver jar file into your Solr/lib directory where you had put Microsoft JDBC 
driver.
3. In the data-config.xml use this statement: 
driver=net.sourceforge.jtds.jdbc.Driver
4. Also in data-config.xml mention url like this: 
url=jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX
5. Now run your indexing.

It should solve the problem.

-Original Message-
From: Husain, Yavar
Sent: Thursday, November 24, 2011 12:38 PM
To: solr-user@lucene.apache.org; Shawn Heisey
Subject: RE: DIH Strange Problem

Hi

Thanks for your replies.

I carried out these 2 steps (it did not solve my problem):

1. I tried setting responseBuffering to adaptive. Did not work.
2. For checking Database connection I wrote a simple java program to connect to 
database and fetch some results with the same driver that I use for solr. It 
worked. So it does not seem to be a problem with the connection.

Now I am stuck where Tomcat log says: Creating a connection for entity . 
and does nothing, I mean after this log we usually get the getConnection() 
took x millisecond however I dont get that ,I can just see the time moving 
with no records getting fetched.

Original Problem listed again:


I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing 
data. Indexing and all was working perfectly fine. However today when I started 
full indexing again, Solr halts/stucks at the line Creating a connection for 
entity. There are no further messages after that. I can see that DIH 
is busy and on the DIH console I can see A command is still running, I can 
also see total rows fetched = 0 and total request made to datasource = 1 and 
time is increasing however it is not doing anything. This is the exact 
configuration that worked for me. I am not really able to understand the 
problem here. Also in the index directory where I am storing the index there 
are just 3 files: 2 segment files + 1  lucene*-write.lock file. 
...
data-config.xml:

dataSource type=JdbcDataSource 
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders user=testUser 
password=password/ document .
.

Logs:

INFO: Server startup in 2016 ms
Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
INFO: Starting Full Import
Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 
QTime=11 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties
INFO: Read dataimport.properties
Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM 
org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
   
commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6]
Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: newest commit = 1322041133719
Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity SampleText with URL: 
jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, November 23, 2011 7:36 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH Strange Problem

On 11/23/2011 5:21 AM, Chantal Ackermann wrote:

Re: [newbie] solrj SolrQuery indent response

2011-11-28 Thread halil

I step one more. but still no indent. I wrote below code segment

 query.setQuery( marka_s:atak* )
.setFacet(true)
.setParam(indent, on)
;

and here is the resulted query string

q=marka_s%3Aatak*facet=trueindent=on

-halil agin.

On Mon, Nov 28, 2011 at 3:07 PM, halil halil.a...@gmail.com wrote:

 Hi List,

 I am new to Solr and lucene world. I have a simple question. I wrote below
 code segment and it works.

 public class SolrjTest {


 public static void main(String[] args) throws MalformedURLException,
 SolrServerException{
 ClassPathXmlApplicationContext c = new
 ClassPathXmlApplicationContext(/patrades-search-solrj-test-beans.xml);
 SolrServer server = (SolrServer)c.getBean(solrServer);
 SolrQuery query = new SolrQuery();
 query.setQuery( *:* )
 .setFacet(true)
 ;
 QueryResponse rsp = server.query( query );
 System.err.println(rsp.toString());
 }
 }


 I want to indent the response string, but couldnt find any answer. I
 looked at book, mail archive and google.  Most relevant link is below

 http://wiki.apache.org/solr/SimpleFacetParameters

 but i dont know how to use it. Api is hard to read by the way.

 regards,

 -Halil AĞIN

Search over multiple indexes

2011-11-28 Thread Valeriy Felberg

Hello,

I'm trying to implement automatic document classification and store
the classified attributes as an additional field in Solr document.
Then the search goes against that field like
q=classified_category:xyz. The document classification is currently
implemented as an UpdateRequestProcessor and works quite well. The
only problem: for each change in the classification algorithm every
document has to be re-indexed which, of course, makes tests and
experimentation difficult and binds resources (other than Solr) for
several hours.

So, my idea would be to store classified attributes in a meta-index
and search over the main and meta indexes simultaneously. For example:
main index has got fields like color and meta index has got
classified_category. The query q=classified_category:xyz AND
color:black should be then split over the main and meta index. This
way, the classification could run on Solr over the main index and
store classified fields in the meta index so that only Solr resources
are bound.

Has anybody already done something like that? It's a little bit like
sharding but different in that each shard would process its part of
the query and live in the same Solr instance.

Regards,
Valeriy

Re: Huge Performance: Solr distributed search

2011-11-28 Thread Artem Lokotosh


Hi all again. Thanks to all for your replies.

On this weekend I'd made some interesting tests, and I would like to  share it 
with you.


First of all I made speed test of my hdd:

root@LSolr:~# hdparm -t /dev/sda9


/dev/sda9:

 Timing buffered disk reads:  146 MB in  3.01 seconds =  48.54 MB/sec


Then with iperf I had tested my network:

[  4]  0.0-18.7 sec  2.00 GBytes917 Mbits/sec


Then, I tried to post my quesries using shard parameter with one

shard, so my queries were like:

http://localhost:8080/solr1/select/?q=(test)qt=requestShards  
http://localhost:8080/solr1/select/?q=%28test%29qt=requestShards

where requestShards is:

requestHandler name=requestShards class=solr.SearchHandler default=false

 lst name=defaults

   str name=echoParamsexplicit/str

   int name=rows10/int

   str name=shards127.0.0.1:8080/solr1  http://127.0.0.1:8080/solr1/str

 /lst

/requestHandler


Maybe its not correct, but:

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(genuflections)qt=requestShardsrows=2000}status=0
 QTime=6525

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(tunefulness)qt=requestShardsrows=2000}
  status=0 QTime=20170

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(societal)qt=requestShardsrows=2000}
  status=0 QTime=44958

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(euchre's)qt=requestShardsrows=2000}
  status=0 QTime=32161

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(monogram's)qt=requestShardsrows=2000}
  status=0 QTime=85252


When I posted similar queries direct to solr1 without requestShards I had:

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(reopening)rows=2000}  
hits=712 status=0 QTime=10

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(housemothers)rows=2000}  
hits=0 status=0 QTime=446

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(harpooners)rows=2000}  
hits=76 status=0 QTime=399

INFO: [] webapp=/solr1 path=/select/  
params={fl=*,scoreident=truestart=0q=(coaxing)rows=2000} hits=562  status=0 
QTime=2820

INFO: [] webapp=/solr1 path=/select/  
params={fl=*,scoreident=truestart=0q=(superstar's)rows=2000}  hits=4748 
status=0 QTime=672

INFO: [] webapp=/solr1 path=/select/  
params={fl=*,scoreident=truestart=0q=(sedateness's)rows=2000}  hits=136 
status=0 QTime=923

INFO: [] webapp=/solr1 path=/select/  
params={fl=*,scoreident=truestart=0q=(petrolatum)rows=2000} hits=8  
status=0 QTime=6183

INFO: [] webapp=/solr1 path=/select/  
params={fl=*,scoreident=truestart=0q=(everlasting's)rows=2000}  hits=1522 
status=0 QTime=2625


And finally I found a bug:

https://issues.apache.org/jira/browse/SOLR-1524  
https://issues.apache.org/jira/browse/SOLR-1524

Why is no activity on it? Its not actual?


Today I wrote a bash script:

#!/bin/bash

ds=$(date +%s.%N)

echo START: $ds  ./data/east_2000

curl  http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=(east)rows=2000  
http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=%28east%29rows=2000-s  -s-H 
'Content-type:text/xml; charset=utf-8'  ./data/east_2000

de=$(date +%s.%N)

ddf=$(echo $de - $ds | bc)

echo END: $de  ./data/east_2000

echo DIFF: $ddf  ./data/east_2000


Before runing a Tomcat I'd dropped cache:

root@LSolr:~# echo 3  /proc/sys/vm/drop_caches


Then I started Tomcat and run the script. Result is bellow:

START: 1322476131.783146691

?xml version=1.0 encoding=UTF-8?

response

lst name=responseHeaderint name=status0/intint

name=QTime125/intlst name=paramsstr

name=fl*,score/strstr name=identtrue/strstr

name=start0/strstr name=q(east)/strstr

name=rows2000/str/lst/lstresult name=response

numFound=21439 start=0 maxScore=4.387605

...

/response

END: 1322476180.262770244

DIFF: 48.479623553


File size is:

root@LSolr:~# ls -l | grep east

-rw-r--r-- 1 root root 1063579 Nov 28 12:29 east_2000


I'm using nmon to monitor a HDD activity. It was near 100% when I run  the 
script. But when I tried to run it again the result was:

DIFF: .063678709

and no much HDD activity at nmon.


I can't undestand one thing: is this my huge hardware such as slow HDDor its a 
Solr troubles?

And why is no activity on bug  https://issues.apache.org/jira/browse/SOLR-1524  
https://issues.apache.org/jira/browse/SOLR-1524  since 27/Oct/09 07:19?


On 11/25/2011 10:02 AM, Dmitry Kan wrote:


45 000 000 per shard approx, Tomcat, caching was tweaked in solrconfig and
shard given 12GB of RAM max.

!-- Filter Cache

  Cache used by SolrIndexSearcher for filters (DocSets),
  unordered sets of *all* documents that match a query.  When a
  new searcher is opened, its caches may be prepopulated or
  autowarmed using data from caches in the old searcher.
  autowarmCount is the number of items to prepopulate.  For
  LRUCache, the autowarmed items will be

Fuzzy search with slop

Hi,

Can i apply fuzzy query and slop together... like 

q=hello world~0.5~3

I am getting error when applying like this. i want to make both fuzzy search
and slop work.

How can i do this, can anybody help me?
Thanks in Advance.
Meghana

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fuzzy-search-with-slop-tp3542280p3542280.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: help no segment in my lucene index!!!

2011-11-28 Thread Michael McCandless

Which version of Solr/Lucene were you using when you hit power loss?

There was a known bug that could allow power loss to cause corruption,
but this was fixed in Lucene 3.4.0.

Unfortunately, there is no easy way to recreate the segments_N file...
in principle it should be possible and maybe not too much work but
nobody has created such a tool yet, that I know of.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Nov 28, 2011 at 5:54 AM, Roberto Iannone
roberto.iann...@gmail.com wrote:
 Hi all,

 after a power supply inperruption my lucene index (about 28 GB) looks like
 this:

 18/11/2011  20:29     2.016.961.997 _3d.fdt
 18/11/2011  20:29         1.816.004 _3d.fdx
 18/11/2011  20:29                89 _3d.fnm
 18/11/2011  20:30       197.323.436 _3d.frq
 18/11/2011  20:30         1.816.004 _3d.nrm
 18/11/2011  20:30       358.016.461 _3d.prx
 18/11/2011  20:30           637.604 _3d.tii
 18/11/2011  20:30        48.565.519 _3d.tis
 18/11/2011  20:31           454.004 _3d.tvd
 18/11/2011  20:31     1.695.380.935 _3d.tvf
 18/11/2011  20:31         3.632.004 _3d.tvx
 18/11/2011  23:33     2.048.500.822 _6g.fdt
 18/11/2011  23:33         3.032.004 _6g.fdx
 18/11/2011  23:33                89 _6g.fnm
 18/11/2011  23:34       221.593.644 _6g.frq
 18/11/2011  23:34         3.032.004 _6g.nrm
 18/11/2011  23:34       350.136.996 _6g.prx
 18/11/2011  23:34           683.668 _6g.tii
 18/11/2011  23:34        52.224.328 _6g.tis
 18/11/2011  23:36           758.004 _6g.tvd
 18/11/2011  23:36     1.758.786.158 _6g.tvf
 18/11/2011  23:36         6.064.004 _6g.tvx
 19/11/2011  03:29     1.966.167.843 _9j.fdt
 19/11/2011  03:29         3.832.004 _9j.fdx
 19/11/2011  03:28                89 _9j.fnm
 19/11/2011  03:30       222.733.606 _9j.frq
 19/11/2011  03:30         3.832.004 _9j.nrm
 19/11/2011  03:30       324.722.843 _9j.prx
 19/11/2011  03:30           715.441 _9j.tii
 19/11/2011  03:30        54.488.546 _9j.tis
 

 without any segment files!
 I tried to fix with CheckIndex utility in lucene, but I got the following
 message:

 ERROR: could not read any segments file in directory
 org.apache.lucene.index.IndexNotFoundException: no segments* file found in
 org.a
 pache.lucene.store.MMapDirectory@E:\recover_me
 lockFactory=org.apache.lucene.sto
 re.NativeFSLockFactory@5d36d1d7: files: [_3d.fdt, _3d.fdx, _3d.fnm,
 _3d.frq, _3d
 .nrm, _3d.prx, _3d.tii, _3d.tis, _3d.tvd, _3d.tvf, _3d.tvx, _6g.fdt,
 _6g.fdx, _6
 g.fnm, _6g.frq, _6g.nrm, _6g.prx, _6g.tii, _6g.tis, _6g.tvd, _6g.tvf,
 _6g.tvx, _
 9j.fdt, _9j.fdx, _9j.fnm, _9j.frq, _9j.nrm, _9j.prx, _9j.tii, _9j.tis,
 _9j.tvd,
 _9j.tvf, _9j.tvx, _cf.cfs, _cm.fdt, _cm.fdx, _cm.fnm, _cm.frq, _cm.nrm,
 _cm.prx,
  _cm.tii, _cm.tis, _cm.tvd, _cm.tvf, _cm.tvx, _ff.fdt, _ff.fdx, _ff.fnm,
 _ff.frq
 , _ff.nrm, _ff.prx, _ff.tii, _ff.tis, _ff.tvd, _ff.tvf, _ff.tvx, _ii.fdt,
 _ii.fd
 x, _ii.fnm, _ii.frq, _ii.nrm, _ii.prx, _ii.tii, _ii.tis, _ii.tvd, _ii.tvf,
 _ii.t
 vx, _lc.cfs, _ll.fdt, _ll.fdx, _ll.fnm, _ll.frq, _ll.nrm, _ll.prx, _ll.tii,
 _ll.
 tis, _ll.tvd, _ll.tvf, _ll.tvx, _lo.cfs, _lp.cfs, _lq.cfs, _lr.cfs,
 _ls.cfs, _lt
 .cfs, _lu.cfs, _lv.cfs, _lw.fdt, _lw.fdx, _lw.tvd, _lw.tvf, _lw.tvx,
 _m.fdt, _m.
 fdx, _m.fnm, _m.frq, _m.nrm, _m.prx, _m.tii, _m.tis, _m.tvd, _m.tvf, _m.tvx]
        at
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo
 s.java:712)
        at
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo
 s.java:593)
        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359)
        at
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:327)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:995)

 There's a way to recover this index ?

 Cheers

 Rob

Re: highlighting on range query

 and output is
 
 {
   responseHeader:{
     status:0,
     QTime:4,
     params:{
       hl.highlightMultiTerm:true,
       fl:lily.id,rangefld,
       indent:on,
      
 hl.useFastVectorHighlighter:false,
        q:rangefld:[5000 TO
 6000],
       hl.fl:*,rangefld,

I don't think hl.fl parameter accepts * value. Please try hl.fl=rangefld

Re: make fuzzy search for phrase

 I am doing fuzzy search in my solr , its working good for
 signle term , but
 when searching for  phrases i get either bulk of data
 or very less data. is
 there any good way for getting satisfactory amount of data
 with nice
 accuracy.
 
 1) q:kenny zemanski : 9 recors
 2) keny~0.7 zemansi~0.7 AND  ken* : 22948 records. 
 

You can do it with https://issues.apache.org/jira/browse/SOLR-1604

q=keny~0.7 zemansi~0.7 AND  ken*

Re: Fuzzy search with slop



 Can i apply fuzzy query and slop together... like 
 
 q=hello world~0.5~3
 
 I am getting error when applying like this. i want to make
 both fuzzy search
 and slop work.
 
 How can i do this, can anybody help me?

It is possible with this plugin. https://issues.apache.org/jira/browse/SOLR-1604

how index words with their perfix in solr?

2011-11-28 Thread mina

I use solr 3.3,I want solr index words with their suffixes. when i index
'book' and 'books' and search 'book', solr show any document that has 'book'
or 'books' but when I index 'rain' and 'rainy' and search 'rain', solr show
any document that has 'rain' but i whant that solr show any document that
has 'rain' or 'rainy'.help me.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-index-words-with-their-perfix-in-solr-tp3542300p3542300.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: highlighting on range query

2011-11-28 Thread Rahul Mehta

Tried  below url and got the same output. Any other suggestion .

http://localhost:8983/solr/select?q=rangefld:[5000%20TO%206000]fl=lily.id,rangefldhl=onrows=5wt=jsonindent=onhl.fl=rangefldhl.highlightMultiTerm=truehl.usePhraseHighlighter=truehl.useFastVectorHighlighter=false

On Mon, Nov 28, 2011 at 8:10 PM, Ahmet Arslan iori...@yahoo.com wrote:

  and output is
 
  {
responseHeader:{
  status:0,
  QTime:4,
  params:{
hl.highlightMultiTerm:true,
fl:lily.id,rangefld,
indent:on,
 
  hl.useFastVectorHighlighter:false,
 q:rangefld:[5000 TO
  6000],
hl.fl:*,rangefld,

 I don't think hl.fl parameter accepts * value. Please try hl.fl=rangefld





-- 
Thanks  Regards

Rahul Mehta

Re: how index words with their perfix in solr?

2011-11-28 Thread François Schiettecatte

It looks like you are using the plural stemmer, you might want to look into 
using the Porter stemmer instead:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming

François

On Nov 28, 2011, at 9:14 AM, mina wrote:

 I use solr 3.3,I want solr index words with their suffixes. when i index
 'book' and 'books' and search 'book', solr show any document that has 'book'
 or 'books' but when I index 'rain' and 'rainy' and search 'rain', solr show
 any document that has 'rain' but i whant that solr show any document that
 has 'rain' or 'rainy'.help me.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-index-words-with-their-perfix-in-solr-tp3542300p3542300.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: help no segment in my lucene index!!!

2011-11-28 Thread Roberto Iannone

Hi Michael,

thx for your help :)

2011/11/28 Michael McCandless luc...@mikemccandless.com

 Which version of Solr/Lucene were you using when you hit power loss?

 I'm using Lucene 3.4.



 There was a known bug that could allow power loss to cause corruption,
 but this was fixed in Lucene 3.4.0.

 Unfortunately, there is no easy way to recreate the segments_N file...
 in principle it should be possible and maybe not too much work but
 nobody has created such a tool yet, that I know of.


some hints about how could I write this code by myself ?

Cheers

Rob

Re: turning off solr server verbosity

2011-11-28 Thread Roland Tollenaar


Hi Ahmet,

thanks. Is this not then a jetty setting? I'll search for that.

RR

Ahmet Arslan wrote:

I have not managed to figure out how to prevent verbose
output of the solr server. I assume the verbosity on the
server side slows down the response and it would be
preferable to turn it off?

If anyone knows how to achieve this, advice would be
appreciated.


Fuad reported such improvement gained by disabling info log level. Here is the 
original post : http://search-lucene.com/m/VBFAXnwp6x1

Re: Unable to index documents using DataImportHandler with MSSQL

2011-11-28 Thread Ian Grainger

Hah, I've just come on here to suggest you do the same thing! Thanks
for getting back to me - and interesting we both came up with the same
solution!

Now I have the problem that running a delta update updates the
'dataimport.properties' file - but then just re-fetches all the data
regardless! Weird!


On Mon, Nov 28, 2011 at 11:59 AM, Husain, Yavar yhus...@firstam.com wrote:
 Hi Ian

 I downloaded and build latest Solr (3.4) from sources and finally hit 
 following line of code in Solr (where I put my debug statement) :

 if(url != null){
               LOG.info(Yavar: getting handle to driver manager:);
               c = DriverManager.getConnection(url, initProps);
               LOG.info(Yavar: got handle to driver manager:);
 }

 The call to Driver Manager was not returning. Here was the error!! The Driver 
 we were using was Microsoft Type 4 JDBC driver for SQL Server. I downloaded 
 another driver called jTDS jDBC driver and installed that. Problem got 
 fixed!!!

 So please follow the following steps:

 1. Download jTDS jDBC driver from http://jtds.sourceforge.net/
 2. Put the driver jar file into your Solr/lib directory where you had put 
 Microsoft JDBC driver.
 3. In the data-config.xml use this statement: 
 driver=net.sourceforge.jtds.jdbc.Driver
 4. Also in data-config.xml mention url like this: 
 url=jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX
 5. Now run your indexing.

 It should solve the problem.

 Regards,
 Yavar

 -Original Message-
 From: Ian Grainger [mailto:i...@isfluent.com]
 Sent: Monday, November 28, 2011 4:11 PM
 To: Husain, Yavar
 Cc: solr-user@lucene.apache.org
 Subject: Re: Unable to index documents using DataImportHandler with MSSQL

 Right.
 This is REALLY weird - I've now started from scratch on another
 machine (this time Windows 7), and got _exactly_ the same problem !?


 On Mon, Nov 28, 2011 at 7:37 AM, Husain, Yavar yhus...@firstam.com wrote:
 Hi Ian

 I am having exactly the same problem what you are having on Win 7 and 2008 
 Server http://lucene.472066.n3.nabble.com/DIH-Strange-Problem-tc3530370.html

 I still have not received any replies which could solve my problem till now. 
 Please do let me know if you have arrived at some solution for your problem.

 Thanks.

 Regards,
 Yavar

 -Original Message-
 From: Ian Grainger [mailto:i...@isfluent.com]
 Sent: Friday, November 25, 2011 10:59 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Unable to index documents using DataImportHandler with MSSQL

 Update on this: I've established:
 * It's not a problem in the DB (I can index from this DB into a Solr
 instance on another server)
 * It's not Tomcat (I get the same problem in Jetty)
 * It's not the schema (I have simplified it to one field)

 That leaves SolrConfig.xml and data-config.

 Only thing changed in SolrConfig.xml is adding:

  lib dir=D:/Software/Solr/example/solr/dist/
 regex=apache-solr-cell-\d.*\.jar /
  lib dir=D:/Software/Solr/example/solr/dist/
 regex=apache-solr-clustering-\d.*\.jar /
  lib dir=D:/Software/Solr/example/solr/dist/
 regex=apache-solr-dataimporthandler-\d.*\.jar /
 requestHandler name=/dataimport
    class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
    str 
 name=configD:/Software/Solr/example/solr/conf/data-config.xml/str
  /lst
 /requestHandler

 And data-config.xml is pretty much as attached - except simpler.

 Any help or any advice on how to diagnose would be appreciated!


 On Fri, Nov 25, 2011 at 12:29 PM, Ian Grainger i...@isfluent.com wrote:
 Hi I have copied my Solr config from a working Windows server to a new
 one, and it can't seem to run an import.

 They're both using win server 2008 and SQL 2008R2. This is the data
 importer config

    dataConfig
      dataSource type=JdbcDataSource  name=ds1
            driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
            url=jdbc:sqlserver://localhost;databaseName=DB
            user=Solr
            password=pwd/
      document name=datas
        entity name=data dataSource=ds1 pk=key
        query=EXEC SOLR_COMPANY_SEARCH_DATA
        deltaImportQuery=SELECT * FROM Company_Search_Data WHERE
 [key]='${dataimporter.delta.key}'
        deltaQuery=SELECT [key] FROM Company_Search_Data WHERE modify_dt
 '${dataimporter.last_index_time}'
              field column=WorkDesc_Comments
 name=WorkDesc_Comments_Split /
              field column=WorkDesc_Comments 
 name=WorkDesc_Comments_Edge /
        /entity
      /document
    /dataConfig

 I can use MS SQL Profiler to watch the Solr user log in successfully,
 but then nothing. It doesn't seem to even try and execute the stored
 procedure. Any ideas why this would be working one server and not on
 another?

 FTR the only thing in the tomcat catalina log is:

    org.apache.solr.handler.dataimport.JdbcDataSource$1 call
    INFO: Creating a connection for entity data with URL:
 jdbc:sqlserver://localhost;databaseName=CATLive

 --
 Ian

 i...@isfluent.com
 +44 (0)1223 257903

RE: DIH Strange Problem

2011-11-28 Thread Husain, Yavar

Thanks Kai for sharing this. Ian encountered the same problem so marking him in 
the mail too.

From: Kai Gülzau [kguel...@novomind.com]
Sent: Monday, November 28, 2011 6:55 PM
To: solr-user@lucene.apache.org
Subject: RE: DIH Strange Problem

Do you use Java 6 update 29? There is a known issue with the latest mssql 
driver:

http://blogs.msdn.com/b/jdbcteam/archive/2011/11/07/supported-java-versions-november-2011.aspx

In addition, there are known connection failure issues with Java 6 update 29, 
and the developer preview (non production) versions of Java 6 update 30 and 
Java 6 update 30 build 12.  We are in contact with Java on these issues and we 
will update this blog once we have more information.

Should work with update 28.

Kai

-Original Message-
From: Husain, Yavar [mailto:yhus...@firstam.com]
Sent: Monday, November 28, 2011 1:02 PM
To: solr-user@lucene.apache.org; Shawn Heisey
Subject: RE: DIH Strange Problem

I figured out the solution and Microsoft and not Solr is the problem here :):

I downloaded and build latest Solr (3.4) from sources and finally hit following 
line of code in Solr (where I put my debug statement) :

if(url != null){
   LOG.info(Yavar: getting handle to driver manager:);
   c = DriverManager.getConnection(url, initProps);
   LOG.info(Yavar: got handle to driver manager:); }

The call to Driver Manager was not returning. Here was the error!! The Driver 
we were using was Microsoft Type 4 JDBC driver for SQL Server. I downloaded 
another driver called jTDS jDBC driver and installed that. Problem got fixed!!!

So please follow the following steps:

1. Download jTDS jDBC driver from http://jtds.sourceforge.net/ 2. Put the 
driver jar file into your Solr/lib directory where you had put Microsoft JDBC 
driver.
3. In the data-config.xml use this statement: 
driver=net.sourceforge.jtds.jdbc.Driver
4. Also in data-config.xml mention url like this: 
url=jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX
5. Now run your indexing.

It should solve the problem.

-Original Message-
From: Husain, Yavar
Sent: Thursday, November 24, 2011 12:38 PM
To: solr-user@lucene.apache.org; Shawn Heisey
Subject: RE: DIH Strange Problem

Hi

Thanks for your replies.

I carried out these 2 steps (it did not solve my problem):

1. I tried setting responseBuffering to adaptive. Did not work.
2. For checking Database connection I wrote a simple java program to connect to 
database and fetch some results with the same driver that I use for solr. It 
worked. So it does not seem to be a problem with the connection.

Now I am stuck where Tomcat log says: Creating a connection for entity . 
and does nothing, I mean after this log we usually get the getConnection() 
took x millisecond however I dont get that ,I can just see the time moving 
with no records getting fetched.

Original Problem listed again:

I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing 
data. Indexing and all was working perfectly fine. However today when I started 
full indexing again, Solr halts/stucks at the line Creating a connection for 
entity. There are no further messages after that. I can see that DIH 
is busy and on the DIH console I can see A command is still running, I can 
also see total rows fetched = 0 and total request made to datasource = 1 and 
time is increasing however it is not doing anything. This is the exact 
configuration that worked for me. I am not really able to understand the 
problem here. Also in the index directory where I am storing the index there 
are just 3 files: 2 segment files + 1  lucene*-write.lock file.
...
data-config.xml:

dataSource type=JdbcDataSource 
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders user=testUser 
password=password/ document .
.

Logs:

INFO: Server startup in 2016 ms
Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
INFO: Starting Full Import
Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 
QTime=11 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties
INFO: Read dataimport.properties
Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM 
org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1

commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6]
Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: newest commit = 1322041133719
Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity SampleText with URL:

Re: DIH Strange Problem

2011-11-28 Thread Ian Grainger

Aha! That sounds like it might be it!

On Mon, Nov 28, 2011 at 4:16 PM, Husain, Yavar yhus...@firstam.com wrote:

Thanks Kai for sharing this. Ian encountered the same problem so marking him
in the mail too.

From: Kai Gülzau [kguel...@novomind.com]
Sent: Monday, November 28, 2011 6:55 PM
To: solr-user@lucene.apache.org
Subject: RE: DIH Strange Problem

Do you use Java 6 update 29? There is a known issue with the latest mssql
driver:

http://blogs.msdn.com/b/jdbcteam/archive/2011/11/07/supported-java-versions-november-2011.aspx

In addition, there are known connection failure issues with Java 6 update
29, and the developer preview (non production) versions of Java 6 update 30
and Java 6 update 30 build 12. We are in contact with Java on these issues
and we will update this blog once we have more information.

Should work with update 28.

Kai

-Original Message-
From: Husain, Yavar [mailto:yhus...@firstam.com]
Sent: Monday, November 28, 2011 1:02 PM
To: solr-user@lucene.apache.org; Shawn Heisey
Subject: RE: DIH Strange Problem

I figured out the solution and Microsoft and not Solr is the problem here :):

I downloaded and build latest Solr (3.4) from sources and finally hit
following line of code in Solr (where I put my debug statement) :

if(url != null){
LOG.info(Yavar: getting handle to driver manager:);
c = DriverManager.getConnection(url, initProps);
LOG.info(Yavar: got handle to driver manager:); }

The call to Driver Manager was not returning. Here was the error!! The Driver
we were using was Microsoft Type 4 JDBC driver for SQL Server. I downloaded
another driver called jTDS jDBC driver and installed that. Problem got
fixed!!!

So please follow the following steps:

1. Download jTDS jDBC driver from http://jtds.sourceforge.net/ 2. Put the
driver jar file into your Solr/lib directory where you had put Microsoft JDBC
driver.
3. In the data-config.xml use this statement:
driver=net.sourceforge.jtds.jdbc.Driver
4. Also in data-config.xml mention url like this:
url=jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX
5. Now run your indexing.

It should solve the problem.

-Original Message-
From: Husain, Yavar
Sent: Thursday, November 24, 2011 12:38 PM
To: solr-user@lucene.apache.org; Shawn Heisey
Subject: RE: DIH Strange Problem

Thanks for your replies.

I carried out these 2 steps (it did not solve my problem):

1. I tried setting responseBuffering to adaptive. Did not work.
2. For checking Database connection I wrote a simple java program to connect
to database and fetch some results with the same driver that I use for solr.
It worked. So it does not seem to be a problem with the connection.

Now I am stuck where Tomcat log says: Creating a connection for entity
. and does nothing, I mean after this log we usually get the
getConnection() took x millisecond however I dont get that ,I can just see
the time moving with no records getting fetched.

Original Problem listed again:

I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing
data. Indexing and all was working perfectly fine. However today when I
started full indexing again, Solr halts/stucks at the line Creating a
connection for entity. There are no further messages after that. I
can see that DIH is busy and on the DIH console I can see A command is still
running, I can also see total rows fetched = 0 and total request made to
datasource = 1 and time is increasing however it is not doing anything. This
is the exact configuration that worked for me. I am not really able to
understand the problem here. Also in the index directory where I am storing
the index there are just 3 files: 2 segment files + 1 lucene*-write.lock
file.
...
data-config.xml:

dataSource type=JdbcDataSource
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders
user=testUser password=password/ document .
.

Logs:

INFO: Server startup in 2016 ms
Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=11 Nov 23, 2011 4:11:27 PM
org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties
INFO: Read dataimport.properties
Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM
org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6]
Nov 23, 2011 4:11:27 PM

Re: Tidying files after optimize. Is a service restart mandatory?

2011-11-28 Thread Shawn Heisey


On 11/28/2011 3:26 AM, Jones, Graham wrote:

Hello

Brief question: How can I clean-up excess files after performing optimize 
without restarting the Tomcat service?

Detail follows:

I've been running several SOLR cores for approx 12 months and have recently 
noticed the disk usage of one of them is growing considerably faster than the 
rate at which documents are being added.

- 1,200,000 docs 12 months ago used a 45 GB index
- 1,700,000 docs today use a 87 GB index
- There may have been _some_ deletions, almost certainly100,000
- The documents are of a broadly uniform style, approx 1000 words

So, approximately 45% growth in documents had grown the disk usage by approx 
100%.

I took a server out of production (I've 1 master  7 slaves) and did the 
following.
I ran http://server/corename/update?stream.body=optimize/  on this core which 
added 49.4 GB to the index folder No previously existing files were deleted I 
restarted the Tomcat service ONLY the files generated by the optimize remained. All 
older files were deleted.

This is the result I want, but not quite the method I'd prefer. How can I get 
to this position without restarting the service?


Based on this description, it seems likely that you are running Solr on 
Windows.  On Windows, if you have a file open for any reason (even just 
reading) it's not possible to delete that file.  Solr keeps the old 
index files open to serve queries until the new index is fully committed 
and ready to take over, which can often be quite a while in software terms.


On Unix/Linux, deleting a file just removes the link to that file in the 
filesystem directory.  When the last link is gone, the space is 
reclaimed.  When a program opens a file, the OS creates an internal link 
to that file.  If you delete that file while it's still open, it is 
still there, but only accessible via the internal link.  This is what 
happens during an optimize - the files are removed from the directory, 
but part of Solr still has them open, until the newly created index is 
completely online and all queries to the old one are complete.  Once 
they are closed, the OS reclaims the space.  I'm fairly sure that there 
is little communication between the processes that serve queries and the 
processes that update and merge the index.


I've checked previous messages on this.  If you can arrange to run the 
optimize a second time before any documents are added or deleted, it 
will complete instantaneously and the extra files will be deleted.  If 
the index is changed at all between the two optimizes, it won't really 
help, as you'll have a new set of old files that won't get deleted.


I am not in a position to test it, but it's possible that issuing a 
RELOAD command to the CoreAdmin might also take care of deleting the old 
files.  I'm pretty sure that such an action is potentially disruptive, 
but in my experience, the index is back online within a second or two, 
much much faster than a full restart.


http://wiki.apache.org/solr/CoreAdmin#RELOAD

This has been a known problem for quite a while, but I do not believe 
that it is a major priority for most Solr users.  Most people I've seen 
posting to this list do not run on Windows.  I found the following bug 
filed on Solr:


https://issues.apache.org/jira/browse/SOLR-1691

Thanks,
Shawn

Re: turning off solr server verbosity


 thanks. Is this not then a jetty setting? I'll search for
 that.

I don't use jetty but there is a logging section here :

http://wiki.apache.org/solr/SolrJetty

PatternTokenizer failure

2011-11-28 Thread Jay Luker

Hi all,

I'm trying to use PatternTokenizer and not getting expected results.
Not sure where the failure lies. What I'm trying to do is split my
input on whitespace except in cases where the whitespace is preceded
by a hyphen character. So to do this I'm using a negative look behind
assertion in the pattern, e.g. (?!-)\s+.

Expected behavior:
foo bar - [foo,bar] - OK
foo \n bar - [foo,bar] - OK
foo- bar - [foo- bar] - OK
foo-\nbar - [foo-\nbar] - OK
foo- \n bar - [foo- \n bar] - FAILS

Here's a test case that demonstrates the failure:

public void testPattern() throws Exception {
MapString,String args = new HashMapString, String();
args.put( PatternTokenizerFactory.GROUP, -1 );
args.put( PatternTokenizerFactory.PATTERN, (?!-)\\s+ );
Reader reader = new StringReader(blah \n foo bar- 
baz\nfoo-\nbar-
baz foo- \n bar);
PatternTokenizerFactory tokFactory = new PatternTokenizerFactory();
tokFactory.init( args );
TokenStream stream = tokFactory.create( reader );
assertTokenStreamContents(stream, new String[] { blah, foo,
bar- baz, foo-\nbar- baz, foo- \n bar });
}

This fails with the following output:
org.junit.ComparisonFailure: term 4 expected:foo- [\n bar] but was:foo- []

Am I doing something wrong? Incorrect expectations? Or could this be a bug?

Thanks,
--jay

DirectSolrSpellChecker on request specified field.

2011-11-28 Thread Phil Hoy

Hi,

Can the DirectSolrSpellChecker be used for autosuggest but defer to request 
time the name of the field to use to create the dictionary. That way I don't 
have to define spellcheckers specific to each field which for me is not really 
possible as the fields I wish to spell check are DynamicFields. 

I could copy all dynamic fields into a 'spellcheck' field but then I could get 
false suggestions if I use it to get suggestions for a particular dynamic field 
where a term returned derives from a different field. 

Phil

Re: [newbie] solrj SolrQuery indent response

2011-11-28 Thread Erick Erickson

I'm not sure what you're really after here. Indent how?

The indent parameter is to make the reply readable, it really
has nothing to do with printing the query.

Could you show an example of what you want for output?

Best
Erick

On Mon, Nov 28, 2011 at 8:42 AM, halil halil.a...@gmail.com wrote:
 I step one more. but still no indent. I wrote below code segment

  query.setQuery( marka_s:atak* )
            .setFacet(true)
            .setParam(indent, on)
            ;

 and here is the resulted query string

 q=marka_s%3Aatak*facet=trueindent=on

 -halil agin.

 On Mon, Nov 28, 2011 at 3:07 PM, halil halil.a...@gmail.com wrote:

 Hi List,

 I am new to Solr and lucene world. I have a simple question. I wrote below
 code segment and it works.

 public class SolrjTest {


     public static void main(String[] args) throws MalformedURLException,
 SolrServerException{
         ClassPathXmlApplicationContext c = new
 ClassPathXmlApplicationContext(/patrades-search-solrj-test-beans.xml);
         SolrServer server = (SolrServer)c.getBean(solrServer);
         SolrQuery query = new SolrQuery();
         query.setQuery( *:* )
             .setFacet(true)
             ;
         QueryResponse rsp = server.query( query );
         System.err.println(rsp.toString());
     }
 }


 I want to indent the response string, but couldnt find any answer. I
 looked at book, mail archive and google.  Most relevant link is below

 http://wiki.apache.org/solr/SimpleFacetParameters

 but i dont know how to use it. Api is hard to read by the way.

 regards,

 -Halil AĞIN

Re: how to apply fuzzy search with slop

2011-11-28 Thread Erick Erickson

Interestingly, Ahmet Arslan just answered a virtually identical
question:


It is possible with this plugin.
https://issues.apache.org/jira/browse/SOLR-1604;

Best
Erick

On Mon, Nov 28, 2011 at 9:09 AM, vrpar...@gmail.com vrpar...@gmail.com wrote:
 Hello all,

 i want to search on phrase with fuzzy,  e.g.  q=word1 word2~

 also want to apply slop for both words in phrase

 q=(word1 word2~)~2     doesn't work?

 how can i apply same?





 Thanks,
 Vishal Parekh

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-apply-fuzzy-search-with-slop-tp3542286p3542286.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: DirectSolrSpellChecker on request specified field.

2011-11-28 Thread Robert Muir

technically it could? I'm just not sure if the current spellchecking
apis allow for it? But maybe someone has a good idea on how to easily
expose this.

I think its a good idea.

Care to open a JIRA issue?

On Mon, Nov 28, 2011 at 1:31 PM, Phil Hoy p...@friendsreunited.co.uk wrote:
 Hi,

 Can the DirectSolrSpellChecker be used for autosuggest but defer to request 
 time the name of the field to use to create the dictionary. That way I don't 
 have to define spellcheckers specific to each field which for me is not 
 really possible as the fields I wish to spell check are DynamicFields.

 I could copy all dynamic fields into a 'spellcheck' field but then I could 
 get false suggestions if I use it to get suggestions for a particular dynamic 
 field where a term returned derives from a different field.

 Phil






-- 
lucidimagination.com

Re: Faceting is not Using Field Value Cache . . ?


: To Erick's Point: Can you be more specific then 'certain circumstances'?
: 
: Can anyone provide an example of when fieldValueCache would be used?

either FC and FVC are used most of the time -- which one is used depends 
on wether the field is multivalued or not, and if it's tokenized or not: 
ie: max 1 term per doc == FC, else FVC.

The most of the time depends on facet.method...

https://wiki.apache.org/solr/SimpleFacetParameters#facet.method

...if the enum method is used, them the filterCache is used.

Yonik discussed a lot of these subtleties in his facet talk @ EuroCon...

http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/many-facets-apache-solr



: Christopher
: 
: 
: 
: On 2:59 PM, Erick Erickson wrote:
:  In addition to Samuel's comment, the filterCache is also used under
:  certain circumstances
:  
:  Best
:  Erick
:  
:  2011/11/22 Samuel García Martínezsamuelgmarti...@gmail.com:
:   AFAIK, FieldValueCache is only used for faceting on tokenized fields.
:   Maybe, are you getting confused with FieldCache (
:   
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/FieldCache.html)?
:   This is used for common facets (using facet.methodü and not tokenized
:   fields).
:   
:   This makes any sense for you?
:   
:   On Tue, Nov 22, 2011 at 7:21 PM,
:   CRBsub.scripti...@metaheuristica.comwrote:
:   
:Seeing something odd going on with faceting . . . we execute facets with
:every query and yet the fieldValueCache is not being used:
:
:name:  fieldValueCache
:class:  org.apache.solr.search.**FastLRUCache
:version:  1.0
:description:  Concurrent LRU Cache(maxSize000, initialSize,
:minSize???00, acceptableSize•00, cleanupThreadúlse)
:stats: lookups : 0
:hits : 0
:hitratio : 0.00
:inserts : 0
:evictions : 0
:size : 0
:warmupTime : 0
:cumulative_lookups : 0
:cumulative_hits : 0
:cumulative_hitratio : 0.00
:cumulative_inserts : 0
:cumulative_evictions : 0
:
:I was under the impression the fieldValueCache  was an implicit cache
:(if
:you don't define it, it will still exist).
:
:We are running Solr v3.3 (and NOT using {!cacheúlse}).
:
:Thoughts?
:
:   
:   
:   --
:   Un saludo,
:   Samuel García.
:   
: 
: 

-Hoss

RE: DirectSolrSpellChecker on request specified field.

2011-11-28 Thread Phil Hoy

Added issue: https://issues.apache.org/jira/browse/SOLR-2926
Please let me know if more information needs adding to JIRA.

Phil

-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: 28 November 2011 19:32
To: solr-user@lucene.apache.org
Subject: Re: DirectSolrSpellChecker on request specified field.

technically it could? I'm just not sure if the current spellchecking
apis allow for it? But maybe someone has a good idea on how to easily
expose this.

I think its a good idea.

Care to open a JIRA issue?

On Mon, Nov 28, 2011 at 1:31 PM, Phil Hoy p...@friendsreunited.co.uk wrote:
 Hi,

 Can the DirectSolrSpellChecker be used for autosuggest but defer to request 
 time the name of the field to use to create the dictionary. That way I don't 
 have to define spellcheckers specific to each field which for me is not 
 really possible as the fields I wish to spell check are DynamicFields.

 I could copy all dynamic fields into a 'spellcheck' field but then I could 
 get false suggestions if I use it to get suggestions for a particular dynamic 
 field where a term returned derives from a different field.

 Phil

-- 
lucidimagination.com

__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__

Re: DirectSolrSpellChecker on request specified field.

2011-11-28 Thread Robert Muir

On Mon, Nov 28, 2011 at 4:36 PM, Phil Hoy p...@friendsreunited.co.uk wrote:
 Added issue: https://issues.apache.org/jira/browse/SOLR-2926
 Please let me know if more information needs adding to JIRA.

 Phil


Thanks, I'll followup on the issue



-- 
lucidimagination.com

Re: Huge Performance: Solr distributed search

2011-11-28 Thread Artem Lokotosh

Problem has been resolved. My disk subsystem been a bottleneck for quick search.
I put my indexes to RAM and I see very nice QTimes :)
Sorry for your time, guys.

On Mon, Nov 28, 2011 at 4:02 PM, Artem Lokotosh arco...@gmail.com wrote:
 Hi all again. Thanks to all for your replies.

 On this weekend I'd made some interesting tests, and I would like to  share
 it with you.


 First of all I made speed test of my hdd:

 root@LSolr:~# hdparm -t /dev/sda9


 /dev/sda9:

  Timing buffered disk reads:  146 MB in  3.01 seconds =  48.54 MB/sec


 Then with iperf I had tested my network:

 [  4]  0.0-18.7 sec  2.00 GBytes    917 Mbits/sec


 Then, I tried to post my quesries using shard parameter with one

 shard, so my queries were like:

 http://localhost:8080/solr1/select/?q=(test)qt=requestShards
  http://localhost:8080/solr1/select/?q=%28test%29qt=requestShards

 where requestShards is:

 requestHandler name=requestShards class=solr.SearchHandler
 default=false

  lst name=defaults

   str name=echoParamsexplicit/str

   int name=rows10/int

   str name=shards127.0.0.1:8080/solr1
  http://127.0.0.1:8080/solr1/str

  /lst

 /requestHandler


 Maybe its not correct, but:

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(genuflections)qt=requestShardsrows=2000}status=0
 QTime=6525

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(tunefulness)qt=requestShardsrows=2000}
  status=0 QTime=20170

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(societal)qt=requestShardsrows=2000}
  status=0 QTime=44958

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(euchre's)qt=requestShardsrows=2000}
  status=0 QTime=32161

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(monogram's)qt=requestShardsrows=2000}
  status=0 QTime=85252


 When I posted similar queries direct to solr1 without requestShards I had:

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(reopening)rows=2000}
  hits=712 status=0 QTime=10

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(housemothers)rows=2000}
  hits=0 status=0 QTime=446

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(harpooners)rows=2000}
  hits=76 status=0 QTime=399

 INFO: [] webapp=/solr1 path=/select/
  params={fl=*,scoreident=truestart=0q=(coaxing)rows=2000} hits=562
  status=0 QTime=2820

 INFO: [] webapp=/solr1 path=/select/
  params={fl=*,scoreident=truestart=0q=(superstar's)rows=2000}  hits=4748
 status=0 QTime=672

 INFO: [] webapp=/solr1 path=/select/
  params={fl=*,scoreident=truestart=0q=(sedateness's)rows=2000}  hits=136
 status=0 QTime=923

 INFO: [] webapp=/solr1 path=/select/
  params={fl=*,scoreident=truestart=0q=(petrolatum)rows=2000} hits=8
  status=0 QTime=6183

 INFO: [] webapp=/solr1 path=/select/
  params={fl=*,scoreident=truestart=0q=(everlasting's)rows=2000}
  hits=1522 status=0 QTime=2625


 And finally I found a bug:

 https://issues.apache.org/jira/browse/SOLR-1524
  https://issues.apache.org/jira/browse/SOLR-1524

 Why is no activity on it? Its not actual?


 Today I wrote a bash script:

 #!/bin/bash

 ds=$(date +%s.%N)

 echo START: $ds  ./data/east_2000

 curl
  http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=(east)rows=2000
  http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=%28east%29rows=2000-s
  -s-H 'Content-type:text/xml; charset=utf-8'  ./data/east_2000

 de=$(date +%s.%N)

 ddf=$(echo $de - $ds | bc)

 echo END: $de  ./data/east_2000

 echo DIFF: $ddf  ./data/east_2000


 Before runing a Tomcat I'd dropped cache:

 root@LSolr:~# echo 3  /proc/sys/vm/drop_caches


 Then I started Tomcat and run the script. Result is bellow:

 START: 1322476131.783146691

 ?xml version=1.0 encoding=UTF-8?

 response

 lst name=responseHeaderint name=status0/intint

 name=QTime125/intlst name=paramsstr

 name=fl*,score/strstr name=identtrue/strstr

 name=start0/strstr name=q(east)/strstr

 name=rows2000/str/lst/lstresult name=response

 numFound=21439 start=0 maxScore=4.387605

 ...

 /response

 END: 1322476180.262770244

 DIFF: 48.479623553


 File size is:

 root@LSolr:~# ls -l | grep east

 -rw-r--r-- 1 root root 1063579 Nov 28 12:29 east_2000


 I'm using nmon to monitor a HDD activity. It was near 100% when I run  the
 script. But when I tried to run it again the result was:

 DIFF: .063678709

 and no much HDD activity at nmon.


 I can't undestand one thing: is this my huge hardware such as slow HDDor its
 a Solr troubles?

 And why is no activity on bug
  https://issues.apache.org/jira/browse/SOLR-1524
  https://issues.apache.org/jira/browse/SOLR-1524  since 27/Oct/09 07:19?


 On 11/25/2011 10:02 AM, Dmitry Kan wrote:

 45 000 000 per shard approx, Tomcat, caching was tweaked in solrconfig and
 shard given 12GB of RAM max.

 !-- Filter Cache

          Cache used by SolrIndexSearcher for filters (DocSets),

Re: help no segment in my lucene index!!!

2011-11-28 Thread Michael McCandless

On Mon, Nov 28, 2011 at 10:49 AM, Roberto Iannone
iann...@crmpa.unisa.it wrote:
 Hi Michael,

 thx for your help :)

You're welcome!

 2011/11/28 Michael McCandless luc...@mikemccandless.com

 Which version of Solr/Lucene were you using when you hit power loss?

 I'm using Lucene 3.4.

Hmm, which OS/filesystem?  Unexpected power loss (nor OS crash, JVM
crash) in 3.4.0 should not cause corrumption, as long as the IO system
properly implements fsync.

 There was a known bug that could allow power loss to cause corruption,
 but this was fixed in Lucene 3.4.0.

 Unfortunately, there is no easy way to recreate the segments_N file...
 in principle it should be possible and maybe not too much work but
 nobody has created such a tool yet, that I know of.

 some hints about how could I write this code by myself ?

Well, you'd need to take a listing of all files, aggregate those into
unique segment names, open a SegmentReader on each segment name, and
from that SegmentReader reconstruct what you can (numDocs, delCount,
isCompoundFile, etc.) about each SegmentInfo.  Add all the resulting
SegmentInfo instances into a new SegmentInfos and write it to the
directory.

Was the index newly created in 3.4.x?  If not (if you inherited
segments from earlier Lucene versions) you might also have to
reconstruct shared doc stores (stored fields, term vectors) files,
which will be trickier...

Mike

Re: conditionally update document on unique id

2011-11-28 Thread chadsteele.com

I wanted something similar for a file crawler/uploader in c#, but don't even
want to upload the document if it exists... I'm currently querying solr
first... Is this is optimal, silly, or otherwise? 

 var url = http://solr/select?q=myid.docrows=0;;
 var txt = webclient.DownloadString(url);

if (txt.Contains(numFound=\0\)) 
{
//upload the file
}

--
View this message in context: 
http://lucene.472066.n3.nabble.com/conditionally-update-document-on-unique-id-tp3119302p3543866.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: conditionally update document on unique id

2011-11-28 Thread chadsteele.com

oops... the query looks more like this

http://solr/select?q=*id:*myid.docrows=0

--
View this message in context: 
http://lucene.472066.n3.nabble.com/conditionally-update-document-on-unique-id-tp3119302p3543871.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index a null text field


: I am indexing a table that has a field by the name of solr_keywords of type
: text in mysql. And it contains null values also. While creating index in
: solr, this field is not getting indexed.

what exactly is the problem you are seeing?

If your documents are being indexed w/o error, but some documents with 
null in the solr_keywords database field are not getting any (stored or 
indexed) values in the resulting solr index, then it sounds like 
everything is working properly.

There is no concept of a null value in a Solr index.  Documents either 
have a field value or they do not -- if you want to index the string 
null (or any other special string for that matter) when a document has 
no valued for a field, then there a few differnet wyas to do that.

this simplest in your case would probably be adding a default property 
on the field in your schema, or using something like the COALESCE 
function in your SQL.


-Hoss

Incomplete logging on local machine

2011-11-28 Thread daamsie

I'm stumped. For some reason on my local set up, Solr is not logging all that
it should. None of the searches, updates, errors are logged at all. 

I just did a fresh install of Tomcat 7, Solr 3.5 and it's all the same. No
logging. The *only* thing I change to the default configuration is the
location of my Solr Home in web.xml 

When I go into solr/admin/logging, I can see the problem. Only three solr
logging options are available. 

org.apache.solr
org.apache.solr.servlet
org.apache.solr.servlet.LogLevelSelection

That's all. And they're all set to INFO. 

If I compare that to the production server, I can see there that there's
several dozen other Solr logging categories.

Why would these not be available to me on my local machine? I've combed the
internet for anything, and no-one else seems to have this issue :(

I'm running on a Mac (10.5.8), Tomcat 7, Solr 3.5

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Incomplete-logging-on-local-machine-tp3543960p3543960.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: UpdateRequestProcessor - processCommit


: I'm assuming the processCommit method is called for each
: UpdateRequestProcessor chain class when the records are being commited to
: the Lucene index.

Not exactly.

RequestHandlers that want to modify the index do so by asking the SolrCore 
for a processor chain (either by name or just get the default), and then 
they execute methods on that chain passing in instances of UpdateCommand 
objects that model the type of index update they want to perform.

The first element in the chain decides if/when to pass the UpdateCommand 
on to subsequent members of the chain, and most processor chains include
a RunUpdateProcessorFactory instance, which is responsible for actually 
performing the update on the UpdateHandler used by the SolrCore.

Which means:
 1) there is no garuntee that processCommit is called on every 
UpdateRequestProcessor in the chain -- that's entirely dependent on what 
comes before RunUpdateProcessorFactory in the chain.
 2) RunUpdateProcessorFactory itself is what tells the underlying 
IndexWriter to commit (ie: processCommit is not a callback method done 
after the underlying commit happens -- you may be comfusing 
UpdateRequestProcessor with the SolrEventListener API)

: I'm debugging the processor chain using the debug functionality in the
: dataimport.jsp page, and I have selected verbose and commit as options.
: When I import 10 records,
: the processAddd methods are getting called, but the processCommit methods
: aren't.
...
: I'm using SOLR 1.4

Hmmm... I can confirm the behavior you describe in Solr 1.4.1, but using 
Solr 3.5.0 I can see that the processCommit method is definitely getting 
called by DIH when using the Debug Now button of the DIH console when 
the commit checkbox is checked (FWIW: I tested using the 'rss' core 
of example-DIH/solr and watching the logs for the log messages from 
LogUpdateProcessorFactory) 

So please consider upgrading -- besides this evident fix, there have been 
a *TON* of other bug fixes and other improvements between Solr 1.4 and 
Solr 3.5.



-Hoss

Re: solrQueryParser defaultOperator


: are you using either dismax or edismax? They don't respect
: the defaultOperator. Use the mm param to get this kind
: of behavior.

FWIW: that has not been tru since Solr 3.1 ... mm's default value is now 
based on q.op (which get's it's default from defaultOperator in the 
schema.xml) 

By Erick's point is still valid: we need all the details of the request 
you are executing, and what the request handler config looks like, and 
what the debugQuery output for that request lookse like, etc. before we 
can make a guess as to why you are getting the results you are getting.


-Hoss

Re: make fuzzy search for phrase