Re: logical relation among filter queries

2011-03-08 Thread cyang2010
Right, i can combine that into one fq query.  

The only thing is that i want to reduce the cache size.  

I remember this is what i read from wiki.  

fq=rating:R (filter query cache A)
fq=rating:PG-13  (filter query cache B)
fq=rating:(R O PG-13)  --  (It won't be able to leverage the filter query
cache A and B above, instead it will create another whole new filter query
cache C)

fq=rating:Rfq=rating:PG-13  -- (Will be able to leverage filter query cache
A and B)


I will have a lot of queries with different combination of the values out of
the same field, rating.   Therefore, i thought if the logical relation among
filter query is OR, it will control the number of distinct cache to be
distinct number of rating value.  


Does it matter?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/logical-relation-among-filter-queries-tp2649142p2649904.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: StreamingUpdateSolrServer

2011-03-08 Thread Lance Norskog
Yes. Each thread uses its own connection, and each becomes a new
thread in the servlet container.

On Mon, Mar 7, 2011 at 2:54 AM, Isan Fulia isan.fu...@germinait.com wrote:
 Hi all,
 I am using StreamingUpdateSolrServer with queuesize = 5 and threadcount=4
 The no. of connections created are same as threadcount.
 Is it that it creates a new connection for every thread.


 --
 Thanks  Regards,
 Isan Fulia.




-- 
Lance Norskog
goks...@gmail.com


Synonyms question

2011-03-08 Thread Darx Oman
Hi guys

How to put this in synonyms.txt

US

USA

United States of America


Re: Synonyms question

2011-03-08 Thread Jan Høydahl
http://lmgtfy.com/?q=solr+synonym

(First hit gives many examples)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 8. mars 2011, at 10.06, Darx Oman wrote:

 Hi guys
 
 How to put this in synonyms.txt
 
 US
 
 USA
 
 United States of America



How to Index and query URLs as fields

2011-03-08 Thread Robert Krüger
Hi,

I've run into problems trying to achieve a seemingly simple thing. I'm indexing 
a bunch of files (local ones and potentially some accessible via other 
protocols like http or ftp) and have an index field with the url to the file, 
e.g. file:/home/foo/bar.pdf. Now I want to perform two simple types of 
queries on this, i.e. retrieve all file records located under a certain path 
(e.g. file:/home/foo/*) or find the file record for an exact URL.

What I naively tried was to index the file URL in a field (fileURL) of type 
string and simply perform queries like

fileURL:file\:/home/foo/*

and 

fileURL:file\:/home/foo/bar.pdf

and neither one returned results.

the type is defined as

fieldtype name=string  class=solr.StrField sortMissingLast=true 
omitNorms=true/

and the field as

field name=fileURL   type=string   indexed=true  stored=true  
multiValued=false / 

I am using solr 1.4.1 and use solrj to do the indexing and querying.

This seems like a rather basic requirement and obviously I am doing something 
wrong. I didn't find anything in the docs or the mailing list archive so far.

Any help, hints, pointers would be appreciated.

Robert



Re: Use of multiple tomcat instance and shards.

2011-03-08 Thread Tommaso Teofili
Hi,
from my experience when you have to scale in the number of documents it's
good idea to use shards (so one schema and N shards containing
(1/N)*total#docs) while if the requirement is granting high query volume
response you could get a significant boost from replicating the same index
on 2 or more machines and do load balancing on those machines (consider that
in most cases a round robin LB works pretty well).
So I think you should look inside the replication wiki page [1].
To check your Tomcat installation the related wiki page may also be useful
[2].
My 2 cents,
Tommaso


[1] : http://wiki.apache.org/solr/SolrReplication
[2] : http://wiki.apache.org/solr/SolrTomcat



2011/3/8 rajini maski rajinima...@gmail.com

  In order to increase the Java heap memory, I have only 2gb ram… so
 my default memory configuration is --JvmMs 128 --JvmMx 512  . I have the
 single solr data index upto 6gb. Now if I am trying to fire a search very
 often on this data index, after sometime I find an error as java heap space
 out of memory error and search does not return results. What are the
 possibilities to fix this error? (I cannot increase heap memory) How about
 having another tomcat instance running (how this works? )or is it by
 configuring shards? What is that might help me fix this search fail?


 Rajani



Re: Use of multiple tomcat instance and shards.

2011-03-08 Thread Jan Høydahl
Having 2Gb physical memory on the box I would allocate -Xmx1024m to Java as a 
starting point.

The other thing you could do is try to trim your config to use less memory. Are 
you using many facets? String sorts? Wildcards? Fuzzy? Storing or returning 
more fields than needed?

http://wiki.apache.org/solr/SolrPerformanceFactors#RAM_Usage_Considerations

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 8. mars 2011, at 07.40, rajini maski wrote:

  In order to increase the Java heap memory, I have only 2gb ram… so
 my default memory configuration is --JvmMs 128 --JvmMx 512  . I have the
 single solr data index upto 6gb. Now if I am trying to fire a search very
 often on this data index, after sometime I find an error as java heap space
 out of memory error and search does not return results. What are the
 possibilities to fix this error? (I cannot increase heap memory) How about
 having another tomcat instance running (how this works? )or is it by
 configuring shards? What is that might help me fix this search fail?
 
 
 Rajani



Difference between Faceting Fieldcollapsing

2011-03-08 Thread Isha Garg

Hi,

 Can anyone explain  in which scenario faceting  field collapsing is 
used .What is the difference between these two.



Best Regards!
Isha


Re: How to Index and query URLs as fields

2011-03-08 Thread Robert Krüger

My mistake. The error turned out to be somewhere else and the described 
approach seems to work.

Sorry for the wasted bandwidth.


On Mar 8, 2011, at 11:06 AM, Robert Krüger wrote:

 Hi,
 
 I've run into problems trying to achieve a seemingly simple thing. I'm 
 indexing a bunch of files (local ones and potentially some accessible via 
 other protocols like http or ftp) and have an index field with the url to the 
 file, e.g. file:/home/foo/bar.pdf. Now I want to perform two simple types 
 of queries on this, i.e. retrieve all file records located under a certain 
 path (e.g. file:/home/foo/*) or find the file record for an exact URL.
 
 What I naively tried was to index the file URL in a field (fileURL) of type 
 string and simply perform queries like
 
 fileURL:file\:/home/foo/*
 
 and 
 
 fileURL:file\:/home/foo/bar.pdf
 
 and neither one returned results.
 
 the type is defined as
 
 fieldtype name=string  class=solr.StrField sortMissingLast=true 
 omitNorms=true/
 
 and the field as
 
 field name=fileURL   type=string   indexed=true  stored=true  
 multiValued=false / 
 
 I am using solr 1.4.1 and use solrj to do the indexing and querying.
 
 This seems like a rather basic requirement and obviously I am doing something 
 wrong. I didn't find anything in the docs or the mailing list archive so far.
 
 Any help, hints, pointers would be appreciated.
 
 Robert
 



Re: Difference between Faceting Fieldcollapsing

2011-03-08 Thread Jan Høydahl
Faceting is returned independently of your result set, telling you how many 
documents contain each facet.

Field collapsing / grouping modifies your result set to roll up multiple hits 
sharing the same collapse key, much like Google does to hide more results from 
same site.

You may use a field both for faceting and collapsing, but for different reasons.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 8. mars 2011, at 12.50, Isha Garg wrote:

 Hi,
 
 Can anyone explain  in which scenario faceting  field collapsing is used 
 .What is the difference between these two.
 
 
 Best Regards!
 Isha



Re: logical relation among filter queries

2011-03-08 Thread Erick Erickson
The filter queries are interpreted to be intersection. That is, each
fq clause is intersected with the result set. There's no way I know
of to combine separate filter queries with an OR operator.

Best
Erick

On Tue, Mar 8, 2011 at 2:59 AM, cyang2010 ysxsu...@hotmail.com wrote:
 Right, i can combine that into one fq query.

 The only thing is that i want to reduce the cache size.

 I remember this is what i read from wiki.

 fq=rating:R         (filter query cache A)
 fq=rating:PG-13  (filter query cache B)
 fq=rating:(R O PG-13)  --  (It won't be able to leverage the filter query
 cache A and B above, instead it will create another whole new filter query
 cache C)

 fq=rating:Rfq=rating:PG-13  -- (Will be able to leverage filter query cache
 A and B)


 I will have a lot of queries with different combination of the values out of
 the same field, rating.   Therefore, i thought if the logical relation among
 filter query is OR, it will control the number of distinct cache to be
 distinct number of rating value.


 Does it matter?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/logical-relation-among-filter-queries-tp2649142p2649904.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Use of multiple tomcat instance and shards.

2011-03-08 Thread rajini maski
I have considered the RAM usage points of solr_wiki and yes,I have many
facet queries fired every time and might be this is one of the reason .. I
did give the Xmx-1024m and the error occurred but it was 2-3 times after
many search queries fired.. But then the system slows  down . So I needed
any alternative.
*
*
Tommaso, Please can you share any link that explains me about how to enable
and do load balancing on the machines that you did mention above..?





On Tue, Mar 8, 2011 at 4:11 PM, Jan Høydahl jan@cominvent.com wrote:

 Having 2Gb physical memory on the box I would allocate -Xmx1024m to Java as
 a starting point.

 The other thing you could do is try to trim your config to use less memory.
 Are you using many facets? String sorts? Wildcards? Fuzzy? Storing or
 returning more fields than needed?

 http://wiki.apache.org/solr/SolrPerformanceFactors#RAM_Usage_Considerations

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 On 8. mars 2011, at 07.40, rajini maski wrote:

   In order to increase the Java heap memory, I have only 2gb ram…
 so
  my default memory configuration is --JvmMs 128 --JvmMx 512  . I have the
  single solr data index upto 6gb. Now if I am trying to fire a search very
  often on this data index, after sometime I find an error as java heap
 space
  out of memory error and search does not return results. What are the
  possibilities to fix this error? (I cannot increase heap memory) How
 about
  having another tomcat instance running (how this works? )or is it by
  configuring shards? What is that might help me fix this search fail?
 
 
  Rajani




Re: Use of multiple tomcat instance and shards.

2011-03-08 Thread Erick Erickson
Have you looked at your cache usage statistics from the admin page? That should
give you some sense of whether your caches are experiencing evictions, which
would also lead to excessive garbage collections. That should give you some
additional information to work with.

Also, what version of Solr are you using? 1.4.1?

Best
Erick

On Tue, Mar 8, 2011 at 7:52 AM, rajini maski rajinima...@gmail.com wrote:
 I have considered the RAM usage points of solr_wiki and yes,I have many
 facet queries fired every time and might be this is one of the reason .. I
 did give the Xmx-1024m and the error occurred but it was 2-3 times after
 many search queries fired.. But then the system slows  down . So I needed
 any alternative.
 *
 *
 Tommaso, Please can you share any link that explains me about how to enable
 and do load balancing on the machines that you did mention above..?





 On Tue, Mar 8, 2011 at 4:11 PM, Jan Høydahl jan@cominvent.com wrote:

 Having 2Gb physical memory on the box I would allocate -Xmx1024m to Java as
 a starting point.

 The other thing you could do is try to trim your config to use less memory.
 Are you using many facets? String sorts? Wildcards? Fuzzy? Storing or
 returning more fields than needed?

 http://wiki.apache.org/solr/SolrPerformanceFactors#RAM_Usage_Considerations

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 On 8. mars 2011, at 07.40, rajini maski wrote:

           In order to increase the Java heap memory, I have only 2gb ram…
 so
  my default memory configuration is --JvmMs 128 --JvmMx 512  . I have the
  single solr data index upto 6gb. Now if I am trying to fire a search very
  often on this data index, after sometime I find an error as java heap
 space
  out of memory error and search does not return results. What are the
  possibilities to fix this error? (I cannot increase heap memory) How
 about
  having another tomcat instance running (how this works? )or is it by
  configuring shards? What is that might help me fix this search fail?
 
 
  Rajani





Re: Use of multiple tomcat instance and shards.

2011-03-08 Thread Tommaso Teofili
Hi Rajani,

i


2011/3/8 rajini maski rajinima...@gmail.com


 Tommaso, Please can you share any link that explains me about how to enable
 and do load balancing on the machines that you did mention above..?




if you're querying Solr via SolrJ [1] you could use the LBHttpSolrServer [2]
otherwise, if you still want Solr to be responsible for load balancing,
implement a custom handler which wraps it (see [3]).
Consider also that this load balancing often gets done using a VIP [4] or an
Apache HTTP server in front of Solr.
Hope this helps,
Tommaso


[1] : http://wiki.apache.org/solr/Solrj
[2] : http://wiki.apache.org/solr/LBHttpSolrServer
[3] : http://markmail.org/thread/25jrko5s7wlmzjf7
[4] : http://en.wikipedia.org/wiki/Virtual_IP_address


getting much double-Values from solr -- timeout

2011-03-08 Thread stockii
Hello.

i have 34.000.000 documents in my index and each doc have a field with a
double-value. i want the sum of these fields. i testet it with the
statscomponent but this is not usable. !! so i get all my values directly
from solr, from the index and with php-sum() i get my sum.

that works fine but, when a user search over really much documents (~
30.000), my skript need longer than 30 seconds and php skipped this.


how can i tune solr, to geht much faster this double-values from the index
!?

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 4GB Xmx
- Solr2 for Update-Request  - delta every 2 Minutes - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/getting-much-double-Values-from-solr-timeout-tp2650981p2650981.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Use of multiple tomcat instance and shards.

2011-03-08 Thread Tommaso Teofili
Just one more hint, I didn't mention it in the previous email since I
imagine the scenario you explained doesn't allow it but anyways you could
also check Solr Cloud and its distributed requests [1].
Cheers,
Tommaso

[1] : http://wiki.apache.org/solr/SolrCloud#Distributed_Requests

2011/3/8 Tommaso Teofili tommaso.teof...@gmail.com

 Hi Rajani,

 i


 2011/3/8 rajini maski rajinima...@gmail.com


 Tommaso, Please can you share any link that explains me about how to
 enable
 and do load balancing on the machines that you did mention above..?




 if you're querying Solr via SolrJ [1] you could use the LBHttpSolrServer
 [2] otherwise, if you still want Solr to be responsible for load balancing,
 implement a custom handler which wraps it (see [3]).
 Consider also that this load balancing often gets done using a VIP [4] or
 an Apache HTTP server in front of Solr.
 Hope this helps,
 Tommaso


 [1] : http://wiki.apache.org/solr/Solrj
 [2] : http://wiki.apache.org/solr/LBHttpSolrServer
 [3] : http://markmail.org/thread/25jrko5s7wlmzjf7
 [4] : http://en.wikipedia.org/wiki/Virtual_IP_address





Re: logical relation among filter queries

2011-03-08 Thread cyang2010
Erick,

Thanks for reply.

Is there anyway that i can instruct to combine seperate filter queries with
UNION result, without creating the 3rd filter query cache as I described
above?

If not, shall I give up using filter query for such scenario (where i query
the same field with multiple value using OR) and using normal solr query
instead?  At least solr query cache is lighter weighted than filter query
cache.  

What do you think?  Thanks,


Carole

--
View this message in context: 
http://lucene.472066.n3.nabble.com/logical-relation-among-filter-queries-tp2649142p2651639.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How to handle searches across traditional and simplifies Chinese?

2011-03-08 Thread Burton-West, Tom
This page discusses the reasons why it's not a simple one to one mapping

http://www.kanji.org/cjk/c2c/c2cbasis.htm

Tom
-Original Message-
 I have documents that contain both simplified and traditional Chinese 
 characters. Is there any way to search across them? For example, if someone 
 searches for 类 (simplified Chinese), I'd like to be able to recognize that 
 the equivalent character is 類 in traditional Chinese and search for 类 or 類 in 
 the documents


docBoost

2011-03-08 Thread Brian Lamb
Hi all,

I am using dataimport to create my index and I want to use docBoost to
assign some higher weights to certain docs. I understand the concept behind
docBoost but I haven't been able to find an example anywhere that shows how
to implement it. Assuming the following config file:

document
   entity name=animal
  dataSource=animals
  pk=id
  query=SELECT * FROM animals
field column=id name=id /
field column=genus name=genus /
field column=species name=species /
entity name=boosters
   dataSource=boosts
   query=SELECT boost_score FROM boosts WHERE animal_id = ${
animal.id}
  field column=boost_score name=boost_score /
/entity
  /entity
/document

How do I add in a docBoost score? The boost score is currently in a separate
table as shown above.


Re: Problem adding new requesthandler to solr branch_3x

2011-03-08 Thread Chris Hostetter

: 1.  Why the problem occurs (has something changed between 1.4.1 and 3x)?

Various pieces of code dealing with config parsing have changed since 
1.4.1 to be better about verifying that configs are meaningful ,ad 
reporting errors when unexpected things are encountered.  i'm not sure of 
the specific change, but the underlying point is: if 1.4.1 wasn't giving 
you an error for that syntax, it's because it was compleltey ignoring it.


-Hoss

Smart Pagination queries

2011-03-08 Thread javaxmlsoapdev
e.g. There are 4,000 solr documents that were found for a particular word
search. My app has entitlement rules applied to those 4,000 documents and
it's quite possible that user is only eligible to view 3,000 results out of
4K. This is achieved through post filtering application logic. 

My question related to solr pagination is :
In order to paint Next links app would have to know total number of
records that user is eligible for read. getNumFound() will tell me that
there are total 4K records that Solr returned. If there wasn't any
entitlement rules then it could have been easier to determine how many
Next links to paint and when user clicks on Next pass in start
position appropriately in solr query. Since I have to apply post filter as
and when results are fetched from Solr is there a better way to achieve
this? e.g. Because of post filtering I wouldn't know whether to paint Next
link until results for next links are pre-fetched and filtered.
Pre-fetching won't work as that would kill the performance and have no
meaning of Solr pagination. Any better suggestion?

Thanks,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Smart-Pagination-queries-tp2652273p2652273.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: -ignore words not working?

2011-03-08 Thread Chris Hostetter
: AND ((-title:men) AND (-keywords:men) AND (-description:men))
...
: As soon as I put in -field:value it yeilds no results... even though there
: are a ton of results that match the criteria :/

you didn't add -field:value ... you added (-field:value)

the parens are significant.

the parents create a boolean query, and inside that boolean query you have 
one clause which is purely negative.

a boolean query with all negative clauses by definittion matches nothing.

in your outher query, you have then made that boolean query mandatory 
(because of the AND) which means your outer query can't match anythign 
either.

removing the parens would probably work, or using a meme of (*:* 
-keywords:men) would probably work.

(Solr does a good job of helping you with pure negative queries at the 
top level of your syntax (ie: fq=-field:value) but it doesn't traverse 
the entire query looking for things that are structural valid but don't 
actually match anything ... that might have been your point when you wrote 
it)


-Hoss


Re: Error during auto-warming of key

2011-03-08 Thread Markus Jelsma
Anyone here with some thoughts on this issue?

 Hi,
 
 Yesterday's error log contains something peculiar:
 
  ERROR [solr.search.SolrCache] - [pool-29-thread-1] - : Error during auto-
 warming of key:+*:*
 (1.0/(7.71E-8*float(ms(const(1298682616680),date(sort_date)))+1.0))^20.0:ja
 va.lang.NullPointerException at
 org.apache.lucene.util.StringHelper.intern(StringHelper.java:36) at
 org.apache.lucene.search.FieldCacheImpl$Entry.init(FieldCacheImpl.java:27
 5) at
 org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:525)
 at
 org.apache.solr.search.function.LongFieldSource.getValues(LongFieldSource.j
 ava:57) at
 org.apache.solr.search.function.DualFloatFunction.getValues(DualFloatFuncti
 on.java:48) at
 org.apache.solr.search.function.ReciprocalFloatFunction.getValues(Reciproca
 lFloatFunction.java:61) at
 org.apache.solr.search.function.FunctionQuery$AllScorer.init(FunctionQuer
 y.java:123) at
 org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(Functio
 nQuery.java:93) at
 org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.jav
 a:297) at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:246)
 at org.apache.lucene.search.Searcher.search(Searcher.java:171)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java
 :651) at
 org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:5
 45) at
 org.apache.solr.search.SolrIndexSearcher.cacheDocSet(SolrIndexSearcher.java
 :520) at
 org.apache.solr.search.SolrIndexSearcher$2.regenerateItem(SolrIndexSearcher
 .java:296) at
 org.apache.solr.search.FastLRUCache.warm(FastLRUCache.java:168) at
 org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481)
 at org.apache.solr.core.SolrCore$2.call(SolrCore.java:1131)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at
 java.util.concurrent.FutureTask.run(FutureTask.java:138) at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j
 ava:886) at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
 908) at java.lang.Thread.run(Thread.java:619)
 
 
 Well, i use Dismax' bf parameter to boost very recent documents. I'm not
 using the queryResultCache or documentCache, only filterCache and Lucene
 fieldCache. I've check LUCENE-1890 but am unsure if that's the issue. Anyt
 thoughts on this one?
 
 https://issues.apache.org/jira/browse/LUCENE-1890
 
 Cheers,


Re: getting much double-Values from solr -- timeout

2011-03-08 Thread Jan Høydahl
Are you using shards or have everything in same index?

What problem did you experience with the StatsCompnent? How did you use it? I 
think the right approach will be to optimize StatsComponent to do quick sum()

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 8. mars 2011, at 16.52, stockii wrote:

 Hello.
 
 i have 34.000.000 documents in my index and each doc have a field with a
 double-value. i want the sum of these fields. i testet it with the
 statscomponent but this is not usable. !! so i get all my values directly
 from solr, from the index and with php-sum() i get my sum.
 
 that works fine but, when a user search over really much documents (~
 30.000), my skript need longer than 30 seconds and php skipped this.
 
 
 how can i tune solr, to geht much faster this double-values from the index
 !?
 
 -
 --- System 
 
 
 One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
 1 Core with 31 Million Documents other Cores  100.000
 
 - Solr1 for Search-Requests - commit every Minute  - 4GB Xmx
 - Solr2 for Update-Request  - delta every 2 Minutes - 4GB Xmx
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/getting-much-double-Values-from-solr-timeout-tp2650981p2650981.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: logical relation among filter queries

2011-03-08 Thread Erick Erickson
Can't really answer that question in the abstract. About all you can
really do is monitor your caches (the admin stats page helps) and
note if/when you start getting cache evictions and adjust then.

I really wouldn't worry about this unless and until you start getting
query slowdowns, just go ahead and use combined filter queries
instead (i.e. fq=(A OR B OR C)

Best
Erick

On Tue, Mar 8, 2011 at 12:15 PM, cyang2010 ysxsu...@hotmail.com wrote:
 Erick,

 Thanks for reply.

 Is there anyway that i can instruct to combine seperate filter queries with
 UNION result, without creating the 3rd filter query cache as I described
 above?

 If not, shall I give up using filter query for such scenario (where i query
 the same field with multiple value using OR) and using normal solr query
 instead?  At least solr query cache is lighter weighted than filter query
 cache.

 What do you think?  Thanks,


 Carole

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/logical-relation-among-filter-queries-tp2649142p2651639.html
 Sent from the Solr - User mailing list archive at Nabble.com.



NRT in Solr

2011-03-08 Thread Jae Joo
Hi,
Is NRT in Solr 4.0 from trunk? I have checkouted from Trunk, but could not
find the configuration for NRT.

Regards

Jae


two QueryHandler components in one schema?

2011-03-08 Thread Paul Libbrecht
hello list,

in my schema I have 

searchComponent name=query
 
class=org.curriki.solr.handlers.CurrikiSolrQueryComponent /

which, as I understand it, allows all requestHandlers to use my 
query-component. That is useful but I wonder if there's a way for me to have 
one request-handler that uses my query-component and another to use the default 
one?

Formulated, differently, my question is whether 
- search-components can be defined by name within the requestHandler element of 
the schema
- or whether a differently named query search-component would still be used as 
query-component

thanks in advance

paul

Re: two QueryHandler components in one schema?

2011-03-08 Thread Chris Hostetter

: in my schema I have 

First off, a bit of terminoligy clarification: Search COmponents are 
declarred in the solrconfig.xml file.  schema.xml is where you define 
what, inherently, the data in your index *is*.  solrocnfig.xml is where 
you define how you want people to be able to interact with the data in 
your index.

: Formulated, differently, my question is whether 
: - search-components can be defined by name within the requestHandler element 
of the schema
: - or whether a differently named query search-component would still be used 
as query-component

yes, and yes.

SearchHandler refrences Search Components by name, using the component 
list it is configured with.  So you can leave the name query for the 
default instance of QueryComponent and then give your custom component 
it's own name, and refer to it by name when configuring the 
SearchHandler's you want to use it...

http://wiki.apache.org/solr/SearchHandler
http://wiki.apache.org/solr/SearchComponent




-Hoss


Re: two QueryHandler components in one schema?

2011-03-08 Thread Paul Libbrecht

Le 8 mars 2011 à 23:03, Chris Hostetter a écrit :

 : in my schema I have 
 
 First off, a bit of terminoligy clarification: Search COmponents are 
 declarred in the solrconfig.xml file.  schema.xml is where you define 
 what, inherently, the data in your index *is*.  solrocnfig.xml is where 
 you define how you want people to be able to interact with the data in 
 your index.

Sorry, this is absolutely true. I should have said in my config.

 : Formulated, differently, my question is whether 
 : - search-components can be defined by name within the requestHandler 
 element of the schema
 : - or whether a differently named query search-component would still be used 
 as query-component
 
 yes, and yes.
 
 SearchHandler refrences Search Components by name, using the component 
 list it is configured with.  So you can leave the name query for the 
 default instance of QueryComponent and then give your custom component 
 it's own name, and refer to it by name when configuring the 
 SearchHandler's you want to use it...

So how do I define, for a given request-handler, a special query component?
I did not find in this in the schema.

paul

Re: two QueryHandler components in one schema?

2011-03-08 Thread Markus Jelsma
A request handler can have first-components and last-components and also just 
plain components. List all your stuff in components and voila. Don't forget to 
also add debug, facet and other default components if you need them.

 Le 8 mars 2011 à 23:03, Chris Hostetter a écrit :
  : in my schema I have
  
  First off, a bit of terminoligy clarification: Search COmponents are
  declarred in the solrconfig.xml file.  schema.xml is where you define
  what, inherently, the data in your index *is*.  solrocnfig.xml is where
  you define how you want people to be able to interact with the data in
  your index.
 
 Sorry, this is absolutely true. I should have said in my config.
 
  : Formulated, differently, my question is whether
  : - search-components can be defined by name within the requestHandler
  : element of the schema - or whether a differently named query
  : search-component would still be used as query-component
  
  yes, and yes.
  
  SearchHandler refrences Search Components by name, using the component
  list it is configured with.  So you can leave the name query for the
  default instance of QueryComponent and then give your custom component
  it's own name, and refer to it by name when configuring the
  SearchHandler's you want to use it...
 
 So how do I define, for a given request-handler, a special query component?
 I did not find in this in the schema.
 
 paul


Solr Hanging all of sudden with update/csv

2011-03-08 Thread danomano
Hi folks, I've been using solr for about 3 months.

Our Solr install is a single node, and we have been injecting logging data
into the solr server every couple of minutes, which each updating taking few
minutes.

Everything working fine until this morning, at which point it appeared that
all updates were hung.

Retarting the solr server did not help, as all updaters immediately 'hung'
again.

Poking around in the threads, and strace, I do in fact see stuff happening.

The index size itself is about 270Gb, (we are hopping to support upto
500-1TB), and have supplied the system with ~3TB diskspace.

Any Tips on what could be happening?
notes: we have never run an optimize yet.
  we have never deleted from system yet.


The merge Thread appears to be the one..'never returnning'
Lucene Merge Thread #0 - Thread t@41
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.pread0(Native Method)
at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:31)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:234)
at sun.nio.ch.IOUtil.read(IOUtil.java:210)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:622)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:139)
at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:94)
at org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:176)
at
org.apache.lucene.index.FieldsWriter.addRawDocuments(FieldsWriter.java:209)
at
org.apache.lucene.index.SegmentMerger.copyFieldsNoDeletions(SegmentMerger.java:424)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4053)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3645)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:339)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:407)


Some ptrace output:
23178 pread(172,
\270\316\276\2\245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2...,
4096, 98004192) = 4096 0.09
23178 pread(172,
\245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2...,
4096, 98004196) = 4096 0.09
23178 pread(172,
\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2...,
4096, 98004200) = 4096 0.08
23178 pread(172,
\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2...,
4096, 98004204) = 4096 0.08
23178 pread(172,
\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2...,
4096, 98004208) = 4096 0.08
23178 pread(172,
\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2...,
4096, 98004212) = 4096 0.09
23178 pread(172,
\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2...,
4096, 98004216) = 4096 0.08
23178 pread(172,
\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2...,
4096, 98004220) = 4096 0.09
23178 pread(172,
\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2...,
4096, 98004224) = 4096 0.13
22688 ... futex resumed ) = -1 ETIMEDOUT (Connection timed
out) 0.051276
23178 pread(172,
\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2...,
4096, 98004228) = 4096 0.10
22688 futex(0x464a9f28, FUTEX_WAKE_PRIVATE, 1 
23178 pread(172,
\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2...,
4096, 98004232) = 4096 0.10
22688 ... futex resumed ) = 0 0.51
23178 pread(172,
\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2\307\316\276\2...,
4096, 98004236) = 4096 0.10
22688 clock_gettime(CLOCK_MONOTONIC,  
23178 pread(172,
\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2\307\316\276\2\310\316\276\2...,
4096, 98004240) = 4096 0.10
22688 ... clock_gettime resumed {1900472, 454038316}) = 0 0.54
23178 pread(172,
\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2\307\316\276\2\310\316\276\2\311\316\276\2...,
4096, 98004244) = 4096 0.11
22688 clock_gettime(CLOCK_MONOTONIC,  
23178 pread(172,

Re: two QueryHandler components in one schema?

2011-03-08 Thread Chris Hostetter

: So how do I define, for a given request-handler, a special query component?
: I did not find in this in the schema.

you mean solrocnfig.xml, again.

Taken directly from the SearchHandler URL i sent you...

 If you want to have a custom list of components (either omitting 
 defaults or adding custom components) you can specify the components for 
 a handler directly:
 
arr name=components
  strquery/str
  strfacet/str
  strmlt/str
  strhighlight/str
  strdebug/str
  strsomeothercomponent/str
/arr

...so if you don't wnat to use query and you want to use 
mySpecialQueryComponent it would be ...

   arr name=components
 strmySpecialQueryComponent/str
 strfacet/str
 strmlt/str
 strhighlight/str
 strdebug/str
   /arr

...the SearchComponent URL i sent, as well as the 
example/solr/conf/solrconfig.xml file that ships with solr also has 
examples of how/when you can specify an explicit components list



-Hoss


Re: two QueryHandler components in one schema?

2011-03-08 Thread Paul Libbrecht
Erm,

did you, Hoss, not say that components are referred to by name?
How could the search result be read from the query mySpecialQueryComponent if 
it cannot be named? Simply through the pool of SolrParams?

If yes, that's the great magic of solr.

paul


Le 8 mars 2011 à 23:19, Chris Hostetter a écrit :

 ...so if you don't wnat to use query and you want to use 
 mySpecialQueryComponent it would be ...
 
   arr name=components
 strmySpecialQueryComponent/str
 strfacet/str
 strmlt/str
 strhighlight/str
 strdebug/str
   /arr
 
 ...the SearchComponent URL i sent, as well as the 
 example/solr/conf/solrconfig.xml file that ships with solr also has 
 examples of how/when you can specify an explicit components list



How to intercept the http request made by solrj

2011-03-08 Thread cyang2010
Hi,

Anyone knows how to intercept the http request made by solrj?

I only see the url being printed out when the request is invalid.  But still
as part of development/debugging process, i want to verify what http request
it sent out to solr server.

Thanks.

CY




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-intercept-the-http-request-made-by-solrj-tp2652951p2652951.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: error in log INFO org.apache.solr.core.SolrCore - webapp=/solr path=/admin/ping params={} status=0 QTime=1

2011-03-08 Thread Chris Hostetter

: I am using solr under jboss, so this might be more of a jboss config
: issue, not really sure.  But my logs keep getting spammed, because
: solr sends it as ERROR [STDERR] INFO org.apache.solr.core.SolrCore -
: webapp=/solr path=/admin/ping params={} status=0 QTime=1
: 
: Has anyone seen this and found a workaround to not send this as an Error?

that's not an error -- that's Solr logging a message using the INFO 
level which some other code is then prepending ERROR [STDERR]  in front 
of.

My guess: your installation is setup so that Java Util Logging goes to
System.err by default, and then something in JBoss has remapped System.err 
to an internal stream that it then processes/redirects and lets you know 
that those lines were written to STDERR (and treats them as an error) ... 
most likely everything Solr ever logs is being written out that way (not 
just those INFO messages from SolrCore.

Solr users the SLF4J abstraction to do it's logging, and by default ships 
with the SLF4J-to-JUL bridge (because JUL logging is the one type of 
logging garunteed to be supported by every servlet container w/o any 
external dependencies or risk of class path collision).  You should 
investigate how to configure JUL logging for your JBoss installation to 
get those messages somewhere more useful then STDERR, and/or change the 
SLF4J bindings that are in use in your Solr installation...

http://wiki.apache.org/solr/SolrLogging







-Hoss


Re: two QueryHandler components in one schema?

2011-03-08 Thread Chris Hostetter

: did you, Hoss, not say that components are referred to by name? How 
: could the search result be read from the query mySpecialQueryComponent 
: if it cannot be named? Simply through the pool of SolrParams?

in the example i gave, mySpecialQueryComponent *is* the name of some 
component you have already defined -- instead of using the component named 
query which has also already been defined (either implicitly as a 
default or explicitly in the config)

As i keep saying: if you look at the 1.4.1 example solrconfig.xml, there 
are several examples of this (and the example solrconfig.xml that will be 
in the Solr 3.1 is even better)...

From 1.4.1...

   By default, the following components are avaliable:

   searchComponent name=query 
   class=org.apache.solr.handler.component.QueryComponent /

...
   
   Default configuration in a requestHandler would look like:
arr name=components
  strquery/str
  strfacet/str
  strmlt/str
  strhighlight/str
  strstats/str
  strdebug/str
/arr

If you register a searchComponent to one of the standard names, that 
will be used instead. 

...

  !-- A component to return terms and document frequency of those terms.
   This component does not yet support distributed search. --
  searchComponent name=termsComponent 
  class=org.apache.solr.handler.component.TermsComponent/

  requestHandler name=/terms 
  class=org.apache.solr.handler.component.SearchHandler
 lst name=defaults
  bool name=termstrue/bool
/lst 
arr name=components
  strtermsComponent/str
/arr
  /requestHandler



-Hoss


Re: dataimport

2011-03-08 Thread Chris Hostetter

: INFO: Creating a connection for entity id with URL:
: 
jdbc:mysql://localhost/researchsquare_beta_library?characterEncoding=UTF8zeroDateTimeBehavior=convertToNull
: Feb 24, 2011 8:58:25 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
: call
: INFO: Time taken for getConnection(): 137
: Killed
: 
: So it looks like for whatever reason, the server crashes trying to do a full
: import. When I add a LIMIT clause on the query, it works fine when the LIMIT
: is only 250 records but if I try to do 500 records, I get the same message.

...wow.  that's ... weird.

I've never seen a java process just log Killed like that.

The only time i've ever seen a process log Killed is if it was 
terminated by the os (ie: kill -9 pid)

What OS are you using? how are you running solr? (ie: are you using the 
simple jetty example java -jar start.jar or are you using a differnet 
servlet container?) ... are you absolutely certain your machine doens't 
have some sort of monitoring in place that kills jobs if they take too 
long, or use too much CPU?


-Hoss


Re: Help with explain query syntax

2011-03-08 Thread Chris Hostetter

: str name=parsedquery
: +DisjunctionMaxQuery((company_name:(linguajob.pl linguajob) pl)~0.01) ()
: /str

you can see the crux of your problem in this query string

it seems you have a query time synonym in place to *expand* linguajob.pl 
into [linguajob.pl] and [linguajob] [pl] but query time synonym expansion 
of multiword queries doesn't work -- what it is ultimatley requiring is 
that a doc contain linguajob.pl and linguajob at the same term 
position, followed by pl

this is not what you have indexed.

This type of specific example is warned against on the wiki...

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory


-Hoss


Re: Solr Hanging all of sudden with update/csv

2011-03-08 Thread Jonathan Rochkind
My guess is that you're running out of RAM.  Actual Java profiling is 
beyond me, but I have seen issues on updating that were solved by more RAM.


If you are updating every few minutes, and your new index takes more 
than a few minutes to warm, you could be running into overlapping 
warming indexes issues. Some more info on what I mean by this in this 
FAQ, although the FAQ isn't actually targetted at this case exactly: 
http://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F


Overlapping warming indexes can result in excessive RAM and/or CPU usage.

If you haven't given your JVM options to tune the JVM Garbage 
Collection, that can also help things, using the options for concurrent 
thread GC.  But if your fundamental problem is overlapping warming 
queries, you probably need to make that stop.


On 3/8/2011 5:17 PM, danomano wrote:

Hi folks, I've been using solr for about 3 months.

Our Solr install is a single node, and we have been injecting logging data
into the solr server every couple of minutes, which each updating taking few
minutes.

Everything working fine until this morning, at which point it appeared that
all updates were hung.

Retarting the solr server did not help, as all updaters immediately 'hung'
again.

Poking around in the threads, and strace, I do in fact see stuff happening.

The index size itself is about 270Gb, (we are hopping to support upto
500-1TB), and have supplied the system with ~3TB diskspace.

Any Tips on what could be happening?
notes: we have never run an optimize yet.
   we have never deleted from system yet.


The merge Thread appears to be the one..'never returnning'
Lucene Merge Thread #0 - Thread t@41
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.pread0(Native Method)
at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:31)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:234)
at sun.nio.ch.IOUtil.read(IOUtil.java:210)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:622)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:139)
at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:94)
at org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:176)
at
org.apache.lucene.index.FieldsWriter.addRawDocuments(FieldsWriter.java:209)
at
org.apache.lucene.index.SegmentMerger.copyFieldsNoDeletions(SegmentMerger.java:424)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4053)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3645)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:339)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:407)


Some ptrace output:
23178 pread(172,
\270\316\276\2\245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2...,
4096, 98004192) = 40960.09
23178 pread(172,
\245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2...,
4096, 98004196) = 40960.09
23178 pread(172,
\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2...,
4096, 98004200) = 40960.08
23178 pread(172,
\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2...,
4096, 98004204) = 40960.08
23178 pread(172,
\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2...,
4096, 98004208) = 40960.08
23178 pread(172,
\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2...,
4096, 98004212) = 40960.09
23178 pread(172,
\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2...,
4096, 98004216) = 40960.08
23178 pread(172,
\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2...,
4096, 98004220) = 40960.09
23178 pread(172,
\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2...,
4096, 98004224) = 40960.13
22688... futex resumed  ) = -1 ETIMEDOUT (Connection timed
out)0.051276
23178 pread(172,
\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2...,
4096, 98004228) = 40960.10
22688 futex(0x464a9f28, FUTEX_WAKE_PRIVATE, 1
23178 pread(172,

Re: Solr Hanging all of sudden with update/csv

2011-03-08 Thread danomano
Actually this is definitely not a ram issue.  I have visualVM connected and
MAX Ram available for the JavaVM is ~7GB, but the system is only using
~5.5GB, with a MAX so far of 6.5GB consumed.

I think..well I'm guessing the system hit a merge threshold, but I can't
tell for sure..I have seen the index size grow rapidly today (much more then
normal, in the last 3 hours the index size has increased by about 50%).  
From various posts I see that during the 'optimize' (which I have not
called), or the perhaps the merging of segments it is normal for the disk
space requirements to temporarily increase by 2x to 3x.  As such my only
assumption is that it must be conducing a merge.  
Note: since I restarted the solr server, I have only 1 client thread pushing
data in (it already transmitted the data.(~2mb)). (and it has been held up
for about 4 hours now..I believe its stuck waiting for the merge thread to
complete).

Is there a better way to handle merging? or at least predicting when it will
occur? (I'm essentially using the defaults MergeFactor:10, ramBuffer 32MB).

I'm totally new to solr/lucune/indexing in generaly so I'm so what clueless
on all this..
It should be noted we have 'millions of documents' all which are generally 
4k bytes.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Hanging-all-of-sudden-with-update-csv-tp2652903p2653423.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Hanging all of sudden with update/csv

2011-03-08 Thread Jason Rutherglen
 The index size itself is about 270Gb, (we are hopping to support upto
 500-1TB), and have supplied the system with ~3TB diskspace.

That's simply massive for a single node.  When the system tries to
merge the segments the queries are probably not working?  And the
merges will take quite a while.  How long is OK for a single query to
return in?

On Tue, Mar 8, 2011 at 2:17 PM, danomano dshopk...@earthlink.net wrote:
 Hi folks, I've been using solr for about 3 months.

 Our Solr install is a single node, and we have been injecting logging data
 into the solr server every couple of minutes, which each updating taking few
 minutes.

 Everything working fine until this morning, at which point it appeared that
 all updates were hung.

 Retarting the solr server did not help, as all updaters immediately 'hung'
 again.

 Poking around in the threads, and strace, I do in fact see stuff happening.

 The index size itself is about 270Gb, (we are hopping to support upto
 500-1TB), and have supplied the system with ~3TB diskspace.

 Any Tips on what could be happening?
 notes: we have never run an optimize yet.
          we have never deleted from system yet.


 The merge Thread appears to be the one..'never returnning'
 Lucene Merge Thread #0 - Thread t@41
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcher.pread0(Native Method)
        at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:31)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:234)
        at sun.nio.ch.IOUtil.read(IOUtil.java:210)
        at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:622)
        at
 org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
        at
 org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:139)
        at
 org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:94)
        at org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:176)
        at
 org.apache.lucene.index.FieldsWriter.addRawDocuments(FieldsWriter.java:209)
        at
 org.apache.lucene.index.SegmentMerger.copyFieldsNoDeletions(SegmentMerger.java:424)
        at
 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153)
        at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4053)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3645)
        at
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:339)
        at
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:407)


 Some ptrace output:
 23178 pread(172,
 \270\316\276\2\245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2...,
 4096, 98004192) = 4096 0.09
 23178 pread(172,
 \245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2...,
 4096, 98004196) = 4096 0.09
 23178 pread(172,
 \271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2...,
 4096, 98004200) = 4096 0.08
 23178 pread(172,
 \272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2...,
 4096, 98004204) = 4096 0.08
 23178 pread(172,
 \273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2...,
 4096, 98004208) = 4096 0.08
 23178 pread(172,
 \274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2...,
 4096, 98004212) = 4096 0.09
 23178 pread(172,
 \275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2...,
 4096, 98004216) = 4096 0.08
 23178 pread(172,
 \276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2...,
 4096, 98004220) = 4096 0.09
 23178 pread(172,
 \277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2...,
 4096, 98004224) = 4096 0.13
 22688 ... futex resumed )             = -1 ETIMEDOUT (Connection timed
 out) 0.051276
 23178 pread(172,
 \300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2...,
 4096, 98004228) = 4096 0.10
 22688 futex(0x464a9f28, FUTEX_WAKE_PRIVATE, 1
 23178 pread(172,
 \301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2...,
 4096, 98004232) = 4096 0.10
 22688 ... futex resumed )             = 0 0.51
 23178 pread(172,
 \302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2\307\316\276\2...,
 4096, 98004236) = 4096 0.10
 22688 clock_gettime(CLOCK_MONOTONIC,
 23178 

Re: Help with explain query syntax

2011-03-08 Thread Yonik Seeley
It's probably the WordDelimiterFilter:

 org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal:
 1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0
 generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 }

Get rid of the preserveOriginal=1 in the query analyzer.

-Yonik
http://lucidimagination.com

On Tue, Mar 1, 2011 at 9:01 AM, Glòria Martínez
gloria.marti...@careesma.com wrote:
 Hello,

 I can't understand why this query is not matching anything. Could someone
 help me please?

 *Query*
 http://localhost:8894/solr/select?q=linguajob.plqf=company_namewt=xmlqt=dismaxdebugQuery=onexplainOther=id%3A1

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime12/int
 -
 lst name=params
 str name=explainOtherid:1/str
 str name=debugQueryon/str
 str name=qlinguajob.pl/str
 str name=qfcompany_name/str
 str name=wtxml/str
 str name=qtdismax/str
 /lst
 /lst
 result name=response numFound=0 start=0/
 -
 lst name=debug
 str name=rawquerystringlinguajob.pl/str
 str name=querystringlinguajob.pl/str
 -
 str name=parsedquery
 +DisjunctionMaxQuery((company_name:(linguajob.pl linguajob) pl)~0.01) ()
 /str
 -
 str name=parsedquery_toString
 +(company_name:(linguajob.pl linguajob) pl)~0.01 ()
 /str
 lst name=explain/
 str name=otherQueryid:1/str
 -
 lst name=explainOther
 -
 str name=1

 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited
 clause(s)
  0.0 = no match on required clause (company_name:(linguajob.pl linguajob)
 pl) *- What does this syntax (field:(token1 token2) token3) mean?*
    0.0 = (NON-MATCH) fieldWeight(company_name:(linguajob.pl linguajob) pl
 in 0), product of:
      0.0 = tf(phraseFreq=0.0)
      1.6137056 = idf(company_name:(linguajob.pl linguajob) pl)
      0.4375 = fieldNorm(field=company_name, doc=0)
 /str
 /lst
 str name=QParserDisMaxQParser/str
 null name=altquerystring/
 null name=boostfuncs/
 +
 lst name=timing
 ...
 /response



 There's only one document indexed:

 *Document*
 http://localhost:8894/solr/select?q=1qf=idwt=xmlqt=dismax
 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime2/int
 -
 lst name=params
 str name=qfid/str
 str name=wtxml/str
 str name=qtdismax/str
 str name=q1/str
 /lst
 /lst
 -
 result name=response numFound=1 start=0
 -
 doc
 str name=company_nameLinguaJob.pl/str
 str name=id1/str
 int name=status6/int
 date name=timestamp2011-03-01T11:14:24.553Z/date
 /doc
 /result
 /response

 *Solr Admin Schema*
 Field: company_name
 Field Type: text
 Properties: Indexed, Tokenized, Stored
 Schema: Indexed, Tokenized, Stored
 Index: Indexed, Tokenized, Stored

 Position Increment Gap: 100

 Index Analyzer: org.apache.solr.analysis.TokenizerChain Details
 Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:
 schema.UnicodeNormalizationFilterFactory args:{composed: false
 remove_modifiers: true fold: true version: java6 remove_diacritics: true }
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal:
 1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1
 generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}

 Query Analyzer: org.apache.solr.analysis.TokenizerChain Details
 Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:
 schema.UnicodeNormalizationFilterFactory args:{composed: false
 remove_modifiers: true fold: true version: java6 remove_diacritics: true }
 org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt
 expand: true ignoreCase: true }
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt
 ignoreCase: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal:
 1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0
 generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}

 Docs: 1
 Distinct: 5
 Top 5 terms
 term frequency
 lingua 1
 linguajob.pl 1
 linguajobpl 1
 pl 1
 job 1

 *Solr Analysis*
 Field name: company_name
 Field value (Index): LinguaJob.pl
 Field value (Query): linguajob.pl

 *Index Analyzer

 org.apache.solr.analysis.WhitespaceTokenizerFactory {}
 term position 1
 term text LinguaJob.pl
 term type word
 source start,end 0,12
 payload

 schema.UnicodeNormalizationFilterFactory {composed=false,
 remove_modifiers=true, fold=true, version=java6, remove_diacritics=true}
 term position 1
 term text LinguaJob.pl
 term type word
 source start,end 0,12
 payload

 org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
 ignoreCase=true, enablePositionIncrements=true}
 term position 1
 term text LinguaJob.pl
 term type word
 source start,end 

Custom search filters

2011-03-08 Thread Mark
Hi all, I am trying to use a custom search filter 
(org.apache.lucene.search.Filter) but I am unsure of where I should 
configure this.


Would I have to create my own SearchHandler that would wrap this logic 
in? Any example/suggestions out there?


Thanks


Re: Use of multiple tomcat instance and shards.

2011-03-08 Thread rajini maski
Thank you all .

Tommaso , Thanks. I will follow the links you suggested.
Erick, It is Solr 1.4.1 ..

Regards,
Rajani Maski






On Tue, Mar 8, 2011 at 10:16 PM, Tommaso Teofili
tommaso.teof...@gmail.comwrote:

 Just one more hint, I didn't mention it in the previous email since I
 imagine the scenario you explained doesn't allow it but anyways you could
 also check Solr Cloud and its distributed requests [1].
 Cheers,
 Tommaso

 [1] : http://wiki.apache.org/solr/SolrCloud#Distributed_Requests

 2011/3/8 Tommaso Teofili tommaso.teof...@gmail.com

  Hi Rajani,
 
  i
 
 
  2011/3/8 rajini maski rajinima...@gmail.com
 
 
  Tommaso, Please can you share any link that explains me about how to
  enable
  and do load balancing on the machines that you did mention above..?
 
 
 
 
  if you're querying Solr via SolrJ [1] you could use the LBHttpSolrServer
  [2] otherwise, if you still want Solr to be responsible for load
 balancing,
  implement a custom handler which wraps it (see [3]).
  Consider also that this load balancing often gets done using a VIP [4] or
  an Apache HTTP server in front of Solr.
  Hope this helps,
  Tommaso
 
 
  [1] : http://wiki.apache.org/solr/Solrj
  [2] : http://wiki.apache.org/solr/LBHttpSolrServer
  [3] : http://markmail.org/thread/25jrko5s7wlmzjf7
  [4] : http://en.wikipedia.org/wiki/Virtual_IP_address
 
 
 



True master-master fail-over without data gaps

2011-03-08 Thread Otis Gospodnetic
Hello,

What are some common or good ways to handle indexing (master) fail-over?
Imagine you have a continuous stream of incoming documents that you have to 
index without losing any of them (or with losing as few of them as possible).  
How do you set up you masters?
In other words, you can't just have 2 masters where the secondary is the 
Repeater (or Slave) of the primary master and replicates the index 
periodically: 
you need to have 2 masters that are in sync at all times!
How do you achieve that?

* Do you just put N masters behind a LB VIP, configure them both to point to 
the 
index on some shared storage (e.g. SAN), and count on the LB to fail-over to 
the 
secondary master when the primary becomes unreachable?
If so, how do you deal with index locks?  You use the Native lock and count on 
it disappearing when the primary master goes down?  That means you count on the 
whole JVM process dying, which may not be the case...

* Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters 
with 2 separate indices in sync, while making sure you write to only 1 of them 
via LB VIP or otherwise?

* Or ...


This thread is on a similar topic, but is inconclusive:
  http://search-lucene.com/m/aOsyN15f1qd1

Here is another similar thread, but this one doesn't cover how 2 masters are 
kept in sync at all times:
  http://search-lucene.com/m/aOsyN15f1qd1

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



Re: NRT in Solr

2011-03-08 Thread Otis Gospodnetic
I think once this starts yielding matches:


 

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Jae Joo jaejo...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, March 8, 2011 4:27:41 PM
 Subject: NRT in Solr
 
 Hi,
 Is NRT in Solr 4.0 from trunk? I have checkouted from Trunk, but could  not
 find the configuration for NRT.
 
 Regards
 
 Jae
 


Re: NRT in Solr

2011-03-08 Thread Otis Gospodnetic
I think once this starts yielding matches:

trunk/solr$ find . -name \*java | xargs grep IndexReader | grep IndexWriter

...we'll know NRT has landed.

Until then: http://wiki.apache.org/solr/NearRealtimeSearch

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Jae Joo jaejo...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, March 8, 2011 4:27:41 PM
 Subject: NRT in Solr
 
 Hi,
 Is NRT in Solr 4.0 from trunk? I have checkouted from Trunk, but could  not
 find the configuration for NRT.
 
 Regards
 
 Jae
 


RE: True master-master fail-over without data gaps

2011-03-08 Thread Jonathan Rochkind
I'd honestly think about buffer the incoming documents in some store that's 
actually made for fail-over persistence reliability, maybe CouchDB or 
something. And then that's taking care of not losing anything, and the problem 
becomes how we make sure that our solr master indexes are kept in sync with the 
actual persistent store; which I'm still not sure about, but I'm thinking it's 
a simpler problem. The right tool for the right job, that kind of failover 
persistence is not solr's specialty. 

From: Otis Gospodnetic [otis_gospodne...@yahoo.com]
Sent: Tuesday, March 08, 2011 11:45 PM
To: solr-user@lucene.apache.org
Subject: True master-master fail-over without data gaps

Hello,

What are some common or good ways to handle indexing (master) fail-over?
Imagine you have a continuous stream of incoming documents that you have to
index without losing any of them (or with losing as few of them as possible).
How do you set up you masters?
In other words, you can't just have 2 masters where the secondary is the
Repeater (or Slave) of the primary master and replicates the index periodically:
you need to have 2 masters that are in sync at all times!
How do you achieve that?

* Do you just put N masters behind a LB VIP, configure them both to point to the
index on some shared storage (e.g. SAN), and count on the LB to fail-over to the
secondary master when the primary becomes unreachable?
If so, how do you deal with index locks?  You use the Native lock and count on
it disappearing when the primary master goes down?  That means you count on the
whole JVM process dying, which may not be the case...

* Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters
with 2 separate indices in sync, while making sure you write to only 1 of them
via LB VIP or otherwise?

* Or ...


This thread is on a similar topic, but is inconclusive:
  http://search-lucene.com/m/aOsyN15f1qd1

Here is another similar thread, but this one doesn't cover how 2 masters are
kept in sync at all times:
  http://search-lucene.com/m/aOsyN15f1qd1

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/