RE: Document row in solr Result

2011-09-12 Thread Pierre GOSSE
Hi Eric,

If you want a query informing one customer of its product row at any given 
time, the easiest way is to filter on submission date greater than this 
customer's and return the result count. If you have 500 products with an 
earlier submission date, your row number is 501.

Hope this helps,

Pierre


-Message d'origine-
De : Eric Grobler [mailto:impalah...@googlemail.com] 
Envoyé : lundi 12 septembre 2011 11:00
À : solr-user@lucene.apache.org
Objet : Re: Document row in solr Result

Hi Manish,

Thank you for your time.

For upselling reasons I want to inform the customer that:
your product is on the last page of the search result. However, click here
to put your product back on the first page...


Here is an example:
I have a phone with productid 635001 in the iphone category.
When I sort this category by submissiondate this product will be near the
end of the result (on row 9863 in this example).
At the moment I have to scan nearly 1 rows in the client to determine
the position of this product.
Is there a more efficient way to find the position of a specific document in
a resultset without returning the full result?

q=category:iphone
fl=productid
sort=submissiondate desc
rows=1

 row productid submissiondate
   1 6565692011-09-12 08:12
   2 6564682011-09-12 08:03
   3 6562012011-09-11 23:41
...
9863 6350012011-08-11 17:22
...
9922 6344232011-08-10 21:51

Regards
Ericz

On Mon, Sep 12, 2011 at 9:38 AM, Manish Bafna manish.bafna...@gmail.comwrote:

 You might not be able to find the row index.
 Can you post your query in detail. The kind of inputs and outputs you are
 expecting.

 On Mon, Sep 12, 2011 at 2:01 PM, Eric Grobler impalah...@googlemail.com
 wrote:

  Hi Manish,
 
  Thanks for your reply - but how will that return me the row index of the
  original query.
 
  Regards
  Ericz
 
  On Mon, Sep 12, 2011 at 9:24 AM, Manish Bafna manish.bafna...@gmail.com
  wrote:
 
   fq - filter query parameter searches within the results.
  
   On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler 
 impalah...@googlemail.com
   wrote:
  
Hi Solr experts,
   
If you have a site with products sorted by submission date, the
 product
   of
a
customer might be on page 1 on the first day, and then move down to
  page
   x
as other customers submit newer entries.
   
To find the row of a product you can of course run the query and loop
through the result until you find the specific productid like:
q=category:myproducttype
fl=productid
sort=submissiondate desc
rows=1
   
But is there perhaps a more efficient way to do this? Maybe a special
syntax
to search within the result.
   
Thanks
Ericz
   
  
 



RE: What is the different?

2011-07-22 Thread Pierre GOSSE
Hi,

Have you check the queries by using the debugQuery=true parameter ? This could 
give some hints of what is searched in both cases.

Pierre

-Message d'origine-
De : cnyee [mailto:yeec...@gmail.com] 
Envoyé : vendredi 22 juillet 2011 05:14
À : solr-user@lucene.apache.org
Objet : What is the different?

Hi,

I have two queries:

(1) q = (change management)
(2) q = (change management) AND domain_ids:(0^1.3 OR 1)

The purpose of the (2) is to boost the records with domain_ids=0.
In my database all records has domain_ids = 0 or 1, so domains_ids:(0 or 1)
will always returns the full database.

Now my questions is - query (2) returns 5000+ results, but query (1) returns
700+ results.

Can somebody enlighten me on what is the reasons behind such a vast
different in number of results?

Many thanks in advance.

Yee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-different-tp3190278p3190278.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Solr still respond to search queries during commit, only new indexations 
requests will have to wait (until end of commit?). So I don't think your users 
will experience increased response time during commits (unless your server is 
much undersized).

Pierre

-Message d'origine-
De : Jonty Rhods [mailto:jonty.rh...@gmail.com] 
Envoyé : jeudi 21 juillet 2011 20:27
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Actually i m worried about the response time. i k commiting around 500
docs in every 5 minutes. as i know,correct me if i m wrong; at the
time of commiting solr server stop responding. my concern is how to
minimize the response time so user not need to wait. or any other
logic will require for my case. please suggest.

regards
jonty

On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote:
 What is it you want help with? You haven't told us what the
 problem you're trying to solve is. Are you asking how to
 speed up indexing? What have you tried? Have you
 looked at: http://wiki.apache.org/solr/FAQ#Performance?

 Best
 Erick

 On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote:
 I am using solrj to index the data. I have around 5 docs indexed. As at
 the time of commit due to lock server stop giving response so I was
 calculating commit time:

 double starttemp = System.currentTimeMillis();
 server.add(docs);
 server.commit();
 System.out.println(total time in commit =  + (System.currentTimeMillis() -
 starttemp)/1000);

 It taking around 9 second to commit the 5000 docs with 15 fields. However I
 am not confirm the lock time of index whether it is start
 since server.add(docs); time or server.commit(); time only.

 If I am changing from above to following

 server.add(docs);
 double starttemp = System.currentTimeMillis();
 server.commit();
 System.out.println(total time in commit =  + (System.currentTimeMillis() -
 starttemp)/1000);

 then commit time becomes less then 1 second. I am not sure which one is
 right.

 please help.

 regards
 Jonty




RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Solr will response for search during optimization, but commits will have to 
wait the end of the optimization process.

During optimization a new index is generated on disk by merging every single 
file of the current index into one big file, so you're server will be busy, 
especially regarding disk access. This may alter your response time and has 
very negative effect on the replication of index if you have a master/slave 
architecture.

I've read here that optimization is not always a requirement to have an 
efficient index, due to some low level changes in lucene 3.xx, so maybe you 
don't really need optimization. What version of solr are you using ? Maybe 
someone can point toward a relevant link about optimization other than solr 
wiki 
http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

Pierre


-Message d'origine-
De : Jonty Rhods [mailto:jonty.rh...@gmail.com] 
Envoyé : vendredi 22 juillet 2011 12:45
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Thanks for clarity.

One more thing I want to know about optimization.

Right now I am planning to optimize the server in 24 hour. Optimization is
also time taking ( last time took around 13 minutes), so I want to know that
:

1. when optimization is under process that time will solr server response or
not?
2. if server will not response then how to do optimization of server fast or
other way to do optimization so our user will not have to wait to finished
optimization process.

regards
Jonty



On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

 Solr still respond to search queries during commit, only new indexations
 requests will have to wait (until end of commit?). So I don't think your
 users will experience increased response time during commits (unless your
 server is much undersized).

 Pierre

 -Message d'origine-
 De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
 Envoyé : jeudi 21 juillet 2011 20:27
 À : solr-user@lucene.apache.org
 Objet : Re: commit time and lock

 Actually i m worried about the response time. i k commiting around 500
 docs in every 5 minutes. as i know,correct me if i m wrong; at the
 time of commiting solr server stop responding. my concern is how to
 minimize the response time so user not need to wait. or any other
 logic will require for my case. please suggest.

 regards
 jonty

 On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote:
  What is it you want help with? You haven't told us what the
  problem you're trying to solve is. Are you asking how to
  speed up indexing? What have you tried? Have you
  looked at: http://wiki.apache.org/solr/FAQ#Performance?
 
  Best
  Erick
 
  On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com
 wrote:
  I am using solrj to index the data. I have around 5 docs indexed. As
 at
  the time of commit due to lock server stop giving response so I was
  calculating commit time:
 
  double starttemp = System.currentTimeMillis();
  server.add(docs);
  server.commit();
  System.out.println(total time in commit =  +
 (System.currentTimeMillis() -
  starttemp)/1000);
 
  It taking around 9 second to commit the 5000 docs with 15 fields.
 However I
  am not confirm the lock time of index whether it is start
  since server.add(docs); time or server.commit(); time only.
 
  If I am changing from above to following
 
  server.add(docs);
  double starttemp = System.currentTimeMillis();
  server.commit();
  System.out.println(total time in commit =  +
 (System.currentTimeMillis() -
  starttemp)/1000);
 
  then commit time becomes less then 1 second. I am not sure which one is
  right.
 
  please help.
 
  regards
  Jonty
 
 



RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Hi Mark

I've read that in a thread title  Weird optimize performance degradation, 
where Erick Erickson states that Older versions of Lucene would search faster 
on an optimized index, but this is no longer necessary., and more recently in 
a thread you initiated a month ago Question about optimization.

I'll also be very interested if anyone had a more precise idea/datas of 
benefits and tradeoff of optimize vs merge ...

Pierre


-Message d'origine-
De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com] 
Envoyé : vendredi 22 juillet 2011 15:45
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Hello,

Pierre, can you tell us where you read that?
I've read here that optimization is not always a requirement to have an
efficient index, due to some low level changes in lucene 3.xx

Marc.

On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

 Solr will response for search during optimization, but commits will have to
 wait the end of the optimization process.

 During optimization a new index is generated on disk by merging every
 single file of the current index into one big file, so you're server will be
 busy, especially regarding disk access. This may alter your response time
 and has very negative effect on the replication of index if you have a
 master/slave architecture.

 I've read here that optimization is not always a requirement to have an
 efficient index, due to some low level changes in lucene 3.xx, so maybe you
 don't really need optimization. What version of solr are you using ? Maybe
 someone can point toward a relevant link about optimization other than solr
 wiki
 http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

 Pierre


 -Message d'origine-
 De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
 Envoyé : vendredi 22 juillet 2011 12:45
 À : solr-user@lucene.apache.org
 Objet : Re: commit time and lock

 Thanks for clarity.

 One more thing I want to know about optimization.

 Right now I am planning to optimize the server in 24 hour. Optimization is
 also time taking ( last time took around 13 minutes), so I want to know
 that
 :

 1. when optimization is under process that time will solr server response
 or
 not?
 2. if server will not response then how to do optimization of server fast
 or
 other way to do optimization so our user will not have to wait to finished
 optimization process.

 regards
 Jonty



 On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.com
 wrote:

  Solr still respond to search queries during commit, only new indexations
  requests will have to wait (until end of commit?). So I don't think your
  users will experience increased response time during commits (unless your
  server is much undersized).
 
  Pierre
 
  -Message d'origine-
  De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
  Envoyé : jeudi 21 juillet 2011 20:27
  À : solr-user@lucene.apache.org
  Objet : Re: commit time and lock
 
  Actually i m worried about the response time. i k commiting around 500
  docs in every 5 minutes. as i know,correct me if i m wrong; at the
  time of commiting solr server stop responding. my concern is how to
  minimize the response time so user not need to wait. or any other
  logic will require for my case. please suggest.
 
  regards
  jonty
 
  On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com
 wrote:
   What is it you want help with? You haven't told us what the
   problem you're trying to solve is. Are you asking how to
   speed up indexing? What have you tried? Have you
   looked at: http://wiki.apache.org/solr/FAQ#Performance?
  
   Best
   Erick
  
   On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com
  wrote:
   I am using solrj to index the data. I have around 5 docs indexed.
 As
  at
   the time of commit due to lock server stop giving response so I was
   calculating commit time:
  
   double starttemp = System.currentTimeMillis();
   server.add(docs);
   server.commit();
   System.out.println(total time in commit =  +
  (System.currentTimeMillis() -
   starttemp)/1000);
  
   It taking around 9 second to commit the 5000 docs with 15 fields.
  However I
   am not confirm the lock time of index whether it is start
   since server.add(docs); time or server.commit(); time only.
  
   If I am changing from above to following
  
   server.add(docs);
   double starttemp = System.currentTimeMillis();
   server.commit();
   System.out.println(total time in commit =  +
  (System.currentTimeMillis() -
   starttemp)/1000);
  
   then commit time becomes less then 1 second. I am not sure which one
 is
   right.
  
   please help.
  
   regards
   Jonty
  
  
 



RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Merging does not happen often enough to keep deleted documents to a low enough 
count ?

Maybe there's a need to have partial optimization available in solr, meaning 
that segment with too much deleted document could be copied to a new file 
without unnecessary datas. That way cleaning deleted datas could be compatible 
with having light replications.

I'm worried by this idea of deleted documents influencing relevance scores, any 
pointer to how important this influence may be ?

Pierre

-Message d'origine-
De : Shawn Heisey [mailto:s...@elyograg.org] 
Envoyé : vendredi 22 juillet 2011 16:42
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

On 7/22/2011 8:23 AM, Pierre GOSSE wrote:
 I've read that in a thread title  Weird optimize performance degradation, 
 where Erick Erickson states that Older versions of Lucene would search 
 faster on an optimized index, but this is no longer necessary., and more 
 recently in a thread you initiated a month ago Question about optimization.

 I'll also be very interested if anyone had a more precise idea/datas of 
 benefits and tradeoff of optimize vs merge ...

My most recent testing has been with Solr 3.2.0.  I have noticed some 
speedup after optimizing an index, but the gain is not 
earth-shattering.  My index consists of 7 shards.  One of them is small, 
and receives all new documents every two minutes.  The others are large, 
and aside from deletes, are mostly static.  Once a day, the oldest data 
is distributed from the small shard to its proper place in the other six 
shards.

The small shard is optimized once an hour, and usually takes less than a 
minute.  I optimize one large shard every day, so each one gets 
optimized once every six days.  That optimize takes 10-15 minutes.  The 
only reason that I optimize is to remove deleted documents, whatever 
speedup I get is just icing on the cake.  Deleted documents take up 
space and continue to influence the relevance scoring of queries, so I 
want to remove them.

Thanks,
Shawn



RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE
The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ




RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE
From what you tell us, I guess a separate index for website docs would be the 
best. If you fear that request from the window service would cripple your web 
site performance, why not have a totally separated index on another server, 
and have your website documents index in both indexes ?

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.


Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 



Regards,
JAME VAALET


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ




RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE
It is redundancy. You have to balance the cost of redundancy with the cost in 
performance with your web index requested by your windows service. If your 
windows service is not too aggressive in its requests, go for shards.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 15:05
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

But incase the website docs contribute around 50 % of the entire docs , why to 
recreate the indexes . don't you think its redundancy ?
Can two web apps (solr instances ) share a single index file to search on it 
without interfering each other 


Regards,
JAME VAALET
Software Developer 
EXT :8108
Capital IQ


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 5:12 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

From what you tell us, I guess a separate index for website docs would be the 
best. If you fear that request from the window service would cripple your web 
site performance, why not have a totally separated index on another server, 
and have your website documents index in both indexes ?

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.


Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 



Regards,
JAME VAALET


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ




RE: Multiple indexes

2011-06-17 Thread Pierre GOSSE
 I think there are reasons to use seperate indexes for each document type
 but do combined searches on these indexes
 (for example if you need separate TFs for each document type).

I wonder if in this precise case it wouldn't be pertinent to have a single 
index with the various document types each having each their own fields set. 
Isn't TF calculated field by field ?


RE: Document match with no highlight

2011-05-13 Thread Pierre GOSSE
In WordDelimiterFilter the parameters catenateNumbers, catenateWords, 
catenateAlls are set to 1. This parameters adds overlapping tokens which could 
explain that you meet the bug described in the jira issue I mentioned. 

As I understand WordDelimiterFilter :
0176 R3 1.5 TO should we tokenized with tokens R3 overlapping with R and 
3, and 15 overlapping with 1 and 5 

This parmeters are set to 0 for query, but having them set to 1 should not 
correct your problem unless you search for R3 1.5.

I think you have to either
 - set this parameters to 0 in index, but your query won't match anymore
 - wait for correction to be released in a new solr version, 
 - use solr trunk, 
 - or backport the modifications in the lucene-highlighter version you use. 

I did a backport for solr 1.4.1 since I won't move to 3.0 until some time, so 
please ask if you have question about how to do this.

Pierre


-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 20:06
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,

I read the link provided and I'll need some time to digest what it is
saying.

Here's my text fieldtype.

fieldtype name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimeterFilterFactory generateWordParts=1
generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimeterFilterFactory generateWordParts=1
generateNumberParts=1
  catenateWords=0 catenateNumbers=0 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
fieldtype
Also, I figured out what value in DOC_TEXT cause this issue to occur.
With a DOC_TEXT of (without the quotes):
0176 R3 1.5 TO 

Searching for 3 1 15 returns a match with empty highlight.
Searching for 3 1 15~1 returns a match with highlight.

Can anyone see anything that I'm missing?

Thanks,
P.


On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

  Since you're using the standard text field, this should NOT be you're
 case.

 Sorry, for the missing NOT in previous phrase. You should have the same
 issue given what you said, but still, it sound very similar.

 Are you sure your fieldtype text has nothing special ? a tokenizer or
 filter that could add some token in your indexed text but not in your query,
 like for example a WordDelimiter present in index and not query ?

 Pierre

 -Message d'origine-
 De : Pierre GOSSE [mailto:pierre.go...@arisem.com]
 Envoyé : jeudi 12 mai 2011 18:21
 À : solr-user@lucene.apache.org
 Objet : RE: Document match with no highlight

  In fact if I did 3 1 15~1 I do get snipet also.

 Strange, I had a very similar problem, but with overlapping tokens. Since
 you're using the standard text field, this should be you're case.

 Maybe you could have a look at this issue, since it sound very familiar to
 me :
 https://issues.apache.org/jira/browse/LUCENE-3087

 Pierre

 -Message d'origine-
 De : Phong Dais [mailto:phong.gd...@gmail.com]
 Envoyé : jeudi 12 mai 2011 17:26
 À : solr-user@lucene.apache.org
 Objet : Re: Document match with no highlight

 Hi,

 field name=DOC_TEXT type=text indexed=true stored=true/

 The type text is the default one that came with the default solr 1.4
 install w.o any modifications.

 If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
 get snipet also.

 Hope that helps.

 P.

 On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

URL:
  
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
  
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
  
   XML:
   ?xml version=1.0 encoding=UTF-8?
   response
 lst name=responseHeader
   int name=status0/int
   int name=QTime19/int
   lst name=params
 str name=explainOther/
 str
   name=indenton/str
 str
   name=hl.flDOC_TEXT/str
 str
   name=wtstandard/str
 str
   name=hl.maxAnalyzedChars-1/str
 str name=hlon/str
 str name=rows10/str
 str
   name=version2.2/str
 str
   name=debugQueryon/str
 str
   name

RE: Document match with no highlight

2011-05-12 Thread Pierre GOSSE
 In fact if I did 3 1 15~1 I do get snipet also.

Strange, I had a very similar problem, but with overlapping tokens. Since 
you're using the standard text field, this should be you're case. 

Maybe you could have a look at this issue, since it sound very familiar to me :
https://issues.apache.org/jira/browse/LUCENE-3087

Pierre

-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 17:26
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,

field name=DOC_TEXT type=text indexed=true stored=true/

The type text is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

   URL:
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
  XML:
  ?xml version=1.0 encoding=UTF-8?
  response
lst name=responseHeader
  int name=status0/int
  int name=QTime19/int
  lst name=params
str name=explainOther/
str
  name=indenton/str
str
  name=hl.flDOC_TEXT/str
str
  name=wtstandard/str
str
  name=hl.maxAnalyzedChars-1/str
str name=hlon/str
str name=rows10/str
str
  name=version2.2/str
str
  name=debugQueryon/str
str
  name=flDOC_TEXT,score/str
str name=start0/str
str name=qDOC_TEXT:3 1
  15/str
str
  name=qtstandard/str
str name=fq/
  /lst
/lst
result name=response numFound='1 start=0
  maxScore=0.035959315
  doc
float
  name=score0.035959315/float
arr name=DOC_TEXTstr
  ... /str/arr
  doc
/result
lst name=highlighting
  lst name=123456/
/lst
lst name=debug
  str name=rawquerystringDOC_TEXT:3
  1 15/str
  str name=querystringDOC_TEXT:3 1
  15/str
  str
  name=parsedqueryPhraseQuery(DOC_TEXT:3 1
  15)/str
  str
  name=parsedquery_toStringDOC_TEXT:3 1
  15/str
  lst name=explain
str name=123456
  0.035959315 =
  fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
  tf(phraseFreq=1.0)
  0.92055845 = idf(DOC_TEXT: 3=1
  1=1 15=1)
  0.0390625 =
  fieldNorm(field=DOC_TEXT, doc=0)
  /str
/lst
str name=QParserLuceneQParser/str
arr name=filter_queries
  str/
/arr
arr name=parsed_filter_queries/
lst name=timing
  ...
/lst
  /response


 Nothing looks suspicious.

 Can you provide two things more;
 fieldType of DOC_TEXT
 and
 field definition of DOC_TEXT.

 Also do you get snippet from the same doc, when you remove quotes from your
 query?




RE: Document match with no highlight

2011-05-12 Thread Pierre GOSSE
 Since you're using the standard text field, this should NOT be you're case.

Sorry, for the missing NOT in previous phrase. You should have the same issue 
given what you said, but still, it sound very similar. 

Are you sure your fieldtype text has nothing special ? a tokenizer or filter 
that could add some token in your indexed text but not in your query, like for 
example a WordDelimiter present in index and not query ?

Pierre

-Message d'origine-
De : Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Envoyé : jeudi 12 mai 2011 18:21
À : solr-user@lucene.apache.org
Objet : RE: Document match with no highlight

 In fact if I did 3 1 15~1 I do get snipet also.

Strange, I had a very similar problem, but with overlapping tokens. Since 
you're using the standard text field, this should be you're case. 

Maybe you could have a look at this issue, since it sound very familiar to me :
https://issues.apache.org/jira/browse/LUCENE-3087

Pierre

-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 17:26
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,

field name=DOC_TEXT type=text indexed=true stored=true/

The type text is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

   URL:
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
  XML:
  ?xml version=1.0 encoding=UTF-8?
  response
lst name=responseHeader
  int name=status0/int
  int name=QTime19/int
  lst name=params
str name=explainOther/
str
  name=indenton/str
str
  name=hl.flDOC_TEXT/str
str
  name=wtstandard/str
str
  name=hl.maxAnalyzedChars-1/str
str name=hlon/str
str name=rows10/str
str
  name=version2.2/str
str
  name=debugQueryon/str
str
  name=flDOC_TEXT,score/str
str name=start0/str
str name=qDOC_TEXT:3 1
  15/str
str
  name=qtstandard/str
str name=fq/
  /lst
/lst
result name=response numFound='1 start=0
  maxScore=0.035959315
  doc
float
  name=score0.035959315/float
arr name=DOC_TEXTstr
  ... /str/arr
  doc
/result
lst name=highlighting
  lst name=123456/
/lst
lst name=debug
  str name=rawquerystringDOC_TEXT:3
  1 15/str
  str name=querystringDOC_TEXT:3 1
  15/str
  str
  name=parsedqueryPhraseQuery(DOC_TEXT:3 1
  15)/str
  str
  name=parsedquery_toStringDOC_TEXT:3 1
  15/str
  lst name=explain
str name=123456
  0.035959315 =
  fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
  tf(phraseFreq=1.0)
  0.92055845 = idf(DOC_TEXT: 3=1
  1=1 15=1)
  0.0390625 =
  fieldNorm(field=DOC_TEXT, doc=0)
  /str
/lst
str name=QParserLuceneQParser/str
arr name=filter_queries
  str/
/arr
arr name=parsed_filter_queries/
lst name=timing
  ...
/lst
  /response


 Nothing looks suspicious.

 Can you provide two things more;
 fieldType of DOC_TEXT
 and
 field definition of DOC_TEXT.

 Also do you get snippet from the same doc, when you remove quotes from your
 query?




RE: Allowing looser matches

2011-04-13 Thread Pierre GOSSE
For (a) I don't think anything exists today providing this mechanism. 
But (b) is a good description of the dismax handler with a MM parameter of 66%. 

Pierre

-Message d'origine-
De : Mark Mandel [mailto:mark.man...@gmail.com] 
Envoyé : mercredi 13 avril 2011 10:04
À : solr-user@lucene.apache.org
Objet : Allowing looser matches

Not sure if the title explains it all, or if what I want is even possible,
but figured I would ask.

Say, I have a series of products I'm selling, and a search of:

Blue Wool Rugs

Comes in.  This returns 0 results, as Blue and Rugs match terms that are
indexes, Wool does not.

Is there a way to configure my index/searchHandler, to either:

(a) if no documents are returned, look to partial matches of the search
(e.g. return results with 'Blue rugs', in this case)
(b) add results to the overall search, but at a lower score, that have only
*some* of the terms being searched in them (in this case, maybe 2/3)

Is that even possible?

Thanks,

Mark

-- 
E: mark.man...@gmail.com
T: http://www.twitter.com/neurotic
W: www.compoundtheory.com

cf.Objective(ANZ) - Nov 17, 18 - Melbourne Australia
http://www.cfobjective.com.au

Hands-on ColdFusion ORM Training
www.ColdFusionOrmTraining.com


RE: Highlighting Problem

2011-03-29 Thread Pierre GOSSE
Look like special chars are filtered at index time and not replaced by space 
that would keep correct offset of terms. Can you paste here the definition of 
the fieldtype in your shema.xml ?


Pierre

-Message d'origine-
De : pottw...@freenet.de [mailto:pottw...@freenet.de] 
Envoyé : lundi 28 mars 2011 11:16
À : solr-user@lucene.apache.org
Objet : Highlighting Problem

dear solr specialists,

my data looks like this:

j]s(dh)fjk [hf]sjkadh asdj(kfh) [skdjfh aslkfjhalwe uigfrhj bsd bsdfga sjfg 
asdlfj.

if I want to query for the first word, the following queries must match:

j]s(dh)fjk
j]s(dhfjk
j]sdhfjk
jsdhfjk
dhf

So the matching should ignore some characters like ( ) [ ] and should match 
substrings.

So far I have the following field definition in the schema.xml:

    
  
    
    
    
    
     
  
  
    
    
      
    
     
  
    


With this definition the matching works as planned. But not for highlighting, 
there the special characters seem to move the  tags to wrong positions, for 
example searching for jsdhfjk misses the last 3 letters of the words ( = 3 
special characters from PatternReplaceFilterFactory)

j]s(dh)fjk

Solr has so many bells and whistles - what must I do to get a correctly working 
highlighting?

kind regards,
F.


---
Zeigen Sie uns Ihre beste Seite und gewinnen Sie ein iPad!
Machen Sie mit beim freenet Homepage Award 2011


RE: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-17 Thread Pierre GOSSE
I do have the xml preamble ?xml version=1.0 encoding=UTF-8? in my config 
file in conf/Catalina/localhost/ and solr starts ok with Tomcat 7.0.8. Haven't 
try with 7.0.11 yet.

I wonder why your exception point to line 4 column 6, however. Shouldn't it 
point to line 1 column 1 ? Do you have some blank lines at the start of your 
XML file or some non blank lines ?

Pierre

-Message d'origine-
De : François Schiettecatte [mailto:fschietteca...@gmail.com] 
Envoyé : jeudi 17 mars 2011 14:48
À : solr-user@lucene.apache.org
Objet : Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11 

Lewis

My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my 
context file and it does not have the xml preamble your has, specifically: 
'?xml version=1.0 encoding=utf-8?', 


Here is my context file:

Context docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war 
debug=0 crossContext=true 
   Environment name=solr/home type=java.lang.String 
value=/home/omim/index/ override=true /
/Context
---

Hope this helps.

Cheers

François


On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote:

 Hello list,
 
 Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the
 past I have been using guidance in accordance with
 http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat
 but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems
 E.g.
 
 INFO: Deploying configuration descriptor wombra.xml  This is my context
 fragment
 from /home/lewis/Downloads/apache-tomcat-7.0.11/conf/Catalina/localhost
 16-Mar-2011 16:57:36 org.apache.tomcat.util.digester.Digester fatalError
 SEVERE: Parse Fatal Error at line 4 column 6: The processing instruction
 target matching [xX][mM][lL] is not allowed.
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 16-Mar-2011 16:57:36 org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor wombra.xml
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 some more
 ...
 
 My configuration descriptor is as follows
 ?xml version=1.0 encoding=utf-8?
 Context docBase=/home/lewis/Downloads/wombra/wombra.war
 crossContext=true
  Environment name=solr/home type=java.lang.String
 value=/home/lewis/Downloads/wombra override=true/
 /Context
 
 Preferably I would upload a WAR file, but I have been working well with
 the configuration I have been using up until now therefore I didn't
 question change.
 I am unfamiliar with the above errors. Can anyone please point me in the
 right direction?
 
 Thank you
 Lewis
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education's Widening Participation Initiative of the 
 Year 2009 and Herald Society's Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education's Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html



RE: Concurrent updates/commits

2011-02-09 Thread Pierre GOSSE
 However, the Solr book, in the Commit, Optimise, Rollback section reads:
 if more than one Solr client were to submit modifications and commit them 
 at similar times, it is possible for part of one client's set of changes to 
 be committed before that client told Solr to commit
 which suggests that requests are *not* serialised.

I read this as If two client submit modifications and commits every couple of 
minutes, it could happen that modifications of client1 got committed by 
client2's commit before client1 asks for a commit.

As far as I understand Solr commit, they are serialized by design. And 
committing too often could lead you to trouble if you have many warm-up queries 
(?).

Hope this helps,

Pierre
-Message d'origine-
De : Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] 
Envoyé : mercredi 9 février 2011 16:34
À : solr-user@lucene.apache.org
Objet : Concurrent updates/commits

Hello,

This topic has probably been covered before here, but we're still not very
clear about how multiple commits work in Solr.
We currently have a requirement to make our domain objects searchable
immediately after the get updated in the database by some user action. This
could potentially cause multiple updates/commits to be fired to Solr and we
are trying to investigate how Solr handles those multiple requests.

This thread:
http://search-lucene.com/m/0cab31f10Mh/concurrent+commitssubj=commit+concurrency+full+text+search

suggests that Solr will handle all of the lower level details and that Before
a *COMMIT* is done , lock is obtained and its released  after the
operation
which in my understanding means that Solr will serialise all update/commit
requests?

However, the Solr book, in the Commit, Optimise, Rollback section reads:
if more than one Solr client were to submit modifications and commit them
at similar times, it is possible for part of one client's set of changes to
be committed before that client told Solr to commit
which suggests that requests are *not* serialised.

Our questions are:
- Does Solr handle concurrent requests or do we need to add synchronisation
logic around our code?
- If Solr *does* handle concurrent requests, does it serialise each request
or has some other strategy for processing those?


Thanks,
- Savvas


RE: Concurrent updates/commits

2011-02-09 Thread Pierre GOSSE
Well, Jonathan explanations are much more accurate than mine. :)

I took the word serialization as meaning kind of isolation between commits, 
which is not very smart. Sorry to have introduce more confusion in this.

Pierre

-Message d'origine-
De : Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] 
Envoyé : mercredi 9 février 2011 17:04
À : solr-user@lucene.apache.org
Objet : Re: Concurrent updates/commits

Hello,

Thanks very much for your quick replies.

So, according to Pierre, all updates will be immediately posted to Solr, but
all commits will be serialised. But doesn't that contradict Jonathan's
example where you can end up with FIVE 'new indexes' being warmed? If
commits are serialised, then there can only ever be one Index Searcher being
auto-warmed at a time or have I got this wrong?

The reason we are investigating commit serialisation, is because we want to
know whether the commit requests will be blocked until the previous ones
finish.

Cheers,
- Savvas

On 9 February 2011 15:44, Pierre GOSSE pierre.go...@arisem.com wrote:

  However, the Solr book, in the Commit, Optimise, Rollback section
 reads:
  if more than one Solr client were to submit modifications and commit
 them
  at similar times, it is possible for part of one client's set of changes
 to
  be committed before that client told Solr to commit
  which suggests that requests are *not* serialised.

 I read this as If two client submit modifications and commits every couple
 of minutes, it could happen that modifications of client1 got committed by
 client2's commit before client1 asks for a commit.

 As far as I understand Solr commit, they are serialized by design. And
 committing too often could lead you to trouble if you have many warm-up
 queries (?).

 Hope this helps,

 Pierre
 -Message d'origine-
 De : Savvas-Andreas Moysidis [mailto:
 savvas.andreas.moysi...@googlemail.com]
 Envoyé : mercredi 9 février 2011 16:34
 À : solr-user@lucene.apache.org
 Objet : Concurrent updates/commits

 Hello,

 This topic has probably been covered before here, but we're still not very
 clear about how multiple commits work in Solr.
 We currently have a requirement to make our domain objects searchable
 immediately after the get updated in the database by some user action. This
 could potentially cause multiple updates/commits to be fired to Solr and we
 are trying to investigate how Solr handles those multiple requests.

 This thread:

 http://search-lucene.com/m/0cab31f10Mh/concurrent+commitssubj=commit+concurrency+full+text+search

 suggests that Solr will handle all of the lower level details and that
 Before
 a *COMMIT* is done , lock is obtained and its released  after the
 operation
 which in my understanding means that Solr will serialise all update/commit
 requests?

 However, the Solr book, in the Commit, Optimise, Rollback section reads:
 if more than one Solr client were to submit modifications and commit them
 at similar times, it is possible for part of one client's set of changes to
 be committed before that client told Solr to commit
 which suggests that requests are *not* serialised.

 Our questions are:
 - Does Solr handle concurrent requests or do we need to add synchronisation
 logic around our code?
 - If Solr *does* handle concurrent requests, does it serialise each request
 or has some other strategy for processing those?


 Thanks,
 - Savvas



RE: Problem in faceting

2011-02-04 Thread Pierre GOSSE
Using a facet query like

facet.query=+water +treatement +plant 

... should give a count of 0 to documents not having all tree terms. This could 
do the trick, if I understand how this parameter works.