RE: Reverse sort facet query [SOLR-1672]

2010-01-04 Thread Peter 4U


 

 Date: Sun, 3 Jan 2010 22:18:33 -0800
 From: hossman_luc...@fucit.org
 To: solr-user@lucene.apache.org
 Subject: RE: Reverse sort facet query [SOLR-1672]
 
 
 : Yes, I thought about adding some 'new syntax', but I opted for a separate 
 'facet.sortorder' parameter,
 : 
 : mainly because I'm not familiar enough with the codebase to know what 
 effect this might have on
 : 
 : backward compatibility. It would be easy enough to modify the patch I 
 created to do it this way.
 
 it shouldn't really affect anything -- it wouldn't really be new syntax, 
 just extending hte existing sort param syntax to apply to the 
 facet.sort param. The only back compat concern is making sure we 
 continue to support true/false as aliases, and having the default order 
 match the current bahvior if asc/desc aren't specified.
 
 
 -Hoss
 


Yes, agreed. The current patch doesn't touch the b/w true/false aliasing, and 
any move to adding a new attr can keep all that intact.

I've been using the current patch extensively in our testing, and that's 
working well. The only caveat to this is that the reverse sort results

don't include 0-count facets (see notes in SOLR-1672), so reverse sort results 
start with the first count=1. This could be confusing as

there could well be many facets whose count is 0, and it might be expected that 
these be returned in the first instance.

From my admittedly cursory look into the codebase regading this, I believe 
patching to include 0 counts could open a can of worms in terms

of b/w compat and performance, as 0 counts look to be skipped (by default). I 
could be wrong, and you may know better how changes to 
SimpleFacets/UnInvertedField would affect performance and compatibility.

If there is indeed a performance optimization in facet counting iteration, it 
would, imo, be preferable to have the optimization, rather than the 0-counts.

 

Would you like me to go ahead and amend the patch (w/o 0-counts) to define a 
new 'sort' parameter? 

For naming, I would propose an extension of FacetParams.FACET_SORT_COUNT ala:

 

public static final String FACET_SORT_COUNT_REVERSE = count.reverse;

 

I can then easily modify the patch to detect/use this value to invoke the new 
behaviour.

Comments? Suggestions?

 

Thanks,

Peter

 

 

 

 
  
_
Have more than one Hotmail account? Link them together to easily access both
 http://clk.atdmt.com/UKM/go/186394591/direct/01/

RE: Reverse sort facet query [SOLR-1672]

2009-12-28 Thread Peter 4U

 in Solr 1.4 the boolean syntax was deprecated in place of keywords that 
 are more meaninful...
 http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort
 
 ... count and index replaced true and false


Yes, I thought about adding some 'new syntax', but I opted for a separate 
'facet.sortorder' parameter,

mainly because I'm not familiar enough with the codebase to know what effect 
this might have on

backward compatibility. It would be easy enough to modify the patch I created 
to do it this way.

[see SOLR-1672]

 

Thanks,

Peter

 


 
 Date: Thu, 24 Dec 2009 22:24:25 -0800
 From: hossman_luc...@fucit.org
 To: solr-user@lucene.apache.org
 Subject: RE: Reverse sort facet query
 
 
 : I'll have a look at SimpleFacets.java to look at patching it. I should 
 : think the sorting bit will be relatively straightforward. The tricky bit 
 : is how to submit the request via the query interface - there's only a 
 : boolean
 
 in Solr 1.4 the boolean syntax was deprecated in place of keywords that 
 are more meaninful...
 http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort
 
 ... count and index replaced true and false
 
 we could always start supporting count desc and count asc (with 
 count as an alias for count desct
 
 : The reverse facet query is for when you want to know which event (or 
 : group of event types) has happened the least
 
 got it, thanks.
 
 
 
 -Hoss
 
  
_
Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
http://clk.atdmt.com/UKM/go/186394592/direct/01/

RE: Reverse sort facet query

2009-12-16 Thread Peter 4U

Hello,

 

Thanks very much for your answer.

 

I'll have a look at SimpleFacets.java to look at patching it. I should think 
the sorting bit will be relatively straightforward. The tricky bit is how to 
submit the request via the query interface - there's only a boolean

for facet sorting - would probably require a new parameter so as to maintain bw 
compatilibity [e.g. facet.reversesort=true] (if you have any thoughts on how 
you would like to see such functionality integrated into a query, let me know). 
When I have something working, I'll probably have to ask you the best way to 
submit a patch for this.

 

The use case is pretty straightforward, really:

 

In my case, the index is collecting/storing network events (logs, firewall 
events, Win event logs etc.).

 

The reverse facet query is for when you want to know which event (or group of 
event types) has happened the least

over a given period of time.

 

As a simple example:

Let's say you want to look at who has been logging in to a secure server over 
the past week, and this server is normally

accessed by only a handful of users.

But you don't want to know the 'typical' users that have logged in, you want to 
know who's only logged-in once, at say

3 o'clock in the morning on Wednesday. Hmmm, why's he/she doing that?

 

Here, a 'rare' query will show you the atypical behaviour.

 

Capacity Planning and Performance Monitoring is another example - where you 
might want to know which machines have 

produced the least number of errors or the least amount of traffic.

 

Outside of networking, stock control would be another example - 'what items are 
we about to run out of?'

 

Thanks,

Peter

 

 

 
 Date: Tue, 15 Dec 2009 13:12:44 -0800
 From: hossman_luc...@fucit.org
 To: solr-user@lucene.apache.org
 Subject: Re: Reverse sort facet query
 
 
 : Does anyone know of a good way to perform a reverse-sorted facet query 
 (i.e. rarest first)?
 
 I'm fairly confident that code doesn't exist at the moment. 
 
 If i remember correctly, it would be fairly simply to implement if you'd 
 like to submit a patch: when sorting by count a simple bounded priority 
 queue is used, so we'd just have the change the comparator. If you're 
 interested in working on a patch it should be in SimpleFacets.java. I 
 think the queue is called BoundedTreeSet
 
 
 (that's a pretty novel request actually ... i don't remember anyone else 
 ever asking for anything like this before .. can you describe your use 
 case a bit -- i'm curious as to how/when you would use this data)
 
 
 
 -Hoss
 
  
_
Use Hotmail to send and receive mail from your different email accounts
http://clk.atdmt.com/UKM/go/186394592/direct/01/

Reverse sort facet query

2009-12-10 Thread Peter 4U

Hello Forum,

 

I've had a search in the mail archives and on the 'net, but I'm sure I wouldn't 
be the first to have a requirement for this:

 

Does anyone know of a good way to perform a reverse-sorted facet query (i.e. 
rarest first)?

 

As you know facet.sort toggles between sorting on count or field name, but 
there's no built-in method for reverse count.

 

One way I've found to do this is to set facet.limit=-1 (and facet.mincount) to 
get the entire list, then take 'bottom-5' to get a 'rare' list.

This works, but it's not great for very large lists.

 

Does anyone know of a better way?

 

Many thanks,

Peter

 
  
_
Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
http://clk.atdmt.com/UKM/go/186394592/direct/01/

RE: Facet query with special characters

2009-12-09 Thread Peter 4U

Hi,

 

Thanks for your help and answers. I believe I have isolated the issue, and yes, 
it was 'schema/write'-related.

 

Basically, the issue was this:

All indexing is performed via solrj objects (to an EmbeddedSolrServer 
instance), and this was ported over from 'raw' Lucene java indexing code. When 
I moved over to SolrJ, I hadn't realized that the schema.xml file will then 
affect all writes for the given type. Once I sorted out my schema properly, and 
reindexed - queries started behaving as expected.

 

Thank you very much for your excellent insight - I'm quite new to Solr, so it's 
really great to have an expert show me the err of my ways. I had only recently 
discovered the power of debugQuery=true - awesomely good!

 

Many thanks again,

Peter

 

 
 Date: Tue, 8 Dec 2009 09:35:31 -0800
 From: hossman_luc...@fucit.org
 To: solr-user@lucene.apache.org
 Subject: RE: Facet query with special characters
 
 
 : Note that I am (supposed to be) indexing/searching without analysis 
 : tokenization (if that's the correct term) - i.e. field values like 
 : 'pds-comp.domain' shouldn't be (and I believe aren't) broken up as in 
 : 'pds', 'comp' 'domain' etc. (e.g. using the 'text_ws' fieldtype).
 ...
 : What would be your opinion on the best way to index/analyze/not-analyze 
 such fields?
 
 a whitespace tokenizer is probeably the best bet, but in order to be 
 certain what's going on, you would need to look at a few things (and if 
 you wanted help from other people, you would need to post those things) 
 that i mentioned before
 
 :  check your analysis configuration for this fieldtype, in particular look 
 :  at what debugQuery produces for your parsed query, and look at what 
 :  analysis.jsp says it will do at query time with the input string 
 :  pds-comp.domain ... because it sounds like you have a disconnect 
 between 
 :  how the text is indexed and how it is searched. adding a * to your 
 
 ...so what does your schema look like, what is the outputfrom debugQuery, 
 what is the output from analysis.jsp, etc...
 
 -Hoss
 
  
_
Have more than one Hotmail account? Link them together to easily access both
 http://clk.atdmt.com/UKM/go/186394591/direct/01/

RE: Facet query with special characters

2009-12-08 Thread Peter 4U

Hello Hoss,

 

Many thanks for your answer.

That's very interesting.

So, are you saying this is an issue on the index side, rather than the query 
side?

Note that I am (supposed to be) indexing/searching without analysis 
tokenization (if that's the correct term) - i.e. field values like 
'pds-comp.domain' shouldn't be (and I believe aren't) broken up as in 'pds', 
'comp' 'domain' etc. (e.g. using the 'text_ws' fieldtype).

 

What would be your opinion on the best way to index/analyze/not-analyze such 
fields?

 

Thanks!

Peter


 
 Date: Mon, 7 Dec 2009 15:30:47 -0800
 From: hossman_luc...@fucit.org
 To: solr-user@lucene.apache.org
 Subject: Re: Facet query with special characters
 
 
 
 : When performing a facet query where part of the value portion has a 
 : special character (a minus sign in this case), the query returns zero 
 : results unless I put a wildcard (*) at the end.
 
 check your analysis configuration for this fieldtype, in particular look 
 at what debugQuery produces for your parsed query, and look at what 
 analysis.jsp says it will do at query time with the input string 
 pds-comp.domain ... because it sounds like you have a disconnect between 
 how the text is indexed and how it is searched. adding a * to your 
 input query forces it to make a WildcardQuery which doesn't use analysis, 
 so you get a match on the literal token.
 
 in short: i suspect your problem has nothing to do with query string 
 escaping, and everything to do with field tokenization.
 
 
 -Hoss
 
  
_
View your other email accounts from your Hotmail inbox. Add them now.
http://clk.atdmt.com/UKM/go/186394592/direct/01/

RE: Embedded for write, HTTP for read - cache aging

2009-12-07 Thread Peter 4U

Hi Erik,

 

Thanks for your answer.

 

Yes, I've done an /update to the http server, which certainly works as far as 
the 'reading' goes.

This sends the update to the back-end index though, which essentially defeats 
the purpose of having the embedded instance do the write (as writes are always 
local, but reads might be remote, the goal is for super-fast writes, at the 
potential cost of slower reads). Maybe the http server can be set as 
'Read-only' (redirected /update handler) so that it doesn't hit the back-end 
indexer, but still tells it to check the index on the next read?

 

The main performance bottleneck isn't Solr itself, but the HTTP 
wrapping/transmission.

At low traffic rates, it really makes no difference at all.

But when you get into 1000's writes/sec the http wrapping and transmission 
becomes more and more significant as the traffic rate rises. On average, we've 
seen ~3-8% efficiency increase at very high rates (using a typical Windows TCP 
stack). This might not seem like much, but at really high screaming input 
rates, it does make a difference.

The EmbeddedSolr instance itself wraps each request into an XML request, so I 
believe the performance of the EmbeddedSolr instance could be increased if it 
handled requests without any wrapping at all (NamedList).

 

Thanks,

Peter

 


 
 From: erik.hatc...@gmail.com
 To: solr-user@lucene.apache.org
 Subject: Re: Embedded for write, HTTP for read - cache aging
 Date: Mon, 7 Dec 2009 05:49:01 +0100
 
 
 On Dec 5, 2009, at 12:56 PM, Peter 4U wrote:
  Does anyone know of a way to tell an http SolrServer to reload its 
  back-end index (mark cache as dirty) periodically?
 
 Send a commit/ to the HTTP SolrServer.
 
  I have a scenario where an EmbeddedSolrServer is used for writing 
  (for fast indexing), and an
 
  CommonsHttpSolrServer for reading (for remote access).
 
 I'm curious, now much faster is it in your situation?
 
 Erik
 
  
_
Have more than one Hotmail account? Link them together to easily access both
 http://clk.atdmt.com/UKM/go/186394591/direct/01/

Embedded for write, HTTP for read - cache aging

2009-12-05 Thread Peter 4U

Hello,

 

Does anyone know of a way to tell an http SolrServer to reload its back-end 
index (mark cache as dirty) periodically?

 

I have a scenario where an EmbeddedSolrServer is used for writing (for fast 
indexing), and an

CommonsHttpSolrServer for reading (for remote access).

 

If the http server is used for writing, reading clients pick up any updates, as 
the /update has gone 'through' the http server.

For very high indexing rates, I'd rather not have to build an http request for 
every write (or group of writes), since the writer is always on the same 
machine as the index.

 

Any help on this is much appreciated.

 

Thanks,

Peter

 
  
_
View your other email accounts from your Hotmail inbox. Add them now.
http://clk.atdmt.com/UKM/go/186394592/direct/01/

Question: Write to Solr but not via http, and still store date_format

2009-12-04 Thread Peter 4U

Hi Solr team,

 

Has anyone been able to write to Solr, keeping things like 'date_format', but 
indexing directly, rather than via http?

 

I've been indexing using Lucene Java, and this works well and is very fast, 
except that any data indexed this way doesn't store date_format et al 
information (date.format resuts always return 0).

I like indexing directly into Lucene, rather than via http requests, as it is 
much faster, particularly at very high input rates.

 

Anyone encountered this and managed to solve it?

 

Many thanks,

peter

 
  
_
Got more than one Hotmail account? Save time by linking them together
 http://clk.atdmt.com/UKM/go/186394591/direct/01/

Answer: RE: Question: Write to Solr but not via http, and still store date_format

2009-12-04 Thread Peter 4U

Oops, of course the answer was staring me in the face!

   -- Use the EmbeddedSolrServer, rather than the CommonsHttpSolrServer.

 

Live and learn. Live. and learn.

 

Thanks,

Peter

 


 
 From: pete...@hotmail.com
 To: solr-user@lucene.apache.org
 Subject: Question: Write to Solr but not via http, and still store date_format
 Date: Fri, 4 Dec 2009 20:09:19 +
 
 
 Hi Solr team,
 
 
 
 Has anyone been able to write to Solr, keeping things like 'date_format', but 
 indexing directly, rather than via http?
 
 
 
 I've been indexing using Lucene Java, and this works well and is very fast, 
 except that any data indexed this way doesn't store date_format et al 
 information (date.format resuts always return 0).
 
 I like indexing directly into Lucene, rather than via http requests, as it is 
 much faster, particularly at very high input rates.
 
 
 
 Anyone encountered this and managed to solve it?
 
 
 
 Many thanks,
 
 peter
 
 
 
 _
 Got more than one Hotmail account? Save time by linking them together
 http://clk.atdmt.com/UKM/go/186394591/direct/01/
  
_
Got more than one Hotmail account? Save time by linking them together
 http://clk.atdmt.com/UKM/go/186394591/direct/01/

Facet query with special characters

2009-12-03 Thread Peter 4U

Hello,
 
I've encountered some strange behaviour in Solr facet querying, and I've not 
been able to find anything on this on the web.
Perhaps someone can shed some light on this?
 
The problem:
When performing a facet query where part of the value portion has a special 
character (a minus sign in this case), the query returns zero results unless I 
put a wildcard (*) at the end.


 
Here is my query:
 
This produces zero 'numFound':
http://localhost:8983/solr/select/?wt=xmlindent=onrows=20q=((signature:3083 
AND host:pds-comp.domain)) AND _time:[091119124039 TO 
091203124039]facet=truefacet.field=hostfacet.field=sourcetypefacet.field=userfacet.field=signature
 
This produces 28 'numFound':
http://localhost:8983/solr/select/?wt=xmlindent=onrows=20q=((signature:3083 
AND host:pds-comp.domain*)) AND _time:[091119124039 TO 
091203124039]facet=truefacet.field=hostfacet.field=sourcetypefacet.field=userfacet.field=signature


(Note: all hit results are for hostpds-comp.domain/host - there are no 
other characters in the resulting field values)


I've tried escaping the minus sign in various ways, encoding etc., but nothing 
seems to work.
Can anyone help?
 
Many thanks,
Peter

  
_
Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
http://clk.atdmt.com/UKM/go/186394592/direct/01/

RE: Facet query with special characters

2009-12-03 Thread Peter 4U

Hello Solr Forum,

 

I believe I have found a solution (workaround?) for performing an explicit 
(non-wildcarded) field query with values that contain special (escaped) 
characters.

 

Instead of:

  field:value-with-escape-chars

change this to:

  field:[value-with-escape-chars TO value-with-escape-chars]

 

(Note that for SolrJ, use QueryParser.escape(), to ultimately turn this into:  
field:[\value\-with\-escape\-chars\ TO \value\-with\-escape\-chars\])

 

If the value being queried has no special characters (e.g. host:localhost), the 
above is not necessary, which leads me to believe this more of a workaround 
than the 'supported way'. Please do correct me/clarify if you know differently, 
or know of a better/more efficient method.

 

In early tests with 200,000+ hits, there appears no performance hit for using 
the range form. Not sure if this affects performance for millions+ hits.

 

Thanks,

Peter

 


 
 From: pete...@hotmail.com
 To: solr-user@lucene.apache.org
 Subject: Facet query with special characters
 Date: Thu, 3 Dec 2009 13:29:45 +
 
 
 Hello,
 
 I've encountered some strange behaviour in Solr facet querying, and I've not 
 been able to find anything on this on the web.
 Perhaps someone can shed some light on this?
 
 The problem:
 When performing a facet query where part of the value portion has a special 
 character (a minus sign in this case), the query returns zero results unless 
 I put a wildcard (*) at the end.
 
 
 
 Here is my query:
 
 This produces zero 'numFound':
 http://localhost:8983/solr/select/?wt=xmlindent=onrows=20q=((signature:3083
  AND host:pds-comp.domain)) AND _time:[091119124039 TO 
 091203124039]facet=truefacet.field=hostfacet.field=sourcetypefacet.field=userfacet.field=signature
 
 This produces 28 'numFound':
 http://localhost:8983/solr/select/?wt=xmlindent=onrows=20q=((signature:3083
  AND host:pds-comp.domain*)) AND _time:[091119124039 TO 
 091203124039]facet=truefacet.field=hostfacet.field=sourcetypefacet.field=userfacet.field=signature
 
 
 (Note: all hit results are for hostpds-comp.domain/host - there are no 
 other characters in the resulting field values)
 
 
 I've tried escaping the minus sign in various ways, encoding etc., but 
 nothing seems to work.
 Can anyone help?
 
 Many thanks,
 Peter
 
 
 _
 Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
 http://clk.atdmt.com/UKM/go/186394592/direct/01/
  
_
Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
http://clk.atdmt.com/UKM/go/186394592/direct/01/