Re: NRT and warmupTime of filterCache

2011-03-10 Thread stockii
 it'll negatively impact the desired goal of low latency new index readers?
- yes, i think so, thats the reason because i dont understand the
wiki-article ...

i set the warmupCount to 500 and i got no error messages, that solr isnt
available ...
but solr-stats.jsp show me a warmuptime of warmupTime : 12174  why ? 

is the warmuptime in solrconfig.xml the maximum time in ms, for autowarming
? or what does it really means ? 

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2659560.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: NRT and warmupTime of filterCache

2011-03-10 Thread stockii
okay, not the time ... the items ...

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2659562.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ and digest authentication

2011-03-10 Thread Erlend Garåsen


I figured it out. Since this Solr server does not has an SSL interface, 
I had to change the following line from 443 to 80:

AuthScope scope = new AuthScope(host, 80, resin);

Erlend

On 09.03.11 17.09, Erlend Garåsen wrote:


I'm trying to do a search with SolrJ using digest authentication, but
I'm getting the following error:
org.apache.solr.common.SolrException: Unauthorized

I'm setting up SolrJ this way:

HttpClient client = new HttpClient();
ListString authPrefs = new ArrayListString();
authPrefs.add(AuthPolicy.DIGEST);
client.getParams().setParameter(AuthPolicy.AUTH_SCHEME_PRIORITY,
authPrefs);
AuthScope scope = new AuthScope(host, 443, resin);
client.getState().setCredentials(scope, new
UsernamePasswordCredentials(username, password));
client.getParams().setAuthenticationPreemptive(true);
SolrServer server = new CommonsHttpSolrServer(server, client);

Is this something which is not supported by SolrJ or have I written
something wrong in the code above?

Erlend




--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Possible to sort in .xml file?

2011-03-10 Thread Andy Newby
Hi,

I'm trying to setup Solr so that we can sort using:

document_views asc,score

...is this possible via the solrconfig.xml/schema.xml file?

I know its possible to do via adding sort= , but the Perl module
(WebService::Solr) doesn't seem to offer the option to pass in this value :(

TIA
-- 
Andy Newby
a...@ultranerds.com


Re: Possible to sort in .xml file?

2011-03-10 Thread Markus Jelsma
Is there no generic parameter store in the Solr module you can use for passing 
the sort parameter? If not, you can define your sort parameter as default in 
the request handler you use in solrconfig. See the shipped config for 
examples.

On Thursday 10 March 2011 11:25:01 Andy Newby wrote:
 Hi,
 
 I'm trying to setup Solr so that we can sort using:
 
 document_views asc,score
 
 ...is this possible via the solrconfig.xml/schema.xml file?
 
 I know its possible to do via adding sort= , but the Perl module
 (WebService::Solr) doesn't seem to offer the option to pass in this value
 :(
 
 TIA

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Possible to sort in .xml file?

2011-03-10 Thread Markus Jelsma
No, look for request handlers.

  requestHandler name=search class=solr.SearchHandler default=true
!-- default values for query parameters can be specified, these
 will be overridden by parameters in the request
  --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
 /lst
!-- In addition to defaults, appends params can be specified
 to identify values which should be appended to the list of
 multi-val params from the query (or the existing defaults).
  --
!-- In this example, the param fq=instock:true would be appended to
 any query time fq params the user may specify, as a mechanism for
 partitioning the index, independent of any user selected filtering
 that may also be desired (perhaps as a result of faceted searching).

 NOTE: there is *absolutely* nothing a client can do to prevent these
 appends values from being used, so don't use this mechanism
 unless you are sure you always want it.
  --
!--
   lst name=appends
 str name=fqinStock:true/str
   /lst
  --


etc... You can add any valid parameter there as default.

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

On Thursday 10 March 2011 11:34:47 Andy Newby wrote:
 Hi,
 
 Thanks for the quick reply!
 
 I did a quick look in the solrconfig.xml file, but can't see anything about
 sort, appart from:
 
!-- An optimization that attempts to use a filter to satisfy a search.
  If the requested sort does not include score, then the filterCache
  will be checked for a filter matching the query. If found, the
 filter
  will be used as the source of document ids, and then the sort will
 be
  applied to that.
 useFilterForSortedQuerytrue/useFilterForSortedQuery
--
 
 
 TIA
 
 Andy
 
 On Thu, Mar 10, 2011 at 10:33 AM, Markus Jelsma
 
 markus.jel...@openindex.iowrote:
  Is there no generic parameter store in the Solr module you can use for
  passing
  the sort parameter? If not, you can define your sort parameter as default
  in
  the request handler you use in solrconfig. See the shipped config for
  examples.
  
  On Thursday 10 March 2011 11:25:01 Andy Newby wrote:
   Hi,
   
   I'm trying to setup Solr so that we can sort using:
   
   document_views asc,score
   
   ...is this possible via the solrconfig.xml/schema.xml file?
   
   I know its possible to do via adding sort= , but the Perl module
   (WebService::Solr) doesn't seem to offer the option to pass in this
   value
   
   :(
   
   TIA
  
  --
  Markus Jelsma - CTO - Openindex
  http://www.linkedin.com/in/markus17
  050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


disquery - difference qf qs / pf ps

2011-03-10 Thread Gastone Penzo
Hi
i understand what qf and qs parameters are
but i can't understand what pf and ps are exactly.
someone can explain it to me??

for example

qf=title^2 name^1.2 surname^1
qs=3

it means i search in title field with boost 2 or in name field with boost
1.2 or in surname field with boost 1
and the maximum slop beetween term to match is 3.

right??

and the ps? pf? (phrase filter and phrase slop)?
can i use all 4 parameters together??

Thanx

-- 
Gastone Penzo


Re: disquery - difference qf qs / pf ps

2011-03-10 Thread Ahmet Arslan
 Hi
 i understand what qf and qs parameters are
 but i can't understand what pf and ps are exactly.
 someone can explain it to me??
 
 for example
 
 qf=title^2 name^1.2 surname^1
 qs=3
 
 it means i search in title field with boost 2 or in name
 field with boost
 1.2 or in surname field with boost 1
 and the maximum slop beetween term to match is 3.
 
 right??
 
 and the ps? pf? (phrase filter and phrase slop)?
 can i use all 4 parameters together??

Yes you can use all 4 parameters together. Please see similar discussion:
http://search-lucene.com/m/KWkYf2kE4Ng1/


  


Re: disquery - difference qf qs / pf ps

2011-03-10 Thread Gastone Penzo
Thank you very much. i understand the difference beetween qs and ps but not
what pf is...is it necessary to use ps?


  Yes you can use all 4 parameters together. Please see similar discussion:
 http://search-lucene.com/m/KWkYf2kE4Ng1/






-- 
Gastone Penzo


Re: Math-generated fields during query

2011-03-10 Thread Markus Jelsma
Not at the moment if i'm not mistaken. The same issue is with Solr 3.1 where 
relative distances are not being returned as field value when doing spatial 
filtering. To retrieve the value one must use the score as the some pseudo 
field.

http://wiki.apache.org/solr/SpatialSearch#Returning_the_distance

On Wednesday 09 March 2011 23:06:33 Peter Sturge wrote:
 Hi,
 
 I was wondering if it is possible during a query to create a returned
 field 'on the fly' (like function query, but for concrete values, not
 score).
 
 For example, if I input this query:
q=_val_:product(15,3)fl=*,score
 
 For every returned document, I get score = 45.
 
 If I change it slightly to add *:* like this:
q=*:* _val_:product(15,3)fl=*,score
 
 I get score = 32.526913.
 
 If I try my use case of _val_:product(qty_ordered,unit_price), I get
 varying scores depending on...well depending on something.
 
 I understand this is doing relevance scoring, but it doesn't seem to
 tally with the FunctionQuery Wiki
 [example at the bottom of the page]:
 
q=boxname:findbox+_val_:product(product(x,y),z)fl=*,score
 ...where score will contain the resultant volume.
 
 Is there a trick to getting not a score, but the actual value of
 quantity*price (e.g. product(5,2.21) == 11.05)?
 
 Many thanks

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: disquery - difference qf qs / pf ps

2011-03-10 Thread Ahmet Arslan


 i understand the
 difference beetween qs and ps but not
 what pf is...is it necessary to use ps?

pf (Phrase Fields) and ps (Phrase Slop) related to each other.

Lets say you have q=term1 term1pf=title textps=10

We can think as if dismax adds title:term1 term2~10 text:term1 term2~10 
imaginary  optional clauses to your original query. Optional means they effect 
order of documents.

http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29


  


Re: NRT in Solr

2011-03-10 Thread Jason Rutherglen
Bill,

I think all of the improvements can be made, however they are fairly
large structural changes that would require perhaps several patches.
The other issue is we'll likely land RT this year (or next) and then
the cached values need to be appended to as the documents are added,
that and they'll be across several DWPTs (see LUCENE-2324).  So one
could easily do work for per-segment caching, and then need to go back
and do per-segment, append caches.  I'm not sure caching is needed at
all, especially with the recent speed improvements, except for facets
which resemble field caches, and probably should be subsumed there.

Jason

On Wed, Mar 9, 2011 at 8:27 PM, Bill Bell billnb...@gmail.com wrote:
 So it looks like can handle adding new documents, and expiring old
 documents. Updating a document is not part of the game.
 This would work well for message boards or tweet type solutions.

 Solr can do this as well directly. Why wouldn't you just improve the
 document and facet caching so that when you append there is not a huge hit
 to Solr? Also we could add a expiration to documents as well.

 The big issue for me is that when I update Solr I need to replicate that
 change quickly to all slaves. If we changed replication to stream to the
 slaves in Near Real Time and not have to create a whole new index version,
 warming, etc, that would be awesome. That combined with better caching
 smarts and we have a near perfect solution.

 Thanks.

 On 3/9/11 3:29 PM, Smiley, David W. dsmi...@mitre.org wrote:

Zoie adds NRT to Solr:
http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin

I haven't tried it yet but looks cool.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On Mar 9, 2011, at 9:01 AM, Jason Rutherglen wrote:

 Jae,

 NRT hasn't been implemented NRT as of yet in Solr, I think partially
 because major features such as replication, caching, and uninverted
 faceting suddenly are no longer viable, eg, it's another round of
 testing etc.  It's doable, however I think the best approach is a
 separate request call path, to avoid altering to current [working]
 API.

 On Tue, Mar 8, 2011 at 1:27 PM, Jae Joo jaejo...@gmail.com wrote:
 Hi,
 Is NRT in Solr 4.0 from trunk? I have checkouted from Trunk, but could
not
 find the configuration for NRT.

 Regards

 Jae











Re: FunctionQueries and FieldCache and OOM

2011-03-10 Thread Markus Jelsma
Well, it's quite hard to debug because the values listed on the stats page in 
the fieldCache section don't make much sense. Reducing precision with 
NOW/HOUR, however, does seem to make a difference.

It is hard (or impossible) to reproduce this is a test setup with the same 
index but without continues updates and without stress tests. Firing manual 
queries with different values for the bf parameter don't show any difference 
in the values listed on the stats page.

Someone cares to provide an explanation?

Thanks

On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote:
 Hi,
 
 In one of the environments i'm working on (4 Solr 1.4.1. nodes with
 replication, 3+ million docs, ~5.5GB index size, high commit rate
 (~1-2min), high query rate (~50q/s), high number of updates
 (~1000docs/commit)) the nodes continuously run out of memory.
 
 During development we frequently ran excessive stress tests and after
 tuning JVM and Solr settings all ran fine. A while ago i added the DisMax
 bq parameter for boosting recent documents, documents older than a day
 receive 50% less boost, similar to the example but with a much steeper
 slope. For clarity, i'm not using the ordinal function but the reciprocal
 version in the bq parameter which is warned against when using Solr 1.4.1
 according to the wiki.
 
 This week we started the stress tests and nodes are going down again. I've
 reconfigured the nodes to have different settings for the bq parameter (or
 no bq parameter).
 
 It seems the bq the cause of the misery.
 
 Issue SOLR- keeps popping up but it has not been resolved. Is there
 anyone who can confirm one of those patches fixes this issue before i
 waste hours of work finding out it doesn't? ;)
 
 Am i correct when i assume that Lucene FieldCache entries are added for
 each unique function query?  In that case, every query is a unique cache
 entry because it operates on milliseconds. If all doesn't work i might be
 able to reduce precision by operating on minutes or even more instead of
 milli seconds. I, however, cannot use other nice math function in the ms()
 parameter so that might make things difficult.
 
 However, date math seems available (NOW/HOUR) so i assume it would also
 work for SOME_DATE_FIELD/HOUR as well. This way i just might prevent
 useless entries.
 
 My apologies for this long mail but it may prove useful for other users and
 hopefully we find the solution and can update the wiki to add this warning.
 
 Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: NRT and warmupTime of filterCache

2011-03-10 Thread Jason Rutherglen
 - yes, i think so, thats the reason because i dont understand the
 wiki-article ...

Maybe the article is out of date?  I think it's grossly inefficient to
warm the searchers at all in the NRT case.  Queries are being
performed across *all* segments, even though there should only be 1
that's new that may require warming.  However given the new segment's
so small, there should be no reason to warm it at all?

On Thu, Mar 10, 2011 at 12:14 AM, stockii stock.jo...@googlemail.com wrote:
 it'll negatively impact the desired goal of low latency new index readers?
 - yes, i think so, thats the reason because i dont understand the
 wiki-article ...

 i set the warmupCount to 500 and i got no error messages, that solr isnt
 available ...
 but solr-stats.jsp show me a warmuptime of warmupTime : 12174  why ?

 is the warmuptime in solrconfig.xml the maximum time in ms, for autowarming
 ? or what does it really means ?

 -
 --- System 
 

 One Server, 12 GB RAM, 2 Solr Instances, 7 Cores,
 1 Core with 31 Million Documents other Cores  100.000

 - Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
 - Solr2 for Update-Request  - delta every Minute - 4GB Xmx
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2659560.html
 Sent from the Solr - User mailing list archive at Nabble.com.



DIH : modify document in sibling entity of root entity

2011-03-10 Thread Chantal Ackermann
Dear all,

in DIH, is it possible to have two sibling entities where:

- the first one is the root entity that creates the documents by
iterating over a table that has one row per document.
- the second one is executed after the completion of the first entity
iteration, and it provides more data that is added to the newly created
documents.


I've set up such a dih configuration, and the second entity is executed,
but no data is written into the index apart from the data extracted by
the root entity  (=no document is modified?).

Documents are identified by the unique key 'id' which is defined by
pk=id on both entities.

Is this supposed to work at all? I haven't found anything so far on the
net but I could have used the wrong keywords for searching, of course.

As answer to the maybe obvious question why I'm not using a subentity:
I thought that this solution might be faster because it iterates over
the second data source instead of hitting it with a query per each
document.

Anyway, the main reason I tried this is because I want to know whether
it works. I'm still not sure whether it should work but I'm doing
something wrong...


Thanks!
Chantal



Re: FunctionQueries and FieldCache and OOM

2011-03-10 Thread Markus Jelsma
Alright, i can now confirm the issue has been resolved by reducing precision. 
The garbage collector on nodes without reduced precision has a real hard time 
keeping up and clearly shows a very different graph of heap consumption.

Consider using MINUTE, HOUR or DAY as precision in case you suffer from 
excessive memory consumption:

recip(ms(NOW/PRECISION,DATE_FIELD),TIME_FRACTION,1,1)

On Thursday 10 March 2011 15:14:25 Markus Jelsma wrote:
 Well, it's quite hard to debug because the values listed on the stats page
 in the fieldCache section don't make much sense. Reducing precision with
 NOW/HOUR, however, does seem to make a difference.
 
 It is hard (or impossible) to reproduce this is a test setup with the same
 index but without continues updates and without stress tests. Firing manual
 queries with different values for the bf parameter don't show any
 difference in the values listed on the stats page.
 
 Someone cares to provide an explanation?
 
 Thanks
 
 On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote:
  Hi,
  
  In one of the environments i'm working on (4 Solr 1.4.1. nodes with
  replication, 3+ million docs, ~5.5GB index size, high commit rate
  (~1-2min), high query rate (~50q/s), high number of updates
  (~1000docs/commit)) the nodes continuously run out of memory.
  
  During development we frequently ran excessive stress tests and after
  tuning JVM and Solr settings all ran fine. A while ago i added the DisMax
  bq parameter for boosting recent documents, documents older than a day
  receive 50% less boost, similar to the example but with a much steeper
  slope. For clarity, i'm not using the ordinal function but the reciprocal
  version in the bq parameter which is warned against when using Solr 1.4.1
  according to the wiki.
  
  This week we started the stress tests and nodes are going down again.
  I've reconfigured the nodes to have different settings for the bq
  parameter (or no bq parameter).
  
  It seems the bq the cause of the misery.
  
  Issue SOLR- keeps popping up but it has not been resolved. Is there
  anyone who can confirm one of those patches fixes this issue before i
  waste hours of work finding out it doesn't? ;)
  
  Am i correct when i assume that Lucene FieldCache entries are added for
  each unique function query?  In that case, every query is a unique cache
  entry because it operates on milliseconds. If all doesn't work i might be
  able to reduce precision by operating on minutes or even more instead of
  milli seconds. I, however, cannot use other nice math function in the
  ms() parameter so that might make things difficult.
  
  However, date math seems available (NOW/HOUR) so i assume it would also
  work for SOME_DATE_FIELD/HOUR as well. This way i just might prevent
  useless entries.
  
  My apologies for this long mail but it may prove useful for other users
  and hopefully we find the solution and can update the wiki to add this
  warning.
  
  Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: NRT and warmupTime of filterCache

2011-03-10 Thread stockii
 Maybe the article is out of date? 
  - maybe .. i dont know

in my case it make no sense and i use another configuration ...


-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/NRT-and-warmupTime-of-filterCache-tp2654886p2660814.html
Sent from the Solr - User mailing list archive at Nabble.com.


Error on string searching # [STRANGE]

2011-03-10 Thread Dario Rigolin
I have a text field indexed using WordDelimeter
Indexed in that way
doc
field name=myfieldS.#L.W.VI.37/field
...
/doc

Serching in that way:
http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)

Makes this error:

org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': 
Lexical error at line 1, column 17.  Encountered: EOF after : \S.

It seems that # is a wrong character for query... I try urlencoding o adding a 
slash before or removing quotes but other errors comes:

http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)

org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': 
Encountered EOF at line 1, column 15.
Was expecting one of:
AND ...
OR ...
NOT ...
+ ...
- ...
( ...
) ...
* ...
^ ...
QUOTED ...
TERM ...
FUZZY_SLOP ...
PREFIXTERM ...
WILDTERM ...
[ ...
{ ...
NUMBER ...


Any idea how to solve this?
Maybe a bug? Or probably I'm missing something.

Dario.


Re: Error on string searching # [STRANGE]

2011-03-10 Thread Juan Grande
I think that the problem is with the # symbol, because it has a special
meaning when used inside a URL. Try replacing it with %23, like this:
http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.%23L.W.VI.37)

Regards,
*
Juan G. Grande*
-- Solr Consultant @ http://www.plugtree.com
-- Blog @ http://juanggrande.wordpress.com


On Thu, Mar 10, 2011 at 12:45 PM, Dario Rigolin
dario.rigo...@comperio.itwrote:

 I have a text field indexed using WordDelimeter
 Indexed in that way
 doc
 field name=myfieldS.#L.W.VI.37/field
 ...
 /doc

 Serching in that way:
 http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)

 Makes this error:

 org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.':
 Lexical error at line 1, column 17.  Encountered: EOF after : \S.

 It seems that # is a wrong character for query... I try urlencoding o
 adding a
 slash before or removing quotes but other errors comes:

 http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)

 org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.':
 Encountered EOF at line 1, column 15.
 Was expecting one of:
AND ...
OR ...
NOT ...
+ ...
- ...
( ...
) ...
* ...
^ ...
QUOTED ...
TERM ...
FUZZY_SLOP ...
PREFIXTERM ...
WILDTERM ...
[ ...
{ ...
NUMBER ...


 Any idea how to solve this?
 Maybe a bug? Or probably I'm missing something.

 Dario.



Re: Math-generated fields during query

2011-03-10 Thread dan sutton
As a workaround can you not have a search component run after the
querycomponent, and have the qty_ordered,unit_price as stored fields
and returned with the fl parameter and have your custom component do
the calc, unless you need to sort by this value too?

Dan

On Wed, Mar 9, 2011 at 10:06 PM, Peter Sturge peter.stu...@gmail.com wrote:
 Hi,

 I was wondering if it is possible during a query to create a returned
 field 'on the fly' (like function query, but for concrete values, not
 score).

 For example, if I input this query:
   q=_val_:product(15,3)fl=*,score

 For every returned document, I get score = 45.

 If I change it slightly to add *:* like this:
   q=*:* _val_:product(15,3)fl=*,score

 I get score = 32.526913.

 If I try my use case of _val_:product(qty_ordered,unit_price), I get
 varying scores depending on...well depending on something.

 I understand this is doing relevance scoring, but it doesn't seem to
 tally with the FunctionQuery Wiki
 [example at the bottom of the page]:

   q=boxname:findbox+_val_:product(product(x,y),z)fl=*,score
 ...where score will contain the resultant volume.

 Is there a trick to getting not a score, but the actual value of
 quantity*price (e.g. product(5,2.21) == 11.05)?

 Many thanks



Re: Error on string searching # [STRANGE]

2011-03-10 Thread Dario Rigolin
On Thursday, March 10, 2011 04:53:51 pm Juan Grande wrote:
 I think that the problem is with the # symbol, because it has a special
 meaning when used inside a URL. Try replacing it with %23, like this:
 http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.%23L.W.VI.37)

If I do urlencoding and changing in %23 I get this error


3

java.lang.ArrayIndexOutOfBoundsException: 3
at 
org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhraseQuery.java:185)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:208)
at org.apache.lucene.search.Searcher.search(Searcher.java:88)

 
 Regards,
 *
 Juan G. Grande*
 -- Solr Consultant @ http://www.plugtree.com
 -- Blog @ http://juanggrande.wordpress.com
 
 
 On Thu, Mar 10, 2011 at 12:45 PM, Dario Rigolin
 
 dario.rigo...@comperio.itwrote:
  I have a text field indexed using WordDelimeter
  Indexed in that way
  doc
  field name=myfieldS.#L.W.VI.37/field
  ...
  /doc
  
  Serching in that way:
  http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)
  
  Makes this error:
  
  org.apache.lucene.queryParser.ParseException: Cannot parse
  'myfield:(S.': Lexical error at line 1, column 17.  Encountered: EOF
  after : \S.
  
  It seems that # is a wrong character for query... I try urlencoding o
  adding a
  slash before or removing quotes but other errors comes:
  
  http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)
  
  org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.':
  Encountered EOF at line 1, column 15.
  
  Was expecting one of:
 AND ...
 OR ...
 NOT ...
 + ...
 - ...
 ( ...
 ) ...
 * ...
 ^ ...
 QUOTED ...
 TERM ...
 FUZZY_SLOP ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
  
  Any idea how to solve this?
  Maybe a bug? Or probably I'm missing something.
  
  Dario.


Re: DIH : modify document in sibling entity of root entity

2011-03-10 Thread Stefan Matheis
Hi Chantal,

i'm not sure if i understood you correctly (if at all)? Two entities,
not arranged as sub-entitiy, but using values from the previous
entity? Could you paste your dataimport  the relevant part of the
logging-output?

Regards
Stefan

On Thu, Mar 10, 2011 at 4:12 PM, Chantal Ackermann
chantal.ackerm...@btelligent.de wrote:
 Dear all,

 in DIH, is it possible to have two sibling entities where:

 - the first one is the root entity that creates the documents by
 iterating over a table that has one row per document.
 - the second one is executed after the completion of the first entity
 iteration, and it provides more data that is added to the newly created
 documents.


 I've set up such a dih configuration, and the second entity is executed,
 but no data is written into the index apart from the data extracted by
 the root entity  (=no document is modified?).

 Documents are identified by the unique key 'id' which is defined by
 pk=id on both entities.

 Is this supposed to work at all? I haven't found anything so far on the
 net but I could have used the wrong keywords for searching, of course.

 As answer to the maybe obvious question why I'm not using a subentity:
 I thought that this solution might be faster because it iterates over
 the second data source instead of hitting it with a query per each
 document.

 Anyway, the main reason I tried this is because I want to know whether
 it works. I'm still not sure whether it should work but I'm doing
 something wrong...


 Thanks!
 Chantal




Re: True master-master fail-over without data gaps (choosing CA in CAP)

2011-03-10 Thread Otis Gospodnetic
Hi,



- Original Message 
 From: Jake Luciani jak...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wed, March 9, 2011 8:07:00 PM
 Subject: Re: True master-master fail-over without data gaps (choosing CA in 
CAP)
 
 Yeah sure.  Let me update this on the Solandra wiki. I'll send across  the
 link

Excellent.  You could include ES there, too, if you feel extra adventurous. ;)

 I think you hit the main two shortcomings  atm.

- Grandma, why are your eyes so big? 
- To see you better.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


 -Jake
 
 On Wed, Mar 9, 2011 at 6:17 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
   wrote:
 
  Jake,
 
  Maybe it's time to come up with the  Solandra/Solr matrix so we can see
  Solandra's strengths (e.g. RT, no  replication) and weaknesses (e.g. I think
  I
  saw a mention of  some big indices?) or missing feature (e.g. no delete by
  query),  etc.
 
  Thanks!
  Otis
  
  Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
 
  - Original  Message 
   From: Jake Luciani jak...@gmail.com
   To: solr-user@lucene.apache.org  solr-user@lucene.apache.org
Sent: Wed, March 9, 2011 6:04:13 PM
   Subject: Re: True  master-master fail-over without data gaps (choosing CA
  in
   CAP)
  
   Jason,
  
   It's  predecessor did, Lucandra. But Solandra is a new approach  that
   manages
  shards of documents across the cluster for you and uses  solrs  distributed
  search to query indexes.
   
  
   Jake
  
   On Mar 9, 2011, at  5:15  PM, Jason Rutherglen 
  jason.rutherg...@gmail.com
   wrote:
  
Doesn't Solandra partition by term  instead of  document?
   
On Wed, Mar 9,  2011 at 2:13 PM, Smiley, David W.  dsmi...@mitre.org
  wrote:
 I  was just about to jump in this conversation to mention  Solandra 
and
  go
  fig,  Solandra's committer comes in.  :-)   It was nice to meet you at
  Strata,
  Jake.

I haven't dug into the code yet but  Solandra  strikes me as a killer
  way to
  scale Solr. I'm  looking forward to playing with  it; particularly looking
   at
  disk requirements and performance  measurements.

~ David Smiley
   
  On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote:

Hi  Otis,

Have you considered using Solandra  with  Quorum writes
to achieve master/master with  CA  semantics?
   
 -Jake
   
   
 On Wed, Mar 9, 2011 at 2:48 PM, Otis  Gospodnetic
   otis_gospodne...@yahoo.com
  wrote:
   
 Hi,
   
  Original Message 
   
 From: Robert Petersen rober...@buy.com

Can't you skip the SAN  and keep the indexes  locally?  Then you
   would
 have two redundant  copies of the index and no  lock issues.
   
 I  could, but then I'd have the issue of keeping them in sync, which
   seems
more
 fragile.  I think SAN  makes things simpler overall.

Also,  Can't master02  just be a slave to master01 (in the master
  farm
  and
 separate from the slave farm) until such time as   master01 fails?
  Then
   
 No, because  it wouldn't be in sync.  It would always  be N minutes
  behind,
and
 when the primary master  fails, the secondary would not  have all 
the
  docs
  -
  data
loss.
   
  master02 would start receiving the new documents  with an   
indexes
complete up to the last  replication at least and  the other slaves
  would
 be directed by LB to poll  master02 also...

Yeah, complete up to   the last replication is the problem.  It's a
  data
  gap
that now needs to be  filled somehow.
   
 Otis

 Sematext  :: http://sematext.com/ ::  Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/

   
 -Original   Message-
From: Otis  Gospodnetic [mailto:otis_gospodne...@yahoo.com]
  Sent: Wednesday, March 09, 2011 9:47 AM
 To: solr-user@lucene.apache.org
  Subject:  Re: True master-master fail-over  without data gaps
  (choosing
  CA
 in  CAP)
   
 Hi,
   

- Original Message  
 From: Walter  Underwood  wun...@wunderwood.org

On  Mar 9, 2011,  at 9:02 AM, Otis Gospodnetic  wrote:

You   mean  it's  not possible to have 2 masters that are in
nearly
real-time
  sync?
 How  about with DRBD?  I know  people use  DRBD to keep 2  Hadoop
  NNs
 (their
 edit
   
 logs) in  sync to avoid the current NN   SPOF, for example, so 
I'm
  thinking
this

could be  doable with Solr masters, too,  no?

If you add   fault-tolerant, you run into the CAP  Theorem.
   Consistency,
   
 availability,  partition: choose two. You cannot  have  it   all.
   
 Right, so I'll take  Consistency and Availability, and  I'll  put 
my
  2
 masters  in
the same rack (which has redundant  switches,  power  supply, 

question regarding proper placement of geofilt in fq=

2011-03-10 Thread Jerry Mindek
Hi,

I am using rev 1036236 of solr trunk running as a servlet in Tomcat 7.
The doc set is sharded over 11 shards.
Currently, I have all the shards running in a single tomcat.

Please see the bottom of the email for the bits of my schema.xml and 
solrconfig.xml that might help you understand my configuration.

I am seeing what I think is strange behavior when I try to use the geofilt in a 
filter query.
Here's what I am seeing:


1.   If put the {!geofilt} as the last argument of the fq= parameter and I 
send the following distributed query to my sharded index:
/select?start=0rows=30q=foodfq=b_type:shops AND 
{!geofilt}qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=...
I get a syntax error. Which seems odd to me.


2.   If I move the {!geofilt} to the first position in the fq= and send the 
following distributed query:
/select?start=0rows=30q=foodfq={!geofilt} AND 
b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=...
Then only the geofilt is apply, not the b_type:T01. Which seems odd to me. I 
would expect both filters to be applied.


3.   Finally, when I submit this query as:
/select?start=0rows=30q=foodfq=_query_:{!geofilt} AND 
b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=...
This works as I had hoped, i.e. both the geofilt and the b_type filters are 
applied.

Am I trying to use geofilt in the wrong way or is this possibly a bug?

Thanks,
Jerry Mindek


!--schema.xml--
field name=cn type=text indexed=true stored=true required=true /
field name=dn type=string indexed=true stored=true required=false /
field name=t1 type=text indexed=true stored=true /
field name=ts type=string indexed=true  stored=true/
field name=lb type=text indexed=true   stored=false /
field name=sim type=string indexed=true   stored=true /
field name=s4_s type=text indexed=true   stored=false /
field name=stat type=string indexed=true stored=true  /
field name=pst type=text indexed=true  stored=true /
fieldType name=location class=solr.LatLonType 
subFieldSuffix=_coordinate/
...
field name=type b_type=string indexed=true stored=true/
field name=lat_long type=location indexed=true stored=true /
!-end snippet schema.xml--

!-solrconfig.xml --
requestHandler name=spatialdismax class=solr.DisMaxRequestHandler 
lst name=defaults
 str name=sortscore desc/str
 str name=facettrue/str
 str name=facet.mincount1/str
 str name=echoParamsexplicit/str
 int name=rows20/int
 float name=tie0.01/float
 str name=qf
cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
 /str
 str name=pf
cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
 /str
   str name=fldn, cn,  t1, stat, pst, pct, ts, sv, score/str
 str name=mm
2lt;-1 5lt;-2 6lt;90%
 /str
 int name=ps100/int
 str name=q.alt*:*/str
/lst
  /requestHandler
!-end snippet solrconfig.xml--



Re: Error on string searching # [STRANGE] [FIX]

2011-03-10 Thread Dario Rigolin
On Thursday, March 10, 2011 04:58:43 pm Dario Rigolin wrote:

It seems fixed by setting into WordDelimiterTokenizer

catenateWords=0 catenateNumbers=0 

Instead of 1 on both...

Nice to know...


 On Thursday, March 10, 2011 04:53:51 pm Juan Grande wrote:
  I think that the problem is with the # symbol, because it has a special
  meaning when used inside a URL. Try replacing it with %23, like this:
  http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.%23L.W.VI.37)
 
 If I do urlencoding and changing in %23 I get this error
 
 
 3
 
 java.lang.ArrayIndexOutOfBoundsException: 3
   at
 org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhr
 aseQuery.java:185) at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:208) at
 org.apache.lucene.search.Searcher.search(Searcher.java:88)
 
 
  Regards,
  *
  Juan G. Grande*
  -- Solr Consultant @ http://www.plugtree.com
  -- Blog @ http://juanggrande.wordpress.com
  
  
  On Thu, Mar 10, 2011 at 12:45 PM, Dario Rigolin
  
  dario.rigo...@comperio.itwrote:
   I have a text field indexed using WordDelimeter
   Indexed in that way
   doc
   field name=myfieldS.#L.W.VI.37/field
   ...
   /doc
   
   Serching in that way:
   http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)
   
   Makes this error:
   
   org.apache.lucene.queryParser.ParseException: Cannot parse
   'myfield:(S.': Lexical error at line 1, column 17.  Encountered: EOF
   after : \S.
   
   It seems that # is a wrong character for query... I try urlencoding o
   adding a
   slash before or removing quotes but other errors comes:
   
   http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)
   
   org.apache.lucene.queryParser.ParseException: Cannot parse
   'myfield:(S.': Encountered EOF at line 1, column 15.
   
   Was expecting one of:
  AND ...
  OR ...
  NOT ...
  + ...
  - ...
  ( ...
  ) ...
  * ...
  ^ ...
  QUOTED ...
  TERM ...
  FUZZY_SLOP ...
  PREFIXTERM ...
  WILDTERM ...
  [ ...
  { ...
  NUMBER ...
   
   Any idea how to solve this?
   Maybe a bug? Or probably I'm missing something.
   
   Dario.


Re: DIH : modify document in sibling entity of root entity

2011-03-10 Thread Gora Mohanty
On Thu, Mar 10, 2011 at 8:42 PM, Chantal Ackermann
chantal.ackerm...@btelligent.de wrote:
[...]
 Is this supposed to work at all? I haven't found anything so far on the
 net but I could have used the wrong keywords for searching, of course.

 As answer to the maybe obvious question why I'm not using a subentity:
 I thought that this solution might be faster because it iterates over
 the second data source instead of hitting it with a query per each
 document.
[...]

I think that what you are after can be handled by Solr's
CachedSqlEntityProcessor:
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor

Two major caveats here:
* I am not 100% sure that I have understood your requirements.
* The documentation for CachedSqlEntityProcessor needs to be improved.
  Will see if I can test it, and come up with a better example. As I have
  not actually used this, it could be that I have misunderstood its purpose.

Regards,
Gora


Re: disquery - difference qf qs / pf ps

2011-03-10 Thread Jonathan Rochkind

On 3/10/2011 8:15 AM, Gastone Penzo wrote:

Thank you very much. i understand the difference beetween qs and ps but not
what pf is...is it necessary to use ps?


It's not neccesary to use anything, including Solr.

pf:  Will take the entire query the user entered, make it into a single 
phrase, and boost documents within the already existing result set that 
match that phrase. pf does not change the result set, it just changes 
the ranking.
ps: Will set phrase query slop on that pf query of the entire entered 
search string, that effects boosting.





Re: DIH : modify document in sibling entity of root entity

2011-03-10 Thread Chantal Ackermann
Hi Stefan,

thanks for your time!

No, the second entity is not reusing values from the previous one. It
just provides more fields to it, and, of course the unique identifier -
which in case of the second entity is not unique:

document name=contributor
entity name=contributor pk=id rootEntity=true
query=select   CONTRIBUTOR_ID as id,
CONTRIBUTOR_NAME as name,
EXT_ID as extid
fromDIM_CONTRIBUTOR
/entity   
entity name=appearance pk=id rootEntity=false
transformer=RegexTransformer
query=select   CONTENTID as contentid,
SUBVALUE
fromCONTENT_VALUE
where   ID_ATTRIBUTE=170
field column=ignore sourceColName=SUBVALUE
groupNames=id,type,pos,character
regex=(\d+);(\d+);(\d+);([^;]*);\d*;[A-Z0-9]*;\d* /
/entity
/document


and here are the fields:

field name=id type=slong indexed=true stored=true
required=true /
field name=name type=string indexed=true stored=true
required=true termVectors=true /
field name=contentid type=slong indexed=true stored=true
multiValued=true /
field name=character type=string indexed=true stored=true
multiValued=true termVectors=true /
field name=type type=sint indexed=true stored=true
multiValued=true /

(For the sake of simplicity I've removed some fields that would be
created using copyfield instructions and transformers.)

I'm currently trying to run this using a subentity using the SQL
restriction SUBVALUE like '${contributor.id};%' but this takes ages...

The other one finished in under a minute (and it did actually process
the second entity, I think, it just didn't modify the index). The
current one runs for about 30min, and has only processed 22,000
documents out of more than 390,000. (Of course, there is probably no
index on that column)


Thanks for any suggestions!
Chantal




On Thu, 2011-03-10 at 17:13 +0100, Stefan Matheis wrote:
 Hi Chantal,
 
 i'm not sure if i understood you correctly (if at all)? Two entities,
 not arranged as sub-entitiy, but using values from the previous
 entity? Could you paste your dataimport  the relevant part of the
 logging-output?
 
 Regards
 Stefan
 
 On Thu, Mar 10, 2011 at 4:12 PM, Chantal Ackermann
 chantal.ackerm...@btelligent.de wrote:
  Dear all,
 
  in DIH, is it possible to have two sibling entities where:
 
  - the first one is the root entity that creates the documents by
  iterating over a table that has one row per document.
  - the second one is executed after the completion of the first entity
  iteration, and it provides more data that is added to the newly created
  documents.
 
 
  I've set up such a dih configuration, and the second entity is executed,
  but no data is written into the index apart from the data extracted by
  the root entity  (=no document is modified?).
 
  Documents are identified by the unique key 'id' which is defined by
  pk=id on both entities.
 
  Is this supposed to work at all? I haven't found anything so far on the
  net but I could have used the wrong keywords for searching, of course.
 
  As answer to the maybe obvious question why I'm not using a subentity:
  I thought that this solution might be faster because it iterates over
  the second data source instead of hitting it with a query per each
  document.
 
  Anyway, the main reason I tried this is because I want to know whether
  it works. I'm still not sure whether it should work but I'm doing
  something wrong...
 
 
  Thanks!
  Chantal
 
 



Re: Math-generated fields during query

2011-03-10 Thread Peter Sturge
Hi Dan,

Yes, you're right - in fact that was precisely what I was thinking of
doing! Also looking at SOLR-1298  SOLR-1566 - which would be good for
applying functions generically rather than on a per-use-case basis.

Thanks!
Peter


On Thu, Mar 10, 2011 at 3:58 PM, dan sutton danbsut...@gmail.com wrote:
 As a workaround can you not have a search component run after the
 querycomponent, and have the qty_ordered,unit_price as stored fields
 and returned with the fl parameter and have your custom component do
 the calc, unless you need to sort by this value too?

 Dan

 On Wed, Mar 9, 2011 at 10:06 PM, Peter Sturge peter.stu...@gmail.com wrote:
 Hi,

 I was wondering if it is possible during a query to create a returned
 field 'on the fly' (like function query, but for concrete values, not
 score).

 For example, if I input this query:
   q=_val_:product(15,3)fl=*,score

 For every returned document, I get score = 45.

 If I change it slightly to add *:* like this:
   q=*:* _val_:product(15,3)fl=*,score

 I get score = 32.526913.

 If I try my use case of _val_:product(qty_ordered,unit_price), I get
 varying scores depending on...well depending on something.

 I understand this is doing relevance scoring, but it doesn't seem to
 tally with the FunctionQuery Wiki
 [example at the bottom of the page]:

   q=boxname:findbox+_val_:product(product(x,y),z)fl=*,score
 ...where score will contain the resultant volume.

 Is there a trick to getting not a score, but the actual value of
 quantity*price (e.g. product(5,2.21) == 11.05)?

 Many thanks




Re: DIH : modify document in sibling entity of root entity

2011-03-10 Thread Chantal Ackermann
Hi Gora,

thanks for making me read this part of the documentation again!
This processor probably cannot do what I need out of the box but I will
try to extend it to allow specifying a regular expression in its where
attribute.

Thanks!
Chantal

On Thu, 2011-03-10 at 17:39 +0100, Gora Mohanty wrote:
 On Thu, Mar 10, 2011 at 8:42 PM, Chantal Ackermann
 chantal.ackerm...@btelligent.de wrote:
 [...]
  Is this supposed to work at all? I haven't found anything so far on the
  net but I could have used the wrong keywords for searching, of course.
 
  As answer to the maybe obvious question why I'm not using a subentity:
  I thought that this solution might be faster because it iterates over
  the second data source instead of hitting it with a query per each
  document.
 [...]
 
 I think that what you are after can be handled by Solr's
 CachedSqlEntityProcessor:
 http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
 
 Two major caveats here:
 * I am not 100% sure that I have understood your requirements.
 * The documentation for CachedSqlEntityProcessor needs to be improved.
   Will see if I can test it, and come up with a better example. As I have
   not actually used this, it could be that I have misunderstood its purpose.
 
 Regards,
 Gora



Custom fieldtype with sharding?

2011-03-10 Thread Peter Cline

Hi all,
I'm having an issue with using a custom fieldtype with distributed 
search.  It may be the case that what I'm looking for could be 
accomplished in a different way, but this is my first stab at it.


I'm looking to store XML in a field.  What I've done, which works fine, 
is to:

- on ingest, wrap the XML in a CDATA tag
- write a simple class that extends org.apache.solr.schema.TextField, 
which writes an XML node much in the way that a textfield would, but 
without escaping the contents


It looks like this:
public class XMLField extends TextField {
   @Override
   public void write(TextResponseWriter xmlWriter, String name, 
Fieldable f)

 throws java.io.IOException {
  Writer writer = xmlWriter.getWriter();
  writer.write(xml name= + '' + name + '' + '');
  writer.write(f.stringValue(), 0, f.stringValue() == null ? 0 : 
f.stringValue().length());

  writer.write(/xml);
 }
}

Like I said, simple.  Not especially pretty, but it does the job.  Works 
fine for normal searching, I get back a response like:

xml name=xmlFieldxml-contents-unescaped//xml

When I try to use this with distributed searching, though, it comes back 
written as a normal textfield, like:

str name=xmlFieldlt;xml-contents-have-been-escaped/gt;/str

It looks like it doesn't know anything about my custom fieldtype at all, 
and is defaulting to writing it as a StrField or TextField instead.


So, my question:
- is there a better way to do this?  I'd be fine if it came back with a 
'str' element name, as long as it's not escaped.
- is there perhaps a different class I should extend to do this with 
sharded searching?
- should I just bite the bullet and manually unescape the xml after 
receiving the response?  I'd really prefer not to do this if I can get 
around it.


Thanks in advance for any help.

Peter


Re: question regarding proper placement of geofilt in fq=

2011-03-10 Thread Bill Bell
Can you use 2 fq parameters ? The default op is usually set to AND.

Bill Bell
Sent from mobile
 

On Mar 10, 2011, at 9:33 AM, Jerry Mindek jmin...@manta.com wrote:

 Hi,
 
 I am using rev 1036236 of solr trunk running as a servlet in Tomcat 7.
 The doc set is sharded over 11 shards.
 Currently, I have all the shards running in a single tomcat.
 
 Please see the bottom of the email for the bits of my schema.xml and 
 solrconfig.xml that might help you understand my configuration.
 
 I am seeing what I think is strange behavior when I try to use the geofilt in 
 a filter query.
 Here's what I am seeing:
 
 
 1.   If put the {!geofilt} as the last argument of the fq= parameter and 
 I send the following distributed query to my sharded index:
 /select?start=0rows=30q=foodfq=b_type:shops AND 
 {!geofilt}qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=...
 I get a syntax error. Which seems odd to me.
 
 
 2.   If I move the {!geofilt} to the first position in the fq= and send 
 the following distributed query:
 /select?start=0rows=30q=foodfq={!geofilt} AND 
 b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=...
 Then only the geofilt is apply, not the b_type:T01. Which seems odd to me. I 
 would expect both filters to be applied.
 
 
 3.   Finally, when I submit this query as:
 /select?start=0rows=30q=foodfq=_query_:{!geofilt} AND 
 b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=...
 This works as I had hoped, i.e. both the geofilt and the b_type filters are 
 applied.
 
 Am I trying to use geofilt in the wrong way or is this possibly a bug?
 
 Thanks,
 Jerry Mindek
 
 
 !--schema.xml--
 field name=cn type=text indexed=true stored=true required=true /
 field name=dn type=string indexed=true stored=true required=false 
 /
 field name=t1 type=text indexed=true stored=true /
 field name=ts type=string indexed=true  stored=true/
 field name=lb type=text indexed=true   stored=false /
 field name=sim type=string indexed=true   stored=true /
 field name=s4_s type=text indexed=true   stored=false /
 field name=stat type=string indexed=true stored=true  /
 field name=pst type=text indexed=true  stored=true /
 fieldType name=location class=solr.LatLonType 
 subFieldSuffix=_coordinate/
 ...
 field name=type b_type=string indexed=true stored=true/
 field name=lat_long type=location indexed=true stored=true /
 !-end snippet schema.xml--
 
 !-solrconfig.xml --
 requestHandler name=spatialdismax class=solr.DisMaxRequestHandler 
lst name=defaults
 str name=sortscore desc/str
 str name=facettrue/str
 str name=facet.mincount1/str
 str name=echoParamsexplicit/str
 int name=rows20/int
 float name=tie0.01/float
 str name=qf
cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
 /str
 str name=pf
cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
 /str
   str name=fldn, cn,  t1, stat, pst, pct, ts, sv, score/str
 str name=mm
2lt;-1 5lt;-2 6lt;90%
 /str
 int name=ps100/int
 str name=q.alt*:*/str
/lst
  /requestHandler
 !-end snippet solrconfig.xml--
 


Re: question regarding proper placement of geofilt in fq=

2011-03-10 Thread Bill Bell
Also _query_ is the right approach when using fq with 2 Boolean values. Just 
make sure you double quote the {!geofilt} when using that.

Bill Bell
Sent from mobile


On Mar 10, 2011, at 9:33 AM, Jerry Mindek jmin...@manta.com wrote:

 Hi,
 
 I am using rev 1036236 of solr trunk running as a servlet in Tomcat 7.
 The doc set is sharded over 11 shards.
 Currently, I have all the shards running in a single tomcat.
 
 Please see the bottom of the email for the bits of my schema.xml and 
 solrconfig.xml that might help you understand my configuration.
 
 I am seeing what I think is strange behavior when I try to use the geofilt in 
 a filter query.
 Here's what I am seeing:
 
 
 1.   If put the {!geofilt} as the last argument of the fq= parameter and 
 I send the following distributed query to my sharded index:
 /select?start=0rows=30q=foodfq=b_type:shops AND 
 {!geofilt}qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=...
 I get a syntax error. Which seems odd to me.
 
 
 2.   If I move the {!geofilt} to the first position in the fq= and send 
 the following distributed query:
 /select?start=0rows=30q=foodfq={!geofilt} AND 
 b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=...
 Then only the geofilt is apply, not the b_type:T01. Which seems odd to me. I 
 would expect both filters to be applied.
 
 
 3.   Finally, when I submit this query as:
 /select?start=0rows=30q=foodfq=_query_:{!geofilt} AND 
 b_type:T01qt=spatialdismaxfl=*%2Cscorefacet=falsept=38.029191,-78.479266sfield=lat_longd=80shards=...
 This works as I had hoped, i.e. both the geofilt and the b_type filters are 
 applied.
 
 Am I trying to use geofilt in the wrong way or is this possibly a bug?
 
 Thanks,
 Jerry Mindek
 
 
 !--schema.xml--
 field name=cn type=text indexed=true stored=true required=true /
 field name=dn type=string indexed=true stored=true required=false 
 /
 field name=t1 type=text indexed=true stored=true /
 field name=ts type=string indexed=true  stored=true/
 field name=lb type=text indexed=true   stored=false /
 field name=sim type=string indexed=true   stored=true /
 field name=s4_s type=text indexed=true   stored=false /
 field name=stat type=string indexed=true stored=true  /
 field name=pst type=text indexed=true  stored=true /
 fieldType name=location class=solr.LatLonType 
 subFieldSuffix=_coordinate/
 ...
 field name=type b_type=string indexed=true stored=true/
 field name=lat_long type=location indexed=true stored=true /
 !-end snippet schema.xml--
 
 !-solrconfig.xml --
 requestHandler name=spatialdismax class=solr.DisMaxRequestHandler 
lst name=defaults
 str name=sortscore desc/str
 str name=facettrue/str
 str name=facet.mincount1/str
 str name=echoParamsexplicit/str
 int name=rows20/int
 float name=tie0.01/float
 str name=qf
cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
 /str
 str name=pf
cn^2.0 t1^2.0 ts^2.0 lb^2.0 s4_s^2.0 sim^2.0
 /str
   str name=fldn, cn,  t1, stat, pst, pct, ts, sv, score/str
 str name=mm
2lt;-1 5lt;-2 6lt;90%
 /str
 int name=ps100/int
 str name=q.alt*:*/str
/lst
  /requestHandler
 !-end snippet solrconfig.xml--
 


Re: docBoost

2011-03-10 Thread Brian Lamb
Okay I think I have the idea:

dataConfig
  dataSource type=JdbcDataSource
  name=animals
  batchSize=-1
  driver=com.mysql.jdbc.Driver

url=jdbc:mysql://localhost/animals?characterEncoding=UTF8amp;zeroDateTimeBehavior=convertToNull
  user=user
  password=pass/
  script![CDATA[
function BoostScores(row) {
  // if searching for recommendations add in the boost score
   if(some_condition) {
row.put('$docBoost', row.get('boost_score'));
  } // end if(some_condition)

  return row;
} // end function BoostRecommendations(row)
  ]]/script
 document
 entity name=animal
dataSource=animals
pk=id
query=SELECT * FROM animals
  field column=id name=id /
  field column=genus name=genus /
  field column=species name=species /
  entity name=boosters
 dataSource=boosts
 query=SELECT boost_score FROM boosts WHERE animal_id=${
animal.id}
field column=boost_score name=boost_score /
  /entity
/entity
  /document
/dataConfig

Now, am I right in thinking that the boost score is only when the data is
loaded? If so, that's close to what I want to do but not exactly. I would
like to load all the data without boosting any scores but storing what the
boost score would be. And then, depending on the search, boost scores by the
value.

For example, if a user searches for dog, they would get search results that
were unboosted.

However, I would also want the option to pass in a flag of some kind so that
if a user searches for dog, they would get search results with the boost
score factored in. Ideally it would be something like:

Regular search: http://localhost/solr/search/?q=dog
Boosted search: http://localhost/solr/search?q=dogboost=true

To achieve this, would it be applied in the data import handler? If so, what
would I need to put in for some_condition?

Thanks for all the help so far. I truly do appreciate it.

Thanks,

Brian Lamb

On Wed, Mar 9, 2011 at 11:50 PM, Bill Bell billnb...@gmail.com wrote:

 Yes just add if statement based on a field type and do a row.put() only if
 that other value is a certain value.



 On 3/9/11 1:39 PM, Brian Lamb brian.l...@journalexperts.com wrote:

 That makes sense. As a follow up, is there a way to only conditionally use
 the boost score? For example, in some cases I want to use the boost score
 and in other cases I want all documents to be treated equally.
 
 On Wed, Mar 9, 2011 at 2:42 PM, Jayendra Patil
 jayendra.patil@gmail.com
  wrote:
 
  you can use the ScriptTransformer to perform the boost calcualtion and
  addition.
  http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer
 
  dataConfig
 script![CDATA[
 function f1(row)  {
 // Add boost
 row.put('$docBoost',1.5);
 return row;
 }
 ]]/script
 document
 entity name=e pk=id transformer=script:f1
  query=select * from X
 
 /entity
 /document
  /dataConfig
 
  Regards,
  Jayendra
 
 
  On Wed, Mar 9, 2011 at 2:01 PM, Brian Lamb
  brian.l...@journalexperts.com wrote:
   Anyone have any clue on this on?
  
   On Tue, Mar 8, 2011 at 2:11 PM, Brian Lamb 
  brian.l...@journalexperts.comwrote:
  
   Hi all,
  
   I am using dataimport to create my index and I want to use docBoost
 to
   assign some higher weights to certain docs. I understand the concept
  behind
   docBoost but I haven't been able to find an example anywhere that
 shows
  how
   to implement it. Assuming the following config file:
  
   document
  entity name=animal
 dataSource=animals
 pk=id
 query=SELECT * FROM animals
   field column=id name=id /
   field column=genus name=genus /
   field column=species name=species /
   entity name=boosters
  dataSource=boosts
  query=SELECT boost_score FROM boosts WHERE animal_id
 =
  ${
   animal.id}
 field column=boost_score name=boost_score /
   /entity
 /entity
   /document
  
   How do I add in a docBoost score? The boost score is currently in a
   separate table as shown above.
  
  
 





Re: Sorting

2011-03-10 Thread Brian Lamb
Any ideas on this one?

On Wed, Mar 9, 2011 at 2:00 PM, Brian Lamb brian.l...@journalexperts.comwrote:

 Hi all,

 I know that I can add sort=score desc to the url to sort in descending
 order. However, I would like to sort a MoreLikeThis response which returns
 records like this:

 lst name=moreLikeThis
   result name=3 numFound=113611 start=0 maxScore=0.4392774
   result name=2 numFound= start=0 maxScore=0.5392774
 /lst

 I don't want them grouped by result; I would just like have them all thrown
 together and then sorted according to score. I have an XSLT which does put
 them altogether and returns the following:

 moreLikeThis
   similar
 scorex./score
 idsome_id/id
   /similar
 /moreLikeThis

 However it appears that it basically applies the stylesheet to result
 name=3 then result name=2.

 How can I make it so that with my XSLT, the results appear sorted by
 score?



Solr

2011-03-10 Thread yazhini.k vini
Hi ,

I need notes and detail about solr because of Now I am working in solr so i
need help .


Regards ,

Yazhini . K
 NCSI ,
 M.Sc ( Software Engineering ) .


Re: Possible to sort in .xml file?

2011-03-10 Thread Chris Hostetter

: I know its possible to do via adding sort= , but the Perl module
: (WebService::Solr) doesn't seem to offer the option to pass in this value :(

according to the docs, you can pass any query params you want to the sort 
method...

http://search.cpan.org/~bricas/WebService-Solr-0.11/lib/WebService/Solr.pm#search%28_$query,_\%options_%29

 All key-value pairs supplied in \%options are serialzied in the request 
 URL.


-Hoss


Re: Solr

2011-03-10 Thread Geert-Jan Brits
Start by reading  http://wiki.apache.org/solr/FrontPage and the provided
links (introduction, tutorial, etc. )

2011/3/10 yazhini.k vini yazhini@gmail.com

 Hi ,

 I need notes and detail about solr because of Now I am working in solr so i
 need help .


 Regards ,

 Yazhini . K
  NCSI ,
  M.Sc ( Software Engineering ) .



If statements in DataImportHandler?

2011-03-10 Thread Jason Rutherglen
Is it possible to conditionally load sub-entities in
DataImportHandler, based on the gathered value of parent entities?


Re: New PHP API for Solr (Logic Solr API)

2011-03-10 Thread Liam O'Boyle
How about the Solr PHP Client (http://code.google.com/p/solr-php-client/)?
 We use this and have been quite happy with it, and it seems that it
addresses all of the concerns you expressed.

What advantages does yours offer?

Liam

On 8 March 2011 17:02, Burak burak...@gmail.com wrote:

 On 03/07/2011 12:43 AM, Stefan Matheis wrote:

 Burak,

 what's wrong with the existing PHP-Extension
 (http://php.net/manual/en/book.solr.php)?

 I think wrong is not the appropriate word here. But if I had to summarize
 why I wrote this API:

 * Not everybody is enthusiastic about adding another item to an already
 long list of server dependencies. I just wanted a pure PHP option.
 * I am not a C programmer either so the ability to understand the source
 code and modify it according to my needs is another advantage.
 * Yes, a PECL package would be faster. However, in 99% of the cases, after
 everything is said, coded, and byte-code cached, my biggest bottlenecks end
 up being the database and network.
 * Last of all, choice is what open source means to me.

 Burak











-- 
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

*Awarded 2010 Best New Business and Business of the Year - Business3000
Awards*

This email and any attachments are confidential and may contain legally
privileged information or copyright material. If you are not an intended
recipient, please contact us at once by return email and then delete both
messages. We do not accept liability in connection with transmission of
information using the internet.


Solr and Permissions

2011-03-10 Thread Liam O'Boyle
Morning,

We use solr to index a range of content to which, within our application,
access is restricted by a system of user groups and permissions.  In order
to ensure that search results don't reveal information about items which the
user doesn't have access to, we need to somehow filter the results; this
needs to be done within Solr itself, rather than after retrieval, so that
the facet and result counts are correct.

Currently we do this by creating a filter query which specifies all of the
items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)),
but this has definite scalability issues - we're starting to run into
issues, as this can be a set of ORs of potentially unlimited size (and
practically, we're hitting the low thousands sometimes).  While we can
adjust maxBooleanClauses upwards, I understand that this has performance
implications...

So, has anyone had to implement something similar in the past?  Any
suggestions for a more scalable approach?  Any advice on safe and sensible
limits on how far I can push maxBooleanClauses?

Thanks for your advice,

Liam


Re: Solr and Permissions

2011-03-10 Thread Sujit Pal
How about assigning content types to documents in the index, and map
users to a set of content types they are allowed to access? That way you
will pass in fewer parameters in the fq.

-sujit

On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
 Morning,
 
 We use solr to index a range of content to which, within our application,
 access is restricted by a system of user groups and permissions.  In order
 to ensure that search results don't reveal information about items which the
 user doesn't have access to, we need to somehow filter the results; this
 needs to be done within Solr itself, rather than after retrieval, so that
 the facet and result counts are correct.
 
 Currently we do this by creating a filter query which specifies all of the
 items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)),
 but this has definite scalability issues - we're starting to run into
 issues, as this can be a set of ORs of potentially unlimited size (and
 practically, we're hitting the low thousands sometimes).  While we can
 adjust maxBooleanClauses upwards, I understand that this has performance
 implications...
 
 So, has anyone had to implement something similar in the past?  Any
 suggestions for a more scalable approach?  Any advice on safe and sensible
 limits on how far I can push maxBooleanClauses?
 
 Thanks for your advice,
 
 Liam



Re: If statements in DataImportHandler?

2011-03-10 Thread Gora Mohanty
On Fri, Mar 11, 2011 at 4:48 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 Is it possible to conditionally load sub-entities in
 DataImportHandler, based on the gathered value of parent entities?

Probably the easies way to do that is with a transformer.
Please see the DIH Wiki page for details:
http://wiki.apache.org/solr/DataImportHandler#Transformer

Regards,
Gora


Re: If statements in DataImportHandler?

2011-03-10 Thread Jason Rutherglen
Right that's not within the XML however, and it's unclear how to
access the upper level entities that have already been instantiated,
eg, beyond the given 'transform' row.

On Thu, Mar 10, 2011 at 8:02 PM, Gora Mohanty g...@mimirtech.com wrote:
 On Fri, Mar 11, 2011 at 4:48 AM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 Is it possible to conditionally load sub-entities in
 DataImportHandler, based on the gathered value of parent entities?

 Probably the easies way to do that is with a transformer.
 Please see the DIH Wiki page for details:
 http://wiki.apache.org/solr/DataImportHandler#Transformer

 Regards,
 Gora



Re: Solr and Permissions

2011-03-10 Thread go canal
I have similar requirements.

Content type is one solution; but there are also other use cases where this not 
enough.

Another requirement is, when the access permission is changed, we need to 
update 
the field - my understanding is we can not unless re-index the whole document 
again. Am I correct?
 thanks,
canal





From: Sujit Pal sujit@comcast.net
To: solr-user@lucene.apache.org
Sent: Fri, March 11, 2011 10:39:27 AM
Subject: Re: Solr and Permissions

How about assigning content types to documents in the index, and map
users to a set of content types they are allowed to access? That way you
will pass in fewer parameters in the fq.

-sujit

On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
 Morning,
 
 We use solr to index a range of content to which, within our application,
 access is restricted by a system of user groups and permissions.  In order
 to ensure that search results don't reveal information about items which the
 user doesn't have access to, we need to somehow filter the results; this
 needs to be done within Solr itself, rather than after retrieval, so that
 the facet and result counts are correct.
 
 Currently we do this by creating a filter query which specifies all of the
 items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)),
 but this has definite scalability issues - we're starting to run into
 issues, as this can be a set of ORs of potentially unlimited size (and
 practically, we're hitting the low thousands sometimes).  While we can
 adjust maxBooleanClauses upwards, I understand that this has performance
 implications...
 
 So, has anyone had to implement something similar in the past?  Any
 suggestions for a more scalable approach?  Any advice on safe and sensible
 limits on how far I can push maxBooleanClauses?
 
 Thanks for your advice,
 
 Liam


  

Re: If statements in DataImportHandler?

2011-03-10 Thread Gora Mohanty
On Fri, Mar 11, 2011 at 10:23 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 Right that's not within the XML however, and it's unclear how to
 access the upper level entities that have already been instantiated,
 eg, beyond the given 'transform' row.

The second example for a ScriptTransformer in
http://wiki.apache.org/solr/DataImportHandler#Transformer
should give you an idea of how to proceed:
* row.get( 'category' ) gets the field 'category' from the
  current entity to which the ScriptTransformer is being
  applied.
* Fields from higher-level entities will need to be passed
  in using DIH variables. E.g., if you have a higher-level
  entity called 'parent', and are getting data from the current
  entity via a database select, e.g.,
 entity... query=select category from mytable
 you will need to modify the query to something like
 entity... transformer=script:mytrans query=select category
from mytable, ${parent.id} as id
 and add
 field column=id
 inside the current entity (cannot remember now if this is
 required, or can be dispensed with).

Regards,
Gora


Re: DIH : modify document in sibling entity of root entity

2011-03-10 Thread Lance Norskog
The DIH is strictly tree-structured. Data flows down the tree. If the
first sibling is the root entity, nothing is used from the second
sibling. This configuration is something that it the DIH should fail.

On Thu, Mar 10, 2011 at 9:14 AM, Chantal Ackermann
chantal.ackerm...@btelligent.de wrote:
 Hi Gora,

 thanks for making me read this part of the documentation again!
 This processor probably cannot do what I need out of the box but I will
 try to extend it to allow specifying a regular expression in its where
 attribute.

 Thanks!
 Chantal

 On Thu, 2011-03-10 at 17:39 +0100, Gora Mohanty wrote:
 On Thu, Mar 10, 2011 at 8:42 PM, Chantal Ackermann
 chantal.ackerm...@btelligent.de wrote:
 [...]
  Is this supposed to work at all? I haven't found anything so far on the
  net but I could have used the wrong keywords for searching, of course.
 
  As answer to the maybe obvious question why I'm not using a subentity:
  I thought that this solution might be faster because it iterates over
  the second data source instead of hitting it with a query per each
  document.
 [...]

 I think that what you are after can be handled by Solr's
 CachedSqlEntityProcessor:
 http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor

 Two major caveats here:
 * I am not 100% sure that I have understood your requirements.
 * The documentation for CachedSqlEntityProcessor needs to be improved.
   Will see if I can test it, and come up with a better example. As I have
   not actually used this, it could be that I have misunderstood its purpose.

 Regards,
 Gora





-- 
Lance Norskog
goks...@gmail.com


Re: Solr and Permissions

2011-03-10 Thread Liam O'Boyle
As Canal points out,  grouping into types is not always possible.

In our case, permissions are not on a per-type level, but either on a per
folder (of which there can be hundreds) or per item in some cases (of
which there can be... any number at all).

Reindexing is also to slow to really be an option; some of the items use
Tika to extract content, which means that we need to reextract the content
(variable length of time; average is about half a second, but on some
documents it will sit there until the connection times out) .  Querying it,
modifying then resubmitting without rerunning content extraction is still
faster, but involves sending even more data over the network; either way is
relatively slow.

Liam

On 11 March 2011 16:24, go canal goca...@yahoo.com wrote:

 I have similar requirements.

 Content type is one solution; but there are also other use cases where this
 not
 enough.

 Another requirement is, when the access permission is changed, we need to
 update
 the field - my understanding is we can not unless re-index the whole
 document
 again. Am I correct?
  thanks,
 canal




 
 From: Sujit Pal sujit@comcast.net
 To: solr-user@lucene.apache.org
 Sent: Fri, March 11, 2011 10:39:27 AM
 Subject: Re: Solr and Permissions

 How about assigning content types to documents in the index, and map
 users to a set of content types they are allowed to access? That way you
 will pass in fewer parameters in the fq.

 -sujit

 On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
  Morning,
 
  We use solr to index a range of content to which, within our application,
  access is restricted by a system of user groups and permissions.  In
 order
  to ensure that search results don't reveal information about items which
 the
  user doesn't have access to, we need to somehow filter the results; this
  needs to be done within Solr itself, rather than after retrieval, so that
  the facet and result counts are correct.
 
  Currently we do this by creating a filter query which specifies all of
 the
  items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
 ...)),
  but this has definite scalability issues - we're starting to run into
  issues, as this can be a set of ORs of potentially unlimited size (and
  practically, we're hitting the low thousands sometimes).  While we can
  adjust maxBooleanClauses upwards, I understand that this has performance
  implications...
 
  So, has anyone had to implement something similar in the past?  Any
  suggestions for a more scalable approach?  Any advice on safe and
 sensible
  limits on how far I can push maxBooleanClauses?
 
  Thanks for your advice,
 
  Liam







-- 
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

*Awarded 2010 Best New Business and Business of the Year - Business3000
Awards*

This email and any attachments are confidential and may contain legally
privileged information or copyright material. If you are not an intended
recipient, please contact us at once by return email and then delete both
messages. We do not accept liability in connection with transmission of
information using the internet.


Re: Solr and Permissions

2011-03-10 Thread go canal
To be fair, I think there is a slight difference between a Content Management 
and a Search Engine.

Access control at per document level, per type level, supporting dynamic role 
changes, etc.are more like  content management use cases; where search solution 
like Solr focuses on different set of use cases;

But in real world, any content management systems need full text search; so the 
question is to how to support search with permission control.

JackRabbit integrated with Lucene/Tika, this could be one solution but I do not 
know its performance and scalability;

CouchDB also integrates with Lucene/Tika, another option? 

I have yet to see a Search Engine that provides some sort of Content Management 
features like we are discussing here (Solr, Elastic Search ?)


Then the last option is probably to build an application that works with a 
document repository with all necessary content management features and Solr 
which provides search capability;  and handling the permissions outside Solr?
thanks,
canal





From: Liam O'Boyle liam.obo...@intelligencebank.com
To: solr-user@lucene.apache.org
Cc: go canal goca...@yahoo.com
Sent: Fri, March 11, 2011 2:28:19 PM
Subject: Re: Solr and Permissions

As Canal points out,  grouping into types is not always possible.

In our case, permissions are not on a per-type level, but either on a per
folder (of which there can be hundreds) or per item in some cases (of
which there can be... any number at all).

Reindexing is also to slow to really be an option; some of the items use
Tika to extract content, which means that we need to reextract the content
(variable length of time; average is about half a second, but on some
documents it will sit there until the connection times out) .  Querying it,
modifying then resubmitting without rerunning content extraction is still
faster, but involves sending even more data over the network; either way is
relatively slow.

Liam

On 11 March 2011 16:24, go canal goca...@yahoo.com wrote:

 I have similar requirements.

 Content type is one solution; but there are also other use cases where this
 not
 enough.

 Another requirement is, when the access permission is changed, we need to
 update
 the field - my understanding is we can not unless re-index the whole
 document
 again. Am I correct?
  thanks,
 canal




 
 From: Sujit Pal sujit@comcast.net
 To: solr-user@lucene.apache.org
 Sent: Fri, March 11, 2011 10:39:27 AM
 Subject: Re: Solr and Permissions

 How about assigning content types to documents in the index, and map
 users to a set of content types they are allowed to access? That way you
 will pass in fewer parameters in the fq.

 -sujit

 On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
  Morning,
 
  We use solr to index a range of content to which, within our application,
  access is restricted by a system of user groups and permissions.  In
 order
  to ensure that search results don't reveal information about items which
 the
  user doesn't have access to, we need to somehow filter the results; this
  needs to be done within Solr itself, rather than after retrieval, so that
  the facet and result counts are correct.
 
  Currently we do this by creating a filter query which specifies all of
 the
  items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
 ...)),
  but this has definite scalability issues - we're starting to run into
  issues, as this can be a set of ORs of potentially unlimited size (and
  practically, we're hitting the low thousands sometimes).  While we can
  adjust maxBooleanClauses upwards, I understand that this has performance
  implications...
 
  So, has anyone had to implement something similar in the past?  Any
  suggestions for a more scalable approach?  Any advice on safe and
 sensible
  limits on how far I can push maxBooleanClauses?
 
  Thanks for your advice,
 
  Liam







-- 
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

*Awarded 2010 Best New Business and Business of the Year - Business3000
Awards*

This email and any attachments are confidential and may contain legally
privileged information or copyright material. If you are not an intended
recipient, please contact us at once by return email and then delete both
messages. We do not accept liability in connection with transmission of
information using the internet.



  

Problem with copyfield

2011-03-10 Thread nidhi gupta
I want to implement type ahead styling feature for description field.For that I 
defined ngtext fieldtype.I indexed 

description as text and then using copyfield indexed into ngtext field.But I 
found out that it is not working.
If I put ngtext directly as a field type value without using copyfield it is 
working fine.
I am not able to understand the reason behind it?

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
    tokenizer class=solr.WhitespaceTokenizerFactory/
    filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt 

enablePositionIncrements=true /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 

catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
    filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
    tokenizer class=solr.WhitespaceTokenizerFactory/
    filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt 

enablePositionIncrements=true /
    filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 

catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
    filter class=solr.LowerCaseFilterFactory/
  /analyzer
    /fieldType
fieldType name=ngtext class=solr.TextField positionIncrementGap=100
        analyzer type=index            
            tokenizer class=solr.KeywordTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory/
            filter class=solr.EdgeNGramFilterFactory minGramSize=2 
maxGramSize=50/
        /analyzer
        analyzer type=query
            tokenizer class=solr.KeywordTokenizerFactory/
            filter class=solr.LowerCaseFilterFactory/
        /analyzer
    /fieldType
field name=description type=text indexed=true stored=true /
copyField source=id dest=ng_text/