Re: How does DIH multithreading work?

2010-10-27 Thread markwaddle

Anyone know how it works?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1784419.html
Sent from the Solr - User mailing list archive at Nabble.com.


How does DIH multithreading work?

2010-10-26 Thread markwaddle

I understand that the thread count is specified on root entities only. Does
it spawn multiple threads per root entity? Or multiple threads per
descendant entity? Can someone give an example of how you would make a
database query in an entity with 4 threads that would select 1 row per
thread?

Thanks,
Mark
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH wiht several Cores

2010-10-25 Thread markwaddle

Unfortunately, what you are asking for is not possible. The DIH needs to be
configured separately for each core. I have a similar situation with my Solr
application. I am solving it by creating a custom index feeder that is aware
of all of the cores and which documents to send to which cores.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-wiht-several-Cores-tp1767883p1769794.html
Sent from the Solr - User mailing list archive at Nabble.com.


Unexpected boolean query behavior

2010-01-14 Thread markwaddle

Here is my query:
(virt* AND machine fingerprinting) OR (virt* AND encryption) OR (virt* AND
anonymous) OR (virt* AND analytic*) AND owned:true

It can be broken down to:
(A) OR (B) OR (C) OR (D) AND E

A, B, C and D are themselves AND boolean clauses.

The E clause at the end is not behaving the way I would expect. No matter
how I order the A,B,C and D clauses, it always returns the equivalent of
((D) AND E).

When I add additional parentheses it behaves the way I expect. Like:
((A) OR (B) OR (C) OR (D)) AND E
or
(A) OR (B) OR (C) OR ((D) AND E)

Can anyone explain why it behaves the way it does without the parentheses?
Is there something I am missing in the way it processes boolean clauses?

Thanks,
Mark
-- 
View this message in context: 
http://old.nabble.com/Unexpected-boolean-query-behavior-tp27166967p27166967.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Unexpected boolean query behavior

2010-01-14 Thread markwaddle

That is a reasonable question. The problem here is that my users have already
created numerous queries just like this one, using ANDs and ORs. My users
are very technical and they have been using the results of these queries for
months now to perform analysis that drives business decisions. I need an
explanation for why this is happening so I can not only train them on how to
use it more effectively, but also to restore their trust in the search
application.

Does anyone understand this behavior? Or can you recommend a place for me to
look?


Otis Gospodnetic wrote:
 
 Mark,
 
 Does it help if you rewrite your query using +/- syntax (required,
 prohibited), or nothing for should?  Because that's what happens under
 the hood (terms are required, prohibited, or should occur).
 
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
 
 
 
 - Original Message 
 From: markwaddle m...@markwaddle.com
 To: solr-user@lucene.apache.org
 Sent: Thu, January 14, 2010 2:39:21 PM
 Subject: Unexpected boolean query behavior
 
 
 Here is my query:
 (virt* AND machine fingerprinting) OR (virt* AND encryption) OR (virt*
 AND
 anonymous) OR (virt* AND analytic*) AND owned:true
 
 It can be broken down to:
 (A) OR (B) OR (C) OR (D) AND E
 
 A, B, C and D are themselves AND boolean clauses.
 
 The E clause at the end is not behaving the way I would expect. No matter
 how I order the A,B,C and D clauses, it always returns the equivalent of
 ((D) AND E).
 
 When I add additional parentheses it behaves the way I expect. Like:
 ((A) OR (B) OR (C) OR (D)) AND E
 or
 (A) OR (B) OR (C) OR ((D) AND E)
 
 Can anyone explain why it behaves the way it does without the
 parentheses?
 Is there something I am missing in the way it processes boolean clauses?
 
 Thanks,
 Mark
 -- 
 View this message in context: 
 http://old.nabble.com/Unexpected-boolean-query-behavior-tp27166967p27166967.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Unexpected-boolean-query-behavior-tp27166967p27167750.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Unexpected boolean query behavior

2010-01-14 Thread markwaddle

That explains my exact problem, thank you! May I ask how you found that wiki
posting?


Otis Gospodnetic wrote:
 
 HI Mark,
 
 Does this help?
 http://wiki.apache.org/lucene-java/BooleanQuerySyntax
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
 
-- 
View this message in context: 
http://old.nabble.com/Unexpected-boolean-query-behavior-tp27166967p27170172.html
Sent from the Solr - User mailing list archive at Nabble.com.



Delete, commit, optimize doesn't reduce index file size

2009-12-29 Thread markwaddle

I have an index that used to have ~38M docs at 17.2GB. I deleted all but 13K
docs using a delete by query, commit and then optimize. A *:* query now
returns 13K docs. The problem is that the files on disk are still 17.1GB in
size. I expected the optimize to shrink the files. Is there a way I can
shrink them now that the index only has 13K docs?

Mark
-- 
View this message in context: 
http://old.nabble.com/Delete%2C-commit%2C-optimize-doesn%27t-reduce-index-file-size-tp26958067p26958067.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete, commit, optimize doesn't reduce index file size

2009-12-29 Thread markwaddle



Yonik Seeley-2 wrote:
 
 On Tue, Dec 29, 2009 at 1:23 PM, markwaddle m...@markwaddle.com wrote:
 I have an index that used to have ~38M docs at 17.2GB. I deleted all but
 13K
 docs using a delete by query, commit and then optimize. A *:* query now
 returns 13K docs. The problem is that the files on disk are still 17.1GB
 in
 size. I expected the optimize to shrink the files. Is there a way I can
 shrink them now that the index only has 13K docs?
 
 Are you on Windows?
 The IndexWriter can't delete files in use by the current IndexReader
 (like it can in UNIX) when the commit is done.
 If you make further changes to the index and do a commit, you should
 see the space go down.
 
 -Yonik
 http://www.lucidimagination.com
 
 

I am on Windows. Would a DataImportHandler delta-import with 1 or more
changes be a sufficient change to allow the files to be deleted?

Mark
-- 
View this message in context: 
http://old.nabble.com/Delete%2C-commit%2C-optimize-doesn%27t-reduce-index-file-size-tp26958067p26960857.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete, commit, optimize doesn't reduce index file size

2009-12-29 Thread markwaddle



Yonik Seeley-2 wrote:
 
 If you make further changes to the index and do a commit, you should
 see the space go down.
 

It worked. I added a bogus document using /update and then performed a
commit and now the files are down to 6MB.

http://.../core00/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E0%3C/field%3E%3C/doc%3E%3C/add%3E

http://.../core00/update?stream.body=%3Ccommit/%3E

Thanks!
Mark
-- 
View this message in context: 
http://old.nabble.com/Delete%2C-commit%2C-optimize-doesn%27t-reduce-index-file-size-tp26958067p26960957.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr under tomcat - UTF-8 issue

2009-10-26 Thread markwaddle

I was originally using POST for the same reason, however I discovered that
Tomcat could easily be configured to accept any length URI. All it requires
is specifying the maxHttpHeaderSize attribute in your default Connector in
server.xml. I set my value to 1MB, which is certainly excessive, but it
ensures I will never hit the limit. As the other chap mentioned, I now have
the benefits of caching and most importantly, proper web logs!

I also have a similar situation where I constrain the search results based
on the user's role. I have only two roles to support, so my case is very
simple, but I could imagine having a multivalued role field that you could
perform facet queries on.

Mark


Glock, Thomas wrote:
 
 Thanks -
 
 I agree.  However my application requires results be trimmed to users
 based on roles.  The roles are repeating values on the documents.  Users
 have many different role combinations as do documents.
 I recognize this is going to hamper caching - but using a GET will tend to
 limit the size of search phrases when combined with the boolean role
 clause.  And I am concerned with hitting url limits.
 
 At any rate I solved it thanks to Yonik's recommendation.  
 
 My flex client httpservice by default only sets the content-type request
 header to  application/x-www-form-urlencoded  what it needed to do for
 tomcat is set the content-type request header to content-type =
 application/x-www-form-urlencoded; charset=UTF-8; 
 
 If you have any suggestions regarding limiting results based on user and
 document role permutations - I'm all ears.  I've been to the Search Summit
 in NYC and no vendor could even seem to grasp the concept.  
 
 The problem case statement is this  - I have users globally who need to
 search for content tailored to them.  Users searching for 'Holiday' don't
 get any value from 1 documents having the word holiday. What they need
 are documents authored for that population.  The documents have the
 associated role information as metadata and therefore users will get only
 the documents they have access to and are relevant to them.  That's the
 plan anyway!  
 
 By chance I stumbled in Solr a month or so ago and I think its awesome.  I
 got the book two days ago too - fantastic!
 
 Thanks again,
 Tom
 

-- 
View this message in context: 
http://www.nabble.com/Solr-under-tomcat---UTF-8-issue-tp26040052p26054942.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Core/shard preference

2009-10-21 Thread markwaddle

Thank you guys for your responses. That is what I suspected, that it was
going with the first instance of the document that it sees. I tried setting
up Solr in Eclipse and ran into a couple of issues blocking it from
compiling. I also did some reading, but none of the write ups were very
comprehensive. Are there any good write ups that you know of with
instructions on setting up Solr in Eclipse?

Thanks again,
Mark



Yonik Seeley-2 wrote:
 
 Although shards should be disjoint, Solr tolerates duplication
 (won't return duplicates in the main results list, but doesn't make
 any effort to correct facet counts, etc).
 
 Currently, whichever shard responds first wins.
 The relevant code is around line 420 in QueryComponent.java:
 
   String prevShard = uniqueDoc.put(id, srsp.getShard());
   if (prevShard != null) {
 // duplicate detected
 numFound--;
 
 // For now, just always use the first encountered since we
 can't currently
 // remove the previous one added to the priority queue.
 If we switched
 // to the Java5 PriorityQueue, this would be easier.
 continue;
 // make which duplicate is used deterministic based on shard
 // if (prevShard.compareTo(srsp.shard) = 0) {
 //  TODO: remove previous from priority queue
 //  continue;
 // }
   }
 
 So it's certainly possible to make it deterministic, we just haven't
 done it yet.
 
 -Yonik
 http://www.lucidimagination.com
 
 
 On Mon, Oct 19, 2009 at 7:30 PM, Lance Norskog goks...@gmail.com wrote:
 Distributed Search is designed only for disjoint cores.

 The document list from each core is returned sorted by the relevance
 score. The distributed searcher merges these sorted lists. Solr does
 not implement distributed IDF, which essentially means distributed
 coordinated scoring. All scoring happens inside each core, relative to
 that core's contents. The resulting score numbers are not coordinated
 with each other, and you will get random results.

 There is no way to say use this core's results because the searches
 are not compared all at once. Only the page of results fetched is
 compared, so there's no way to suppress a result in the second page if
 it was already found in the first.

 On Mon, Oct 19, 2009 at 3:30 PM, markwaddle m...@markwaddle.com wrote:

 I have a small core performing deltas quickly (core00), and a large core
 performing deltas slowly (core01), both on the same set of documents.
 The
 delta core is cleaned nightly. As you can imagine, at times there are
 two
 versions of a document, one in each core. When I execute a query that
 matches this document, sometimes it will come from the delta core, and
 some
 times it will come from the large core. It almost seems random. Here is
 my
 query:

 http://porsche:8181/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/start=0rows=20q=hazard+gas+countrycode:JP

 When the delta documents from core00 are returned as desired the access
 logs
 show:

 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
 /worldip5/core00/select
 HTTP/1.1 200 293 1
 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
 /worldip5/core01/select
 HTTP/1.1 200 506 1
 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
 /worldip5/core00/select
 HTTP/1.1 200 1151 1
 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
 /worldip5/core01/select
 HTTP/1.1 200 2597 1
 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
 /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/start=0rows=20q=hazard+gas+countrycode:JP
 HTTP/1.1 200 11881 9

 When the documents are returned from core01 the access logs show:
 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
 /worldip5/core00/select
 HTTP/1.1 200 289 1
 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
 /worldip5/core01/select
 HTTP/1.1 200 506 1
 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
 /worldip5/core01/select
 HTTP/1.1 200 3390 1
 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
 /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/start=0rows=20q=hazard+gas+countrycode:JP
 HTTP/1.1 200 11873 9

 Any ideas on why there is a difference in the requests made? Is there a
 way
 I can tell Solr to prefer the documents in core00?

 Mark
 --
 View this message in context:
 http://www.nabble.com/Core-shard-preference-tp25966791p25966791.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 Lance Norskog
 goks...@gmail.com

 
 

-- 
View this message in context: 
http://www.nabble.com/Core-shard-preference-tp25966791p26004203.html
Sent from the Solr - User mailing list archive at Nabble.com.



Core/shard preference

2009-10-19 Thread markwaddle

I have a small core performing deltas quickly (core00), and a large core
performing deltas slowly (core01), both on the same set of documents. The
delta core is cleaned nightly. As you can imagine, at times there are two
versions of a document, one in each core. When I execute a query that
matches this document, sometimes it will come from the delta core, and some
times it will come from the large core. It almost seems random. Here is my
query:

http://porsche:8181/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/start=0rows=20q=hazard+gas+countrycode:JP

When the delta documents from core00 are returned as desired the access logs
show:

10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
HTTP/1.1 200 293 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 506 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
HTTP/1.1 200 1151 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 2597 1
10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/start=0rows=20q=hazard+gas+countrycode:JP
HTTP/1.1 200 11881 9

When the documents are returned from core01 the access logs show:
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core00/select
HTTP/1.1 200 289 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 506 1
10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST /worldip5/core01/select
HTTP/1.1 200 3390 1
10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/start=0rows=20q=hazard+gas+countrycode:JP
HTTP/1.1 200 11873 9

Any ideas on why there is a difference in the requests made? Is there a way
I can tell Solr to prefer the documents in core00?

Mark
-- 
View this message in context: 
http://www.nabble.com/Core-shard-preference-tp25966791p25966791.html
Sent from the Solr - User mailing list archive at Nabble.com.