Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Yonik Seeley
On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary gary.mo...@ars.usda.gov wrote:

 I have a number of chemical names containing commas which I'm mapping in 
 index_synonyms.txt thusly:

 2\,4-D-butotyl=Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
 3,CCRIS 8562

 According to the sample synonyms.txt, the comma above should be. i.e. 
 a\,a=b\,b.    The problem is that according to analysis.jsp the commas are 
 not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I 
 paste in 2\,4-D-butotyl, the mappings are done.


I can confirm that this works in 1.4, but no longer works in 3x or
trunk.  Can you open an issue?

-Yonik
http://www.lucidimagination.com


Re: commas in synonyms.txt are not escaping

2011-08-26 Thread Yonik Seeley
On Fri, Aug 26, 2011 at 11:16 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary gary.mo...@ars.usda.gov wrote:

 I have a number of chemical names containing commas which I'm mapping in 
 index_synonyms.txt thusly:

 2\,4-D-butotyl=Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
 3,CCRIS 8562

 According to the sample synonyms.txt, the comma above should be. i.e. 
 a\,a=b\,b.    The problem is that according to analysis.jsp the commas are 
 not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I 
 paste in 2\,4-D-butotyl, the mappings are done.


 I can confirm that this works in 1.4, but no longer works in 3x or
 trunk.  Can you open an issue?

Actually, I think I've tracked it to LUCENE-3233 where the parsing
rules were moved from Solr to Lucene (and changed the functionality in
the process).
I'll reopen t hat since I don't think it's been in a released version yet.

-Yonik
http://www.lucidimagination.com


Re: Query vs Filter Query Usage

2011-08-25 Thread Yonik Seeley
On Thu, Aug 25, 2011 at 5:19 PM, Michael Ryan mr...@moreover.com wrote:
 10,000,000 document index
 Internal Document id is 32 bit unsigned int
 Max Memory Used by a single cache slot in the filter cache = 32 bits x
 10,000,000 docs = 320,000,000 bits or 38 MB

 I think it depends on where exactly the result set was generated. I believe 
 the result set will usually be represented by a BitDocSet, which requires 1 
 bit per doc in your index (result set size doesn't matter), so in your case 
 it would be about 1.2MB.

Right - and Solr switches between the implementation depending on set
size... so if the number of documents in the set were 100, then it
would only take up 400 bytes.

-Yonik
http://www.lucidimagination.com


Re: Query parameter changes from solr 1.4 to 3.3

2011-08-25 Thread Yonik Seeley
On Tue, Aug 23, 2011 at 7:11 AM, Samarendra Pratap samarz...@gmail.com wrote:
  We are upgrading solr 1.4 (with collapsing patch solr-236) to solr 3.3. I
 was looking for the required changes in query parameters (or parameter
 names) if any.

There should be very few (but check CHANGES.txt as Erick pointed out).
We try to keep the main HTTP APIs very stable, even across major versions.

  One thing I know for sure is that collapse and its sub-options are now
 known by group, but didn't find anything else.

Field collapsing/grouping was never in any 1.4 release.

-Yonik
http://www.lucidimagination.com


Re: Batch updates order guaranteed?

2011-08-23 Thread Yonik Seeley
On Tue, Aug 23, 2011 at 2:17 PM, Glenn s...@t2.zazu.com wrote:
 Question about batch updates (performing a delete and add in same
 request, as described at bottom
 of http://wiki.apache.org/solr/UpdateXmlMessages):
 http://wiki.apache.org/solr/UpdateXmlMessages%29:  is the order
 guaranteed?  If a delete is followed by an add, will the delete
 always be performed first?  I would assume so but would like to get
 confirmation.

Yes, if you're crafting the update message yourself in XML or JSON.
SolrJ is a different matter I think.

-Yonik
http://www.lucidimagination.com


Re: Batch updates order guaranteed?

2011-08-23 Thread Yonik Seeley
On Tue, Aug 23, 2011 at 3:38 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Tue, Aug 23, 2011 at 2:17 PM, Glenn s...@t2.zazu.com wrote:
 Question about batch updates (performing a delete and add in same
 request, as described at bottom
 of http://wiki.apache.org/solr/UpdateXmlMessages):
 http://wiki.apache.org/solr/UpdateXmlMessages%29:  is the order
 guaranteed?  If a delete is followed by an add, will the delete
 always be performed first?  I would assume so but would like to get
 confirmation.

 Yes, if you're crafting the update message yourself in XML or JSON.
 SolrJ is a different matter I think.

Found the SolrJ issue:
https://issues.apache.org/jira/browse/SOLR-1162

Looks like it sort of got dropped, but I think this is worth fixing.

-Yonik
http://www.lucidimagination.com


Re: Solr 3.3 crashes after ~18 hours?

2011-08-19 Thread Yonik Seeley
On Fri, Aug 19, 2011 at 10:36 AM, alexander sulz a.s...@digiconcept.net wrote:
 using lsof I think I pinned down the problem: too many open files!
 I already doubled from 512 to 1024 once but it seems there are many SOCKETS
 involved,
 which are listed as can't identify protocol, instead of real files.
 over time, the list grows and grows with these entries until.. it crashs.
 So Ive read several times the fix for this problem is to set the limit to a
 ridiculous high number but
 that seems a little bit of a crude fix. Why so many open sockets in the
 first place?

What are you using as a client to talk to solr?
You need to look at both the update side and the query side.
Using persistent connections is the best all-around, but if not, be
sure to close the connections in the client.

-Yonik
http://www.lucidimagination.com


Re: solr keeps dying every few hours.

2011-08-17 Thread Yonik Seeley
On Wed, Aug 17, 2011 at 5:56 PM, Jason Toy jason...@gmail.com wrote:
 I've only set set minimum memory and have not set maximum memory.  I'm doing
 more investigation and I see that I have 100+ dynamic fields for my
 documents, not the 10 fields I quoted earlier.  I also sort against those
 dynamic fields often,  I'm reading that this potentially uses a lot of
 memory.  Could this be the cause of my problems and if so what options do I
 have to deal with this?

Yes, that's most likely the problem.
Sorting on an integer field causes a FieldCache entry with an
int[maxDoc] (i.e. 4 bytes per document in the index, regardless of if
it has a value for that field or not).
Sorting on a string field is 4 bytes per doc in the index (the ords)
plus the memory to store the actual unique string values.

-Yonik
http://www.lucidimagination.com



 On Wed, Aug 17, 2011 at 2:46 PM, Markus Jelsma
 markus.jel...@openindex.iowrote:

 Keep in mind that a commit warms up another searcher and potentially
 doubling
 RAM consumption in the back ground due to cache warming queries being
 executed
 (newSearcher event). Also, where is your Xmx switch? I don't know how your
 JVM
 will behave if you set Xms  Xmx.

 65m docs is quite a lot but it should run fine with 3GB heap allocation.

 It's a good practice to use a master for indexing without any caches and
 warm-
 up queries when you exceed a certain amount of documents, it will bite.

  I have a large ec2 instance(7.5 gb ram), it dies every few hours with out
  of heap memory issues.  I started upping the min memory required,
  currently I use -Xms3072M .
  I insert about 50k docs an hour and I currently have about 65 million
 docs
  with about 10 fields each. Is this already too much data for one box? How
  do I know when I've reached the limit of this server? I have no idea how
  to keep control of this issue.  Am I just supposed to keep upping the min
  ram used for solr? How do I know what the accurate amount of ram I should
  be using is? Must I keep adding more memory as the index size grows, I'd
  rather the query be a little slower if I can use constant memory and have
  the search read from disk.




 --
 - sent from my mobile
 6176064373



Re: sorting issue with solr 3.3

2011-08-12 Thread Yonik Seeley
On Fri, Aug 12, 2011 at 9:53 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
 It turned out that there is a sorting issue with solr 3.3.
 As fas as I could trace it down currently:

 4 docs in the index and a search for *:*

 sorting on field dccreator_sort in descending order

 http://localhost:8983/solr/select?fsv=truesort=dccreator_sort%20descindent=onversion=2.2q=*%3A*start=0rows=10fl=dccreator_sort

 result is:
 --
 lst name=sort_values
 arr name=dccreator_sort
 strconvertitovistitutonazionaled/str
 str莊國鴻chuangkuohung/str
 strzyywwwxxx/str
 strabdelhadiyasserabdelfattah/str
 /arr
 /lst


Hmmm, are the docs sorted incorrectly too, or is it the sort_values
that are incorrect?
All variants of string sorting should be well tested... see TestSort.testSort()


 fieldType:
 --
 fieldType name=alphaOnlySortLim class=solr.TextField
 sortMissingLast=true omitNorms=true
  analyzer
    tokenizer class=solr.KeywordTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory /
    filter class=solr.TrimFilterFactory /
    filter class=solr.PatternReplaceFilterFactory
 pattern=([\x20-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]) replacement=
 replace=all/
    filter class=solr.PatternReplaceFilterFactory
 pattern=(.{1,30})(.{31,}) replacement=$1 replace=all/
  /analyzer
 /fieldType

 field:
 --
 field name=dccreator_sort type=alphaOnlySortLim indexed=true
 stored=true /


 According to documentation the sorting is UTF8 but _why_ is the first string
 at position 1 and _not_ at position 3 as it should be?


 Following sorting through the code is somewhat difficult.
 Any hint where to look for or where to start debugging?


Sorting.getStringSortField()

Can you reproduce this with a smaller test that we could use to debug/fix?

-Yonik
http://www.lucidimagination.com


Re: sorting issue with solr 3.3

2011-08-12 Thread Yonik Seeley
On Fri, Aug 12, 2011 at 1:04 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Fri, Aug 12, 2011 at 9:53 AM, Bernd Fehling
 bernd.fehl...@uni-bielefeld.de wrote:
 It turned out that there is a sorting issue with solr 3.3.
 As fas as I could trace it down currently:

 4 docs in the index and a search for *:*

 sorting on field dccreator_sort in descending order

 http://localhost:8983/solr/select?fsv=truesort=dccreator_sort%20descindent=onversion=2.2q=*%3A*start=0rows=10fl=dccreator_sort

 result is:
 --
 lst name=sort_values
 arr name=dccreator_sort
 strconvertitovistitutonazionaled/str
 str莊國鴻chuangkuohung/str
 strzyywwwxxx/str
 strabdelhadiyasserabdelfattah/str
 /arr
 /lst


 Hmmm, are the docs sorted incorrectly too, or is it the sort_values
 that are incorrect?
 All variants of string sorting should be well tested... see 
 TestSort.testSort()

OK, something is very wrong with that test - I purposely introduced an
error into MissingLastOrdComparator and the test isn't failing.
I'll dig.

-Yonik
http://www.lucidimagination.com


Re: sorting issue with solr 3.3

2011-08-12 Thread Yonik Seeley
On Fri, Aug 12, 2011 at 2:08 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Fri, Aug 12, 2011 at 1:04 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Fri, Aug 12, 2011 at 9:53 AM, Bernd Fehling
 bernd.fehl...@uni-bielefeld.de wrote:
 It turned out that there is a sorting issue with solr 3.3.
 As fas as I could trace it down currently:

 4 docs in the index and a search for *:*

 sorting on field dccreator_sort in descending order

 http://localhost:8983/solr/select?fsv=truesort=dccreator_sort%20descindent=onversion=2.2q=*%3A*start=0rows=10fl=dccreator_sort

 result is:
 --
 lst name=sort_values
 arr name=dccreator_sort
 strconvertitovistitutonazionaled/str
 str莊國鴻chuangkuohung/str
 strzyywwwxxx/str
 strabdelhadiyasserabdelfattah/str
 /arr
 /lst


 Hmmm, are the docs sorted incorrectly too, or is it the sort_values
 that are incorrect?
 All variants of string sorting should be well tested... see 
 TestSort.testSort()

 OK, something is very wrong with that test - I purposely introduced an
 error into MissingLastOrdComparator and the test isn't failing.
 I'll dig.

Oops, scratch that.  It was a bug I just introduced into the test in
my local copy to try and reproduce your issue.

-Yonik
http://www.lucidimagination.com


Re: sorting issue with solr 3.3

2011-08-12 Thread Yonik Seeley
I've checked in an improved TestSort that adds deleted docs and
randomizes things a lot more (and fixes the previous reliance on doc
ids not being reordered).
I still can't reproduce this error though.
Is this stock solr?  Can you verify that the documents are in the
wrong order also (and not just the field sort values)?

-Yonik
http://www.lucidimagination.com


Re: frange not working in query

2011-08-11 Thread Yonik Seeley
On Wed, Aug 10, 2011 at 5:57 AM, Amit Sawhney sawhney.a...@gmail.com wrote:
 Hi All,

 I am trying to sort the results on a unix timestamp using this query.

 http://url.com:8983/solr/db/select/?indent=onversion=2.1q={!frange%20l=0.25}query($qq)qq=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1

 When I run this query, it says 'no field name specified in query and no 
 defaultSearchField defined in schema.xml'


The default query type for embedded queries is lucene.   so your
qq=nokia is equivalent to qq={!lucene}nokia

So one way is to explicitly make it dismax:
   qq={!dismax}nokia
Another way is to declare the sub-query to be of type dismax:
  q={!frange l=0.25}query({!dismax v=$qq})qq=nokia

-Yonik
http://www.lucidimagination.com


 As soon as I remove the frange query and run this, it starts working fine.

 http://url.com:8983/solr/db/select/?indent=onversion=2.1q=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1

 Any pointers?


 Thanks,
 Amit


Re: Solr 3.3 crashes after ~18 hours?

2011-08-10 Thread Yonik Seeley
On Wed, Aug 10, 2011 at 11:00 AM, alexander sulz a.s...@digiconcept.net wrote:
 Okay, with this command it hangs.

It doesn't look like a hang from this thread dump.  It doesn't look
like any solr requests are executing at the time the dump was taken.

Did you do this from the command line?
curl http://localhost:8983/solr/update?commit=true;

Are you saying that the curl command just hung and never returned?

-Yonik
http://www.lucidimagination.com

 Also: I managed to get a Thread Dump (attached).

 regards

 Am 05.08.2011 15:08, schrieb Yonik Seeley:

 On Fri, Aug 5, 2011 at 7:33 AM, alexander sulza.s...@digiconcept.net
  wrote:

 Usually you get a XML-Response when doing commits or optimize, in this
 case
 I get nothing
 in return, but the site ( http://[...]/solr/update?optimize=true )
 DOESN'T
 load forever or anything.
 It doesn't hang! I just get a blank page / empty response.

 Sounds like you are doing it from a browser?
 Can you try it from the command line?  It should give back some sort
 of response (or hang waiting for a response).

 curl http://localhost:8983/solr/update?commit=true;

 -Yonik
 http://www.lucidimagination.com


 I use the stuff in the example folder, the only changes i made was enable
 logging and changing the port to 8985.
 I'll try getting a thread dump if it happens again!
 So far its looking good with having allocated more memory to it.

 Am 04.08.2011 16:08, schrieb Yonik Seeley:

 On Thu, Aug 4, 2011 at 8:09 AM, alexander sulza.s...@digiconcept.net
  wrote:

 Thank you for the many replies!

 Like I said, I couldn't find anything in logs created by solr.
 I just had a look at the /var/logs/messages and there wasn't anything
 either.

 What I mean by crash is that the process is still there and http GET
 pings
 would return 200
 but when i try visiting /solr/admin, I'd get a blank page! The server
 ignores any incoming updates or commits,

 ignores means what?  The request hangs?  If so, could you get a thread
 dump?

 Do queries work (like /solr/select?q=*:*) ?

 thous throwing no errors, no 503's.. It's like the server has a
 blackout
 and
 stares blankly into space.

 Are you using a different servlet container than what is shipped with
 solr?
 If you did start with the solr example server, what jetty
 configuration changes have you made?

 -Yonik
 http://www.lucidimagination.com





Re: csv responsewriter and numfound

2011-08-08 Thread Yonik Seeley
On Mon, Aug 8, 2011 at 5:12 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
 Great question.  But how would that get returned in the response?

 It is a drag that the header is lost when results are written in CSV, but 
 there really isn't an obvious spot for that information to be returned.

I guess a comment would be one option.

-Yonik
http://www.lucidimagination.com


Re: 4820 searchers opened?

2011-08-06 Thread Yonik Seeley
On Sat, Aug 6, 2011 at 11:31 AM, Paul Libbrecht p...@hoplahup.net wrote:

 Le 6 août 2011 à 02:09, Yonik Seeley a écrit :

 On Fri, Aug 5, 2011 at 7:30 PM, Paul Libbrecht p...@hoplahup.net wrote:
 my solr is coming to slowly reach its memory limits (8Gb) and the stats 
 displays me a reasonable fieldCache (1800) but 4820 searchers. That sounds 
 a bit much to me, each has been opened in its own time since the last 
 restart about two weeks ago.

 Definitely sounds like a reference leak.
 What version are you using?

 1.4.1.


 Is this stock Solr, or do you have any custom request handlers or
 anything else that could be forgetting to decrement the reference
 count of the searchers it uses?

 I have a custom query-handler and a custom response writer.

Do you always retrieve the searcher via
SolrQueryRequest.getSearcher()?  If so, there should be no problem...
but if you call SolrCore.getSearcher(), that is where leaks can happen
if you don't decref the reference returned.


 I also use the velocity response-writer (for debug purposes).
 None store a searcher or params, I believe.

 I have a query in the query handler that is a thread-local (it's a large 
 preferring query that I add to every query). Could this be the reason?

As long as it's a normal query that has not been rewritten or
weighted, it should have no state tied to any particular
reader/searcher and you should be fine.

-Yonik
http://www.lucidimagination.com

 I also have a thread-local that stores a date-formatter.

 Should I post my config?

 paul


Re: 4820 searchers opened?

2011-08-06 Thread Yonik Seeley
On Sat, Aug 6, 2011 at 1:35 PM, Paul Libbrecht p...@hoplahup.net wrote:

 Le 6 août 2011 à 17:37, Yonik Seeley a écrit :
 I have a custom query-handler and a custom response writer.

 Do you always retrieve the searcher via
 SolrQueryRequest.getSearcher()?  If so, there should be no problem...
 but if you call SolrCore.getSearcher(), that is where leaks can happen
 if you don't decref the reference returned.

 I've been using the following:

   rb.req.getCore().getSearcher().get().getReader()

Bingo!  Code should never do core.getSearcher().get()
since core.getSearcher returns a reference that must be decremented
when you are done.

Using req.getSearcher() is much easier since it ensures that the
searcher never changes during the scope of a single request
and it handles decrementing the reference when the request is closed.

-Yonik
http://www.lucidimagination.com


Re: 4820 searchers opened?

2011-08-06 Thread Yonik Seeley
On Sat, Aug 6, 2011 at 2:17 PM, Paul Libbrecht p...@hoplahup.net wrote:
 This is convincing me... I'd like to experiment and close.

 So, how can I be sure this is the right thing?
 I would have thought adding a document and committing would have created a 
 Searcher in my current usage but I do not see the reference list actually 
 being enlarged on my development machine.

It is creating a new searcher, but then closing the old searcher after
all currently running requests are done using it (that's what the
reference counting is for).
After the searcher is closed, it's removed from the list.

Pay attention to the address of the searcher on the stats page:
  searcherName : Searcher@7d0ade7e main

You should see the address change after a commit.

-Yonik
http://www.lucidimagination.com


Re: 4820 searchers opened?

2011-08-06 Thread Yonik Seeley
On Sat, Aug 6, 2011 at 2:30 PM, Paul Libbrecht p...@hoplahup.net wrote:

 Le 6 août 2011 à 20:21, Yonik Seeley a écrit :

 It is creating a new searcher, but then closing the old searcher after
 all currently running requests are done using it (that's what the
 reference counting is for).
 After the searcher is closed, it's removed from the list.

 Not if using:
          rb.req.getCore().getSearcher().get().getReader()
 right?

 Pay attention to the address of the searcher on the stats page:
  searcherName : Searcher@7d0ade7e main
 You should see the address change after a commit.


 I saw that one.
 But how can I see retention?

Oh, I see... you want to re-create the bug so you can see when it is fixed?
To trigger the bug, you need to hit a code path that uses the
getCore().getSearcher().get() code.

So first send a request that hits that buggy code, then add a doc and
do a commit, and then you should see
more than one searcher on the stats page.

-Yonik
http://www.lucidimagination.com


Re: Solr 3.3 crashes after ~18 hours?

2011-08-05 Thread Yonik Seeley
On Fri, Aug 5, 2011 at 7:33 AM, alexander sulz a.s...@digiconcept.net wrote:
 Usually you get a XML-Response when doing commits or optimize, in this case
 I get nothing
 in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T
 load forever or anything.
 It doesn't hang! I just get a blank page / empty response.

Sounds like you are doing it from a browser?
Can you try it from the command line?  It should give back some sort
of response (or hang waiting for a response).

curl http://localhost:8983/solr/update?commit=true;

-Yonik
http://www.lucidimagination.com


 I use the stuff in the example folder, the only changes i made was enable
 logging and changing the port to 8985.
 I'll try getting a thread dump if it happens again!
 So far its looking good with having allocated more memory to it.

 Am 04.08.2011 16:08, schrieb Yonik Seeley:

 On Thu, Aug 4, 2011 at 8:09 AM, alexander sulza.s...@digiconcept.net
  wrote:

 Thank you for the many replies!

 Like I said, I couldn't find anything in logs created by solr.
 I just had a look at the /var/logs/messages and there wasn't anything
 either.

 What I mean by crash is that the process is still there and http GET
 pings
 would return 200
 but when i try visiting /solr/admin, I'd get a blank page! The server
 ignores any incoming updates or commits,

 ignores means what?  The request hangs?  If so, could you get a thread
 dump?

 Do queries work (like /solr/select?q=*:*) ?

 thous throwing no errors, no 503's.. It's like the server has a blackout
 and
 stares blankly into space.

 Are you using a different servlet container than what is shipped with
 solr?
 If you did start with the solr example server, what jetty
 configuration changes have you made?

 -Yonik
 http://www.lucidimagination.com




Re: 4820 searchers opened?

2011-08-05 Thread Yonik Seeley
On Fri, Aug 5, 2011 at 7:30 PM, Paul Libbrecht p...@hoplahup.net wrote:
 my solr is coming to slowly reach its memory limits (8Gb) and the stats 
 displays me a reasonable fieldCache (1800) but 4820 searchers. That sounds a 
 bit much to me, each has been opened in its own time since the last restart 
 about two weeks ago.

Definitely sounds like a reference leak.
Is this stock Solr, or do you have any custom request handlers or
anything else that could be forgetting to decrement the reference
count of the searchers it uses?
What version are you using?

-Yonik
http://www.lucidimagination.com


Re: Solr 3.3 crashes after ~18 hours?

2011-08-04 Thread Yonik Seeley
On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz a.s...@digiconcept.net wrote:
 Thank you for the many replies!

 Like I said, I couldn't find anything in logs created by solr.
 I just had a look at the /var/logs/messages and there wasn't anything
 either.

 What I mean by crash is that the process is still there and http GET pings
 would return 200
 but when i try visiting /solr/admin, I'd get a blank page! The server
 ignores any incoming updates or commits,

ignores means what?  The request hangs?  If so, could you get a thread dump?

Do queries work (like /solr/select?q=*:*) ?

 thous throwing no errors, no 503's.. It's like the server has a blackout and
 stares blankly into space.

Are you using a different servlet container than what is shipped with solr?
If you did start with the solr example server, what jetty
configuration changes have you made?

-Yonik
http://www.lucidimagination.com


Re: Joining on multi valued fields

2011-08-04 Thread Yonik Seeley
On Thu, Aug 4, 2011 at 11:21 AM,  matthew.fow...@thomsonreuters.com wrote:
 Hi Yonik

 So I tested the join using the sample data below and the latest trunk. I 
 still got the same behaviour.

 HOWEVER! In this case it was nothing to do with the patch or solr version. It 
 was the tokeniser splitting G1 into G and 1.

Ah, glad you figured it out!

 So thank you for a nice patch and your suggestions.

 I do have a couple of questions for you: At what level does the join happen 
 and what do you expect the performance penalty to be. We might use this 
 extensively if the performance penalty isn't great.

With the current implementation, the performance is proportional to
the number of unique terms in the fields being joined.

-Yonik
http://www.lucidimagination.com


Re: Joining on multi valued fields

2011-08-03 Thread Yonik Seeley
Hmmm, if these are real responses from a solr server at rest (i.e.
documents not being changed between queries) then what you show
definitely looks like a bug.
That's interesting, since TestJoin implements a random test that
should cover cases like this pretty well.

I assume you are using a version of trunk (4.0-dev) and not just the
actual attached to the JIRA issue (which IIRC had at least one bug...
SOLR-2521).
Have you tried a more recent version of trunk?

-Yonik
http://www.lucidimagination.com



On Wed, Aug 3, 2011 at 7:00 AM,  matthew.fow...@thomsonreuters.com wrote:
 Hi Yonik

 Sorry for my late reply. I have been trying to get to the bottom of this
 but I'm getting inconsistent behaviour. Here's an example:

 Query = pi:rcs100     -       Here going to use pid_rcs as join
 value

 result name=response numFound=1 start=0
  doc
  str name=pircs100/str
  str name=ctrcs/str
  str name=pid_rcsG1/str
  str name=name_rcsEmerging Market Countries/str
  str name=definition_rcsAll business events relating to companies
 and other issuers of securities./str
  /doc
  /result
  /response

 Query = code:G1       -       See how many docs have G1 in their
 code field. Notice that code is multi valued

 - result name=response numFound=2 start=0
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF3wGpXk+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF7YcLP+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
  /result
  /response

 Now for the join: http://10.15.39.137:8983/solr/file/select?q={!join
 from=pid_rcs to=code}pi:rcs100

 - result name=response numFound=3 start=0
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF3wGpXk+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:57Z/date
  str name=pinCIF7YcLP+1029782/str
 - arr name=code
  strG1/str
  strG7U/str
  strGK/str
  strME7/str
  strME8/str
  strMN/str
  strMR/str
  /arr
  /doc
 - doc
  str name=ctcat/str
  date name=maindocdate2011-04-22T05:48:58Z/date
  str name=pinCN1763203+1029782/str
 - arr name=code
  strA2/str
  strA5/str
  strA9/str
  strAN/str
  strB125/str
  strB126/str
  strB130/str
  strBL63/str
  strG41/str
  strGK/str
  strMZ/str
  /arr
  /doc
  /result
  /response

 So as you can see I get back 3 results when only 2 match the criteria.
 i.e. docs where G1 is present in multi valued code field. Why should
 the last document be included in the result of the join?

 Thank you,

 Matt


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: 01 August 2011 18:28
 To: solr-user@lucene.apache.org
 Subject: Re: Joining on multi valued fields

 On Mon, Aug 1, 2011 at 12:58 PM,  matthew.fow...@thomsonreuters.com
 wrote:
 I have been using the JOIN patch
 https://issues.apache.org/jira/browse/SOLR-2272 with great success.

 However I have hit a case where it doesn't seem to be working. It
 doesn't seem to work when joining to a multi-valued field.

 That should work (and the unit tests do test with multi-valued fields).
 Can you come up with a simple example where you are not getting the
 expected results?

 -Yonik
 http://www.lucidimagination.com

 This email was sent to you by Thomson Reuters, the global news and 
 information company. Any views expressed in this message are those of the 
 individual sender, except where the sender specifically states them to be the 
 views of Thomson Reuters.



Re: External File Field

2011-08-01 Thread Yonik Seeley
On Mon, Aug 1, 2011 at 11:16 AM, Mark static.void@gmail.com wrote:
 We have around 10million documents that are in our index and about 10% of
 them have some extra statistics that are calculated on a daily basis which
 are then index and used in our function queries. This reindexing comes at
 the expense of doing multiple joins in DIH so I am thinking it may be faster
 to precompute these values and use external files rather than have to
 re-index 10% of our corpus daily. How many external file fields could one
 use before it becomes too many? Is this a valid use case or am I trying to
 fit a square into a circular hole?

Each external file field will take up maxDoc*4 bytes of RAM.
The other consideration is the time to load them (how often the index
needs to change).

-Yonik
http://www.lucidimagination.com


Re: Joining on multi valued fields

2011-08-01 Thread Yonik Seeley
On Mon, Aug 1, 2011 at 12:58 PM,  matthew.fow...@thomsonreuters.com wrote:
 I have been using the JOIN patch
 https://issues.apache.org/jira/browse/SOLR-2272 with great success.

 However I have hit a case where it doesn't seem to be working. It
 doesn't seem to work when joining to a multi-valued field.

That should work (and the unit tests do test with multi-valued fields).
Can you come up with a simple example where you are not getting the
expected results?

-Yonik
http://www.lucidimagination.com


Re: what data type for geo fields?

2011-07-28 Thread Yonik Seeley
On Thu, Jul 28, 2011 at 10:24 AM, Peter Wolanin
peter.wola...@acquia.com wrote:
 Thanks for the feedback.  I'll have look more at how geohash works.

 Looking at the sample schema more closely, I see:

  fieldType name=double class=solr.TrieDoubleField
 precisionStep=0 omitNorms=true positionIncrementGap=0/

 So in fact double is also Trie, but just with precisionStep 0 in the 
 example.

Right, which means it's a normal numeric field with one token
indexed per value (i.e. no tradeoff to to speed up range queries by
increasing index size).

-Yonik
http://www.lucidimagination.com


Re: what data type for geo fields?

2011-07-27 Thread Yonik Seeley
On Wed, Jul 27, 2011 at 9:01 AM, Peter Wolanin peter.wola...@acquia.com wrote:
 Looking at the example schema:

 http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3/solr/example/solr/conf/schema.xml

 the solr.PointType field type uses double (is this just an example
 field, or used for geo search?)

While you could possibly use PointType for geo search, it doesn't have
good support for it (it's more of a general n-dimension point)
The LatLonType has all the geo support currently.

, while the solr.LatLonType field uses
 tdouble and it's unclear how the geohash is translated into lat/lon
 values or if the geohash itself might typically be used as a copyfield
 and use just for matching a query on a geohash?

There's no geohash used in LatLonType
It is indexed as a lat and lon under the covers (using the suffix _d)

 Is there an advantage in terms of speed to using Trie fields for
 solr.LatLonType?

Currently only for explicit range queries... like point:[10,10 TO 20,20]

  I would assume so, e.g. for bbox operations.

It's a bit of an implementation detail, but bbox doesn't currently use
range queries.

-Yonik
http://www.lucidimagination.com


Re: Solr vs ElasticSearch

2011-07-27 Thread Yonik Seeley
On Wed, Jul 27, 2011 at 7:17 AM, Tarjei Huse tar...@scanmine.com wrote:
 On 06/01/2011 08:22 AM, Jason Rutherglen wrote:
 Thanks Shashi, this is oddly coincidental with another issue being put
 into Solr (SOLR-2193) to help solve some of the NRT issues, the timing
 is impeccable.
 Hmm, does anyone have an idea on when this will be finished?

It's in trunk now... try it out!

-Yonik
http://www.lucidimagination.com


Re: Rounding errors in solr

2011-07-26 Thread Yonik Seeley
On Mon, Jul 25, 2011 at 10:12 AM, Brian Lamb
brian.l...@journalexperts.com wrote:
 Yes and that's causing some problems in my application. Is there a way to
 truncate the 7th decimal place in regards to sorting by the score?

Not built in.
With some Java coding, you could create a post filter that manipulates scores.

http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters

-Yonik
http://www.lucidimagination.com



 On Fri, Jul 22, 2011 at 4:27 PM, Yonik Seeley yo...@lucidimagination.com
 wrote:

 On Fri, Jul 22, 2011 at 4:11 PM, Brian Lamb
 brian.l...@journalexperts.com wrote:
  I've noticed some peculiar scoring issues going on in my application.
  For
  example, I have a field that is multivalued and has several records that
  have the same value. For example,
 
  arr name=references
   strNational Society of Animal Lovers/str
   strNat. Soc. of Ani. Lov./str
  /arr
 
  I have about 300 records with that exact value.
 
  Now, when I do a search for references:(national society animal lovers),
  I
  get the following results:
 
  id252/id
  id159/id
  id82/id
  id452/id
  id105/id
 
  When I do a search for references:(nat soc ani lov), I get the results
  ordered differently:
 
  id510/id
  id122/id
  id501/id
  id82/id
  id252/id
 
  When I load all the records that match, I notice that at some point, the
  scores aren't the same but differ by only a little:
 
  1.471928 in one and the one before it was 1.471929

 32 bit floats only have 7 decimal digits of precision, and in floating
 point land (a+b+c) can be slightly different than (c+b+a)

 -Yonik
 http://www.lucidimagination.com




Re: Solr 3.3: Exception in thread Lucene Merge Thread #1

2011-07-22 Thread Yonik Seeley
 IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64

I'm confused why MMapDirectory is getting used with the IBM JVM... I
had thought it would default to NIOFSDirectory on Linux w/ a non
Oracle JVM.
Are you specifically selecting MMapDirectory in solrconfig.xml?

Can you try the Oracle JVM to see if that changes things?

-Yonik
http://www.lucidimagination.com



On Fri, Jul 22, 2011 at 5:58 AM, mdz-munich
sebastian.lu...@bsb-muenchen.de wrote:
 I was wrong.

 After rebooting tomcat we discovered a new sweetness:

 /SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.core.SolrCore@3c753c75
 (core.name) has a reference count of 1
 22.07.2011 11:52:07 org.apache.solr.common.SolrException log
 SEVERE: java.lang.RuntimeException: java.io.IOException: Map failed
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1099)
        at org.apache.solr.core.SolrCore.init(SolrCore.java:585)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463)
        at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
        at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
        at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
        at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
        at
 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273)
        at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254)
        at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372)
        at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98)
        at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4584)
        at
 org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5262)
        at
 org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5257)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
        at java.util.concurrent.FutureTask.run(FutureTask.java:149)
        at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
        at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
        at java.lang.Thread.run(Thread.java:736)
 Caused by: java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
        at
 org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:264)
        at 
 org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
        at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129)
        at
 org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:92)
        at 
 org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:113)
        at
 org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:29)
        at
 org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81)
        at
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:750)
        at 
 org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:428)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:371)
        at
 org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1088)
        ... 18 more
 Caused by: java.lang.OutOfMemoryError: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779)
        ... 33 more/

 Any ideas and/or suggestions?

 Best regards  thank you,

 Sebastian

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-3-3-Exception-in-thread-Lucene-Merge-Thread-1-tp3185248p3190976.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 3.3: Exception in thread Lucene Merge Thread #1

2011-07-22 Thread Yonik Seeley
On Fri, Jul 22, 2011 at 9:44 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64

 I'm confused why MMapDirectory is getting used with the IBM JVM... I
 had thought it would default to NIOFSDirectory on Linux w/ a non
 Oracle JVM.

I verified that the MMapDirectory is selected by default with the IBM
JVM (it must also contain the right Sun internal classes).

Anyone else have experience with MMapDirectory w/ IBM's JVM?

-Yonik
http://www.lucidimagination.com


 Are you specifically selecting MMapDirectory in solrconfig.xml?

 Can you try the Oracle JVM to see if that changes things?

 -Yonik
 http://www.lucidimagination.com



 On Fri, Jul 22, 2011 at 5:58 AM, mdz-munich
 sebastian.lu...@bsb-muenchen.de wrote:
 I was wrong.

 After rebooting tomcat we discovered a new sweetness:

 /SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.core.SolrCore@3c753c75
 (core.name) has a reference count of 1
 22.07.2011 11:52:07 org.apache.solr.common.SolrException log
 SEVERE: java.lang.RuntimeException: java.io.IOException: Map failed
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1099)
        at org.apache.solr.core.SolrCore.init(SolrCore.java:585)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463)
        at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
        at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
        at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
        at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
        at
 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273)
        at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254)
        at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372)
        at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98)
        at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4584)
        at
 org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5262)
        at
 org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5257)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
        at java.util.concurrent.FutureTask.run(FutureTask.java:149)
        at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
        at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
        at java.lang.Thread.run(Thread.java:736)
 Caused by: java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
        at
 org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:264)
        at 
 org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
        at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129)
        at
 org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:92)
        at 
 org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:113)
        at
 org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:29)
        at
 org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81)
        at
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:750)
        at 
 org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:428)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:371)
        at
 org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1088)
        ... 18 more
 Caused by: java.lang.OutOfMemoryError: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779)
        ... 33 more/

 Any ideas and/or suggestions?

 Best regards  thank you,

 Sebastian

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-3-3-Exception-in-thread-Lucene-Merge-Thread-1-tp3185248p3190976.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr 3.3: Exception in thread Lucene Merge Thread #1

2011-07-22 Thread Yonik Seeley
OK, best guess is that you're going over some per-process address space limit.

Try seeing what ulimit -a says.

-Yonik
http://www.lucidimagination.com

On Fri, Jul 22, 2011 at 12:51 PM, mdz-munich
sebastian.lu...@bsb-muenchen.de wrote:
 Hi Yonik,

 thanks for your reply!

 Are you specifically selecting MMapDirectory in solrconfig.xml?

 Nope.

 We installed Oracle's Runtime from

 http://java.com/de/download/linux_manual.jsp?locale=de

 /java.runtime.name = Java(TM) SE Runtime Environment
 sun.boot.library.path = /usr/java/jdk1.6.0_26/jre/lib/amd64
 java.vm.version = 20.1-b02
 shared.loader =
 java.vm.vendor = Sun Microsystems Inc.
 enable.master = true
 java.vendor.url = http://java.sun.com/
 path.separator = :
 java.vm.name = Java HotSpot(TM) 64-Bit Server VM
 tomcat.util.buf.StringCache.byte.enabled = true
 file.encoding.pkg = sun.io
 java.util.logging.config.file =
 /local/master01_tomcat7x_solr33x/conf/logging.properties
 user.country = DE
 sun.java.launcher = SUN_STANDARD
 sun.os.patch.level = unknown
 java.vm.specification.name = Java Virtual Machine Specification
 user.dir = /local/master01_tomcat7x_solr33x/logs
 solr.abortOnConfigurationError = true
 java.runtime.version = 1.6.0_26-b03
 java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
 java.endorsed.dirs = /local/master01_tomcat7x_solr33x/endorsed
 os.arch = amd64
 java.io.tmpdir = /local/master01_tomcat7x_solr33x/temp
 line.separator =  /

 But no success with 1000 docs/batch, this was thrown during optimize:

 /
 22.07.2011 18:44:05 org.apache.solr.core.SolrCore execute
 INFO: [core.digi20] webapp=/solr path=/update params={} status=500
 QTime=87540
 22.07.2011 18:44:05 org.apache.solr.common.SolrException log
 SEVERE: java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748)
        at
 org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:303)
        at 
 org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:217)
        at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129)
        at
 org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:245)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:117)
        at 
 org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:703)
        at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4196)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863)
        at
 org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
        at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715)
        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2525)
        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2462)
        at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:410)
        at
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
        at
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
        at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
        at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
        at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
        at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
        at
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
        at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
        at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
        at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563)
        at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
        at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:403)
        at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:301)
        at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:162)
        at
 

Re: Solr 3.3: Exception in thread Lucene Merge Thread #1

2011-07-22 Thread Yonik Seeley
Yep, there ya go... your OS configuration is limiting you to 27G of
virtual address space per process.  Consider setting that to
unlimited.

-Yonik
http://www.lucidimagination.com



On Fri, Jul 22, 2011 at 1:05 PM, mdz-munich
sebastian.lu...@bsb-muenchen.de wrote:
 It says:

 /core file size          (blocks, -c) 0
 data seg size           (kbytes, -d) unlimited
 scheduling priority             (-e) 0
 file size               (blocks, -f) unlimited
 pending signals                 (-i) 257869
 max locked memory       (kbytes, -l) 64
 max memory size         (kbytes, -m) 28063940
 open files                      (-n) 8192
 pipe size            (512 bytes, -p) 8
 POSIX message queues     (bytes, -q) 819200
 real-time priority              (-r) 0
 stack size              (kbytes, -s) 8192
 cpu time               (seconds, -t) unlimited
 max user processes              (-u) 257869
 virtual memory          (kbytes, -v) 27216080
 file locks                      (-x) unlimited/


 Best regards,

 Sebastian



 Yonik Seeley-2-2 wrote:

 OK, best guess is that you're going over some per-process address space
 limit.

 Try seeing what ulimit -a says.

 -Yonik
 http://www.lucidimagination.com

 On Fri, Jul 22, 2011 at 12:51 PM, mdz-munich
 lt;sebastian.lu...@bsb-muenchen.degt; wrote:
 Hi Yonik,

 thanks for your reply!

 Are you specifically selecting MMapDirectory in solrconfig.xml?

 Nope.

 We installed Oracle's Runtime from

 http://java.com/de/download/linux_manual.jsp?locale=de

 /java.runtime.name = Java(TM) SE Runtime Environment
 sun.boot.library.path = /usr/java/jdk1.6.0_26/jre/lib/amd64
 java.vm.version = 20.1-b02
 shared.loader =
 java.vm.vendor = Sun Microsystems Inc.
 enable.master = true
 java.vendor.url = http://java.sun.com/
 path.separator = :
 java.vm.name = Java HotSpot(TM) 64-Bit Server VM
 tomcat.util.buf.StringCache.byte.enabled = true
 file.encoding.pkg = sun.io
 java.util.logging.config.file =
 /local/master01_tomcat7x_solr33x/conf/logging.properties
 user.country = DE
 sun.java.launcher = SUN_STANDARD
 sun.os.patch.level = unknown
 java.vm.specification.name = Java Virtual Machine Specification
 user.dir = /local/master01_tomcat7x_solr33x/logs
 solr.abortOnConfigurationError = true
 java.runtime.version = 1.6.0_26-b03
 java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
 java.endorsed.dirs = /local/master01_tomcat7x_solr33x/endorsed
 os.arch = amd64
 java.io.tmpdir = /local/master01_tomcat7x_solr33x/temp
 line.separator =  /

 But no success with 1000 docs/batch, this was thrown during optimize:

 /
 22.07.2011 18:44:05 org.apache.solr.core.SolrCore execute
 INFO: [core.digi20] webapp=/solr path=/update params={} status=500
 QTime=87540
 22.07.2011 18:44:05 org.apache.solr.common.SolrException log
 SEVERE: java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748)
        at
 org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:303)
        at
 org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:217)
        at
 org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129)
        at
 org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:245)
        at
 org.apache.lucene.index.SegmentReader.get(SegmentReader.java:117)
        at
 org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:703)
        at
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4196)
        at
 org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863)
        at
 org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
        at
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715)
        at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2525)
        at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2462)
        at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:410)
        at
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
        at
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
        at
 org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
        at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
        at
 org.apache.catalina.core.ApplicationFilterChain.doFilter

Re: Rounding errors in solr

2011-07-22 Thread Yonik Seeley
On Fri, Jul 22, 2011 at 4:11 PM, Brian Lamb
brian.l...@journalexperts.com wrote:
 I've noticed some peculiar scoring issues going on in my application. For
 example, I have a field that is multivalued and has several records that
 have the same value. For example,

 arr name=references
  strNational Society of Animal Lovers/str
  strNat. Soc. of Ani. Lov./str
 /arr

 I have about 300 records with that exact value.

 Now, when I do a search for references:(national society animal lovers), I
 get the following results:

 id252/id
 id159/id
 id82/id
 id452/id
 id105/id

 When I do a search for references:(nat soc ani lov), I get the results
 ordered differently:

 id510/id
 id122/id
 id501/id
 id82/id
 id252/id

 When I load all the records that match, I notice that at some point, the
 scores aren't the same but differ by only a little:

 1.471928 in one and the one before it was 1.471929

32 bit floats only have 7 decimal digits of precision, and in floating
point land (a+b+c) can be slightly different than (c+b+a)

-Yonik
http://www.lucidimagination.com


Re: Determine which field term was found?

2011-07-21 Thread Yonik Seeley
On Thu, Jul 21, 2011 at 4:47 PM, Olson, Ron rol...@lbpc.com wrote:
 Is there an easy way to find out which field matched a term in an OR query 
 using Solr? I have a document with names in two multi-valued fields and I am 
 searching for Smith, using the query A_NAMES:smith OR B_NAMES:smith. I 
 figure I could loop through both result arrays, but that seems weird to me to 
 have to search again for the value in a result.

That's pretty much the way lucene currently works - you don't know
what fields match a query.
If the query is simple, looping over the returned stored fields is
probably your best bet.

There are a couple other tricks you could use (although they are not
necessarily better):
1) with grouping by query (a trunk feature) you can essentially return
both queries with one request:
  q=*:*group=truegroup.query=A_NAMES:smithgroup.query=B_NAMES:smith
  and optionally add a group.query=A_NAMES:smith OR B_NAMES:smith if
you need the combined list
2) use pseudo-fields (also trunk) in conjunction with the termfreq
function (the number of times a term appears in a field).  This
obviously only works with term queries.
  fl=*,count1:termfreq(A_NAMES,'smith'),count2:termfreq(B_NAMES,'smith')
  You can use parameter substitution to pull out the actual term and
simplify the query:
  fl=*,count1:termfreq(A_NAMES,$term),count2:termfreq(B_NAMES,$term)term=smith


-Yonik
http://www.lucidimagination.com


Re: defType argument weirdness

2011-07-20 Thread Yonik Seeley
On Tue, Jul 19, 2011 at 11:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Is it generally recognized that this terminology is confusing, or is it just 
 me?

 I do understand what they do (at least well enough to use them), but I find 
 it confusing that it's called defType as a main param, but type in a 
 LocalParam

When used as the main param, it is still just the default (i.e. it may
be overridden).
For example defType=luceneq={!func}1

 (and then there's 'qt', often confused with defType/type by newbies, since 
 they guess it stands for 'query type', but which should probably actually 
 have been called 'requestHandler'/'rh' instead, since that's what it actually 
 chooses, no?  It gets very confusing).

Yeah, qt is very historical... before the QParserPlugin framework,
and before request handlers were used for many other things (including
updates).

-Yonik
http://www.lucidimagination.com


 If it's generally recognized it's confusing and perhaps a somewhat 
 inconsistent mental model being implied, I wonder if there'd be any interest 
 in renaming these to be more clear, leaving the old ones as aliases/synonyms 
 for backwards compatibility (perhaps with a long deprecation period, or 
 perhaps existing forever). I know it was very confusing to me to keep track 
 of these parameters and what they did for quite a while, and still trips me 
 up from time to time.

 Jonathan
 
 From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley 
 [yo...@lucidimagination.com]
 Sent: Tuesday, July 19, 2011 9:40 PM
 To: solr-user@lucene.apache.org
 Subject: Re: defType argument weirdness

 On Tue, Jul 19, 2011 at 1:25 PM, Naomi Dushay ndus...@stanford.edu wrote:
 Regardless, I thought that     defType=dismaxq=*:*   is supposed to be
 equivalent to  q={!defType=dismax}*:*  and also equivalent to q={!dismax}*:*

 Not quite - there is a very subtle distinction.

 {!dismax}  is short for {!type=dismax}, the type of the actual query,
 and this may not be overridden.

 The defType local param is only the default type for sub-queries (as
 opposed to the current query).
 It's useful in conjunction with the query  or nested query qparser:
 http://lucene.apache.org/solr/api/org/apache/solr/search/NestedQParserPlugin.html

 -Yonik
 http://www.lucidimagination.com



Re: Reading Solr's JSON

2011-07-20 Thread Yonik Seeley
On Wed, Jul 20, 2011 at 10:58 AM, Sowmya V.B. vbsow...@gmail.com wrote:
 Which is the best way to read Solr's JSON output, from a Java code?

You could use SolrJ - it handles parsing for you (and uses the most
efficient binary format by default).

 There seems to be a JSONParser in one of the jar files in SolrLib
 (org.apache.noggit..)...but I dont understand how to read the parsed output
 in this.

If you just want to deserialize into objects (Maps, Lists, etc) then it's easy:

ObjectBuilder.fromJSON(my_json_string)

-Yonik
http://www.lucidimagination.com


Re: Wiki Error JSON syntax

2011-07-20 Thread Yonik Seeley
On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou
remyloubra...@gmail.com wrote:
 Hi,
 I was writing a Solr Client API for Node and I found an error on this page
 http://wiki.apache.org/solr/UpdateJSON ,on the section Update Commands the
 JSON is not valid because there are duplicate keys and two times with add
 and delete.

It's a common misconception that it's invalid JSON.  Duplicate keys
are in fact legal.

-Yonik
http://www.lucidimagination.com

I tried with an array and it doesn't work as well, I got error
 400, I think that's because the syntax is bad.

 I don't really know if I am at the good place to talk about that but ...
 that the only place I found. Sorry if it's not.

 Thanks,

 And I love Solr :)



Re: Using FieldCache in SolrIndexSearcher - crazy idea?

2011-07-19 Thread Yonik Seeley
On Tue, Jul 19, 2011 at 3:20 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 :  Quite probably ... you typically can't assume that a FieldCache can be
 :  constructed for *any* field, but it should be a safe assumption for the
 :  uniqueKey field, so for that initial request of the mutiphase distributed
 :  search it's quite possible it would speed things up.
 :
 : Ah, thanks Hoss - I had meant to respond to the original email, but
 : then I lost track of it.
 :
 : Via pseudo-fields, we actually already have the ability to retrieve
 : values via FieldCache.
 : fl=id:{!func}id

 isn't that kind of orthoginal to the question though? ... a user can use
 the new psuedo-field functionality to request values from the FieldCache
 instead of stored fields, but specificly in the case of distributed
 search, when the first request is only asking for the uniqueKey values and
 scores, shouldn't that use the FieldCache to get those values?  (w/o the
 user needing to jumpt thorugh hoops in how the request is made/configured)

Well, I was pointing out that distributed search could be easily
modified to use the field-cache
by changing id to id:{!func}id

But I'm not sure we should do that by default - the memory of a full
fieldCache entry is non-trivial for some people.
Using a CSF id field would be better I think (the type were it doesn't
populate a fieldcache entry).

-Yonik
http://www.lucidimagination.com


Re: Using functions in fq

2011-07-19 Thread Yonik Seeley
On Tue, Jul 19, 2011 at 6:49 PM, solr nps solr...@gmail.com wrote:
 My documents have two prices retail_price and current_price. I want to
 get products which have a sale of x%, the x is dynamic and can be specified
 by the user. I was trying to achieve this by using fq.

 If I want all sony tv's that are at least 20% off, I want to write something
 like

 q=sony tvfq=current_price:[0 TO product(retail_price,0.80)]

 this does not work as the function is not expected in fq.

 how else can I achieve this?

The frange query parser may do what you want.
http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/

fq={!frange l=0 u=0.8}div(current_price, retail_price)

-Yonik
http://www.lucidimagination.com


Re: defType argument weirdness

2011-07-19 Thread Yonik Seeley
On Tue, Jul 19, 2011 at 1:25 PM, Naomi Dushay ndus...@stanford.edu wrote:
 Regardless, I thought that     defType=dismaxq=*:*   is supposed to be
 equivalent to  q={!defType=dismax}*:*  and also equivalent to q={!dismax}*:*

Not quite - there is a very subtle distinction.

{!dismax}  is short for {!type=dismax}, the type of the actual query,
and this may not be overridden.

The defType local param is only the default type for sub-queries (as
opposed to the current query).
It's useful in conjunction with the query  or nested query qparser:
http://lucene.apache.org/solr/api/org/apache/solr/search/NestedQParserPlugin.html

-Yonik
http://www.lucidimagination.com


Re: NRT and commit behavior

2011-07-18 Thread Yonik Seeley
On Mon, Jul 18, 2011 at 10:53 AM, Nicholas Chase nch...@earthlink.net wrote:
 Very glad to hear that NRT is finally here!  But my question is this: will
 things still come to a standstill during a commit?

New updates can now proceed in parallel with a commit, and
searches have always been completely asynchronous w.r.t. commits.

-Yonik
http://www.lucidimagination.com


Re: Join performance?

2011-07-18 Thread Yonik Seeley
On Mon, Jul 18, 2011 at 12:48 PM, Kanduru, Ajay (NIH/NLM/LHC) [C]
akand...@mail.nih.gov wrote:
 I am trying to optimize performance of solr with our collection. The 
 collection has 208M records with index size of about 80GB. The machine has 
 16GB and I am allocating about 14GB to solr.

 I am using self join statement in filter query like this:
 q=(general search term)
 fq={!join from=join_field to=join_field}(field1:(field1 search term) AND 
 field2:(field2 search term) AND field3:(field3 search term))
 ...

 Field definitions:
 join_field: string type (Has ~27K terms)
 field1: text type
 field2: double type
 field3: string type

 The response time of qf with join is about ten times compared to qf without 
 join (~10 sec vs ~1 sec). Is this something on expected lines?

Yep... the initial join implementation is O(nterms), so it's expected
to be slow when the number of unique terms is high.
Given your index size, it would have almost expected it to be slower!

As with faceting, I expect there to be other implementations in the
future, but nothing right now...

-Yonik
http://www.lucidimagination.com

 In general what parameters, if any, can be tweaked? The intention is to use 
 such multiple filter queries, hence the need for optimization. Sharding and 
 more horse power are obvious solutions, but more interested in optimizing for 
 a given host and a given data collection.

 Appreciate any insight in this regard.

 -Ajay



Re: Solr search starting with 1 character spin endlessly

2011-07-18 Thread Yonik Seeley
On Mon, Jul 18, 2011 at 3:44 PM, Timothy Tagge tplimi...@gmail.com wrote:
 Solr version:  1.4.1

 I'm having some trouble with certain queries run against my Solr
 index.  When a query starts with a single letter followed by a space,
 followed by another search term, the query runs endlessly and never
 comes back.  An example problem query string...

 /customer/select/?q=name%3At+j+reynoldsversion=2.2start=0rows=10indent=on


 However, if I switch the order of the search values, putting the
 longer search term before the single character, I get quick, accurate
 results

 /customer/select/?q=name%3AReynolds+T+Jversion=2.2start=0rows=10indent=on


Note that a query of name:t j reynolds
is actually equivalent to name:t default_field:j default_field:reynolds

You probably want a query of name:t j reynolds
or name:(t j reynolds)

The query probably doesn't hang, but may just take a long time if you
have a big index, or if you don't have enough RAM and the default
field isn't one that is normally searched (causing much real disk IO
to satisfy the query).

-Yonik
http://www.lucidimagination.com


 I've defined my name field as text.
 field name=name      type=text indexed=true stored=true 
 required=true /

 Where text is defined as
 fieldType name=text class=solr.TextField positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory
 synonyms=customer-synonyms.txt ignoreCase=true expand=true/
        filter class=solr.StopFilterFactory
                ignoreCase=true
                words=stopwords.txt
                enablePositionIncrements=true
                /
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory
 synonyms=customer-synonyms.txt ignoreCase=true expand=true/
        filter class=solr.StopFilterFactory
                ignoreCase=true
                words=stopwords.txt
                enablePositionIncrements=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
      /analyzer
    /fieldType

 Am I making a simple mistake somewhere?

 Thanks for your help.

 Tim T.



Re: Document IDs instead of count for facets?

2011-07-17 Thread Yonik Seeley
On Sun, Jul 17, 2011 at 10:38 AM, Jeff Schmidt j...@535consulting.com wrote:
 I don't want to query for a particular facet value, but rather have Solr do a 
 grouping of facet values. I'm not sure about the appropriate nomenclature 
 there. But, I have a multi-valued field named process that can have values 
 such as catalysis, activation, inhibition, expression, 
 modification, reaction etc.  About ~100K documents are indexed where this 
 field may have none or one or more of these processes.

 When the client makes a request, I need to tell it that for the process 
 catalysis, refer to documents 1,5,6,8,32 etc., and for modification, 
 documents 43545,22,2134, etc.

This sounds like grouping:
http://wiki.apache.org/solr/FieldCollapsing

Unfortunately it only works on single values fields, and you can't
sort based on numbers of matches either.

The closest you can get today is to issue 2 requests... the first a
faceting request to get the top constraints, and then a second that
uses group.query for each constraint you are interested in.

-Yonik
http://www.lucidimagination.com


Re: return distance in geo spatial query

2011-07-14 Thread Yonik Seeley
On Thu, Jul 14, 2011 at 8:42 AM, Zoltan Altfatter altfatt...@gmail.com wrote:
 Would be interested in the status of the development in returning the
 distance in a spatial query?

This is a feature in trunk (pseudo-fields).
For example:
  fl=id,score,geodist()


-Yonik
http://www.lucidimagination.com


Re: What's the fq= syntax for NumericRangeFilter?

2011-07-09 Thread Yonik Seeley
Something is wrong with your indexing.
Is wc an indexed field?  If not, change it so it is, then re-index your data.

If so, I'd recommend starting with the example data and filter for
something like popularity:[6 TO 10] to convince yourself it works,
then figuring out what you did differently in your schema/data.

-Yonik
http://www.lucidimagination.com

On Sat, Jul 9, 2011 at 10:50 AM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
 http://localhost:8080/solr/select?indent=onversion=2.2q=*%3A**
 fq=wc%3A%5B255+TO+257%5D*
 start=0rows=10fl=*%2Cscoreqt=wt=xmlexplainOther=hl.fl=

 The toString of the request:
 {explainOther=fl=*,scoreindent=onstart=0q=*:*hl.fl=qt=wt=xmlfq=wc:[255+TO+257]rows=1version=2.2}

 Even when the FilterQuery is constructed in Java it doesn't work (i get
 results that ignore the filter query completely).


 On Sat, Jul 9, 2011 at 3:40 PM, Ahmet Arslan iori...@yahoo.com wrote:

  I don't get it to work!
 
  If I specify no fq I get the first result with int
  name=wc256/int
 
  With wc:[255 TO 257] (fq=wc%3A%5B255+TO+257%5D) nothing
  comes out.

 If you give us the Full URL you are using, it can be helpful.

 Correct syntax is fq=wc:[255 TO 257]

 You can use more that fq in a request.




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).



Re: Join and Range Queries

2011-07-09 Thread Yonik Seeley
On Sat, Jul 9, 2011 at 8:04 PM, Lance Norskog goks...@gmail.com wrote:
 Does the Join feature work with Range queries?

Not in any generic manner - joins are based on exact matches of
indexed tokens only.
But if you wanted something specific enough like same year, then you
could index that year for each document and do the join on that (it
would actually be a self join).

You could also get other resolutions by say indexing months...

doc1:
   description:...
   date:May
   close_dates:Apr May Jun

doc2:
  description...
  date: Jun
  close_dates: May Jun Jul

Then to find other events within 1 month of the selected events:
{!join from:date to:close_dates}description:octoberfest

Or to find other events within 2 months:
{!join from:close to:close_dates}description:octoberfest

-Yonik
http://www.lucidimagination.com


 Given a time series of events stored as documents with time ranges, is
 it possible to do a search that finds certain events, and then add
 other documents whose time ranges overlap?

 --
 Lance Norskog
 goks...@gmail.com



Re: Exception when using result grouping and sorting by geodist() with Solr 3.3

2011-07-08 Thread Yonik Seeley
On Fri, Jul 8, 2011 at 4:11 AM, Thomas Heigl tho...@umschalt.com wrote:
 How should I proceed with this problem? Should I create a JIRA issue or
 should I cross-post on the dev mailing list? Any suggestions?

Yes, this definitely sounds like a bug in the 3.3 grouping (looks like
it forgets to weight the sorts).
Could you open a JIRA issue?

-Yonik
http://www.lucidimagination.com


Re: Is solrj 3.3.0 ready for field collapsing?

2011-07-05 Thread Yonik Seeley
On Mon, Jul 4, 2011 at 11:54 AM, Per Newgro per.new...@gmx.ch wrote:
 i've tried to add the params for group=true and group.field=myfield by using
 the SolrQuery.
 But the result is null. Do i have to configure something? In wiki part for
 field collapsing i couldn't
 find anything.

No specific (type-safe) support for grouping is in SolrJ currently.
But you should still have access to the complete generic solr response
via SolrJ regardless (i.e. use getResponse())

-Yonik
http://www.lucidimagination.com


Re: Using FieldCache in SolrIndexSearcher - crazy idea?

2011-07-05 Thread Yonik Seeley
On Tue, Jul 5, 2011 at 5:13 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 : Correct me if I am wrong:  In a standard distributed search with
 : QueryComponent, the first query sent to the shards asks for
 : fl=myUniqueKey or fl=myUniqueKey,score.  When the response is being
 : generated to send back to the coordinator, SolrIndexSearcher.doc (int i,
 : SetString fields) is called for each document.  As I understand it,
 : this will read each document from the index _on disk_ and retrieve the
 : myUniqueKey field value for each document.
 :
 : My idea is to have a FieldCache for the myUniqueKey field in
 : SolrIndexSearcher (or somewhere else?) that would be used in cases where
 : the only field that needs to be retrieved is myUniqueKey.  Is this
 : something that would improve performance?

 Quite probably ... you typically can't assume that a FieldCache can be
 constructed for *any* field, but it should be a safe assumption for the
 uniqueKey field, so for that initial request of the mutiphase distributed
 search it's quite possible it would speed things up.

Ah, thanks Hoss - I had meant to respond to the original email, but
then I lost track of it.

Via pseudo-fields, we actually already have the ability to retrieve
values via FieldCache.
fl=id:{!func}id

But using CSF would probably be better here - no memory overhead for
the FieldCache entry.

-Yonik
http://www.lucidimagination.com



 if you want to try this and report back results, i'm sure a lot of people
 would be interested in a patch ... i would guess the best place to make
 the chance would be in the QueryComponent so thta it used the FieldCache
 (probably best to do it via getValueSource() on the uniqueKey's
 SchemaField) to put the ids in teh response instead of using a
 SolrDocList.

 Hmm, actually...

 there's no reason why this kind of optimization would need to be specific
 to distributed queries, it could be done by the ResponseWriters directly
 -- if the field list they are being asked to return only contains the
 uniqueKeyField and computed values (like score) then don't bother calling
 SolrIndexSearcher.doc at all ... the only hitch is that with distributed
 search and using function values as psuedo fields and what not there are
 more places calling SolrIndexSearcher.doc then their use to be ... so
 maybe putting this change directly into SolrIndexSearcher.doc would make
 the most sense?



 -Hoss



Re: Custom Cache cleared after a commit?

2011-07-04 Thread Yonik Seeley
On Mon, Jul 4, 2011 at 2:07 AM, arian487 akarb...@tagged.com wrote:
 I guess I'll have to use something other then SolrCache to get what I want
 then.  Or I could use SolrCache and just change the code (I've already done
 so much of this anwyways...).  Anyways thanks for the reply.

You can specify a regenerator for your cache that examines items in
the old cache and pre-populates the new cache when a commit happens.

-Yonik
http://www.lucidimagination.com


Re: Custom Cache cleared after a commit?

2011-07-03 Thread Yonik Seeley
On Sun, Jul 3, 2011 at 10:52 PM, arian487 akarb...@tagged.com wrote:
 I know the queryResultCache and stuff live only so long as a commit happens
 but I'm wondering if the custom caches are like this as well?  I'd actually
 rather have a custom cache which is not cleared at all.

That's not currently possible.
The nature of Solr's caches are that they are completely transparent -
it doesn't matter if a cache is used or not, the response should
always be the same.  This is analogous to caching the fact that 2*2 =
4.  Put another way, Solr's caches are only for increasing request
throughput, and should not affect what response a client receives.

-Yonik
http://www.lucidimagination.com


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-07-02 Thread Yonik Seeley
OK, I tried a quick test of 1.4.1 vs 3x on optimized indexes
(unoptimized had different numbers of segments so I didn't try that).
3x (as of today) was 28% faster at a large filter query (300 terms in
one  big disjunction, with each term matching ~1000 docs).

-Yonik
http://www.lucidimagination.com


On Thu, Jun 30, 2011 at 3:30 PM, Shawn Heisey s...@elyograg.org wrote:
 On 6/29/2011 10:16 PM, Shawn Heisey wrote:

 I was thinking perhaps I might actually decrease the termIndexInterval
 value below the default of 128.  I know from reading the Hathi Trust blog
 that memory usage for the tii file is much more than the size of the file
 would indicate, but if I increase it from 13MB to 26MB, it probably would
 still be OK.

 Decreasing the termIndexInterval to 64 almost doubled the tii file size, as
 expected.  It made the filterCache warming much faster, but made the
 queryResultCache warming very very slow.  Regular queries also seem like
 they're slower.

 I am trying again with 256.  I may go back to the default before I'm done.
  I'm guessing that a lot of trial and error was put into choosing the
 default value.

 It's been fun having a newer index available on my backup servers.  I've
 been able to do a lot of trials, learned a lot of things that don't work and
 a few that do.  I might do some experiments with trunk once I've moved off
 1.4.1.

 Thanks,
 Shawn




Re: pagination and groups

2011-07-02 Thread Yonik Seeley
2011/7/1 Tomás Fernández Löbbe tomasflo...@gmail.com:
 I'm not sure I understand what you want to do. To paginate with groups you
 can use start and rows as with ungrouped queries. with group.ngroups
 (Something I found a couple of days ago) you can show the total number of
 groups. group.limit tells Solr how many (max) documents you want to see
 for each group.

Right - just be aware that requesting the total number of groups (via
group.ngroups) is pretty memory and resource intensive - that's why
there is a separate option for it.

-Yonik
http://www.lucidimagination.com


Re: pagination and groups

2011-07-02 Thread Yonik Seeley
On Sat, Jul 2, 2011 at 7:34 PM, Benson Margulies bimargul...@gmail.com wrote:
 Hey, I don't suppose you could easily tell me the rev in which ngroups 
 arrived?

1137037 I believe.  Grouping originated in Solr, was refactored to a
shared lucene/solr module, including the ability to get the total
number of groups, and then Solr's implementation was cut over to that.

 Also, how does ngroups compare to the 'matches' value inside each group?

The units for matches is currently number of documents, while the
units for ngroups is number of groups.


-Yonik
http://www.lucidimagination.com


Re: JOIN, query on the parent?

2011-07-01 Thread Yonik Seeley
On Thu, Jun 30, 2011 at 6:19 PM, Ryan McKinley ryan...@gmail.com wrote:
 Hello-

 I'm looking for a way to find all the links from a set of results.  Consider:

 doc
  id:1
  type:X
  link:a
  link:b
 /doc

 doc
  id:2
  type:X
  link:a
  link:c
 /doc

 doc
  id:3
  type:Y
  link:a
 /doc

 Is there a way to search for all the links from stuff of type X -- in
 this case (a,b,c)

Do the links point to other documents somehow?
Let's assume that there are documents with ids of a,b,c

fq={!join from=link to=id}type:X

Basically, you start with the set of documents that match type:X, then
follow from link to id to arrive at the new set of documents.

-Yonik
http://www.lucidimagination.com


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Yonik Seeley
Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2
and then run some of the queries to see if you can figure out which are slower?

Do any of the queries have stopwords in fields where you now index
those?  If so, that could entirely account for the difference.

-Yonik
http://www.lucidimagination.com

On Wed, Jun 29, 2011 at 10:59 AM, Shawn Heisey s...@elyograg.org wrote:
 I have noticed a significant difference in filter cache warming times on my
 shards between 3.2 and 1.4.1.  What can I do to troubleshoot this?  Please
 let me know what additional information you might need to look deeper.  I
 know this isn't enough.

 It takes about 3 seconds to do an autowarm count of 8 on 1.4.1 and 10-15
 seconds to do an autowarm count of 4 on 3.2.  The only explicit warming
 query is *:*, sorted descending by post_date, a tlong field containing a
 UNIX timestamp, precisionStep 16.  The indexes are not entirely identical,
 but the new one did evolve from the old one.  Perhaps one of the experts
 might spot something that makes for much slower filter cache warming, or
 some way to look deeper if this seems wrong?  Is there a way to see the
 search URL bits that populated the cache?

 Index differences: The new index has four extra small fields, is no longer
 removing stopwords, and has omitTermFreqAndPositions enabled on a
 significant number of fields.  Most of the fields are tokenized text, and
 now more than half of those don't have tf and tp enabled.  Naturally the
 largest text field where most of the matches happen still does have them
 enabled.

 To increase reindex speed, the new index has a termIndexInterval of 1024,
 the old one is at the default of 128.  In terms of raw size, the new index
 is less than one percent larger than the old one.  The old shards average
 out to 17.22GB, the new ones to 17.41GB.  Here's an overview of the
 differences of each type of file (comparing the huge optimized segment only,
 not the handful of tiny ones since) on one the index with the largest size
 gap, old value listed first:

 fdt: 6317180127/6055634923 (4.1% decrease)
 fdx: 76447972/75647412 (1% decrease)
 fnm: 382, 338 (44 bytes!  woohoo!)
 frq: 2828400926/2873249038 (1.5% increase)
 nrm: 28367782/38223988 (35% increase)
 prx: 2449154203/2684249069 (9.5% increase)
 tii: 1686298/13329832 (790% increase)  
 tis: 923045932/999294109 (8% increase)
 tvd: 18910972/19111840 (1% increase)
 tvf: 5867309063/5640332282 (3.9% decrease)
 tvx: 151294820/152895940 (1% increase)

 The tii and nrm files are the only ones that saw a significant size
 increase, but the tii file is MUCH bigger.

 Thanks,
 Shawn




Re: Solr just 'hangs' under load test - ideas?

2011-06-29 Thread Yonik Seeley
Can you get a thread dump to see what is hanging?

-Yonik
http://www.lucidimagination.com

On Wed, Jun 29, 2011 at 11:45 AM, Bob Sandiford
bob.sandif...@sirsidynix.com wrote:
 Hi, all.

 I'm hoping someone has some thoughts here.

 We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the 
 getLuceneVersion() calls, but use luceneMatchVersion directly).

 We're running in a Tomcat instance, 64 bit Java.  CATALINA_OPTS are: 
 -Xmx7168m -Xms7168m -XX:MaxPermSize=256M

 We're running 2 Solr cores, with the same schema.

 We use SolrJ to run our searches from a Java app running in JBoss.

 JBoss, Tomcat, and the Solr Index folders are all on the same server.

 In case it's relevant, we're using JMeter as a load test harness.

 We're running on Solaris, a 16 processor box with 48GB physical memory.

 I've run a successful load test at a 100 user load (at that rate there are 
 about 5-10 solr searches / second), and solr search responses were coming in 
 under 100ms.

 When I tried to ramp up, as far as I can tell, Solr is just hanging.  (We 
 have some logging statements around the SolrJ calls - just before, we log how 
 long our query construction takes, then we run the SolrJ query and log the 
 search times.  We're getting a number of the query construction logs, but no 
 corresponding search time logs).

 Symptoms:
 The Tomcat and JBoss processes show as well under 1% CPU, and they are still 
 the top processes.  CPU states show around 99% idle.   RES usage for the two 
 Java processes around 3GB each.  LWP under 120 for each.  STATE just shows as 
 sleep.  JBoss is still 'alive', as I can get into a piece of software that 
 talks to our JBoss app to get data.

 We set things up to use log4j logging for Solr - the log isn't showing any 
 errors or exceptions.

 We're not indexing - just searching.

 Back in January, we did load testing on a prototype, and had no problems 
 (though that was Solr 1.4 at the time).  It ramped up beautifully - bottle 
 necks were our apps, not Solr.  What I'm benchmarking now is a descendent of 
 that prototyping - a bit more complex on searches and more fields in the 
 schema, but same basic search logic as far as SolrJ usage.

 Any ideas?  What else to look at?  Ringing any bells?

 I can send more details if anyone wants specifics...

 Bob Sandiford | Lead Software Engineer | SirsiDynix
 P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
 www.sirsidynix.comhttp://www.sirsidynix.com/




Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Yonik Seeley
On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey s...@elyograg.org wrote:
 Just now, three of the six shards had documents deleted, and they took
 29.07, 27.57, and 28.66 seconds to warm.  The 1.4.1 counterpart to the 29.07
 second one only took 4.78 seconds, and it did twice as many autowarm
 queries.

Can you post the logs at the INFO level that covers the warming period?

-Yonik
http://www.lucidimagination.com


Re: conditionally update document on unique id

2011-06-29 Thread Yonik Seeley
On Wed, Jun 29, 2011 at 4:32 PM, eks dev eks...@googlemail.com wrote:
 req.getSearcher().getFirstMatch(t) != -1;

Yep, this is currently the fastest option we have.

-Yonik
http://www.lucidimagination.com


Re: Solr 3.2 filter cache warming taking longer than 1.4.1

2011-06-29 Thread Yonik Seeley
On Wed, Jun 29, 2011 at 3:28 PM, Yonik Seeley
yo...@lucidimagination.com wrote:

 On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey s...@elyograg.org wrote:
  Just now, three of the six shards had documents deleted, and they took
  29.07, 27.57, and 28.66 seconds to warm.  The 1.4.1 counterpart to the 29.07
  second one only took 4.78 seconds, and it did twice as many autowarm
  queries.

 Can you post the logs at the INFO level that covers the warming period?

OK, your filter queries have hundreds of terms in them (and that means
hundreds of term lookups, which uses the term index).
Thus, your termIndexInterval change is be the leading suspect for the
slowdown.  A termIndexInterval of 1024 means that
a term lookup will seek to the closest 1024th term and then call
next() until the desired term is found.  Hence instead of calling
next()
an average of 64 times internally, it's now 512 times.

Of course there is still a mystery about why your tii (which is the
term index) would be so much bigger instead of smaller...

-Yonik
http://www.lucidimagination.com


Re: multiple spatial values

2011-06-25 Thread Yonik Seeley
On Sat, Jun 25, 2011 at 5:56 AM, marthinal jm.rodriguez.ve...@gmail.com wrote:
 sfield, pt and d can all be specified directly in the spatial
 functions/filters too, and that will override the global params.

 Unfortunately one must currently use lucene query syntax to do an OR.
 It just makes it look a bit messier.

 q=_query_:{!geofilt} _query:{!geofilt sfield=location_2}

 -Yonik
 http://www.lucidimagination.com


 @Yonik it seems to work like this, i triyed houndreds of other possibilities
 without success:

 q={!geofilt sfield=location_1 pt=36.62,-6.23 d=50}fq={!geofilt
 sfield=location_2 pt=40.51,-5.91 d=500}

Ah, right.  I had thought you wanted docs that matched either geofilt
(hence OR), not docs that only matched both.

-Yonik
http://www.lucidimagination.com


Re: multiple spatial values

2011-06-24 Thread Yonik Seeley
On Fri, Jun 24, 2011 at 2:11 PM, marthinal jm.rodriguez.ve...@gmail.com wrote:

 Yonik Seeley-2-2 wrote:

 On Tue, Sep 21, 2010 at 12:12 PM, dan sutton lt;danbsut...@gmail.comgt;
 wrote:
 I was looking at the LatLonType and how it might represent multiple
 lon/lat
 values ... it looks to me like the lat would go in
 {latlongfield}_0_LatLon
 and the long in {latlongfield}_1_LatLon ... how then if we have multiple
 lat/long points for a doc when filtering for example we choose the
 correct
 points.

 e.g. if thinking in cartisean coords and we have

 P1(3,4), P2(6,7) ... x is stored with 3,6 and y with 4,7 ...

 then how does it ensure we're not erroneously picking (3,7) or (6,4)
 whilst
 filtering with the spatial query?

 That's why it's a single-valued field only for now...

 don't we have to store both values together ? what am i missing here?

 The problem is that we don't have a way to query both values together,
 so we must index them separately.  The basic LatLonType uses numeric
 queries on the lat and lon fields separately.

 -Yonik
 http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


 I have in my index two diferents fields like you say Yonik (location_1,
 location_2) but the problem is when i want to filter results that have d= 50
 for location_1 and d=50 for location_2 .I really dont know to build the
 query ...

 For example it works perfectly :

 q={!geofilt}sfield=location_1pt=36.62288966,-6.23211272d=25

 but how i add the sfield location_2 ?

sfield, pt and d can all be specified directly in the spatial
functions/filters too, and that will override the global params.

Unfortunately one must currently use lucene query syntax to do an OR.
It just makes it look a bit messier.

q=_query_:{!geofilt} _query:{!geofilt sfield=location_2}

-Yonik
http://www.lucidimagination.com


Re: SEVERE: java.lang.NoSuchFieldError: core Solr branch3.x

2011-06-22 Thread Yonik Seeley
I just tried branch_3x and couldn't reproduce this.
Looks like maybe there is something wrong with your build, or some old
class files left over somewhere being picked up.

-Yonik
http://www.lucidimagination.com



On Wed, Jun 22, 2011 at 10:15 AM, Markus Jelsma
markus.jel...@openindex.io wrote:

 Hi,

 Today's checkout (Solr Specification Version: 3.4.0.2011.06.22.16.10.08)
 produces the exception below on start up. The same exception with very similar
 strack trace comes when committing and add. Example schema and docs will
 reproduce the error.

 Jun 22, 2011 4:11:57 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NoSuchFieldError: core
        at
 org.apache.lucene.index.SegmentTermDocs.init(SegmentTermDocs.java:48)
        at
 org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:491)
        at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1005)
        at
 org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:484)
        at
 org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:321)
        at
 org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:101)
        at
 org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:298)
        at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:524)
        at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320)
        at
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1178)
        at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1066)
        at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358)
        at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:258)
        at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
        at
 org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54)
        at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)



 --
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350



Re: sorting by termfreq on trunk doesn't work?

2011-06-22 Thread Yonik Seeley
Thanks for the problem report.  It turns out we didn't check for a
null pointer when there were no terms in a field for a segment.
I've just committed a fix to trunk.

-Yonik
http://www.lucidimagination.com



On Wed, Jun 22, 2011 at 10:28 PM, Jason Toy jason...@gmail.com wrote:
 I am trying to use sorting by the termfreq function using the trunk code
 since termfreq was added in the 4.0 code base.
 I run this query:
 http://127.0.0.1:8983/solr/select/?q=librariansort=termfreq(all_lists_text,librarian)%20desc

 but I get:

 HTTP ERROR 500

 Problem accessing /solr/select/. Reason:

    null

 java.lang.NullPointerException
        at 
 org.apache.solr.search.function.TermFreqValueSource$1.reset(TermFreqValueSource.java:53)
        at 
 org.apache.solr.search.function.TermFreqValueSource$1.init(TermFreqValueSource.java:49)
        at 
 org.apache.solr.search.function.TermFreqValueSource.getValues(TermFreqValueSource.java:44)
        at 
 org.apache.solr.search.function.ValueSource$ValueSourceComparator.setNextReader(ValueSource.java:188)
        at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97)
        at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:544)
        at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:313)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1190)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1078)
        at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:346)
        at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:400)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:231)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1308)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
        at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)





 Is termfreq stable and how can I run this query?


 --
 - sent from my mobile
 6176064373



Re: Problem with CSV update handler

2011-06-21 Thread Yonik Seeley
On Tue, Jun 21, 2011 at 2:15 AM, Rafał Kuć r@solr.pl wrote:
 Hello!

 Once again thanks for the response ;) So the solution is to generate
 the data files once again and either adding the space after doubled
 encapsulator

Maybe...
I can't tell if the file is encoded correctly or not since I don't
know what the decoded values are supposed to be from your example.

-Yonik
http://www.lucidimagination.com

 or changing the encapsulator to the character that does
 not occur in the filed values (of course the one taht will be
 split).


 --
 Regards,
  Rafał Kuć
  http://solr.pl

 Multi-valued CSV fields are double encoded.

 We start with: aaa bbbccc'
 Then decoding one leve, we get:  aaa bbbccc
 Decoding again to get individual values results in a decode error
 because the encapsulator appears unescaped in the middle of the second
 value (i.e. invalid CSV).

 One easier way to fix this is to use a different encapsulator for the
 sub-values of a multi-valued field by adding f.title.encapsulator=%27
 (a single quote char)

 But I can't really tell you exactly how to encode or specify options
 to the CSV loader when I don't know what the actual values you want
 after aaa bbbccc' is decoded.

 -Yonik
 http://www.lucidimagination.com



 On Mon, Jun 20, 2011 at 5:46 PM, Rafał Kuć r@solr.pl wrote:
 Hi!

  Yonik, thanks for the reply. I just realized that the example I gave
 was not full - the error is returned by Solr only when the field is
 multivalued and the values in the fields are splited. For example, the
 following curl command give me the mentioned error:

 curl
 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen
 capsulator=%22f.title.split=truef.title.separator=%20' -H
 'Content-type:text/plain' -d '1,aaa bbbccc'

 while the following is executed without any problem:
 curl
 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen
 capsulator=%22f.title.split=truef.title.separator=%20' -H
 'Content-type:text/plain' -d '1,aaa bbb ccc'

 The only difference between those two is the additional space
 character in between bbb and ccc in the second example.

 Am I doing something wrong ? ;)

 --
 Regards,
  Rafał Kuć
  http://solr.pl

 This works fine for me:

 curl http://localhost:8983/solr/update/csv -H
 'Content-type:text/plain' -d 'id,name
 1,aaa bbb ccc'

 -Yonik
 http://www.lucidimagination.com


 On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć r@solr.pl wrote:
 Hello!

  I have a question about the CSV update handler. Lets say I have the
 following file sent to CSV update handler using curl:

 id,name
 1,aaa bbbccc

 It throws an error, saying that:
 Error 400 java.io.IOException: (line 0) invalid char between encapsulated 
 token end delimiter

 If I change the contents of the file to:

 id,name
 1,aaa bbb ccc

 it works without a problem. This anyone encountered this ? Is it know 
 behavior ?

 --
 Regards,
  Rafał Kuć















Re: Problem with CSV update handler

2011-06-20 Thread Yonik Seeley
This works fine for me:

curl http://localhost:8983/solr/update/csv -H
'Content-type:text/plain' -d 'id,name
1,aaa bbb ccc'

-Yonik
http://www.lucidimagination.com


On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć r@solr.pl wrote:
 Hello!

  I have a question about the CSV update handler. Lets say I have the
 following file sent to CSV update handler using curl:

 id,name
 1,aaa bbbccc

 It throws an error, saying that:
 Error 400 java.io.IOException: (line 0) invalid char between encapsulated 
 token end delimiter

 If I change the contents of the file to:

 id,name
 1,aaa bbb ccc

 it works without a problem. This anyone encountered this ? Is it know 
 behavior ?

 --
 Regards,
  Rafał Kuć





Re: Problem with CSV update handler

2011-06-20 Thread Yonik Seeley
Multi-valued CSV fields are double encoded.

We start with: aaa bbbccc'
Then decoding one leve, we get:  aaa bbbccc
Decoding again to get individual values results in a decode error
because the encapsulator appears unescaped in the middle of the second
value (i.e. invalid CSV).

One easier way to fix this is to use a different encapsulator for the
sub-values of a multi-valued field by adding f.title.encapsulator=%27
(a single quote char)

But I can't really tell you exactly how to encode or specify options
to the CSV loader when I don't know what the actual values you want
after aaa bbbccc' is decoded.

-Yonik
http://www.lucidimagination.com



On Mon, Jun 20, 2011 at 5:46 PM, Rafał Kuć r@solr.pl wrote:
 Hi!

  Yonik, thanks for the reply. I just realized that the example I gave
 was not full - the error is returned by Solr only when the field is
 multivalued and the values in the fields are splited. For example, the
 following curl command give me the mentioned error:

 curl
 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen
 capsulator=%22f.title.split=truef.title.separator=%20' -H
 'Content-type:text/plain' -d '1,aaa bbbccc'

 while the following is executed without any problem:
 curl
 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen
 capsulator=%22f.title.split=truef.title.separator=%20' -H
 'Content-type:text/plain' -d '1,aaa bbb ccc'

 The only difference between those two is the additional space
 character in between bbb and ccc in the second example.

 Am I doing something wrong ? ;)

 --
 Regards,
  Rafał Kuć
  http://solr.pl

 This works fine for me:

 curl http://localhost:8983/solr/update/csv -H
 'Content-type:text/plain' -d 'id,name
 1,aaa bbb ccc'

 -Yonik
 http://www.lucidimagination.com


 On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć r@solr.pl wrote:
 Hello!

  I have a question about the CSV update handler. Lets say I have the
 following file sent to CSV update handler using curl:

 id,name
 1,aaa bbbccc

 It throws an error, saying that:
 Error 400 java.io.IOException: (line 0) invalid char between encapsulated 
 token end delimiter

 If I change the contents of the file to:

 id,name
 1,aaa bbb ccc

 it works without a problem. This anyone encountered this ? Is it know 
 behavior ?

 --
 Regards,
  Rafał Kuć










Re: Update JSON Invalid

2011-06-20 Thread Yonik Seeley
On Mon, Jun 20, 2011 at 11:25 PM, Shawn Heisey elyog...@elyograg.org wrote:
 On 6/20/2011 8:08 PM, entdeveloper wrote:

 Technically, yes, it's valid json, but most libraries treat the json
 objects
 as maps, and with multiple add elements as the keys, you cannot properly
 deserialize.

 As an example, try putting this into jsonlint.com, and notice it trims off
 one of the docs:
 {
  add: {doc: {id : TestDoc1, title : test1} },
  add: {doc: {id : TestDoc2, title : another test} }
 }

 Is there something I'm just not seeing? Should we consider cleaning up
 this
 format, possibly using some json arrays so that it makes more sense from a
 json perspective?

 This was brought up recently and should now be fixed in Solr 3.2.

 https://issues.apache.org/jira/browse/SOLR-2496

Thanks for the reminder, we obviously need to update the docs!

-Yonik


Re: SOlR -- Out of Memory exception

2011-06-17 Thread Yonik Seeley
On Fri, Jun 17, 2011 at 1:30 AM, pravesh suyalprav...@yahoo.com wrote:
 If you are sending whole CSV in a single HTTP request using curl, why not
 consider sending it in smaller chunks?

Smaller chunks should not matter - Solr streams from the input (i.e.
the whole thing is not buffered in memory).

It could be related to autoCommit.  Commits may be stacking up faster
than can be handled.   I'd recommend getting rid of autocommit if
possible, or at a minimum get rid of the maxDocs based autocommit.
Incremental updates can use commitWithin to guarantee a
time-of-visibility, and bulk updates like this CSV upload normally
shouldn't commit until the end.

-Yonik
http://www.lucidimagination.com


Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen

2011-06-15 Thread Yonik Seeley
What version of Solr is this?
Can you show steps to reproduce w/ the example server and data?

-Yonik
http://www.lucidimagination.com


On Wed, Jun 15, 2011 at 7:25 AM, Marc Sturlese marc.sturl...@gmail.com wrote:
 Hey there,
 I've noticed a very odd behaviour with the snapinstaller and commit (using
 collectionDistribution scripts). The first time I install a new index
 everything works fine. But when installing a new one, I can't see the new
 documents. Checking the status page of the core tells me that the index
 version has changed but numDocs and maxDocs are the same. I have a simple
 script that get the version form an index reader and this confirms me that
 that's not true. numDocs and maxDocs are different in both indexs.
 The index I'm trying to install is a whole new index, generated with
 mergefactor = 2 and optimized with no compound file.

 I've tried manually to mv index to index.old and the snapshot.x to index
 (while tomcat is up) and manually execute:
  curl http://localhost:8080/trovit_solr/coreA/update?commit=true -H
 Content-Type: text/xml
 But the same is happening.
 Checking the logs I can see that apparently everything is fine. New searcher
 is registered and warming is properly done to it.

 I would think that the problem is with some reference opening the index
 searcher. But the fact that the indexVersion changes but the numDocs and
 maxDocs dont' makes me understand nothing.

 If I reload the core, numDocs and maxDocs changes and everything is fine.

 Any idea what could be happening here?
 Thanks in advance.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3066902.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: High 100% CPU usage with SOLR 1.4.1

2011-06-15 Thread Yonik Seeley
On Wed, Jun 15, 2011 at 2:21 PM, pravesh suyalprav...@yahoo.com wrote:
 I would need some help in minimizing the CPU load on the new system. Could
 possibly NIOFSDirectory attributes to high CPU?

Yes, it's a feature!  The CPU is only higher because the threads
aren't blocked on IO as much.
So the increase in CPU you are seeing is a good thing, not a bad thing
(i.e. the number of requests
processed in a given number of CPU busy CPU cycles should be greater
or equal to the old release).

-Yonik
http://www.lucidimagination.com


Re: Huge performance drop in distributed search w/ shards on the same server/container

2011-06-13 Thread Yonik Seeley
On Sun, Jun 12, 2011 at 9:10 PM, Johannes Goll johannes.g...@gmail.com wrote:
 However, sporadically, Jetty 6.1.2X (shipped with  Solr 3.1.)
 sporadically throws Socket connect exceptions when executing distributed
 searches.

Are you using the exact jetty.xml that shipped with the solr example server,
or did you make any modifications?

-Yonik
http://www.lucidimagination.com


Re: how can I return function results in my query?

2011-06-10 Thread Yonik Seeley
On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy jason...@gmail.com wrote:
 I want to be able to run a query  like idf(text, 'term') and have that data
 returned with my search results.  I've searched the docs,but I'm unable to
 find how to do it.  Is this possible and how can I do that ?

In trunk, there's a very new feature called pseudo-fields where (among
other things) you can include the results of arbitrary function
queries along with the stored fields for each document.

fl=id,idf(text,'term'),termfreq(text,'term')

Or if you want to alias the idf call to a different name:

fl=id,myidf:idf(text,'term'),mytermfreq:termfreq(text,'term')

Of course, in this specific case it's a bit of a waste since idf won't
change per document.

-Yonik
http://www.lucidimagination.com


Re: how can I return function results in my query?

2011-06-10 Thread Yonik Seeley
On Fri, Jun 10, 2011 at 8:31 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
 Nice! Will SOLR-1298 with aliasing also work with an external file field since
 that can be a source of a function query as well?

Haven't tried it, but it definitely should!

-Yonik
http://www.lucidimagination.com


Re: Edismax sorting help

2011-06-09 Thread Yonik Seeley
2011/6/9 Denis Kuzmenok forward...@ukr.net:
 Hi, everyone.

 I have fields:
 text fields: name, title, text
 boolean field: isflag (true / false)
 int field: popularity (0 to 9)

 Now i do query:
 defType=edismax
 start=0
 rows=20
 fl=id,name
 q=lg optimus
 fq=
 qf=name^3 title text^0.3
 sort=score desc
 pf=name
 bf=isflag sqrt(popularity)
 mm=100%
 debugQuery=on


 If i do query like Samsung i want to see prior most relevant results
 with  isflag:true and bigger popularity, but if i do query like Nokia
 6500  and  there is isflag:false, then it should be higher because of
 exact  match.  Tried different combinations, but didn't found one that
 suites   me.   Just   got   isflag/popularity   sorting   working   or
 isflag/relevancy sorting.

Multiplicative boosts tend to be more stable...

Perhaps try replacing
  bf=isflag sqrt(popularity)
with
  bq=isflag:true^10  // vary the boost to change how much
isflag counts vs the relevancy score of the main query
  boost=sqrt(popularity)  // this will multiply the result by
sqrt(popularity)... assumes that every document has a non-zero
popularity

You could get more creative in trunk where booleans have better
support in function queries.

-Yonik
http://www.lucidimagination.com


Re: Processing/Indexing CSV

2011-06-09 Thread Yonik Seeley
On Thu, Jun 9, 2011 at 3:31 PM, Helmut Hoffer von Ankershoffen
helmut...@googlemail.com wrote:
 Hi,

 there seems to be no way to index CSV using the DataImportHandler.

Looking over the features you want, it looks like you're starting from
a CSV file (as opposed to CSV stored in a database).
Is there a reason that you need to use DIH and can't directly use the
CSV loader?
http://wiki.apache.org/solr/UpdateCSV


-Yonik
http://www.lucidimagination.com



 Using a combination of
 LineEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor
  and 
 RegexTransformerhttp://wiki.apache.org/solr/DataImportHandler#RegexTransformer
 as
 proposed in
 http://robotlibrarian.billdueber.com/an-exercise-in-solr-and-dataimporthandler-hathitrust-data/is
 not working for real world CSV files.

 E.g. many CSV files have double-quotes enclosing some but not all columns -
 there is no elegant way to segment this using a simple regular expression.

 As CSV is still very common esp. in E-Commerce scenarios, I propose that
 Solr provides a CSVEntityProcessor that:
 1) Handles the case of CSV files with/without and with some double-quote
 enclosed columns
 2) Allows for a configurable column separator (';',',','\t' etc.)
 3) Allows for a leading row containing column headings
 4) If there is a leading row with column headings provides a possibility to
 address columns by their column names and map them to Solr fields (similar
 to the XPathEntityProcessor)
 5) Auto-detects encoding of the file (UTF-8 etc.)

 This would make it A LOT easier to use Solr for E-Commerce scenarios.

 If there is no such entity processor in the works i will develop one ... So
 please let me know.

 Regards



Re: Processing/Indexing CSV

2011-06-09 Thread Yonik Seeley
On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen
helmut...@googlemail.com wrote:
 Hi,
 yes, it's about CSV files loaded via HTTP from shops to be fed into a
 shopping search engine.
 The CSV Loader cannot map fields (only field values) etc.

You can provide your own list of fieldnames and optionally ignore the
first line of the CSV file (assuming it contains the field names).
http://wiki.apache.org/solr/UpdateCSV#fieldnames

-Yonik
http://www.lucidimagination.com


Re: Problem with boosting function

2011-06-08 Thread Yonik Seeley
The boost qparser should do the trick if you want a multiplicative boost.
http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html

-Yonik
http://www.lucidimagination.com



On Wed, Jun 8, 2011 at 9:22 AM, Alex Grilo a...@umamao.com wrote:
 Hi,
 I'm trying to use bf parameter in solr queries but I'm having some problems.

 The context is: I have some topics and a integer weight of popularity
 (number of users that follow the topic). I'd like to boost the documents
 according to this weight field, and it changes (users may start following or
 unfollowing that topic). I through the best way to do that is adding a bf
 parameter to the query.

 First of all I was trying to include it in a query processed by a default
 SearchHandler. I debugged the results and the scores didn't change. So I
 tried to change the defType of the SearchHandler to dismax (I didn't add any
 other field in solrconfig), and queries didn't work anymore.

 What is the best way to achieve what I want? Do I really need to use a
 dismax SearchHander (I read about it, and I don't want to search in multple
 fields - I want to search in one field and boost in another one)?

 Thanks in advance

 Alex Grilo



Re: Sorting on solr.TextField

2011-06-08 Thread Yonik Seeley
On Wed, Jun 8, 2011 at 1:21 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks exactly what I was looking for.

 With this new field used just for sorting is there a way to have it be case
 insensitive?

From the example schema:

!-- lowercases the entire field value, keeping it as a single token.  --
fieldType name=lowercase class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType


-Yonik
http://www.lucidimagination.com


Re: Solr Cloud Query Question

2011-06-07 Thread Yonik Seeley
On Tue, Jun 7, 2011 at 9:35 AM, Jamie Johnson jej2...@gmail.com wrote:
 I am currently experimenting with the Solr Cloud code on trunk and just had
 a quick question.  Lets say my setup had 3 nodes a, b and c.  Node a has
 1000 results which meet a particular query, b has 2000 and c has 3000.  When
 executing this query and asking for row 900 what specifically happens?  From
 reading the Distributed Search Wiki I would expect that node a responds with
 900, node b responds with 900 and c responds with 900 and the coordinating
 node is responsible for taking the top scored items and throwing away the
 rest, is this correct or is there some additional coordination that happens
 where nodes a, b and c return back an id and a score and the coordinating
 node makes an additional request to get back the documents for the ids which
 make up the top list?

The latter is correct - the first phase only collects enough
information to merge ids from the shards, and then a second phase
requests the stored fields, highlighting, etc for the specific docs
that will be returned.

-Yonik
http://www.lucidimagination.com


Re: function queries scope

2011-06-07 Thread Yonik Seeley
One way is to use the boost qparser:
http://search-lucene.com/jd/solr/org/apache/solr/search/BoostQParserPlugin.html
q={!boost b=productValueField}shops in madrid

Or you can use the edismax parser which as a boost parameter that
does the same thing:
defType=edismaxq=shops in madridboost=productValueField


-Yonik
http://www.lucidimagination.com


On Tue, Jun 7, 2011 at 6:53 AM, Marco Martinez
mmarti...@paradigmatecnologico.com wrote:
 Hi,

 I need to use the function queries operations with the score of a given
 query, but only in the docset that i get from the query and i dont know if
 this is possible.

 Example:

 q=shops in madrid    returns  1 docs  with a specific score for each doc

 but now i need to do some stuff like

 q=sum(product(2,query(shops in madrid),productValueField) but this will be
 return all the docs in my index.


 I know that i can do it via filter queries, ex, q=sum(product(2,query(shops
 in madrid),productValueField)fq=shops in madrid but this will do the query
 two times and i dont want this because the performance is important to our
 application.


 Is there other approach to accomplished that=


 Thanks in advance,

 Marco Martínez Bautista
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42



Re: Solr Cloud Query Question

2011-06-07 Thread Yonik Seeley
On Tue, Jun 7, 2011 at 1:01 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks Yonik.  I have a follow on now, how does Solr ensure consistent
 results across pages?  So for example if we had my 3 theoretical solr
 instances again and a, b and c each returned 100 documents with the same
 score and the user only requested 100 documents, how are those 100 documents
 chosen from the set available from a, b and c if the documents have the same
 score?

Ties within a shard are broken by docid (just like lucene), and ties
across different shards are broken by comparing the shard ids... so
yes, it's consistent.

-Yonik
http://www.lucidimagination.com


Re: Question about tokenizing, searching and retrieving results.

2011-06-07 Thread Yonik Seeley
On Tue, Jun 7, 2011 at 12:34 PM, Luis Cappa Banda luisca...@gmail.com wrote:
 *Expression*: A B C D E F G H I

As written, this is equivalent to

*Expression*: A default_field:B default_field:C default_field:D
default_field:E default_field:F default_field:G default_field:H
default_field:I

Try *Expression*:( A B C D E F G H I)
or *Expression*:A B C D E F G H I for a phrase query.

Oh, and I highly recommend sticking to java identifiers for field
names - it will make your life much easier in the future.

-Yonik
http://www.lucidimagination.com


Re: Feature: skipping caches and info about cache use

2011-06-03 Thread Yonik Seeley
On Fri, Jun 3, 2011 at 1:02 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Is it just me, or would others like things like:
 * The ability to tell Solr (by passing some URL param?) to skip one or more of
 its caches and get data from the index

Yeah, we've needed this for a long time, and I believe there's a JIRA
issue open for it.
It really needs to be on a per query basis though... so a localParam
that has cache=true/false
would be ideal.


-Yonik
http://www.lucidimagination.com


Re: fq null pointer exception

2011-06-03 Thread Yonik Seeley
Dan, this doesn't really have anything to do with your filter on the
Status field except that it causes different documents to be selected.
The root cause is a schema mismatch with your index.
A string field (or so the schema is saying it's a string field) is
returning null for a value, which is impossible (null values aren't
stored... they are simply missing).
This can happen when the field is actually stored as binary (as is the
case for numeric fields).  So my guess is that a field that was
previously a numeric field is now declared to be of type string by the
current schema.

You can try varying the fl parameter to see what field is causing
the issue, or try luke or the luke request handler for a lower-level
view of the index.

-Yonik
http://www.lucidimagination.com



On Fri, Jun 3, 2011 at 11:46 AM, dan whelan d...@adicio.com wrote:
 I am noticing something strange with our recent upgrade to solr 3.1 and want
 to see if anyone has experienced anything similar.

 I have a solr.StrField field named Status the values are Enabled, Disabled,
 or ''

 When I facet on that field it I get

 Enabled 4409565
 Disabled 29185
  112


 The issue is when I do a filter query

 This query works

 select/?q=*:*fq=Status:Enabled

 But when I run this query I get a NPE

 select/?q=*:*fq=Status:Disabled


 Here is part of the stack trace


 Problem accessing /solr/global_accounts/select/. Reason:
    null

 java.lang.NullPointerException
    at org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828)
    at org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686)
    at org.apache.solr.schema.StrField.write(StrField.java:49)
    at org.apache.solr.schema.SchemaField.write(SchemaField.java:125)
    at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369)
    at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545)
    at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482)
    at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519)
    at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582)
    at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131)
    at
 org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35)
 ...


 Thanks,

 Dan




Re: Faceting on distance in Solr: how do you generate links that search withing a given range of distance?

2011-05-20 Thread Yonik Seeley
On Thu, May 19, 2011 at 6:40 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : It is fairly simple to generate facets for ranges or 'buckets' of
 : distance in Solr:
 : http://wiki.apache.org/solr/SpatialSearch#How_to_facet_by_distance.
 : What isnt described is how to generate the links for these facets

 any query you specify in a facet.query to generate a constraint count can
 be specified in an fq to actaully apply that constraint.

 So if you use...
   facet.query={!frange l=5.001 u=3000}geodist()

Hmmm, seems like we could really do with a geofilt() function that's
just like geodist() but with the first parameter being distance so we
could avoid calculating the distance for every doc.  And of course the
new exists() method (it's just internal now, but should be exposed via
a function) would be false if the doc was outside of the distance.

The best we could do out of the box right now is try to utilize the
geofilt query and default it to a high number where it doesn't match:

facet.query={!frange l=5.001 u=3000}query($gf,1)gf={!geofilt d=3000}

Of course if the lower bound is 0, you can use it directly!

facet.query={!geofilt d=3000}

-Yonik


Re: Spatial search with SolrJ 3.1 ? How to

2011-05-19 Thread Yonik Seeley
On Thu, May 19, 2011 at 8:52 AM, martin_groenhof
martin.groen...@yahoo.com wrote:
 How do you construct a query in java for spatial search ? not the default
 solr REST interface

It depends on what you are trying to do - a spatial request (as
currently implemented in Solr) is typically more than just a query...
it can be filtering by a bounding box, filtering by a distance radius,
 or using a distance (geodist) function query in another way such as
sorting by it or using it as a factor in relevance.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: Facetting: Some questions concerning method:fc

2011-05-19 Thread Yonik Seeley
On Thu, May 19, 2011 at 9:56 AM, Erik Fäßler erik.faess...@uni-jena.de wrote:
 I have a few questions concerning the field cache method for faceting.
 The wiki says for enum method: This was the default (and only) method for
 faceting multi-valued fields prior to Solr 1.4. . And for fc method: This
 was the default method for single valued fields prior to Solr 1.4. .
 I just ran into the problem of using fc for a field which can have multiple
 terms for one field. The facet counts would be wrong, seemingly only
 counting the first term in the field of each document. I observed this in
 Solr 1.4.1 and in 3.1 with the same index.

That doesn't sound right... the results should always be identical
between facet.method=fc and facet.method=enum. Are you sure you didn't
index a multi-valued field and then change the fieldType in the schema
to be single valued? Are you sure the field is indexed the way you
think it is?  If so, is there an easy way for someone to reproduce
what you are seeing?

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: Does every Solr request-response require a running server?

2011-05-18 Thread Yonik Seeley
On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
 Hello,

 I'm wondering if Solr Test framework at the end of the day always runs an
 embedded/jetty server (which is the only way to interact with solr, i.e. no
 web server -- no solr) or in the tests they interact without one, calling
 directly the under line methods?

 The latter seems to be the case trying to understand SolrTestCaseJ4. That
 would be more white-box than otherwise.

Solr does either, depending on the test.  Most tests start only an
embedded solr server w/ no web server, but others use an embedded
jetty server so one can talk HTTP to it.  JettySolrRunner is used for
the latter.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: Does every Solr request-response require a running server?

2011-05-18 Thread Yonik Seeley
On Wed, May 18, 2011 at 11:14 AM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:


 On Wed, May 18, 2011 at 5:09 PM, Yonik Seeley yo...@lucidimagination.com
 wrote:

 On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
  Hello,
 
  I'm wondering if Solr Test framework at the end of the day always runs
  an
  embedded/jetty server (which is the only way to interact with solr, i.e.
  no
  web server -- no solr) or in the tests they interact without one,
  calling
  directly the under line methods?
 
  The latter seems to be the case trying to understand SolrTestCaseJ4.
  That
  would be more white-box than otherwise.

 Solr does either, depending on the test.

  Most tests start only an
 embedded solr server w/ no web server,

 What is confusing me is the solr server. Is it SolrCore? In what aspects is
 it a 'server'? In my understanding it's the core of the Solr Web application
 which makes up the servlets interface, i.e. it's under the servlets not on
 top of them.

Look at TestHarness - it instantiates a CoreContainer.
When running as a webapp in a Jetty server, a DispatchFilter is
registered that instantiates the CoreContainer.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco





 but others use an embedded
 jetty server so one can talk HTTP to it.  JettySolrRunner is used for
 the latter.

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco



 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




Re: JSON delete error with latest branch_3x

2011-05-18 Thread Yonik Seeley
On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote:
 I updated to the latest branch_3x (r1124339) and I'm now getting the
 error below when trying a delete by query or id. Adding documents with
 the new format works as do the commit and optimize commands. Possible
 regression due to SOLR-2496?

 curl 'http://localhost:8988/solr/update/json?wt=json' -H
 'Content-type:application/json' -d '{delete:{query:*:*}}'

 Error 400 meaningless command:
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Problem accessing /solr/update/json. Reason:
  meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false

Hmmm, looks like unit tests must be inadequate for the JSON format.
I'll look into it.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: JSON delete error with latest branch_3x

2011-05-18 Thread Yonik Seeley
OK, I just fixed this on branch_3x.
Trunk is fine (it was an error in the 3x backport that wasn't caught
because the test doesn't go through the complete solr stack to the
update handkler).

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


On Wed, May 18, 2011 at 1:29 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote:
 I updated to the latest branch_3x (r1124339) and I'm now getting the
 error below when trying a delete by query or id. Adding documents with
 the new format works as do the commit and optimize commands. Possible
 regression due to SOLR-2496?

 curl 'http://localhost:8988/solr/update/json?wt=json' -H
 'Content-type:application/json' -d '{delete:{query:*:*}}'

 Error 400 meaningless command:
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Problem accessing /solr/update/json. Reason:
  meaningless command: 
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Hmmm, looks like unit tests must be inadequate for the JSON format.
 I'll look into it.

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco



Re: filter cache and negative filter query

2011-05-17 Thread Yonik Seeley
On Tue, May 17, 2011 at 6:07 PM, Burton-West, Tom tburt...@umich.edu wrote:
 If I have a query with a filter query such as :  q=artfq=history and then 
 run a second query  q=artfq=-history, will Solr realize that it can use 
 the cached results of the previous filter query history  (in the filter 
 cache)

Yep.

You should be able to verify with the filterCache section of the stats
admin page.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: filter cache and negative filter query

2011-05-17 Thread Yonik Seeley
On Tue, May 17, 2011 at 6:17 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
 I'm not sure. The filter cache uses your filter as a key and a negation is a
 different key. You can check this easily in a controlled environment by
 issueing these queries and watching the filter cache statistics.

Gotta hate crossing emails ;-)
Anyway, this goes back to Solr 1.1

 5. SOLR-80: Negative queries are now allowed everywhere.  Negative queries
are generated and cached as their positive counterpart, speeding
generation and generally resulting in smaller sets to cache.
Set intersections in SolrIndexSearcher are more efficient,
starting with the smallest positive set, subtracting all negative
sets, then intersecting with all other positive sets.  (yonik)

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco



 If I have a query with a filter query such as :  q=artfq=history and
 then run a second query  q=artfq=-history, will Solr realize that it
 can use the cached results of the previous filter query history  (in the
 filter cache) or will it not realize this and have to actually do a second
 filter query against the index  for not history?

 Tom



Re: lucene parser, negative OR operands

2011-05-17 Thread Yonik Seeley
On Tue, May 17, 2011 at 6:57 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 (changed subject for this topic). Weird. I'm seeing it wrong myself, and
 have for a while -- I even wrote some custom pre-processor logic at my app
 level to work around it.  Weird, I dunno.

 Wait. Queries with -one OR -two return less documents than a either operand
 does on its own.

This doesn't have to do with Solr's support of pure-negative top-level
queries, but does have to do with
a long standing confusion of how the lucene queryparser works with
some of the operators (i.e. not really boolean logic).

In a Lucene BooleanQuery, clauses are mandatory, optional, or prohibited.
-foo OR -bar actually parses to a boolean query with two prohibited
clauses... essentially the
same as -foo AND -bar.  You can see this by adding debugQuery=true to
the request.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Yonik Seeley
On Sun, May 15, 2011 at 1:48 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Could you please revert your commit, until we've reached some
 consensus on this discussion first?

Huh?
I thought everyone was in agreement that we needed more field types
for different languages?
I added my best guess about what a generic type for
non-whitespace-delimited might look like.
Since it's a new field type, it doesn't affect anything.  Hopefully it
only improves the situation
for someone trying to use one of these languages.

The only negative would seem to be if it's worse than nothing (i.e. a
very bad example
because it actually doesn't work for non-whitespace-delimited languages).

The issue about changing defaults on TextField and changing what text does in
the example schema by default is not dependent on this.  They are only related
by the fact that if another field is added/changed then _nwd may
become redundant
and can be removed.  For now, it only seems like an improvement?

Anyway... the whole language of revert seems unnecessarily confrontational.
Feel free to improve what's there (or delete *_nwd if people really
feel it adds no/negative value)

-Yonik


Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Yonik Seeley
On Mon, May 16, 2011 at 5:30 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 To be clear, I'm asking that Yonik revert his commit from yesterday
 (rev 1103444), where he added text_nwd fieldType and dynamic fields
 *_nwd to the example schema.xml.

So... your position is that until the text fieldType is changed to
support non-whitespace-delimited languages better, that
no other fieldType should be changed/added to better support
non-whitespace-delimited languages?
Man, that seems political, not technical.

Whatever... I'll revert.

-Yonik


<    4   5   6   7   8   9   10   11   12   13   >