Re: commas in synonyms.txt are not escaping
On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary gary.mo...@ars.usda.gov wrote: I have a number of chemical names containing commas which I'm mapping in index_synonyms.txt thusly: 2\,4-D-butotyl=Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 3,CCRIS 8562 According to the sample synonyms.txt, the comma above should be. i.e. a\,a=b\,b. The problem is that according to analysis.jsp the commas are not being escaped. If I paste in 2,4-D-butotyl, then no mappings. If I paste in 2\,4-D-butotyl, the mappings are done. I can confirm that this works in 1.4, but no longer works in 3x or trunk. Can you open an issue? -Yonik http://www.lucidimagination.com
Re: commas in synonyms.txt are not escaping
On Fri, Aug 26, 2011 at 11:16 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary gary.mo...@ars.usda.gov wrote: I have a number of chemical names containing commas which I'm mapping in index_synonyms.txt thusly: 2\,4-D-butotyl=Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 3,CCRIS 8562 According to the sample synonyms.txt, the comma above should be. i.e. a\,a=b\,b. The problem is that according to analysis.jsp the commas are not being escaped. If I paste in 2,4-D-butotyl, then no mappings. If I paste in 2\,4-D-butotyl, the mappings are done. I can confirm that this works in 1.4, but no longer works in 3x or trunk. Can you open an issue? Actually, I think I've tracked it to LUCENE-3233 where the parsing rules were moved from Solr to Lucene (and changed the functionality in the process). I'll reopen t hat since I don't think it's been in a released version yet. -Yonik http://www.lucidimagination.com
Re: Query vs Filter Query Usage
On Thu, Aug 25, 2011 at 5:19 PM, Michael Ryan mr...@moreover.com wrote: 10,000,000 document index Internal Document id is 32 bit unsigned int Max Memory Used by a single cache slot in the filter cache = 32 bits x 10,000,000 docs = 320,000,000 bits or 38 MB I think it depends on where exactly the result set was generated. I believe the result set will usually be represented by a BitDocSet, which requires 1 bit per doc in your index (result set size doesn't matter), so in your case it would be about 1.2MB. Right - and Solr switches between the implementation depending on set size... so if the number of documents in the set were 100, then it would only take up 400 bytes. -Yonik http://www.lucidimagination.com
Re: Query parameter changes from solr 1.4 to 3.3
On Tue, Aug 23, 2011 at 7:11 AM, Samarendra Pratap samarz...@gmail.com wrote: We are upgrading solr 1.4 (with collapsing patch solr-236) to solr 3.3. I was looking for the required changes in query parameters (or parameter names) if any. There should be very few (but check CHANGES.txt as Erick pointed out). We try to keep the main HTTP APIs very stable, even across major versions. One thing I know for sure is that collapse and its sub-options are now known by group, but didn't find anything else. Field collapsing/grouping was never in any 1.4 release. -Yonik http://www.lucidimagination.com
Re: Batch updates order guaranteed?
On Tue, Aug 23, 2011 at 2:17 PM, Glenn s...@t2.zazu.com wrote: Question about batch updates (performing a delete and add in same request, as described at bottom of http://wiki.apache.org/solr/UpdateXmlMessages): http://wiki.apache.org/solr/UpdateXmlMessages%29: is the order guaranteed? If a delete is followed by an add, will the delete always be performed first? I would assume so but would like to get confirmation. Yes, if you're crafting the update message yourself in XML or JSON. SolrJ is a different matter I think. -Yonik http://www.lucidimagination.com
Re: Batch updates order guaranteed?
On Tue, Aug 23, 2011 at 3:38 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Aug 23, 2011 at 2:17 PM, Glenn s...@t2.zazu.com wrote: Question about batch updates (performing a delete and add in same request, as described at bottom of http://wiki.apache.org/solr/UpdateXmlMessages): http://wiki.apache.org/solr/UpdateXmlMessages%29: is the order guaranteed? If a delete is followed by an add, will the delete always be performed first? I would assume so but would like to get confirmation. Yes, if you're crafting the update message yourself in XML or JSON. SolrJ is a different matter I think. Found the SolrJ issue: https://issues.apache.org/jira/browse/SOLR-1162 Looks like it sort of got dropped, but I think this is worth fixing. -Yonik http://www.lucidimagination.com
Re: Solr 3.3 crashes after ~18 hours?
On Fri, Aug 19, 2011 at 10:36 AM, alexander sulz a.s...@digiconcept.net wrote: using lsof I think I pinned down the problem: too many open files! I already doubled from 512 to 1024 once but it seems there are many SOCKETS involved, which are listed as can't identify protocol, instead of real files. over time, the list grows and grows with these entries until.. it crashs. So Ive read several times the fix for this problem is to set the limit to a ridiculous high number but that seems a little bit of a crude fix. Why so many open sockets in the first place? What are you using as a client to talk to solr? You need to look at both the update side and the query side. Using persistent connections is the best all-around, but if not, be sure to close the connections in the client. -Yonik http://www.lucidimagination.com
Re: solr keeps dying every few hours.
On Wed, Aug 17, 2011 at 5:56 PM, Jason Toy jason...@gmail.com wrote: I've only set set minimum memory and have not set maximum memory. I'm doing more investigation and I see that I have 100+ dynamic fields for my documents, not the 10 fields I quoted earlier. I also sort against those dynamic fields often, I'm reading that this potentially uses a lot of memory. Could this be the cause of my problems and if so what options do I have to deal with this? Yes, that's most likely the problem. Sorting on an integer field causes a FieldCache entry with an int[maxDoc] (i.e. 4 bytes per document in the index, regardless of if it has a value for that field or not). Sorting on a string field is 4 bytes per doc in the index (the ords) plus the memory to store the actual unique string values. -Yonik http://www.lucidimagination.com On Wed, Aug 17, 2011 at 2:46 PM, Markus Jelsma markus.jel...@openindex.iowrote: Keep in mind that a commit warms up another searcher and potentially doubling RAM consumption in the back ground due to cache warming queries being executed (newSearcher event). Also, where is your Xmx switch? I don't know how your JVM will behave if you set Xms Xmx. 65m docs is quite a lot but it should run fine with 3GB heap allocation. It's a good practice to use a master for indexing without any caches and warm- up queries when you exceed a certain amount of documents, it will bite. I have a large ec2 instance(7.5 gb ram), it dies every few hours with out of heap memory issues. I started upping the min memory required, currently I use -Xms3072M . I insert about 50k docs an hour and I currently have about 65 million docs with about 10 fields each. Is this already too much data for one box? How do I know when I've reached the limit of this server? I have no idea how to keep control of this issue. Am I just supposed to keep upping the min ram used for solr? How do I know what the accurate amount of ram I should be using is? Must I keep adding more memory as the index size grows, I'd rather the query be a little slower if I can use constant memory and have the search read from disk. -- - sent from my mobile 6176064373
Re: sorting issue with solr 3.3
On Fri, Aug 12, 2011 at 9:53 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: It turned out that there is a sorting issue with solr 3.3. As fas as I could trace it down currently: 4 docs in the index and a search for *:* sorting on field dccreator_sort in descending order http://localhost:8983/solr/select?fsv=truesort=dccreator_sort%20descindent=onversion=2.2q=*%3A*start=0rows=10fl=dccreator_sort result is: -- lst name=sort_values arr name=dccreator_sort strconvertitovistitutonazionaled/str str莊國鴻chuangkuohung/str strzyywwwxxx/str strabdelhadiyasserabdelfattah/str /arr /lst Hmmm, are the docs sorted incorrectly too, or is it the sort_values that are incorrect? All variants of string sorting should be well tested... see TestSort.testSort() fieldType: -- fieldType name=alphaOnlySortLim class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([\x20-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]) replacement= replace=all/ filter class=solr.PatternReplaceFilterFactory pattern=(.{1,30})(.{31,}) replacement=$1 replace=all/ /analyzer /fieldType field: -- field name=dccreator_sort type=alphaOnlySortLim indexed=true stored=true / According to documentation the sorting is UTF8 but _why_ is the first string at position 1 and _not_ at position 3 as it should be? Following sorting through the code is somewhat difficult. Any hint where to look for or where to start debugging? Sorting.getStringSortField() Can you reproduce this with a smaller test that we could use to debug/fix? -Yonik http://www.lucidimagination.com
Re: sorting issue with solr 3.3
On Fri, Aug 12, 2011 at 1:04 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Aug 12, 2011 at 9:53 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: It turned out that there is a sorting issue with solr 3.3. As fas as I could trace it down currently: 4 docs in the index and a search for *:* sorting on field dccreator_sort in descending order http://localhost:8983/solr/select?fsv=truesort=dccreator_sort%20descindent=onversion=2.2q=*%3A*start=0rows=10fl=dccreator_sort result is: -- lst name=sort_values arr name=dccreator_sort strconvertitovistitutonazionaled/str str莊國鴻chuangkuohung/str strzyywwwxxx/str strabdelhadiyasserabdelfattah/str /arr /lst Hmmm, are the docs sorted incorrectly too, or is it the sort_values that are incorrect? All variants of string sorting should be well tested... see TestSort.testSort() OK, something is very wrong with that test - I purposely introduced an error into MissingLastOrdComparator and the test isn't failing. I'll dig. -Yonik http://www.lucidimagination.com
Re: sorting issue with solr 3.3
On Fri, Aug 12, 2011 at 2:08 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Aug 12, 2011 at 1:04 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Aug 12, 2011 at 9:53 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: It turned out that there is a sorting issue with solr 3.3. As fas as I could trace it down currently: 4 docs in the index and a search for *:* sorting on field dccreator_sort in descending order http://localhost:8983/solr/select?fsv=truesort=dccreator_sort%20descindent=onversion=2.2q=*%3A*start=0rows=10fl=dccreator_sort result is: -- lst name=sort_values arr name=dccreator_sort strconvertitovistitutonazionaled/str str莊國鴻chuangkuohung/str strzyywwwxxx/str strabdelhadiyasserabdelfattah/str /arr /lst Hmmm, are the docs sorted incorrectly too, or is it the sort_values that are incorrect? All variants of string sorting should be well tested... see TestSort.testSort() OK, something is very wrong with that test - I purposely introduced an error into MissingLastOrdComparator and the test isn't failing. I'll dig. Oops, scratch that. It was a bug I just introduced into the test in my local copy to try and reproduce your issue. -Yonik http://www.lucidimagination.com
Re: sorting issue with solr 3.3
I've checked in an improved TestSort that adds deleted docs and randomizes things a lot more (and fixes the previous reliance on doc ids not being reordered). I still can't reproduce this error though. Is this stock solr? Can you verify that the documents are in the wrong order also (and not just the field sort values)? -Yonik http://www.lucidimagination.com
Re: frange not working in query
On Wed, Aug 10, 2011 at 5:57 AM, Amit Sawhney sawhney.a...@gmail.com wrote: Hi All, I am trying to sort the results on a unix timestamp using this query. http://url.com:8983/solr/db/select/?indent=onversion=2.1q={!frange%20l=0.25}query($qq)qq=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1 When I run this query, it says 'no field name specified in query and no defaultSearchField defined in schema.xml' The default query type for embedded queries is lucene. so your qq=nokia is equivalent to qq={!lucene}nokia So one way is to explicitly make it dismax: qq={!dismax}nokia Another way is to declare the sub-query to be of type dismax: q={!frange l=0.25}query({!dismax v=$qq})qq=nokia -Yonik http://www.lucidimagination.com As soon as I remove the frange query and run this, it starts working fine. http://url.com:8983/solr/db/select/?indent=onversion=2.1q=nokiasort=unix-timestamp%20descstart=0rows=10qt=dismaxwt=dismaxfl=*,scorehl=onhl.snippets=1 Any pointers? Thanks, Amit
Re: Solr 3.3 crashes after ~18 hours?
On Wed, Aug 10, 2011 at 11:00 AM, alexander sulz a.s...@digiconcept.net wrote: Okay, with this command it hangs. It doesn't look like a hang from this thread dump. It doesn't look like any solr requests are executing at the time the dump was taken. Did you do this from the command line? curl http://localhost:8983/solr/update?commit=true; Are you saying that the curl command just hung and never returned? -Yonik http://www.lucidimagination.com Also: I managed to get a Thread Dump (attached). regards Am 05.08.2011 15:08, schrieb Yonik Seeley: On Fri, Aug 5, 2011 at 7:33 AM, alexander sulza.s...@digiconcept.net wrote: Usually you get a XML-Response when doing commits or optimize, in this case I get nothing in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T load forever or anything. It doesn't hang! I just get a blank page / empty response. Sounds like you are doing it from a browser? Can you try it from the command line? It should give back some sort of response (or hang waiting for a response). curl http://localhost:8983/solr/update?commit=true; -Yonik http://www.lucidimagination.com I use the stuff in the example folder, the only changes i made was enable logging and changing the port to 8985. I'll try getting a thread dump if it happens again! So far its looking good with having allocated more memory to it. Am 04.08.2011 16:08, schrieb Yonik Seeley: On Thu, Aug 4, 2011 at 8:09 AM, alexander sulza.s...@digiconcept.net wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, ignores means what? The request hangs? If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. Are you using a different servlet container than what is shipped with solr? If you did start with the solr example server, what jetty configuration changes have you made? -Yonik http://www.lucidimagination.com
Re: csv responsewriter and numfound
On Mon, Aug 8, 2011 at 5:12 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Great question. But how would that get returned in the response? It is a drag that the header is lost when results are written in CSV, but there really isn't an obvious spot for that information to be returned. I guess a comment would be one option. -Yonik http://www.lucidimagination.com
Re: 4820 searchers opened?
On Sat, Aug 6, 2011 at 11:31 AM, Paul Libbrecht p...@hoplahup.net wrote: Le 6 août 2011 à 02:09, Yonik Seeley a écrit : On Fri, Aug 5, 2011 at 7:30 PM, Paul Libbrecht p...@hoplahup.net wrote: my solr is coming to slowly reach its memory limits (8Gb) and the stats displays me a reasonable fieldCache (1800) but 4820 searchers. That sounds a bit much to me, each has been opened in its own time since the last restart about two weeks ago. Definitely sounds like a reference leak. What version are you using? 1.4.1. Is this stock Solr, or do you have any custom request handlers or anything else that could be forgetting to decrement the reference count of the searchers it uses? I have a custom query-handler and a custom response writer. Do you always retrieve the searcher via SolrQueryRequest.getSearcher()? If so, there should be no problem... but if you call SolrCore.getSearcher(), that is where leaks can happen if you don't decref the reference returned. I also use the velocity response-writer (for debug purposes). None store a searcher or params, I believe. I have a query in the query handler that is a thread-local (it's a large preferring query that I add to every query). Could this be the reason? As long as it's a normal query that has not been rewritten or weighted, it should have no state tied to any particular reader/searcher and you should be fine. -Yonik http://www.lucidimagination.com I also have a thread-local that stores a date-formatter. Should I post my config? paul
Re: 4820 searchers opened?
On Sat, Aug 6, 2011 at 1:35 PM, Paul Libbrecht p...@hoplahup.net wrote: Le 6 août 2011 à 17:37, Yonik Seeley a écrit : I have a custom query-handler and a custom response writer. Do you always retrieve the searcher via SolrQueryRequest.getSearcher()? If so, there should be no problem... but if you call SolrCore.getSearcher(), that is where leaks can happen if you don't decref the reference returned. I've been using the following: rb.req.getCore().getSearcher().get().getReader() Bingo! Code should never do core.getSearcher().get() since core.getSearcher returns a reference that must be decremented when you are done. Using req.getSearcher() is much easier since it ensures that the searcher never changes during the scope of a single request and it handles decrementing the reference when the request is closed. -Yonik http://www.lucidimagination.com
Re: 4820 searchers opened?
On Sat, Aug 6, 2011 at 2:17 PM, Paul Libbrecht p...@hoplahup.net wrote: This is convincing me... I'd like to experiment and close. So, how can I be sure this is the right thing? I would have thought adding a document and committing would have created a Searcher in my current usage but I do not see the reference list actually being enlarged on my development machine. It is creating a new searcher, but then closing the old searcher after all currently running requests are done using it (that's what the reference counting is for). After the searcher is closed, it's removed from the list. Pay attention to the address of the searcher on the stats page: searcherName : Searcher@7d0ade7e main You should see the address change after a commit. -Yonik http://www.lucidimagination.com
Re: 4820 searchers opened?
On Sat, Aug 6, 2011 at 2:30 PM, Paul Libbrecht p...@hoplahup.net wrote: Le 6 août 2011 à 20:21, Yonik Seeley a écrit : It is creating a new searcher, but then closing the old searcher after all currently running requests are done using it (that's what the reference counting is for). After the searcher is closed, it's removed from the list. Not if using: rb.req.getCore().getSearcher().get().getReader() right? Pay attention to the address of the searcher on the stats page: searcherName : Searcher@7d0ade7e main You should see the address change after a commit. I saw that one. But how can I see retention? Oh, I see... you want to re-create the bug so you can see when it is fixed? To trigger the bug, you need to hit a code path that uses the getCore().getSearcher().get() code. So first send a request that hits that buggy code, then add a doc and do a commit, and then you should see more than one searcher on the stats page. -Yonik http://www.lucidimagination.com
Re: Solr 3.3 crashes after ~18 hours?
On Fri, Aug 5, 2011 at 7:33 AM, alexander sulz a.s...@digiconcept.net wrote: Usually you get a XML-Response when doing commits or optimize, in this case I get nothing in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T load forever or anything. It doesn't hang! I just get a blank page / empty response. Sounds like you are doing it from a browser? Can you try it from the command line? It should give back some sort of response (or hang waiting for a response). curl http://localhost:8983/solr/update?commit=true; -Yonik http://www.lucidimagination.com I use the stuff in the example folder, the only changes i made was enable logging and changing the port to 8985. I'll try getting a thread dump if it happens again! So far its looking good with having allocated more memory to it. Am 04.08.2011 16:08, schrieb Yonik Seeley: On Thu, Aug 4, 2011 at 8:09 AM, alexander sulza.s...@digiconcept.net wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, ignores means what? The request hangs? If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. Are you using a different servlet container than what is shipped with solr? If you did start with the solr example server, what jetty configuration changes have you made? -Yonik http://www.lucidimagination.com
Re: 4820 searchers opened?
On Fri, Aug 5, 2011 at 7:30 PM, Paul Libbrecht p...@hoplahup.net wrote: my solr is coming to slowly reach its memory limits (8Gb) and the stats displays me a reasonable fieldCache (1800) but 4820 searchers. That sounds a bit much to me, each has been opened in its own time since the last restart about two weeks ago. Definitely sounds like a reference leak. Is this stock Solr, or do you have any custom request handlers or anything else that could be forgetting to decrement the reference count of the searchers it uses? What version are you using? -Yonik http://www.lucidimagination.com
Re: Solr 3.3 crashes after ~18 hours?
On Thu, Aug 4, 2011 at 8:09 AM, alexander sulz a.s...@digiconcept.net wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, ignores means what? The request hangs? If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. Are you using a different servlet container than what is shipped with solr? If you did start with the solr example server, what jetty configuration changes have you made? -Yonik http://www.lucidimagination.com
Re: Joining on multi valued fields
On Thu, Aug 4, 2011 at 11:21 AM, matthew.fow...@thomsonreuters.com wrote: Hi Yonik So I tested the join using the sample data below and the latest trunk. I still got the same behaviour. HOWEVER! In this case it was nothing to do with the patch or solr version. It was the tokeniser splitting G1 into G and 1. Ah, glad you figured it out! So thank you for a nice patch and your suggestions. I do have a couple of questions for you: At what level does the join happen and what do you expect the performance penalty to be. We might use this extensively if the performance penalty isn't great. With the current implementation, the performance is proportional to the number of unique terms in the fields being joined. -Yonik http://www.lucidimagination.com
Re: Joining on multi valued fields
Hmmm, if these are real responses from a solr server at rest (i.e. documents not being changed between queries) then what you show definitely looks like a bug. That's interesting, since TestJoin implements a random test that should cover cases like this pretty well. I assume you are using a version of trunk (4.0-dev) and not just the actual attached to the JIRA issue (which IIRC had at least one bug... SOLR-2521). Have you tried a more recent version of trunk? -Yonik http://www.lucidimagination.com On Wed, Aug 3, 2011 at 7:00 AM, matthew.fow...@thomsonreuters.com wrote: Hi Yonik Sorry for my late reply. I have been trying to get to the bottom of this but I'm getting inconsistent behaviour. Here's an example: Query = pi:rcs100 - Here going to use pid_rcs as join value result name=response numFound=1 start=0 doc str name=pircs100/str str name=ctrcs/str str name=pid_rcsG1/str str name=name_rcsEmerging Market Countries/str str name=definition_rcsAll business events relating to companies and other issuers of securities./str /doc /result /response Query = code:G1 - See how many docs have G1 in their code field. Notice that code is multi valued - result name=response numFound=2 start=0 - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF3wGpXk+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF7YcLP+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc /result /response Now for the join: http://10.15.39.137:8983/solr/file/select?q={!join from=pid_rcs to=code}pi:rcs100 - result name=response numFound=3 start=0 - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF3wGpXk+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF7YcLP+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:58Z/date str name=pinCN1763203+1029782/str - arr name=code strA2/str strA5/str strA9/str strAN/str strB125/str strB126/str strB130/str strBL63/str strG41/str strGK/str strMZ/str /arr /doc /result /response So as you can see I get back 3 results when only 2 match the criteria. i.e. docs where G1 is present in multi valued code field. Why should the last document be included in the result of the join? Thank you, Matt -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 01 August 2011 18:28 To: solr-user@lucene.apache.org Subject: Re: Joining on multi valued fields On Mon, Aug 1, 2011 at 12:58 PM, matthew.fow...@thomsonreuters.com wrote: I have been using the JOIN patch https://issues.apache.org/jira/browse/SOLR-2272 with great success. However I have hit a case where it doesn't seem to be working. It doesn't seem to work when joining to a multi-valued field. That should work (and the unit tests do test with multi-valued fields). Can you come up with a simple example where you are not getting the expected results? -Yonik http://www.lucidimagination.com This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
Re: External File Field
On Mon, Aug 1, 2011 at 11:16 AM, Mark static.void@gmail.com wrote: We have around 10million documents that are in our index and about 10% of them have some extra statistics that are calculated on a daily basis which are then index and used in our function queries. This reindexing comes at the expense of doing multiple joins in DIH so I am thinking it may be faster to precompute these values and use external files rather than have to re-index 10% of our corpus daily. How many external file fields could one use before it becomes too many? Is this a valid use case or am I trying to fit a square into a circular hole? Each external file field will take up maxDoc*4 bytes of RAM. The other consideration is the time to load them (how often the index needs to change). -Yonik http://www.lucidimagination.com
Re: Joining on multi valued fields
On Mon, Aug 1, 2011 at 12:58 PM, matthew.fow...@thomsonreuters.com wrote: I have been using the JOIN patch https://issues.apache.org/jira/browse/SOLR-2272 with great success. However I have hit a case where it doesn't seem to be working. It doesn't seem to work when joining to a multi-valued field. That should work (and the unit tests do test with multi-valued fields). Can you come up with a simple example where you are not getting the expected results? -Yonik http://www.lucidimagination.com
Re: what data type for geo fields?
On Thu, Jul 28, 2011 at 10:24 AM, Peter Wolanin peter.wola...@acquia.com wrote: Thanks for the feedback. I'll have look more at how geohash works. Looking at the sample schema more closely, I see: fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ So in fact double is also Trie, but just with precisionStep 0 in the example. Right, which means it's a normal numeric field with one token indexed per value (i.e. no tradeoff to to speed up range queries by increasing index size). -Yonik http://www.lucidimagination.com
Re: what data type for geo fields?
On Wed, Jul 27, 2011 at 9:01 AM, Peter Wolanin peter.wola...@acquia.com wrote: Looking at the example schema: http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3/solr/example/solr/conf/schema.xml the solr.PointType field type uses double (is this just an example field, or used for geo search?) While you could possibly use PointType for geo search, it doesn't have good support for it (it's more of a general n-dimension point) The LatLonType has all the geo support currently. , while the solr.LatLonType field uses tdouble and it's unclear how the geohash is translated into lat/lon values or if the geohash itself might typically be used as a copyfield and use just for matching a query on a geohash? There's no geohash used in LatLonType It is indexed as a lat and lon under the covers (using the suffix _d) Is there an advantage in terms of speed to using Trie fields for solr.LatLonType? Currently only for explicit range queries... like point:[10,10 TO 20,20] I would assume so, e.g. for bbox operations. It's a bit of an implementation detail, but bbox doesn't currently use range queries. -Yonik http://www.lucidimagination.com
Re: Solr vs ElasticSearch
On Wed, Jul 27, 2011 at 7:17 AM, Tarjei Huse tar...@scanmine.com wrote: On 06/01/2011 08:22 AM, Jason Rutherglen wrote: Thanks Shashi, this is oddly coincidental with another issue being put into Solr (SOLR-2193) to help solve some of the NRT issues, the timing is impeccable. Hmm, does anyone have an idea on when this will be finished? It's in trunk now... try it out! -Yonik http://www.lucidimagination.com
Re: Rounding errors in solr
On Mon, Jul 25, 2011 at 10:12 AM, Brian Lamb brian.l...@journalexperts.com wrote: Yes and that's causing some problems in my application. Is there a way to truncate the 7th decimal place in regards to sorting by the score? Not built in. With some Java coding, you could create a post filter that manipulates scores. http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters -Yonik http://www.lucidimagination.com On Fri, Jul 22, 2011 at 4:27 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Jul 22, 2011 at 4:11 PM, Brian Lamb brian.l...@journalexperts.com wrote: I've noticed some peculiar scoring issues going on in my application. For example, I have a field that is multivalued and has several records that have the same value. For example, arr name=references strNational Society of Animal Lovers/str strNat. Soc. of Ani. Lov./str /arr I have about 300 records with that exact value. Now, when I do a search for references:(national society animal lovers), I get the following results: id252/id id159/id id82/id id452/id id105/id When I do a search for references:(nat soc ani lov), I get the results ordered differently: id510/id id122/id id501/id id82/id id252/id When I load all the records that match, I notice that at some point, the scores aren't the same but differ by only a little: 1.471928 in one and the one before it was 1.471929 32 bit floats only have 7 decimal digits of precision, and in floating point land (a+b+c) can be slightly different than (c+b+a) -Yonik http://www.lucidimagination.com
Re: Solr 3.3: Exception in thread Lucene Merge Thread #1
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 I'm confused why MMapDirectory is getting used with the IBM JVM... I had thought it would default to NIOFSDirectory on Linux w/ a non Oracle JVM. Are you specifically selecting MMapDirectory in solrconfig.xml? Can you try the Oracle JVM to see if that changes things? -Yonik http://www.lucidimagination.com On Fri, Jul 22, 2011 at 5:58 AM, mdz-munich sebastian.lu...@bsb-muenchen.de wrote: I was wrong. After rebooting tomcat we discovered a new sweetness: /SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.core.SolrCore@3c753c75 (core.name) has a reference count of 1 22.07.2011 11:52:07 org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: java.io.IOException: Map failed at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1099) at org.apache.solr.core.SolrCore.init(SolrCore.java:585) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4584) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5262) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5257) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:736) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:264) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216) at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129) at org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:92) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:113) at org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:29) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:750) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75) at org.apache.lucene.index.IndexReader.open(IndexReader.java:428) at org.apache.lucene.index.IndexReader.open(IndexReader.java:371) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1088) ... 18 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779) ... 33 more/ Any ideas and/or suggestions? Best regards thank you, Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-3-Exception-in-thread-Lucene-Merge-Thread-1-tp3185248p3190976.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.3: Exception in thread Lucene Merge Thread #1
On Fri, Jul 22, 2011 at 9:44 AM, Yonik Seeley yo...@lucidimagination.com wrote: IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 I'm confused why MMapDirectory is getting used with the IBM JVM... I had thought it would default to NIOFSDirectory on Linux w/ a non Oracle JVM. I verified that the MMapDirectory is selected by default with the IBM JVM (it must also contain the right Sun internal classes). Anyone else have experience with MMapDirectory w/ IBM's JVM? -Yonik http://www.lucidimagination.com Are you specifically selecting MMapDirectory in solrconfig.xml? Can you try the Oracle JVM to see if that changes things? -Yonik http://www.lucidimagination.com On Fri, Jul 22, 2011 at 5:58 AM, mdz-munich sebastian.lu...@bsb-muenchen.de wrote: I was wrong. After rebooting tomcat we discovered a new sweetness: /SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.core.SolrCore@3c753c75 (core.name) has a reference count of 1 22.07.2011 11:52:07 org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: java.io.IOException: Map failed at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1099) at org.apache.solr.core.SolrCore.init(SolrCore.java:585) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4584) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5262) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5257) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:736) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:264) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216) at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129) at org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:92) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:113) at org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:29) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:750) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75) at org.apache.lucene.index.IndexReader.open(IndexReader.java:428) at org.apache.lucene.index.IndexReader.open(IndexReader.java:371) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1088) ... 18 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779) ... 33 more/ Any ideas and/or suggestions? Best regards thank you, Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-3-Exception-in-thread-Lucene-Merge-Thread-1-tp3185248p3190976.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.3: Exception in thread Lucene Merge Thread #1
OK, best guess is that you're going over some per-process address space limit. Try seeing what ulimit -a says. -Yonik http://www.lucidimagination.com On Fri, Jul 22, 2011 at 12:51 PM, mdz-munich sebastian.lu...@bsb-muenchen.de wrote: Hi Yonik, thanks for your reply! Are you specifically selecting MMapDirectory in solrconfig.xml? Nope. We installed Oracle's Runtime from http://java.com/de/download/linux_manual.jsp?locale=de /java.runtime.name = Java(TM) SE Runtime Environment sun.boot.library.path = /usr/java/jdk1.6.0_26/jre/lib/amd64 java.vm.version = 20.1-b02 shared.loader = java.vm.vendor = Sun Microsystems Inc. enable.master = true java.vendor.url = http://java.sun.com/ path.separator = : java.vm.name = Java HotSpot(TM) 64-Bit Server VM tomcat.util.buf.StringCache.byte.enabled = true file.encoding.pkg = sun.io java.util.logging.config.file = /local/master01_tomcat7x_solr33x/conf/logging.properties user.country = DE sun.java.launcher = SUN_STANDARD sun.os.patch.level = unknown java.vm.specification.name = Java Virtual Machine Specification user.dir = /local/master01_tomcat7x_solr33x/logs solr.abortOnConfigurationError = true java.runtime.version = 1.6.0_26-b03 java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment java.endorsed.dirs = /local/master01_tomcat7x_solr33x/endorsed os.arch = amd64 java.io.tmpdir = /local/master01_tomcat7x_solr33x/temp line.separator = / But no success with 1000 docs/batch, this was thrown during optimize: / 22.07.2011 18:44:05 org.apache.solr.core.SolrCore execute INFO: [core.digi20] webapp=/solr path=/update params={} status=500 QTime=87540 22.07.2011 18:44:05 org.apache.solr.common.SolrException log SEVERE: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:303) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:217) at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129) at org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:245) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:117) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:703) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4196) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2525) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2462) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:410) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:403) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:301) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:162) at
Re: Solr 3.3: Exception in thread Lucene Merge Thread #1
Yep, there ya go... your OS configuration is limiting you to 27G of virtual address space per process. Consider setting that to unlimited. -Yonik http://www.lucidimagination.com On Fri, Jul 22, 2011 at 1:05 PM, mdz-munich sebastian.lu...@bsb-muenchen.de wrote: It says: /core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 257869 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) 28063940 open files (-n) 8192 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 257869 virtual memory (kbytes, -v) 27216080 file locks (-x) unlimited/ Best regards, Sebastian Yonik Seeley-2-2 wrote: OK, best guess is that you're going over some per-process address space limit. Try seeing what ulimit -a says. -Yonik http://www.lucidimagination.com On Fri, Jul 22, 2011 at 12:51 PM, mdz-munich lt;sebastian.lu...@bsb-muenchen.degt; wrote: Hi Yonik, thanks for your reply! Are you specifically selecting MMapDirectory in solrconfig.xml? Nope. We installed Oracle's Runtime from http://java.com/de/download/linux_manual.jsp?locale=de /java.runtime.name = Java(TM) SE Runtime Environment sun.boot.library.path = /usr/java/jdk1.6.0_26/jre/lib/amd64 java.vm.version = 20.1-b02 shared.loader = java.vm.vendor = Sun Microsystems Inc. enable.master = true java.vendor.url = http://java.sun.com/ path.separator = : java.vm.name = Java HotSpot(TM) 64-Bit Server VM tomcat.util.buf.StringCache.byte.enabled = true file.encoding.pkg = sun.io java.util.logging.config.file = /local/master01_tomcat7x_solr33x/conf/logging.properties user.country = DE sun.java.launcher = SUN_STANDARD sun.os.patch.level = unknown java.vm.specification.name = Java Virtual Machine Specification user.dir = /local/master01_tomcat7x_solr33x/logs solr.abortOnConfigurationError = true java.runtime.version = 1.6.0_26-b03 java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment java.endorsed.dirs = /local/master01_tomcat7x_solr33x/endorsed os.arch = amd64 java.io.tmpdir = /local/master01_tomcat7x_solr33x/temp line.separator = / But no success with 1000 docs/batch, this was thrown during optimize: / 22.07.2011 18:44:05 org.apache.solr.core.SolrCore execute INFO: [core.digi20] webapp=/solr path=/update params={} status=500 QTime=87540 22.07.2011 18:44:05 org.apache.solr.common.SolrException log SEVERE: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:303) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:217) at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129) at org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:245) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:117) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:703) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4196) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2525) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2462) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:410) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter
Re: Rounding errors in solr
On Fri, Jul 22, 2011 at 4:11 PM, Brian Lamb brian.l...@journalexperts.com wrote: I've noticed some peculiar scoring issues going on in my application. For example, I have a field that is multivalued and has several records that have the same value. For example, arr name=references strNational Society of Animal Lovers/str strNat. Soc. of Ani. Lov./str /arr I have about 300 records with that exact value. Now, when I do a search for references:(national society animal lovers), I get the following results: id252/id id159/id id82/id id452/id id105/id When I do a search for references:(nat soc ani lov), I get the results ordered differently: id510/id id122/id id501/id id82/id id252/id When I load all the records that match, I notice that at some point, the scores aren't the same but differ by only a little: 1.471928 in one and the one before it was 1.471929 32 bit floats only have 7 decimal digits of precision, and in floating point land (a+b+c) can be slightly different than (c+b+a) -Yonik http://www.lucidimagination.com
Re: Determine which field term was found?
On Thu, Jul 21, 2011 at 4:47 PM, Olson, Ron rol...@lbpc.com wrote: Is there an easy way to find out which field matched a term in an OR query using Solr? I have a document with names in two multi-valued fields and I am searching for Smith, using the query A_NAMES:smith OR B_NAMES:smith. I figure I could loop through both result arrays, but that seems weird to me to have to search again for the value in a result. That's pretty much the way lucene currently works - you don't know what fields match a query. If the query is simple, looping over the returned stored fields is probably your best bet. There are a couple other tricks you could use (although they are not necessarily better): 1) with grouping by query (a trunk feature) you can essentially return both queries with one request: q=*:*group=truegroup.query=A_NAMES:smithgroup.query=B_NAMES:smith and optionally add a group.query=A_NAMES:smith OR B_NAMES:smith if you need the combined list 2) use pseudo-fields (also trunk) in conjunction with the termfreq function (the number of times a term appears in a field). This obviously only works with term queries. fl=*,count1:termfreq(A_NAMES,'smith'),count2:termfreq(B_NAMES,'smith') You can use parameter substitution to pull out the actual term and simplify the query: fl=*,count1:termfreq(A_NAMES,$term),count2:termfreq(B_NAMES,$term)term=smith -Yonik http://www.lucidimagination.com
Re: defType argument weirdness
On Tue, Jul 19, 2011 at 11:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Is it generally recognized that this terminology is confusing, or is it just me? I do understand what they do (at least well enough to use them), but I find it confusing that it's called defType as a main param, but type in a LocalParam When used as the main param, it is still just the default (i.e. it may be overridden). For example defType=luceneq={!func}1 (and then there's 'qt', often confused with defType/type by newbies, since they guess it stands for 'query type', but which should probably actually have been called 'requestHandler'/'rh' instead, since that's what it actually chooses, no? It gets very confusing). Yeah, qt is very historical... before the QParserPlugin framework, and before request handlers were used for many other things (including updates). -Yonik http://www.lucidimagination.com If it's generally recognized it's confusing and perhaps a somewhat inconsistent mental model being implied, I wonder if there'd be any interest in renaming these to be more clear, leaving the old ones as aliases/synonyms for backwards compatibility (perhaps with a long deprecation period, or perhaps existing forever). I know it was very confusing to me to keep track of these parameters and what they did for quite a while, and still trips me up from time to time. Jonathan From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley [yo...@lucidimagination.com] Sent: Tuesday, July 19, 2011 9:40 PM To: solr-user@lucene.apache.org Subject: Re: defType argument weirdness On Tue, Jul 19, 2011 at 1:25 PM, Naomi Dushay ndus...@stanford.edu wrote: Regardless, I thought that defType=dismaxq=*:* is supposed to be equivalent to q={!defType=dismax}*:* and also equivalent to q={!dismax}*:* Not quite - there is a very subtle distinction. {!dismax} is short for {!type=dismax}, the type of the actual query, and this may not be overridden. The defType local param is only the default type for sub-queries (as opposed to the current query). It's useful in conjunction with the query or nested query qparser: http://lucene.apache.org/solr/api/org/apache/solr/search/NestedQParserPlugin.html -Yonik http://www.lucidimagination.com
Re: Reading Solr's JSON
On Wed, Jul 20, 2011 at 10:58 AM, Sowmya V.B. vbsow...@gmail.com wrote: Which is the best way to read Solr's JSON output, from a Java code? You could use SolrJ - it handles parsing for you (and uses the most efficient binary format by default). There seems to be a JSONParser in one of the jar files in SolrLib (org.apache.noggit..)...but I dont understand how to read the parsed output in this. If you just want to deserialize into objects (Maps, Lists, etc) then it's easy: ObjectBuilder.fromJSON(my_json_string) -Yonik http://www.lucidimagination.com
Re: Wiki Error JSON syntax
On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou remyloubra...@gmail.com wrote: Hi, I was writing a Solr Client API for Node and I found an error on this page http://wiki.apache.org/solr/UpdateJSON ,on the section Update Commands the JSON is not valid because there are duplicate keys and two times with add and delete. It's a common misconception that it's invalid JSON. Duplicate keys are in fact legal. -Yonik http://www.lucidimagination.com I tried with an array and it doesn't work as well, I got error 400, I think that's because the syntax is bad. I don't really know if I am at the good place to talk about that but ... that the only place I found. Sorry if it's not. Thanks, And I love Solr :)
Re: Using FieldCache in SolrIndexSearcher - crazy idea?
On Tue, Jul 19, 2011 at 3:20 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Quite probably ... you typically can't assume that a FieldCache can be : constructed for *any* field, but it should be a safe assumption for the : uniqueKey field, so for that initial request of the mutiphase distributed : search it's quite possible it would speed things up. : : Ah, thanks Hoss - I had meant to respond to the original email, but : then I lost track of it. : : Via pseudo-fields, we actually already have the ability to retrieve : values via FieldCache. : fl=id:{!func}id isn't that kind of orthoginal to the question though? ... a user can use the new psuedo-field functionality to request values from the FieldCache instead of stored fields, but specificly in the case of distributed search, when the first request is only asking for the uniqueKey values and scores, shouldn't that use the FieldCache to get those values? (w/o the user needing to jumpt thorugh hoops in how the request is made/configured) Well, I was pointing out that distributed search could be easily modified to use the field-cache by changing id to id:{!func}id But I'm not sure we should do that by default - the memory of a full fieldCache entry is non-trivial for some people. Using a CSF id field would be better I think (the type were it doesn't populate a fieldcache entry). -Yonik http://www.lucidimagination.com
Re: Using functions in fq
On Tue, Jul 19, 2011 at 6:49 PM, solr nps solr...@gmail.com wrote: My documents have two prices retail_price and current_price. I want to get products which have a sale of x%, the x is dynamic and can be specified by the user. I was trying to achieve this by using fq. If I want all sony tv's that are at least 20% off, I want to write something like q=sony tvfq=current_price:[0 TO product(retail_price,0.80)] this does not work as the function is not expected in fq. how else can I achieve this? The frange query parser may do what you want. http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/ fq={!frange l=0 u=0.8}div(current_price, retail_price) -Yonik http://www.lucidimagination.com
Re: defType argument weirdness
On Tue, Jul 19, 2011 at 1:25 PM, Naomi Dushay ndus...@stanford.edu wrote: Regardless, I thought that defType=dismaxq=*:* is supposed to be equivalent to q={!defType=dismax}*:* and also equivalent to q={!dismax}*:* Not quite - there is a very subtle distinction. {!dismax} is short for {!type=dismax}, the type of the actual query, and this may not be overridden. The defType local param is only the default type for sub-queries (as opposed to the current query). It's useful in conjunction with the query or nested query qparser: http://lucene.apache.org/solr/api/org/apache/solr/search/NestedQParserPlugin.html -Yonik http://www.lucidimagination.com
Re: NRT and commit behavior
On Mon, Jul 18, 2011 at 10:53 AM, Nicholas Chase nch...@earthlink.net wrote: Very glad to hear that NRT is finally here! But my question is this: will things still come to a standstill during a commit? New updates can now proceed in parallel with a commit, and searches have always been completely asynchronous w.r.t. commits. -Yonik http://www.lucidimagination.com
Re: Join performance?
On Mon, Jul 18, 2011 at 12:48 PM, Kanduru, Ajay (NIH/NLM/LHC) [C] akand...@mail.nih.gov wrote: I am trying to optimize performance of solr with our collection. The collection has 208M records with index size of about 80GB. The machine has 16GB and I am allocating about 14GB to solr. I am using self join statement in filter query like this: q=(general search term) fq={!join from=join_field to=join_field}(field1:(field1 search term) AND field2:(field2 search term) AND field3:(field3 search term)) ... Field definitions: join_field: string type (Has ~27K terms) field1: text type field2: double type field3: string type The response time of qf with join is about ten times compared to qf without join (~10 sec vs ~1 sec). Is this something on expected lines? Yep... the initial join implementation is O(nterms), so it's expected to be slow when the number of unique terms is high. Given your index size, it would have almost expected it to be slower! As with faceting, I expect there to be other implementations in the future, but nothing right now... -Yonik http://www.lucidimagination.com In general what parameters, if any, can be tweaked? The intention is to use such multiple filter queries, hence the need for optimization. Sharding and more horse power are obvious solutions, but more interested in optimizing for a given host and a given data collection. Appreciate any insight in this regard. -Ajay
Re: Solr search starting with 1 character spin endlessly
On Mon, Jul 18, 2011 at 3:44 PM, Timothy Tagge tplimi...@gmail.com wrote: Solr version: 1.4.1 I'm having some trouble with certain queries run against my Solr index. When a query starts with a single letter followed by a space, followed by another search term, the query runs endlessly and never comes back. An example problem query string... /customer/select/?q=name%3At+j+reynoldsversion=2.2start=0rows=10indent=on However, if I switch the order of the search values, putting the longer search term before the single character, I get quick, accurate results /customer/select/?q=name%3AReynolds+T+Jversion=2.2start=0rows=10indent=on Note that a query of name:t j reynolds is actually equivalent to name:t default_field:j default_field:reynolds You probably want a query of name:t j reynolds or name:(t j reynolds) The query probably doesn't hang, but may just take a long time if you have a big index, or if you don't have enough RAM and the default field isn't one that is normally searched (causing much real disk IO to satisfy the query). -Yonik http://www.lucidimagination.com I've defined my name field as text. field name=name type=text indexed=true stored=true required=true / Where text is defined as fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=customer-synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=customer-synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Am I making a simple mistake somewhere? Thanks for your help. Tim T.
Re: Document IDs instead of count for facets?
On Sun, Jul 17, 2011 at 10:38 AM, Jeff Schmidt j...@535consulting.com wrote: I don't want to query for a particular facet value, but rather have Solr do a grouping of facet values. I'm not sure about the appropriate nomenclature there. But, I have a multi-valued field named process that can have values such as catalysis, activation, inhibition, expression, modification, reaction etc. About ~100K documents are indexed where this field may have none or one or more of these processes. When the client makes a request, I need to tell it that for the process catalysis, refer to documents 1,5,6,8,32 etc., and for modification, documents 43545,22,2134, etc. This sounds like grouping: http://wiki.apache.org/solr/FieldCollapsing Unfortunately it only works on single values fields, and you can't sort based on numbers of matches either. The closest you can get today is to issue 2 requests... the first a faceting request to get the top constraints, and then a second that uses group.query for each constraint you are interested in. -Yonik http://www.lucidimagination.com
Re: return distance in geo spatial query
On Thu, Jul 14, 2011 at 8:42 AM, Zoltan Altfatter altfatt...@gmail.com wrote: Would be interested in the status of the development in returning the distance in a spatial query? This is a feature in trunk (pseudo-fields). For example: fl=id,score,geodist() -Yonik http://www.lucidimagination.com
Re: What's the fq= syntax for NumericRangeFilter?
Something is wrong with your indexing. Is wc an indexed field? If not, change it so it is, then re-index your data. If so, I'd recommend starting with the example data and filter for something like popularity:[6 TO 10] to convince yourself it works, then figuring out what you did differently in your schema/data. -Yonik http://www.lucidimagination.com On Sat, Jul 9, 2011 at 10:50 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: http://localhost:8080/solr/select?indent=onversion=2.2q=*%3A** fq=wc%3A%5B255+TO+257%5D* start=0rows=10fl=*%2Cscoreqt=wt=xmlexplainOther=hl.fl= The toString of the request: {explainOther=fl=*,scoreindent=onstart=0q=*:*hl.fl=qt=wt=xmlfq=wc:[255+TO+257]rows=1version=2.2} Even when the FilterQuery is constructed in Java it doesn't work (i get results that ignore the filter query completely). On Sat, Jul 9, 2011 at 3:40 PM, Ahmet Arslan iori...@yahoo.com wrote: I don't get it to work! If I specify no fq I get the first result with int name=wc256/int With wc:[255 TO 257] (fq=wc%3A%5B255+TO+257%5D) nothing comes out. If you give us the Full URL you are using, it can be helpful. Correct syntax is fq=wc:[255 TO 257] You can use more that fq in a request. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Join and Range Queries
On Sat, Jul 9, 2011 at 8:04 PM, Lance Norskog goks...@gmail.com wrote: Does the Join feature work with Range queries? Not in any generic manner - joins are based on exact matches of indexed tokens only. But if you wanted something specific enough like same year, then you could index that year for each document and do the join on that (it would actually be a self join). You could also get other resolutions by say indexing months... doc1: description:... date:May close_dates:Apr May Jun doc2: description... date: Jun close_dates: May Jun Jul Then to find other events within 1 month of the selected events: {!join from:date to:close_dates}description:octoberfest Or to find other events within 2 months: {!join from:close to:close_dates}description:octoberfest -Yonik http://www.lucidimagination.com Given a time series of events stored as documents with time ranges, is it possible to do a search that finds certain events, and then add other documents whose time ranges overlap? -- Lance Norskog goks...@gmail.com
Re: Exception when using result grouping and sorting by geodist() with Solr 3.3
On Fri, Jul 8, 2011 at 4:11 AM, Thomas Heigl tho...@umschalt.com wrote: How should I proceed with this problem? Should I create a JIRA issue or should I cross-post on the dev mailing list? Any suggestions? Yes, this definitely sounds like a bug in the 3.3 grouping (looks like it forgets to weight the sorts). Could you open a JIRA issue? -Yonik http://www.lucidimagination.com
Re: Is solrj 3.3.0 ready for field collapsing?
On Mon, Jul 4, 2011 at 11:54 AM, Per Newgro per.new...@gmx.ch wrote: i've tried to add the params for group=true and group.field=myfield by using the SolrQuery. But the result is null. Do i have to configure something? In wiki part for field collapsing i couldn't find anything. No specific (type-safe) support for grouping is in SolrJ currently. But you should still have access to the complete generic solr response via SolrJ regardless (i.e. use getResponse()) -Yonik http://www.lucidimagination.com
Re: Using FieldCache in SolrIndexSearcher - crazy idea?
On Tue, Jul 5, 2011 at 5:13 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Correct me if I am wrong: In a standard distributed search with : QueryComponent, the first query sent to the shards asks for : fl=myUniqueKey or fl=myUniqueKey,score. When the response is being : generated to send back to the coordinator, SolrIndexSearcher.doc (int i, : SetString fields) is called for each document. As I understand it, : this will read each document from the index _on disk_ and retrieve the : myUniqueKey field value for each document. : : My idea is to have a FieldCache for the myUniqueKey field in : SolrIndexSearcher (or somewhere else?) that would be used in cases where : the only field that needs to be retrieved is myUniqueKey. Is this : something that would improve performance? Quite probably ... you typically can't assume that a FieldCache can be constructed for *any* field, but it should be a safe assumption for the uniqueKey field, so for that initial request of the mutiphase distributed search it's quite possible it would speed things up. Ah, thanks Hoss - I had meant to respond to the original email, but then I lost track of it. Via pseudo-fields, we actually already have the ability to retrieve values via FieldCache. fl=id:{!func}id But using CSF would probably be better here - no memory overhead for the FieldCache entry. -Yonik http://www.lucidimagination.com if you want to try this and report back results, i'm sure a lot of people would be interested in a patch ... i would guess the best place to make the chance would be in the QueryComponent so thta it used the FieldCache (probably best to do it via getValueSource() on the uniqueKey's SchemaField) to put the ids in teh response instead of using a SolrDocList. Hmm, actually... there's no reason why this kind of optimization would need to be specific to distributed queries, it could be done by the ResponseWriters directly -- if the field list they are being asked to return only contains the uniqueKeyField and computed values (like score) then don't bother calling SolrIndexSearcher.doc at all ... the only hitch is that with distributed search and using function values as psuedo fields and what not there are more places calling SolrIndexSearcher.doc then their use to be ... so maybe putting this change directly into SolrIndexSearcher.doc would make the most sense? -Hoss
Re: Custom Cache cleared after a commit?
On Mon, Jul 4, 2011 at 2:07 AM, arian487 akarb...@tagged.com wrote: I guess I'll have to use something other then SolrCache to get what I want then. Or I could use SolrCache and just change the code (I've already done so much of this anwyways...). Anyways thanks for the reply. You can specify a regenerator for your cache that examines items in the old cache and pre-populates the new cache when a commit happens. -Yonik http://www.lucidimagination.com
Re: Custom Cache cleared after a commit?
On Sun, Jul 3, 2011 at 10:52 PM, arian487 akarb...@tagged.com wrote: I know the queryResultCache and stuff live only so long as a commit happens but I'm wondering if the custom caches are like this as well? I'd actually rather have a custom cache which is not cleared at all. That's not currently possible. The nature of Solr's caches are that they are completely transparent - it doesn't matter if a cache is used or not, the response should always be the same. This is analogous to caching the fact that 2*2 = 4. Put another way, Solr's caches are only for increasing request throughput, and should not affect what response a client receives. -Yonik http://www.lucidimagination.com
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
OK, I tried a quick test of 1.4.1 vs 3x on optimized indexes (unoptimized had different numbers of segments so I didn't try that). 3x (as of today) was 28% faster at a large filter query (300 terms in one big disjunction, with each term matching ~1000 docs). -Yonik http://www.lucidimagination.com On Thu, Jun 30, 2011 at 3:30 PM, Shawn Heisey s...@elyograg.org wrote: On 6/29/2011 10:16 PM, Shawn Heisey wrote: I was thinking perhaps I might actually decrease the termIndexInterval value below the default of 128. I know from reading the Hathi Trust blog that memory usage for the tii file is much more than the size of the file would indicate, but if I increase it from 13MB to 26MB, it probably would still be OK. Decreasing the termIndexInterval to 64 almost doubled the tii file size, as expected. It made the filterCache warming much faster, but made the queryResultCache warming very very slow. Regular queries also seem like they're slower. I am trying again with 256. I may go back to the default before I'm done. I'm guessing that a lot of trial and error was put into choosing the default value. It's been fun having a newer index available on my backup servers. I've been able to do a lot of trials, learned a lot of things that don't work and a few that do. I might do some experiments with trunk once I've moved off 1.4.1. Thanks, Shawn
Re: pagination and groups
2011/7/1 Tomás Fernández Löbbe tomasflo...@gmail.com: I'm not sure I understand what you want to do. To paginate with groups you can use start and rows as with ungrouped queries. with group.ngroups (Something I found a couple of days ago) you can show the total number of groups. group.limit tells Solr how many (max) documents you want to see for each group. Right - just be aware that requesting the total number of groups (via group.ngroups) is pretty memory and resource intensive - that's why there is a separate option for it. -Yonik http://www.lucidimagination.com
Re: pagination and groups
On Sat, Jul 2, 2011 at 7:34 PM, Benson Margulies bimargul...@gmail.com wrote: Hey, I don't suppose you could easily tell me the rev in which ngroups arrived? 1137037 I believe. Grouping originated in Solr, was refactored to a shared lucene/solr module, including the ability to get the total number of groups, and then Solr's implementation was cut over to that. Also, how does ngroups compare to the 'matches' value inside each group? The units for matches is currently number of documents, while the units for ngroups is number of groups. -Yonik http://www.lucidimagination.com
Re: JOIN, query on the parent?
On Thu, Jun 30, 2011 at 6:19 PM, Ryan McKinley ryan...@gmail.com wrote: Hello- I'm looking for a way to find all the links from a set of results. Consider: doc id:1 type:X link:a link:b /doc doc id:2 type:X link:a link:c /doc doc id:3 type:Y link:a /doc Is there a way to search for all the links from stuff of type X -- in this case (a,b,c) Do the links point to other documents somehow? Let's assume that there are documents with ids of a,b,c fq={!join from=link to=id}type:X Basically, you start with the set of documents that match type:X, then follow from link to id to arrive at the new set of documents. -Yonik http://www.lucidimagination.com
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
Hmmm, you could comment out the query and filter caches on both 1.4.1 and 3.2 and then run some of the queries to see if you can figure out which are slower? Do any of the queries have stopwords in fields where you now index those? If so, that could entirely account for the difference. -Yonik http://www.lucidimagination.com On Wed, Jun 29, 2011 at 10:59 AM, Shawn Heisey s...@elyograg.org wrote: I have noticed a significant difference in filter cache warming times on my shards between 3.2 and 1.4.1. What can I do to troubleshoot this? Please let me know what additional information you might need to look deeper. I know this isn't enough. It takes about 3 seconds to do an autowarm count of 8 on 1.4.1 and 10-15 seconds to do an autowarm count of 4 on 3.2. The only explicit warming query is *:*, sorted descending by post_date, a tlong field containing a UNIX timestamp, precisionStep 16. The indexes are not entirely identical, but the new one did evolve from the old one. Perhaps one of the experts might spot something that makes for much slower filter cache warming, or some way to look deeper if this seems wrong? Is there a way to see the search URL bits that populated the cache? Index differences: The new index has four extra small fields, is no longer removing stopwords, and has omitTermFreqAndPositions enabled on a significant number of fields. Most of the fields are tokenized text, and now more than half of those don't have tf and tp enabled. Naturally the largest text field where most of the matches happen still does have them enabled. To increase reindex speed, the new index has a termIndexInterval of 1024, the old one is at the default of 128. In terms of raw size, the new index is less than one percent larger than the old one. The old shards average out to 17.22GB, the new ones to 17.41GB. Here's an overview of the differences of each type of file (comparing the huge optimized segment only, not the handful of tiny ones since) on one the index with the largest size gap, old value listed first: fdt: 6317180127/6055634923 (4.1% decrease) fdx: 76447972/75647412 (1% decrease) fnm: 382, 338 (44 bytes! woohoo!) frq: 2828400926/2873249038 (1.5% increase) nrm: 28367782/38223988 (35% increase) prx: 2449154203/2684249069 (9.5% increase) tii: 1686298/13329832 (790% increase) tis: 923045932/999294109 (8% increase) tvd: 18910972/19111840 (1% increase) tvf: 5867309063/5640332282 (3.9% decrease) tvx: 151294820/152895940 (1% increase) The tii and nrm files are the only ones that saw a significant size increase, but the tii file is MUCH bigger. Thanks, Shawn
Re: Solr just 'hangs' under load test - ideas?
Can you get a thread dump to see what is hanging? -Yonik http://www.lucidimagination.com On Wed, Jun 29, 2011 at 11:45 AM, Bob Sandiford bob.sandif...@sirsidynix.com wrote: Hi, all. I'm hoping someone has some thoughts here. We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the getLuceneVersion() calls, but use luceneMatchVersion directly). We're running in a Tomcat instance, 64 bit Java. CATALINA_OPTS are: -Xmx7168m -Xms7168m -XX:MaxPermSize=256M We're running 2 Solr cores, with the same schema. We use SolrJ to run our searches from a Java app running in JBoss. JBoss, Tomcat, and the Solr Index folders are all on the same server. In case it's relevant, we're using JMeter as a load test harness. We're running on Solaris, a 16 processor box with 48GB physical memory. I've run a successful load test at a 100 user load (at that rate there are about 5-10 solr searches / second), and solr search responses were coming in under 100ms. When I tried to ramp up, as far as I can tell, Solr is just hanging. (We have some logging statements around the SolrJ calls - just before, we log how long our query construction takes, then we run the SolrJ query and log the search times. We're getting a number of the query construction logs, but no corresponding search time logs). Symptoms: The Tomcat and JBoss processes show as well under 1% CPU, and they are still the top processes. CPU states show around 99% idle. RES usage for the two Java processes around 3GB each. LWP under 120 for each. STATE just shows as sleep. JBoss is still 'alive', as I can get into a piece of software that talks to our JBoss app to get data. We set things up to use log4j logging for Solr - the log isn't showing any errors or exceptions. We're not indexing - just searching. Back in January, we did load testing on a prototype, and had no problems (though that was Solr 1.4 at the time). It ramped up beautifully - bottle necks were our apps, not Solr. What I'm benchmarking now is a descendent of that prototyping - a bit more complex on searches and more fields in the schema, but same basic search logic as far as SolrJ usage. Any ideas? What else to look at? Ringing any bells? I can send more details if anyone wants specifics... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey s...@elyograg.org wrote: Just now, three of the six shards had documents deleted, and they took 29.07, 27.57, and 28.66 seconds to warm. The 1.4.1 counterpart to the 29.07 second one only took 4.78 seconds, and it did twice as many autowarm queries. Can you post the logs at the INFO level that covers the warming period? -Yonik http://www.lucidimagination.com
Re: conditionally update document on unique id
On Wed, Jun 29, 2011 at 4:32 PM, eks dev eks...@googlemail.com wrote: req.getSearcher().getFirstMatch(t) != -1; Yep, this is currently the fastest option we have. -Yonik http://www.lucidimagination.com
Re: Solr 3.2 filter cache warming taking longer than 1.4.1
On Wed, Jun 29, 2011 at 3:28 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Jun 29, 2011 at 1:43 PM, Shawn Heisey s...@elyograg.org wrote: Just now, three of the six shards had documents deleted, and they took 29.07, 27.57, and 28.66 seconds to warm. The 1.4.1 counterpart to the 29.07 second one only took 4.78 seconds, and it did twice as many autowarm queries. Can you post the logs at the INFO level that covers the warming period? OK, your filter queries have hundreds of terms in them (and that means hundreds of term lookups, which uses the term index). Thus, your termIndexInterval change is be the leading suspect for the slowdown. A termIndexInterval of 1024 means that a term lookup will seek to the closest 1024th term and then call next() until the desired term is found. Hence instead of calling next() an average of 64 times internally, it's now 512 times. Of course there is still a mystery about why your tii (which is the term index) would be so much bigger instead of smaller... -Yonik http://www.lucidimagination.com
Re: multiple spatial values
On Sat, Jun 25, 2011 at 5:56 AM, marthinal jm.rodriguez.ve...@gmail.com wrote: sfield, pt and d can all be specified directly in the spatial functions/filters too, and that will override the global params. Unfortunately one must currently use lucene query syntax to do an OR. It just makes it look a bit messier. q=_query_:{!geofilt} _query:{!geofilt sfield=location_2} -Yonik http://www.lucidimagination.com @Yonik it seems to work like this, i triyed houndreds of other possibilities without success: q={!geofilt sfield=location_1 pt=36.62,-6.23 d=50}fq={!geofilt sfield=location_2 pt=40.51,-5.91 d=500} Ah, right. I had thought you wanted docs that matched either geofilt (hence OR), not docs that only matched both. -Yonik http://www.lucidimagination.com
Re: multiple spatial values
On Fri, Jun 24, 2011 at 2:11 PM, marthinal jm.rodriguez.ve...@gmail.com wrote: Yonik Seeley-2-2 wrote: On Tue, Sep 21, 2010 at 12:12 PM, dan sutton lt;danbsut...@gmail.comgt; wrote: I was looking at the LatLonType and how it might represent multiple lon/lat values ... it looks to me like the lat would go in {latlongfield}_0_LatLon and the long in {latlongfield}_1_LatLon ... how then if we have multiple lat/long points for a doc when filtering for example we choose the correct points. e.g. if thinking in cartisean coords and we have P1(3,4), P2(6,7) ... x is stored with 3,6 and y with 4,7 ... then how does it ensure we're not erroneously picking (3,7) or (6,4) whilst filtering with the spatial query? That's why it's a single-valued field only for now... don't we have to store both values together ? what am i missing here? The problem is that we don't have a way to query both values together, so we must index them separately. The basic LatLonType uses numeric queries on the lat and lon fields separately. -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 I have in my index two diferents fields like you say Yonik (location_1, location_2) but the problem is when i want to filter results that have d= 50 for location_1 and d=50 for location_2 .I really dont know to build the query ... For example it works perfectly : q={!geofilt}sfield=location_1pt=36.62288966,-6.23211272d=25 but how i add the sfield location_2 ? sfield, pt and d can all be specified directly in the spatial functions/filters too, and that will override the global params. Unfortunately one must currently use lucene query syntax to do an OR. It just makes it look a bit messier. q=_query_:{!geofilt} _query:{!geofilt sfield=location_2} -Yonik http://www.lucidimagination.com
Re: SEVERE: java.lang.NoSuchFieldError: core Solr branch3.x
I just tried branch_3x and couldn't reproduce this. Looks like maybe there is something wrong with your build, or some old class files left over somewhere being picked up. -Yonik http://www.lucidimagination.com On Wed, Jun 22, 2011 at 10:15 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Today's checkout (Solr Specification Version: 3.4.0.2011.06.22.16.10.08) produces the exception below on start up. The same exception with very similar strack trace comes when committing and add. Example schema and docs will reproduce the error. Jun 22, 2011 4:11:57 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NoSuchFieldError: core at org.apache.lucene.index.SegmentTermDocs.init(SegmentTermDocs.java:48) at org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:491) at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1005) at org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:484) at org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:321) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:101) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:298) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:524) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1178) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1066) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:258) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: sorting by termfreq on trunk doesn't work?
Thanks for the problem report. It turns out we didn't check for a null pointer when there were no terms in a field for a segment. I've just committed a fix to trunk. -Yonik http://www.lucidimagination.com On Wed, Jun 22, 2011 at 10:28 PM, Jason Toy jason...@gmail.com wrote: I am trying to use sorting by the termfreq function using the trunk code since termfreq was added in the 4.0 code base. I run this query: http://127.0.0.1:8983/solr/select/?q=librariansort=termfreq(all_lists_text,librarian)%20desc but I get: HTTP ERROR 500 Problem accessing /solr/select/. Reason: null java.lang.NullPointerException at org.apache.solr.search.function.TermFreqValueSource$1.reset(TermFreqValueSource.java:53) at org.apache.solr.search.function.TermFreqValueSource$1.init(TermFreqValueSource.java:49) at org.apache.solr.search.function.TermFreqValueSource.getValues(TermFreqValueSource.java:44) at org.apache.solr.search.function.ValueSource$ValueSourceComparator.setNextReader(ValueSource.java:188) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:544) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:313) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1190) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1078) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:346) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:400) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:231) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1308) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Is termfreq stable and how can I run this query? -- - sent from my mobile 6176064373
Re: Problem with CSV update handler
On Tue, Jun 21, 2011 at 2:15 AM, Rafał Kuć r@solr.pl wrote: Hello! Once again thanks for the response ;) So the solution is to generate the data files once again and either adding the space after doubled encapsulator Maybe... I can't tell if the file is encoded correctly or not since I don't know what the decoded values are supposed to be from your example. -Yonik http://www.lucidimagination.com or changing the encapsulator to the character that does not occur in the filed values (of course the one taht will be split). -- Regards, Rafał Kuć http://solr.pl Multi-valued CSV fields are double encoded. We start with: aaa bbbccc' Then decoding one leve, we get: aaa bbbccc Decoding again to get individual values results in a decode error because the encapsulator appears unescaped in the middle of the second value (i.e. invalid CSV). One easier way to fix this is to use a different encapsulator for the sub-values of a multi-valued field by adding f.title.encapsulator=%27 (a single quote char) But I can't really tell you exactly how to encode or specify options to the CSV loader when I don't know what the actual values you want after aaa bbbccc' is decoded. -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 5:46 PM, Rafał Kuć r@solr.pl wrote: Hi! Yonik, thanks for the reply. I just realized that the example I gave was not full - the error is returned by Solr only when the field is multivalued and the values in the fields are splited. For example, the following curl command give me the mentioned error: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbbccc' while the following is executed without any problem: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbb ccc' The only difference between those two is the additional space character in between bbb and ccc in the second example. Am I doing something wrong ? ;) -- Regards, Rafał Kuć http://solr.pl This works fine for me: curl http://localhost:8983/solr/update/csv -H 'Content-type:text/plain' -d 'id,name 1,aaa bbb ccc' -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć r@solr.pl wrote: Hello! I have a question about the CSV update handler. Lets say I have the following file sent to CSV update handler using curl: id,name 1,aaa bbbccc It throws an error, saying that: Error 400 java.io.IOException: (line 0) invalid char between encapsulated token end delimiter If I change the contents of the file to: id,name 1,aaa bbb ccc it works without a problem. This anyone encountered this ? Is it know behavior ? -- Regards, Rafał Kuć
Re: Problem with CSV update handler
This works fine for me: curl http://localhost:8983/solr/update/csv -H 'Content-type:text/plain' -d 'id,name 1,aaa bbb ccc' -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć r@solr.pl wrote: Hello! I have a question about the CSV update handler. Lets say I have the following file sent to CSV update handler using curl: id,name 1,aaa bbbccc It throws an error, saying that: Error 400 java.io.IOException: (line 0) invalid char between encapsulated token end delimiter If I change the contents of the file to: id,name 1,aaa bbb ccc it works without a problem. This anyone encountered this ? Is it know behavior ? -- Regards, Rafał Kuć
Re: Problem with CSV update handler
Multi-valued CSV fields are double encoded. We start with: aaa bbbccc' Then decoding one leve, we get: aaa bbbccc Decoding again to get individual values results in a decode error because the encapsulator appears unescaped in the middle of the second value (i.e. invalid CSV). One easier way to fix this is to use a different encapsulator for the sub-values of a multi-valued field by adding f.title.encapsulator=%27 (a single quote char) But I can't really tell you exactly how to encode or specify options to the CSV loader when I don't know what the actual values you want after aaa bbbccc' is decoded. -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 5:46 PM, Rafał Kuć r@solr.pl wrote: Hi! Yonik, thanks for the reply. I just realized that the example I gave was not full - the error is returned by Solr only when the field is multivalued and the values in the fields are splited. For example, the following curl command give me the mentioned error: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbbccc' while the following is executed without any problem: curl 'http://localhost:8983/solr/update/csv?fieldnames=id,titlecommit=trueen capsulator=%22f.title.split=truef.title.separator=%20' -H 'Content-type:text/plain' -d '1,aaa bbb ccc' The only difference between those two is the additional space character in between bbb and ccc in the second example. Am I doing something wrong ? ;) -- Regards, Rafał Kuć http://solr.pl This works fine for me: curl http://localhost:8983/solr/update/csv -H 'Content-type:text/plain' -d 'id,name 1,aaa bbb ccc' -Yonik http://www.lucidimagination.com On Mon, Jun 20, 2011 at 3:17 PM, Rafał Kuć r@solr.pl wrote: Hello! I have a question about the CSV update handler. Lets say I have the following file sent to CSV update handler using curl: id,name 1,aaa bbbccc It throws an error, saying that: Error 400 java.io.IOException: (line 0) invalid char between encapsulated token end delimiter If I change the contents of the file to: id,name 1,aaa bbb ccc it works without a problem. This anyone encountered this ? Is it know behavior ? -- Regards, Rafał Kuć
Re: Update JSON Invalid
On Mon, Jun 20, 2011 at 11:25 PM, Shawn Heisey elyog...@elyograg.org wrote: On 6/20/2011 8:08 PM, entdeveloper wrote: Technically, yes, it's valid json, but most libraries treat the json objects as maps, and with multiple add elements as the keys, you cannot properly deserialize. As an example, try putting this into jsonlint.com, and notice it trims off one of the docs: { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } } Is there something I'm just not seeing? Should we consider cleaning up this format, possibly using some json arrays so that it makes more sense from a json perspective? This was brought up recently and should now be fixed in Solr 3.2. https://issues.apache.org/jira/browse/SOLR-2496 Thanks for the reminder, we obviously need to update the docs! -Yonik
Re: SOlR -- Out of Memory exception
On Fri, Jun 17, 2011 at 1:30 AM, pravesh suyalprav...@yahoo.com wrote: If you are sending whole CSV in a single HTTP request using curl, why not consider sending it in smaller chunks? Smaller chunks should not matter - Solr streams from the input (i.e. the whole thing is not buffered in memory). It could be related to autoCommit. Commits may be stacking up faster than can be handled. I'd recommend getting rid of autocommit if possible, or at a minimum get rid of the maxDocs based autocommit. Incremental updates can use commitWithin to guarantee a time-of-visibility, and bulk updates like this CSV upload normally shouldn't commit until the end. -Yonik http://www.lucidimagination.com
Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen
What version of Solr is this? Can you show steps to reproduce w/ the example server and data? -Yonik http://www.lucidimagination.com On Wed, Jun 15, 2011 at 7:25 AM, Marc Sturlese marc.sturl...@gmail.com wrote: Hey there, I've noticed a very odd behaviour with the snapinstaller and commit (using collectionDistribution scripts). The first time I install a new index everything works fine. But when installing a new one, I can't see the new documents. Checking the status page of the core tells me that the index version has changed but numDocs and maxDocs are the same. I have a simple script that get the version form an index reader and this confirms me that that's not true. numDocs and maxDocs are different in both indexs. The index I'm trying to install is a whole new index, generated with mergefactor = 2 and optimized with no compound file. I've tried manually to mv index to index.old and the snapshot.x to index (while tomcat is up) and manually execute: curl http://localhost:8080/trovit_solr/coreA/update?commit=true -H Content-Type: text/xml But the same is happening. Checking the logs I can see that apparently everything is fine. New searcher is registered and warming is properly done to it. I would think that the problem is with some reference opening the index searcher. But the fact that the indexVersion changes but the numDocs and maxDocs dont' makes me understand nothing. If I reload the core, numDocs and maxDocs changes and everything is fine. Any idea what could be happening here? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3066902.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: High 100% CPU usage with SOLR 1.4.1
On Wed, Jun 15, 2011 at 2:21 PM, pravesh suyalprav...@yahoo.com wrote: I would need some help in minimizing the CPU load on the new system. Could possibly NIOFSDirectory attributes to high CPU? Yes, it's a feature! The CPU is only higher because the threads aren't blocked on IO as much. So the increase in CPU you are seeing is a good thing, not a bad thing (i.e. the number of requests processed in a given number of CPU busy CPU cycles should be greater or equal to the old release). -Yonik http://www.lucidimagination.com
Re: Huge performance drop in distributed search w/ shards on the same server/container
On Sun, Jun 12, 2011 at 9:10 PM, Johannes Goll johannes.g...@gmail.com wrote: However, sporadically, Jetty 6.1.2X (shipped with Solr 3.1.) sporadically throws Socket connect exceptions when executing distributed searches. Are you using the exact jetty.xml that shipped with the solr example server, or did you make any modifications? -Yonik http://www.lucidimagination.com
Re: how can I return function results in my query?
On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy jason...@gmail.com wrote: I want to be able to run a query like idf(text, 'term') and have that data returned with my search results. I've searched the docs,but I'm unable to find how to do it. Is this possible and how can I do that ? In trunk, there's a very new feature called pseudo-fields where (among other things) you can include the results of arbitrary function queries along with the stored fields for each document. fl=id,idf(text,'term'),termfreq(text,'term') Or if you want to alias the idf call to a different name: fl=id,myidf:idf(text,'term'),mytermfreq:termfreq(text,'term') Of course, in this specific case it's a bit of a waste since idf won't change per document. -Yonik http://www.lucidimagination.com
Re: how can I return function results in my query?
On Fri, Jun 10, 2011 at 8:31 AM, Markus Jelsma markus.jel...@openindex.io wrote: Nice! Will SOLR-1298 with aliasing also work with an external file field since that can be a source of a function query as well? Haven't tried it, but it definitely should! -Yonik http://www.lucidimagination.com
Re: Edismax sorting help
2011/6/9 Denis Kuzmenok forward...@ukr.net: Hi, everyone. I have fields: text fields: name, title, text boolean field: isflag (true / false) int field: popularity (0 to 9) Now i do query: defType=edismax start=0 rows=20 fl=id,name q=lg optimus fq= qf=name^3 title text^0.3 sort=score desc pf=name bf=isflag sqrt(popularity) mm=100% debugQuery=on If i do query like Samsung i want to see prior most relevant results with isflag:true and bigger popularity, but if i do query like Nokia 6500 and there is isflag:false, then it should be higher because of exact match. Tried different combinations, but didn't found one that suites me. Just got isflag/popularity sorting working or isflag/relevancy sorting. Multiplicative boosts tend to be more stable... Perhaps try replacing bf=isflag sqrt(popularity) with bq=isflag:true^10 // vary the boost to change how much isflag counts vs the relevancy score of the main query boost=sqrt(popularity) // this will multiply the result by sqrt(popularity)... assumes that every document has a non-zero popularity You could get more creative in trunk where booleans have better support in function queries. -Yonik http://www.lucidimagination.com
Re: Processing/Indexing CSV
On Thu, Jun 9, 2011 at 3:31 PM, Helmut Hoffer von Ankershoffen helmut...@googlemail.com wrote: Hi, there seems to be no way to index CSV using the DataImportHandler. Looking over the features you want, it looks like you're starting from a CSV file (as opposed to CSV stored in a database). Is there a reason that you need to use DIH and can't directly use the CSV loader? http://wiki.apache.org/solr/UpdateCSV -Yonik http://www.lucidimagination.com Using a combination of LineEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor and RegexTransformerhttp://wiki.apache.org/solr/DataImportHandler#RegexTransformer as proposed in http://robotlibrarian.billdueber.com/an-exercise-in-solr-and-dataimporthandler-hathitrust-data/is not working for real world CSV files. E.g. many CSV files have double-quotes enclosing some but not all columns - there is no elegant way to segment this using a simple regular expression. As CSV is still very common esp. in E-Commerce scenarios, I propose that Solr provides a CSVEntityProcessor that: 1) Handles the case of CSV files with/without and with some double-quote enclosed columns 2) Allows for a configurable column separator (';',',','\t' etc.) 3) Allows for a leading row containing column headings 4) If there is a leading row with column headings provides a possibility to address columns by their column names and map them to Solr fields (similar to the XPathEntityProcessor) 5) Auto-detects encoding of the file (UTF-8 etc.) This would make it A LOT easier to use Solr for E-Commerce scenarios. If there is no such entity processor in the works i will develop one ... So please let me know. Regards
Re: Processing/Indexing CSV
On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen helmut...@googlemail.com wrote: Hi, yes, it's about CSV files loaded via HTTP from shops to be fed into a shopping search engine. The CSV Loader cannot map fields (only field values) etc. You can provide your own list of fieldnames and optionally ignore the first line of the CSV file (assuming it contains the field names). http://wiki.apache.org/solr/UpdateCSV#fieldnames -Yonik http://www.lucidimagination.com
Re: Problem with boosting function
The boost qparser should do the trick if you want a multiplicative boost. http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html -Yonik http://www.lucidimagination.com On Wed, Jun 8, 2011 at 9:22 AM, Alex Grilo a...@umamao.com wrote: Hi, I'm trying to use bf parameter in solr queries but I'm having some problems. The context is: I have some topics and a integer weight of popularity (number of users that follow the topic). I'd like to boost the documents according to this weight field, and it changes (users may start following or unfollowing that topic). I through the best way to do that is adding a bf parameter to the query. First of all I was trying to include it in a query processed by a default SearchHandler. I debugged the results and the scores didn't change. So I tried to change the defType of the SearchHandler to dismax (I didn't add any other field in solrconfig), and queries didn't work anymore. What is the best way to achieve what I want? Do I really need to use a dismax SearchHander (I read about it, and I don't want to search in multple fields - I want to search in one field and boost in another one)? Thanks in advance Alex Grilo
Re: Sorting on solr.TextField
On Wed, Jun 8, 2011 at 1:21 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks exactly what I was looking for. With this new field used just for sorting is there a way to have it be case insensitive? From the example schema: !-- lowercases the entire field value, keeping it as a single token. -- fieldType name=lowercase class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType -Yonik http://www.lucidimagination.com
Re: Solr Cloud Query Question
On Tue, Jun 7, 2011 at 9:35 AM, Jamie Johnson jej2...@gmail.com wrote: I am currently experimenting with the Solr Cloud code on trunk and just had a quick question. Lets say my setup had 3 nodes a, b and c. Node a has 1000 results which meet a particular query, b has 2000 and c has 3000. When executing this query and asking for row 900 what specifically happens? From reading the Distributed Search Wiki I would expect that node a responds with 900, node b responds with 900 and c responds with 900 and the coordinating node is responsible for taking the top scored items and throwing away the rest, is this correct or is there some additional coordination that happens where nodes a, b and c return back an id and a score and the coordinating node makes an additional request to get back the documents for the ids which make up the top list? The latter is correct - the first phase only collects enough information to merge ids from the shards, and then a second phase requests the stored fields, highlighting, etc for the specific docs that will be returned. -Yonik http://www.lucidimagination.com
Re: function queries scope
One way is to use the boost qparser: http://search-lucene.com/jd/solr/org/apache/solr/search/BoostQParserPlugin.html q={!boost b=productValueField}shops in madrid Or you can use the edismax parser which as a boost parameter that does the same thing: defType=edismaxq=shops in madridboost=productValueField -Yonik http://www.lucidimagination.com On Tue, Jun 7, 2011 at 6:53 AM, Marco Martinez mmarti...@paradigmatecnologico.com wrote: Hi, I need to use the function queries operations with the score of a given query, but only in the docset that i get from the query and i dont know if this is possible. Example: q=shops in madrid returns 1 docs with a specific score for each doc but now i need to do some stuff like q=sum(product(2,query(shops in madrid),productValueField) but this will be return all the docs in my index. I know that i can do it via filter queries, ex, q=sum(product(2,query(shops in madrid),productValueField)fq=shops in madrid but this will do the query two times and i dont want this because the performance is important to our application. Is there other approach to accomplished that= Thanks in advance, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42
Re: Solr Cloud Query Question
On Tue, Jun 7, 2011 at 1:01 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Yonik. I have a follow on now, how does Solr ensure consistent results across pages? So for example if we had my 3 theoretical solr instances again and a, b and c each returned 100 documents with the same score and the user only requested 100 documents, how are those 100 documents chosen from the set available from a, b and c if the documents have the same score? Ties within a shard are broken by docid (just like lucene), and ties across different shards are broken by comparing the shard ids... so yes, it's consistent. -Yonik http://www.lucidimagination.com
Re: Question about tokenizing, searching and retrieving results.
On Tue, Jun 7, 2011 at 12:34 PM, Luis Cappa Banda luisca...@gmail.com wrote: *Expression*: A B C D E F G H I As written, this is equivalent to *Expression*: A default_field:B default_field:C default_field:D default_field:E default_field:F default_field:G default_field:H default_field:I Try *Expression*:( A B C D E F G H I) or *Expression*:A B C D E F G H I for a phrase query. Oh, and I highly recommend sticking to java identifiers for field names - it will make your life much easier in the future. -Yonik http://www.lucidimagination.com
Re: Feature: skipping caches and info about cache use
On Fri, Jun 3, 2011 at 1:02 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Is it just me, or would others like things like: * The ability to tell Solr (by passing some URL param?) to skip one or more of its caches and get data from the index Yeah, we've needed this for a long time, and I believe there's a JIRA issue open for it. It really needs to be on a per query basis though... so a localParam that has cache=true/false would be ideal. -Yonik http://www.lucidimagination.com
Re: fq null pointer exception
Dan, this doesn't really have anything to do with your filter on the Status field except that it causes different documents to be selected. The root cause is a schema mismatch with your index. A string field (or so the schema is saying it's a string field) is returning null for a value, which is impossible (null values aren't stored... they are simply missing). This can happen when the field is actually stored as binary (as is the case for numeric fields). So my guess is that a field that was previously a numeric field is now declared to be of type string by the current schema. You can try varying the fl parameter to see what field is causing the issue, or try luke or the luke request handler for a lower-level view of the index. -Yonik http://www.lucidimagination.com On Fri, Jun 3, 2011 at 11:46 AM, dan whelan d...@adicio.com wrote: I am noticing something strange with our recent upgrade to solr 3.1 and want to see if anyone has experienced anything similar. I have a solr.StrField field named Status the values are Enabled, Disabled, or '' When I facet on that field it I get Enabled 4409565 Disabled 29185 112 The issue is when I do a filter query This query works select/?q=*:*fq=Status:Enabled But when I run this query I get a NPE select/?q=*:*fq=Status:Disabled Here is part of the stack trace Problem accessing /solr/global_accounts/select/. Reason: null java.lang.NullPointerException at org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828) at org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686) at org.apache.solr.schema.StrField.write(StrField.java:49) at org.apache.solr.schema.SchemaField.write(SchemaField.java:125) at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369) at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545) at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482) at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519) at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35) ... Thanks, Dan
Re: Faceting on distance in Solr: how do you generate links that search withing a given range of distance?
On Thu, May 19, 2011 at 6:40 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : It is fairly simple to generate facets for ranges or 'buckets' of : distance in Solr: : http://wiki.apache.org/solr/SpatialSearch#How_to_facet_by_distance. : What isnt described is how to generate the links for these facets any query you specify in a facet.query to generate a constraint count can be specified in an fq to actaully apply that constraint. So if you use... facet.query={!frange l=5.001 u=3000}geodist() Hmmm, seems like we could really do with a geofilt() function that's just like geodist() but with the first parameter being distance so we could avoid calculating the distance for every doc. And of course the new exists() method (it's just internal now, but should be exposed via a function) would be false if the doc was outside of the distance. The best we could do out of the box right now is try to utilize the geofilt query and default it to a high number where it doesn't match: facet.query={!frange l=5.001 u=3000}query($gf,1)gf={!geofilt d=3000} Of course if the lower bound is 0, you can use it directly! facet.query={!geofilt d=3000} -Yonik
Re: Spatial search with SolrJ 3.1 ? How to
On Thu, May 19, 2011 at 8:52 AM, martin_groenhof martin.groen...@yahoo.com wrote: How do you construct a query in java for spatial search ? not the default solr REST interface It depends on what you are trying to do - a spatial request (as currently implemented in Solr) is typically more than just a query... it can be filtering by a bounding box, filtering by a distance radius, or using a distance (geodist) function query in another way such as sorting by it or using it as a factor in relevance. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Facetting: Some questions concerning method:fc
On Thu, May 19, 2011 at 9:56 AM, Erik Fäßler erik.faess...@uni-jena.de wrote: I have a few questions concerning the field cache method for faceting. The wiki says for enum method: This was the default (and only) method for faceting multi-valued fields prior to Solr 1.4. . And for fc method: This was the default method for single valued fields prior to Solr 1.4. . I just ran into the problem of using fc for a field which can have multiple terms for one field. The facet counts would be wrong, seemingly only counting the first term in the field of each document. I observed this in Solr 1.4.1 and in 3.1 with the same index. That doesn't sound right... the results should always be identical between facet.method=fc and facet.method=enum. Are you sure you didn't index a multi-valued field and then change the fieldType in the schema to be single valued? Are you sure the field is indexed the way you think it is? If so, is there an easy way for someone to reproduce what you are seeing? -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Does every Solr request-response require a running server?
On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I'm wondering if Solr Test framework at the end of the day always runs an embedded/jetty server (which is the only way to interact with solr, i.e. no web server -- no solr) or in the tests they interact without one, calling directly the under line methods? The latter seems to be the case trying to understand SolrTestCaseJ4. That would be more white-box than otherwise. Solr does either, depending on the test. Most tests start only an embedded solr server w/ no web server, but others use an embedded jetty server so one can talk HTTP to it. JettySolrRunner is used for the latter. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Does every Solr request-response require a running server?
On Wed, May 18, 2011 at 11:14 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: On Wed, May 18, 2011 at 5:09 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, I'm wondering if Solr Test framework at the end of the day always runs an embedded/jetty server (which is the only way to interact with solr, i.e. no web server -- no solr) or in the tests they interact without one, calling directly the under line methods? The latter seems to be the case trying to understand SolrTestCaseJ4. That would be more white-box than otherwise. Solr does either, depending on the test. Most tests start only an embedded solr server w/ no web server, What is confusing me is the solr server. Is it SolrCore? In what aspects is it a 'server'? In my understanding it's the core of the Solr Web application which makes up the servlets interface, i.e. it's under the servlets not on top of them. Look at TestHarness - it instantiates a CoreContainer. When running as a webapp in a Jetty server, a DispatchFilter is registered that instantiates the CoreContainer. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco but others use an embedded jetty server so one can talk HTTP to it. JettySolrRunner is used for the latter. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: JSON delete error with latest branch_3x
On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote: I updated to the latest branch_3x (r1124339) and I'm now getting the error below when trying a delete by query or id. Adding documents with the new format works as do the commit and optimize commands. Possible regression due to SOLR-2496? curl 'http://localhost:8988/solr/update/json?wt=json' -H 'Content-type:application/json' -d '{delete:{query:*:*}}' Error 400 meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false Problem accessing /solr/update/json. Reason: meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false Hmmm, looks like unit tests must be inadequate for the JSON format. I'll look into it. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: JSON delete error with latest branch_3x
OK, I just fixed this on branch_3x. Trunk is fine (it was an error in the 3x backport that wasn't caught because the test doesn't go through the complete solr stack to the update handkler). -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco On Wed, May 18, 2011 at 1:29 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote: I updated to the latest branch_3x (r1124339) and I'm now getting the error below when trying a delete by query or id. Adding documents with the new format works as do the commit and optimize commands. Possible regression due to SOLR-2496? curl 'http://localhost:8988/solr/update/json?wt=json' -H 'Content-type:application/json' -d '{delete:{query:*:*}}' Error 400 meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false Problem accessing /solr/update/json. Reason: meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false Hmmm, looks like unit tests must be inadequate for the JSON format. I'll look into it. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: filter cache and negative filter query
On Tue, May 17, 2011 at 6:07 PM, Burton-West, Tom tburt...@umich.edu wrote: If I have a query with a filter query such as : q=artfq=history and then run a second query q=artfq=-history, will Solr realize that it can use the cached results of the previous filter query history (in the filter cache) Yep. You should be able to verify with the filterCache section of the stats admin page. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: filter cache and negative filter query
On Tue, May 17, 2011 at 6:17 PM, Markus Jelsma markus.jel...@openindex.io wrote: I'm not sure. The filter cache uses your filter as a key and a negation is a different key. You can check this easily in a controlled environment by issueing these queries and watching the filter cache statistics. Gotta hate crossing emails ;-) Anyway, this goes back to Solr 1.1 5. SOLR-80: Negative queries are now allowed everywhere. Negative queries are generated and cached as their positive counterpart, speeding generation and generally resulting in smaller sets to cache. Set intersections in SolrIndexSearcher are more efficient, starting with the smallest positive set, subtracting all negative sets, then intersecting with all other positive sets. (yonik) -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco If I have a query with a filter query such as : q=artfq=history and then run a second query q=artfq=-history, will Solr realize that it can use the cached results of the previous filter query history (in the filter cache) or will it not realize this and have to actually do a second filter query against the index for not history? Tom
Re: lucene parser, negative OR operands
On Tue, May 17, 2011 at 6:57 PM, Jonathan Rochkind rochk...@jhu.edu wrote: (changed subject for this topic). Weird. I'm seeing it wrong myself, and have for a while -- I even wrote some custom pre-processor logic at my app level to work around it. Weird, I dunno. Wait. Queries with -one OR -two return less documents than a either operand does on its own. This doesn't have to do with Solr's support of pure-negative top-level queries, but does have to do with a long standing confusion of how the lucene queryparser works with some of the operators (i.e. not really boolean logic). In a Lucene BooleanQuery, clauses are mandatory, optional, or prohibited. -foo OR -bar actually parses to a boolean query with two prohibited clauses... essentially the same as -foo AND -bar. You can see this by adding debugQuery=true to the request. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: why query chinese character with bracket become phrase query by default?
On Sun, May 15, 2011 at 1:48 PM, Michael McCandless luc...@mikemccandless.com wrote: Could you please revert your commit, until we've reached some consensus on this discussion first? Huh? I thought everyone was in agreement that we needed more field types for different languages? I added my best guess about what a generic type for non-whitespace-delimited might look like. Since it's a new field type, it doesn't affect anything. Hopefully it only improves the situation for someone trying to use one of these languages. The only negative would seem to be if it's worse than nothing (i.e. a very bad example because it actually doesn't work for non-whitespace-delimited languages). The issue about changing defaults on TextField and changing what text does in the example schema by default is not dependent on this. They are only related by the fact that if another field is added/changed then _nwd may become redundant and can be removed. For now, it only seems like an improvement? Anyway... the whole language of revert seems unnecessarily confrontational. Feel free to improve what's there (or delete *_nwd if people really feel it adds no/negative value) -Yonik
Re: why query chinese character with bracket become phrase query by default?
On Mon, May 16, 2011 at 5:30 AM, Michael McCandless luc...@mikemccandless.com wrote: To be clear, I'm asking that Yonik revert his commit from yesterday (rev 1103444), where he added text_nwd fieldType and dynamic fields *_nwd to the example schema.xml. So... your position is that until the text fieldType is changed to support non-whitespace-delimited languages better, that no other fieldType should be changed/added to better support non-whitespace-delimited languages? Man, that seems political, not technical. Whatever... I'll revert. -Yonik