Hourly Faceting
Hi, I want to facet results on an hourly basis, the following query gives me an hourly breakdown, but with the date part. I want just the hour part across the days. Is there any other way of doing this, 2013-02-01T00:00:00Z-330MINUTES true twitterId:191343557 createdOnGMTDate +1HOUR 2013-02-08T23:59:59Z-330MINUTES 0 Result 0 0 0 0 0 0 Desired Result 0 0 0 0 0 0 Regards, Ayush
Re: Trying to understand soft vs hard commit vs transaction log
A soft commit is just like a hard commit but doesn't do things like resolve deletes or call fsync on all the files that were written to disk. It will flush to disk however. Hard commits are for durability if you are not using the update log. If you are using the update log, hard commits are about flushing the update log to disk (eg keeping update log RAM usage down). Soft commits are more about visibility - because it won't do things like fsync, it won't guarantee that segment that was flushed to disk will survive a hard crash, but it will flush to disk and open a new view on that flushed segment. - Mark On Feb 7, 2013, at 11:29 PM, Alexandre Rafalovitch wrote: > Hello, > > What actually happens when using soft (as opposed to hard) commit? > > I understand somewhat very high-level picture (documents become available > faster, but you may loose them on power loss). > I don't care about low-level implementation details. > > But I am trying to understand what is happening on the medium level of > details. > > For example what are stages of a document if we are using all available > transaction log, soft commit, hard commit options? It feels like there is > three stages: > *) Uncommitted (soft or hard): accessible only via direct real-time get? > *) Soft-committed: accessible through all search operatons? (but not on > disk? but where is it? in memory?) > *) Hard-committed: all the same as soft-committed but it is now on disk > > Similarly, in performance section of Wiki, it says: "A commit (including a > soft commit) will free up almost all heap memory" - why would soft commit > free up heap memory? I thought it was not flushed to disk. > > Also, with soft-commits and transaction log enabled, doesn't transaction > log allows to replay/recover the latest state after crash? I believe that's > what transaction log does for the database. If not, how does one recover, > if at all? > > And where does openSearcher=false fits into that? Does it cause > inconsistent results somehow? > > I am missing something, but I am not sure what or where. Any points in the > right direction would be appreciated. > > Regards, > Alex. > > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Custom update handler
Hi: I'm trying to build a custom update handler to accomplish one specific task. In our app we do query suggestions based on previous queries passed into our frontend app, the thing is that instead of getting this queries from the solr logs, we stored in a separated core. So far so good, but one particular requirement is that not every query typed by the users in the search box appears as a suggestion, only the more popuparls. For this we created a field in the schema called count. And write code in out frontend to increase this value, to be honest we don't like this. So we came up with an idea of writing a custom update handler that before store the query in the index, checks if the query exists and then add 1 to the counter. The thing is that right now we have set up a dedupe component to avoid storing very similar queries, is there any way of accessing the dedupe component from the custom update handler? Is there any documentation I can check out to see anything similar to this? Greetings
Re: Trying to understand soft vs hard commit vs transaction log
If you check the revision history of the wiki page, a Mr. jayqhacker added the quoted statement on November 26, 2012. I don't recognize his "name" as being a "known authority" on anything related to Solr, so maybe his uncorroborated comments should be taken with a grain of salt. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Friday, February 08, 2013 6:11 PM To: solr-user@lucene.apache.org Subject: Re: Trying to understand soft vs hard commit vs transaction log Sorry Shawn, Somehow I am still not quite grasping it. I would really appreciate if somebody (or even you) could have another go at very small part of this. Maybe it will clear it up: Similarly, in performance section of Wiki, it says: "A commit (including a soft commit) will free up almost all heap memory" Why? What is the "hard work" that hard commit does and soft commit does not but still commit to disk. Is it some sort of Lucene segment finalization and new segment creation? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Feb 8, 2013 at 2:57 AM, Shawn Heisey wrote: On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote: Hello, What actually happens when using soft (as opposed to hard) commit? I understand somewhat very high-level picture (documents become available faster, but you may loose them on power loss). I don't care about low-level implementation details. But I am trying to understand what is happening on the medium level of details. For example what are stages of a document if we are using all available transaction log, soft commit, hard commit options? It feels like there is three stages: *) Uncommitted (soft or hard): accessible only via direct real-time get? *) Soft-committed: accessible through all search operatons? (but not on disk? but where is it? in memory?) *) Hard-committed: all the same as soft-committed but it is now on disk Similarly, in performance section of Wiki, it says: "A commit (including a soft commit) will free up almost all heap memory" - why would soft commit free up heap memory? I thought it was not flushed to disk. Also, with soft-commits and transaction log enabled, doesn't transaction log allows to replay/recover the latest state after crash? I believe that's what transaction log does for the database. If not, how does one recover, if at all? And where does openSearcher=false fits into that? Does it cause inconsistent results somehow? I am missing something, but I am not sure what or where. Any points in the right direction would be appreciated. Let's see if I can answer your questions without giving you incorrect information. New indexed content is not searchable until you open a new searcher, regardless of the type of commit that you do. A hard commit will close the current transaction log and start a new one. It will also instruct the Directory implementation to flush to disk. If you specify openSearcher=false, then the content that has just been committed will NOT be searchable, as discussed in the previous paragraph. The existing searcher will remain open and continue to serve queries against the same index data. A soft commit does not flush the new content to disk, but it does open a new searcher. I'm sure that the amount of memory available for caching this content is not large, so it's possible that if you do a lot of indexing with soft commits and your hard commits are too infrequent, you'll end up flushing part of the cached data to disk anyway. I'd love to hear from a committer about this, because I could be wrong. There's a caveat with that 'flush to disk' operation -- the default Directory implementation in the Solr example config, which is NRTCachingDirectoryFactory, will cache the last few megabytes of indexed data and not flush it to disk even with a hard commit. If your commits are small, then the net result is similar to a soft commit. If the server or Solr were to crash, the transaction logs would be replayed on Solr startup, recovering that last few megabytes. The transaction log may also recover documents that were soft committed, but I'm not 100% sure about that. To take full advantage of NRT functionality, you can commit as often as you like with soft commits. On some reasonable interval, say every one to fifteen minutes, you can issue a hard commit with openSearcher set to false, to flush things to disk and cycle through transaction logs before they get huge. Solr will keep a few of the transaction logs around, and if they are huge, it can take a long time to replay them. You'll want to choose a hard commit interval that doesn't create giant transaction logs. If any of the info I've given here is wrong, someone should correct me! Thanks, Shawn
Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)
(Sorry for my split message)... See the text_en_splitting field type for an example: ... -- Jack Krupansky -Original Message- From: Zhang, Lisheng Sent: Friday, February 08, 2013 3:20 PM To: solr-user@lucene.apache.org Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Hi, In our application we need to call method setAutoGeneratePhraseQueries(true) on lucene QueryParser, this is the way used to work in earlier versions and it seems to me that is the much natural way? But in current solr 3.6.1, the only way to do so is to set LUCENE_30 in solrconfig.xml (if I read souce code correctly), but I donot want to do so because this will change the whole behavior of lucene, and I only want to change this query parser behavior, not other lucene features? Please guide me if there is a better way other than to change solr source code? Thanks very much for helps, Lisheng
Re: Trying to understand soft vs hard commit vs transaction log
On 2/8/2013 4:11 PM, Alexandre Rafalovitch wrote: Sorry Shawn, Somehow I am still not quite grasping it. I would really appreciate if somebody (or even you) could have another go at very small part of this. Maybe it will clear it up: Similarly, in performance section of Wiki, it says: "A commit (including a soft commit) will free up almost all heap memory" Why? What is the "hard work" that hard commit does and soft commit does not but still commit to disk. Is it some sort of Lucene segment finalization and new segment creation? I don't know the answers to those questions, except to say that committing to disk involves I/O latency. With standard hard disks, it's a LOT of latency. Thanks, Shawn
Re: Trying to understand soft vs hard commit vs transaction log
Sorry Shawn, Somehow I am still not quite grasping it. I would really appreciate if somebody (or even you) could have another go at very small part of this. Maybe it will clear it up: > Similarly, in performance section of Wiki, it says: "A commit (including a soft commit) will free up almost all heap memory" Why? What is the "hard work" that hard commit does and soft commit does not but still commit to disk. Is it some sort of Lucene segment finalization and new segment creation? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Feb 8, 2013 at 2:57 AM, Shawn Heisey wrote: > On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote: > >> Hello, >> >> What actually happens when using soft (as opposed to hard) commit? >> >> I understand somewhat very high-level picture (documents become available >> faster, but you may loose them on power loss). >> I don't care about low-level implementation details. >> >> But I am trying to understand what is happening on the medium level of >> details. >> >> For example what are stages of a document if we are using all available >> transaction log, soft commit, hard commit options? It feels like there is >> three stages: >> *) Uncommitted (soft or hard): accessible only via direct real-time get? >> *) Soft-committed: accessible through all search operatons? (but not on >> disk? but where is it? in memory?) >> *) Hard-committed: all the same as soft-committed but it is now on disk >> >> Similarly, in performance section of Wiki, it says: "A commit (including >> a >> soft commit) will free up almost all heap memory" - why would soft commit >> free up heap memory? I thought it was not flushed to disk. >> >> Also, with soft-commits and transaction log enabled, doesn't transaction >> log allows to replay/recover the latest state after crash? I believe >> that's >> what transaction log does for the database. If not, how does one recover, >> if at all? >> >> And where does openSearcher=false fits into that? Does it cause >> inconsistent results somehow? >> >> I am missing something, but I am not sure what or where. Any points in the >> right direction would be appreciated. >> > > Let's see if I can answer your questions without giving you incorrect > information. > > New indexed content is not searchable until you open a new searcher, > regardless of the type of commit that you do. > > A hard commit will close the current transaction log and start a new one. > It will also instruct the Directory implementation to flush to disk. If > you specify openSearcher=false, then the content that has just been > committed will NOT be searchable, as discussed in the previous paragraph. > The existing searcher will remain open and continue to serve queries > against the same index data. > > A soft commit does not flush the new content to disk, but it does open a > new searcher. I'm sure that the amount of memory available for caching > this content is not large, so it's possible that if you do a lot of > indexing with soft commits and your hard commits are too infrequent, you'll > end up flushing part of the cached data to disk anyway. I'd love to hear > from a committer about this, because I could be wrong. > > There's a caveat with that 'flush to disk' operation -- the default > Directory implementation in the Solr example config, which is > NRTCachingDirectoryFactory, will cache the last few megabytes of indexed > data and not flush it to disk even with a hard commit. If your commits are > small, then the net result is similar to a soft commit. If the server or > Solr were to crash, the transaction logs would be replayed on Solr startup, > recovering that last few megabytes. The transaction log may also recover > documents that were soft committed, but I'm not 100% sure about that. > > To take full advantage of NRT functionality, you can commit as often as > you like with soft commits. On some reasonable interval, say every one to > fifteen minutes, you can issue a hard commit with openSearcher set to > false, to flush things to disk and cycle through transaction logs before > they get huge. Solr will keep a few of the transaction logs around, and if > they are huge, it can take a long time to replay them. You'll want to > choose a hard commit interval that doesn't create giant transaction logs. > > If any of the info I've given here is wrong, someone should correct me! > > Thanks, > Shawn > >
Re: Can Solr analyze content and find dates and places
Hi Bart, I did some work with UIMA but this was to annotate the data before it goes to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe you will have to set up your own aggregate analysis chain in place of the one currently configured. Writing UIMA annotators is very simple (there is a tutorial here: [http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]). You provide the XML description for the annotation and let UIMA generate the annotation bean. You write Java code for the annotator and also the annotator XML descriptor. UIMA uses the annotator XML descriptor to instantiate and run your annotator. Overall, sounds really complicated but its actually quite simple. The tutorial has quite a few examples that you will find useful, but in case you need more, I have some on this github repository: [https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima] The dictionary and pattern annotators may be similar to what you are looking for (date and city annotators). Best regards, Sujit On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote: > Hi Alex, > > Indeed that is exactly what I am trying to achieve using wordcities. Date > will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do > I integrate the Java library as UIMA? The documentation about changing > schema.xml and solr.xml is not very detailed. > > Regards, Bart > > On 8 Feb 2013, at 16:57, Alexandre Rafalovitch wrote: > >> Hi Bart, >> >> I haven't done any UIMA work (I used other stuff for my NLP phase), so not >> sure I can help much further. But in general, you are venturing into pure >> research territory here. >> >> Even for dates, what do you actually mean? Just fixed expression? Relative >> dates (e.g. last tuesday?). What about times (7pm?). >> >> Same with cities. If you want it offline, you need the gazetteer and >> disambiguation modules. Gazetteer for cities (worldwide) is huge and has a >> lot of duplicate names (Paris, Ontario is apparently a short drive from >> London, Ontario eh?). Something like >> http://www.maxmind.com/en/worldcities? And disambiguation usually >> requires training corpus that is similar to >> what your text will look like. >> >> Online services like OpenCalais are backed by gigantic databases and some >> serious corpus-training Machine Language disambiguation algorithms. >> >> So, no plug-and-play solution here. If you really need to get this done, I >> would recommend narrowing down the specification of exactly what you will >> settle for and looking for software that can do it. Once you have that, >> integration with Solr is your next - and smaller - concern. >> >> Regards, >> Alex. >> >> Personal blog: http://blog.outerthoughts.com/ >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >> - Time is the quality of nature that keeps events from happening all at >> once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) >> >> >> On Fri, Feb 8, 2013 at 10:41 AM, jazz wrote: >> >>> Thanks Alex, >>> >>> I checked the documentation but it seems there is only a webservice >>> (OpenCalais) available to extract dates and places. >>> >>> http://uima.apache.org/sandbox.html >>> >>> Do you know is there is a Solr Compatible UIMA add-on which detects dates >>> and places (cities) without a webservice? If not, how do you write one? >>> >>> Regards, Bart >>> >>> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote: >>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, most probably in Update Request Processor pipeline. Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA You will have to put some serious work into this, it is not all tied together and packaged. Mostly because the Natural Language Processing >>> (the field you are getting into) is kind of messy all of its own. Good luck, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Feb 8, 2013 at 9:24 AM, jazz wrote: > Hi, > > I want to know if Solr can analyze text and recoginze dates and places. >>> If > yes, is it then possible to create new dynamic fields with these dates >>> and > places (e.g. city). > > Thanks, Bart > >>> >>>
RE: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)
Thanks very much for your valuable help, it worked perfectly !!! Lisheng -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, February 08, 2013 12:54 PM To: solr-user@lucene.apache.org Subject: Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Simply add the "autoGeneratePhraseQueries" attribute with a value of "true" to all of your "text" field types in your schema.xml. See the text_en_splitting field type for an example: ... -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Friday, February 08, 2013 3:51 PM To: solr-user@lucene.apache.org Subject: Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Simply add the "autoGeneratePhraseQueries" attribute with a value of "true" to all of your "text" field types in your schema.xml. See the text_ -- Jack Krupansky -Original Message- From: Zhang, Lisheng Sent: Friday, February 08, 2013 3:20 PM To: solr-user@lucene.apache.org Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Hi, In our application we need to call method setAutoGeneratePhraseQueries(true) on lucene QueryParser, this is the way used to work in earlier versions and it seems to me that is the much natural way? But in current solr 3.6.1, the only way to do so is to set LUCENE_30 in solrconfig.xml (if I read souce code correctly), but I donot want to do so because this will change the whole behavior of lucene, and I only want to change this query parser behavior, not other lucene features? Please guide me if there is a better way other than to change solr source code? Thanks very much for helps, Lisheng
Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)
Simply add the "autoGeneratePhraseQueries" attribute with a value of "true" to all of your "text" field types in your schema.xml. See the text_en_splitting field type for an example: positionIncrementGap="100" autoGeneratePhraseQueries="true"> ... -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Friday, February 08, 2013 3:51 PM To: solr-user@lucene.apache.org Subject: Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Simply add the "autoGeneratePhraseQueries" attribute with a value of "true" to all of your "text" field types in your schema.xml. See the text_ -- Jack Krupansky -Original Message- From: Zhang, Lisheng Sent: Friday, February 08, 2013 3:20 PM To: solr-user@lucene.apache.org Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Hi, In our application we need to call method setAutoGeneratePhraseQueries(true) on lucene QueryParser, this is the way used to work in earlier versions and it seems to me that is the much natural way? But in current solr 3.6.1, the only way to do so is to set LUCENE_30 in solrconfig.xml (if I read souce code correctly), but I donot want to do so because this will change the whole behavior of lucene, and I only want to change this query parser behavior, not other lucene features? Please guide me if there is a better way other than to change solr source code? Thanks very much for helps, Lisheng
Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)
Simply add the "autoGeneratePhraseQueries" attribute with a value of "true" to all of your "text" field types in your schema.xml. See the text_ -- Jack Krupansky -Original Message- From: Zhang, Lisheng Sent: Friday, February 08, 2013 3:20 PM To: solr-user@lucene.apache.org Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Hi, In our application we need to call method setAutoGeneratePhraseQueries(true) on lucene QueryParser, this is the way used to work in earlier versions and it seems to me that is the much natural way? But in current solr 3.6.1, the only way to do so is to set LUCENE_30 in solrconfig.xml (if I read souce code correctly), but I donot want to do so because this will change the whole behavior of lucene, and I only want to change this query parser behavior, not other lucene features? Please guide me if there is a better way other than to change solr source code? Thanks very much for helps, Lisheng
Solr query parser, needs to call setAutoGeneratePhraseQueries(true)
Hi, In our application we need to call method setAutoGeneratePhraseQueries(true) on lucene QueryParser, this is the way used to work in earlier versions and it seems to me that is the much natural way? But in current solr 3.6.1, the only way to do so is to set LUCENE_30 in solrconfig.xml (if I read souce code correctly), but I donot want to do so because this will change the whole behavior of lucene, and I only want to change this query parser behavior, not other lucene features? Please guide me if there is a better way other than to change solr source code? Thanks very much for helps, Lisheng
RE: Can Solr analyze content and find dates and places
Bart, For Apache Nutch we built a date extractor that relies on some regular expressions to extract sequences that resemble dates and pass the extracted candidates through a list of Java date formats together with the identified language (DateFormat is locale aware). With it we can extract many exotic dates from arbitrary text in many languages. An older but working patch with example date formats and regular expressions exists for Apache Nutch. The relevant parts of the code should be easy to implement in your application if you're using Java. https://issues.apache.org/jira/browse/NUTCH-1414 If you're doing multiple languages locale information is very imporant. That counts for an UIMA annotator as well. Cheers, Markus -Original message- > From:Bart Rijpers > Sent: Fri 08-Feb-2013 17:51 > To: solr-user@lucene.apache.org > Subject: Re: Can Solr analyze content and find dates and places > > Hi Alex, > > Indeed that is exactly what I am trying to achieve using wordcities. Date > will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do > I integrate the Java library as UIMA? The documentation about changing > schema.xml and solr.xml is not very detailed. > > Regards, Bart > > On 8 Feb 2013, at 16:57, Alexandre Rafalovitch wrote: > > > Hi Bart, > > > > I haven't done any UIMA work (I used other stuff for my NLP phase), so not > > sure I can help much further. But in general, you are venturing into pure > > research territory here. > > > > Even for dates, what do you actually mean? Just fixed expression? Relative > > dates (e.g. last tuesday?). What about times (7pm?). > > > > Same with cities. If you want it offline, you need the gazetteer and > > disambiguation modules. Gazetteer for cities (worldwide) is huge and has a > > lot of duplicate names (Paris, Ontario is apparently a short drive from > > London, Ontario eh?). Something like > > http://www.maxmind.com/en/worldcities? And disambiguation usually > > requires training corpus that is similar to > > what your text will look like. > > > > Online services like OpenCalais are backed by gigantic databases and some > > serious corpus-training Machine Language disambiguation algorithms. > > > > So, no plug-and-play solution here. If you really need to get this done, I > > would recommend narrowing down the specification of exactly what you will > > settle for and looking for software that can do it. Once you have that, > > integration with Solr is your next - and smaller - concern. > > > > Regards, > > Alex. > > > > Personal blog: http://blog.outerthoughts.com/ > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > > - Time is the quality of nature that keeps events from happening all at > > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > > > > On Fri, Feb 8, 2013 at 10:41 AM, jazz wrote: > > > >> Thanks Alex, > >> > >> I checked the documentation but it seems there is only a webservice > >> (OpenCalais) available to extract dates and places. > >> > >> http://uima.apache.org/sandbox.html > >> > >> Do you know is there is a Solr Compatible UIMA add-on which detects dates > >> and places (cities) without a webservice? If not, how do you write one? > >> > >> Regards, Bart > >> > >> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote: > >> > >>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, most > >>> probably in Update Request Processor pipeline. > >>> > >>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA > >>> > >>> You will have to put some serious work into this, it is not all tied > >>> together and packaged. Mostly because the Natural Language Processing > >> (the > >>> field you are getting into) is kind of messy all of its own. > >>> > >>> Good luck, > >>> Alex. > >>> > >>> Personal blog: http://blog.outerthoughts.com/ > >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > >>> - Time is the quality of nature that keeps events from happening all at > >>> once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > >>> > >>> > >>> On Fri, Feb 8, 2013 at 9:24 AM, jazz wrote: > >>> > Hi, > > I want to know if Solr can analyze text and recoginze dates and places. > >> If > yes, is it then possible to create new dynamic fields with these dates > >> and > places (e.g. city). > > Thanks, Bart > > >> > >> >
Global .properties file for all Solr cores?
I've read the documentation about how you can configure a Solr core with a properties file. Is there any way to specify a properties file that will apply to all cores running on a server? Here's my scenario. I have a solr setup where I have two cores, "foo" and "bar". I want to enable replication using properties, as is suggested on the wiki. http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node I would like my master/slave settings to apply to all cores on a box, but I would still like to have separate solrcore.properties files so that other properties can be set per core. In other words, I would like a setup like this, with three files. #solr.properties # These properties should apply to all cores on a box enable.master=true enable.slave=false #foo.solrcore.properties # These properties only apply to core foo filterCache.size=16384 #bar.solrcore.properties # These properties only apply to core bar filterCache.size=2048 What I'm trying to avoid is having to duplicate the global values across all solrcore.properties files. I've looked into having a .properties file that applies to the whole context, but we are running Tomcat, which does not make this easy. It seems the only way to do this with Tomcat is with the CATALINA_OPTS environment variable, and I would rather duplicate values across solrcore.properties files than use CATALINA_OPTS. - Hayden
Change client to http1.1
good day, using protocol http client that supports http 1.1 y change the HttpBase property useHttp11 to true but capturing packets still shows as a 1.0 request. This seems to be affecting my crawling. -- View this message in context: http://lucene.472066.n3.nabble.com/Change-client-to-http1-1-tp4039279.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can Solr analyze content and find dates and places
Hi Alex, Indeed that is exactly what I am trying to achieve using wordcities. Date will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do I integrate the Java library as UIMA? The documentation about changing schema.xml and solr.xml is not very detailed. Regards, Bart On 8 Feb 2013, at 16:57, Alexandre Rafalovitch wrote: > Hi Bart, > > I haven't done any UIMA work (I used other stuff for my NLP phase), so not > sure I can help much further. But in general, you are venturing into pure > research territory here. > > Even for dates, what do you actually mean? Just fixed expression? Relative > dates (e.g. last tuesday?). What about times (7pm?). > > Same with cities. If you want it offline, you need the gazetteer and > disambiguation modules. Gazetteer for cities (worldwide) is huge and has a > lot of duplicate names (Paris, Ontario is apparently a short drive from > London, Ontario eh?). Something like > http://www.maxmind.com/en/worldcities? And disambiguation usually > requires training corpus that is similar to > what your text will look like. > > Online services like OpenCalais are backed by gigantic databases and some > serious corpus-training Machine Language disambiguation algorithms. > > So, no plug-and-play solution here. If you really need to get this done, I > would recommend narrowing down the specification of exactly what you will > settle for and looking for software that can do it. Once you have that, > integration with Solr is your next - and smaller - concern. > > Regards, > Alex. > > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > On Fri, Feb 8, 2013 at 10:41 AM, jazz wrote: > >> Thanks Alex, >> >> I checked the documentation but it seems there is only a webservice >> (OpenCalais) available to extract dates and places. >> >> http://uima.apache.org/sandbox.html >> >> Do you know is there is a Solr Compatible UIMA add-on which detects dates >> and places (cities) without a webservice? If not, how do you write one? >> >> Regards, Bart >> >> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote: >> >>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, most >>> probably in Update Request Processor pipeline. >>> >>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA >>> >>> You will have to put some serious work into this, it is not all tied >>> together and packaged. Mostly because the Natural Language Processing >> (the >>> field you are getting into) is kind of messy all of its own. >>> >>> Good luck, >>> Alex. >>> >>> Personal blog: http://blog.outerthoughts.com/ >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >>> - Time is the quality of nature that keeps events from happening all at >>> once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) >>> >>> >>> On Fri, Feb 8, 2013 at 9:24 AM, jazz wrote: >>> Hi, I want to know if Solr can analyze text and recoginze dates and places. >> If yes, is it then possible to create new dynamic fields with these dates >> and places (e.g. city). Thanks, Bart >> >>
Re: Can Solr analyze content and find dates and places
Hi Bart, I haven't done any UIMA work (I used other stuff for my NLP phase), so not sure I can help much further. But in general, you are venturing into pure research territory here. Even for dates, what do you actually mean? Just fixed expression? Relative dates (e.g. last tuesday?). What about times (7pm?). Same with cities. If you want it offline, you need the gazetteer and disambiguation modules. Gazetteer for cities (worldwide) is huge and has a lot of duplicate names (Paris, Ontario is apparently a short drive from London, Ontario eh?). Something like http://www.maxmind.com/en/worldcities? And disambiguation usually requires training corpus that is similar to what your text will look like. Online services like OpenCalais are backed by gigantic databases and some serious corpus-training Machine Language disambiguation algorithms. So, no plug-and-play solution here. If you really need to get this done, I would recommend narrowing down the specification of exactly what you will settle for and looking for software that can do it. Once you have that, integration with Solr is your next - and smaller - concern. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Feb 8, 2013 at 10:41 AM, jazz wrote: > Thanks Alex, > > I checked the documentation but it seems there is only a webservice > (OpenCalais) available to extract dates and places. > > http://uima.apache.org/sandbox.html > > Do you know is there is a Solr Compatible UIMA add-on which detects dates > and places (cities) without a webservice? If not, how do you write one? > > Regards, Bart > > On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote: > > > Yes, it is possible. You are looking at UIMA or OpenNLP integration, most > > probably in Update Request Processor pipeline. > > > > Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA > > > > You will have to put some serious work into this, it is not all tied > > together and packaged. Mostly because the Natural Language Processing > (the > > field you are getting into) is kind of messy all of its own. > > > > Good luck, > >Alex. > > > > Personal blog: http://blog.outerthoughts.com/ > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > > - Time is the quality of nature that keeps events from happening all at > > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > > > > On Fri, Feb 8, 2013 at 9:24 AM, jazz wrote: > > > >> Hi, > >> > >> I want to know if Solr can analyze text and recoginze dates and places. > If > >> yes, is it then possible to create new dynamic fields with these dates > and > >> places (e.g. city). > >> > >> Thanks, Bart > >> > >
Re: Can Solr analyze content and find dates and places
Thanks Alex, I checked the documentation but it seems there is only a webservice (OpenCalais) available to extract dates and places. http://uima.apache.org/sandbox.html Do you know is there is a Solr Compatible UIMA add-on which detects dates and places (cities) without a webservice? If not, how do you write one? Regards, Bart On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote: > Yes, it is possible. You are looking at UIMA or OpenNLP integration, most > probably in Update Request Processor pipeline. > > Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA > > You will have to put some serious work into this, it is not all tied > together and packaged. Mostly because the Natural Language Processing (the > field you are getting into) is kind of messy all of its own. > > Good luck, >Alex. > > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > On Fri, Feb 8, 2013 at 9:24 AM, jazz wrote: > >> Hi, >> >> I want to know if Solr can analyze text and recoginze dates and places. If >> yes, is it then possible to create new dynamic fields with these dates and >> places (e.g. city). >> >> Thanks, Bart >>
copy Field / postprocess Fields after analyze / dynamic analyzer config
I there a way to postprocess a field after analyze? Saying postprocess I think of renaming, moving or appending fields. Some more information: My schema.xml contains several language suffixed fields (nouns_de, ...). Each of these is analyzed in a language dependent way: When I do a facted search I have to include every field_lang combination since I do not know the language at query time: http://localhost:8983/solr/master/select?q=*:*&rows=0&facet=true&facet.field=nouns_de&facet.field=nouns_en&facet.field=nouns_fr&facet.field=nouns_nl ... So I have to merge all terms in my own business logic :-( Any idea / pointer to rename fields after analyze? This post says it's not possible with the current API: http://lucene.472066.n3.nabble.com/copyField-after-analyzer-td3900337.html Another approach would be to allow analyzer configuration depending on another field value (language). regards, Kai Gülzau
Re: Trying to understand soft vs hard commit vs transaction log
On 2/8/2013 3:12 AM, Isaac Hebsh wrote: Shawn, what about 'flush to disk' behaviour on MMapDirectoryFactory? MMapDirectoryFactory should flush everything to disk on a hard commit and not keep anything in RAM. I *think* that soft commits still end up in RAM with this implementation, but you'll want to wait for someone who actually knows to confirm or deny that. Just FYI, NRTCachingDirectoryFactory is a wrapper class - implementing caching functionality and using MMapDirectoryFactory to actually contact the disk. If indexing and/or startup performance concerns have led you to turn off the updateLog, MMapDirectoryFactory is the correct implementation to use. Using the NRT default without the updateLog will lead to data loss if anything crashes. Thanks, Shawn
Re: Can Solr analyze content and find dates and places
Yes, it is possible. You are looking at UIMA or OpenNLP integration, most probably in Update Request Processor pipeline. Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA You will have to put some serious work into this, it is not all tied together and packaged. Mostly because the Natural Language Processing (the field you are getting into) is kind of messy all of its own. Good luck, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Feb 8, 2013 at 9:24 AM, jazz wrote: > Hi, > > I want to know if Solr can analyze text and recoginze dates and places. If > yes, is it then possible to create new dynamic fields with these dates and > places (e.g. city). > > Thanks, Bart >
RE: which analyzer is used for facet.query?
> So it seems that facet.query is using the analyzer of type index. > Is it a bug or is there another analyzer type for the facet query? Nobody? Should I file a bug? Kai -Original Message- From: Kai Gülzau [mailto:kguel...@novomind.com] Sent: Tuesday, February 05, 2013 2:31 PM To: solr-user@lucene.apache.org Subject: which analyzer is used for facet.query? Hi all, which analyzer is used for the facet.query? This is my schema.xml: ... When doing a faceting search like: http://localhost:8983/solr/slave/select?q=*:*&fq=type:7&rows=0&wt=json&indent=true&facet=true&facet.query=albody_de:Klaus The UIMA whitespace tokenizer logs some infos: Feb 05, 2013 2:23:06 PM WhitespaceTokenizer process Information: "Whitespace tokenizer starts processing" Feb 05, 2013 2:23:06 PM WhitespaceTokenizer process Information: "Whitespace tokenizer finished processing" So it seems that facet.query is using the analyzer of type index. Is it a bug or is there another analyzer type for the facet query? Regards, Kai Gülzau
Can Solr analyze content and find dates and places
Hi, I want to know if Solr can analyze text and recoginze dates and places. If yes, is it then possible to create new dynamic fields with these dates and places (e.g. city). Thanks, Bart
Re: solr file based spell suggestions
hi thanks I configured that using synonym mapping its now giving sII results on searching for s2. thanks regards Rohan On Thu, Feb 7, 2013 at 7:15 PM, Jack Krupansky wrote: > Changing "x" to "y" (e.g., "s2" to "sII") is not a function of "spell > check" or "suggestion". > > Synonyms are a closer match, but can be difficult to configure properly. > Good luck. > > You may be better off preprocessing the query at the application level and > then generating the appropriate boolean logic, such as: "(s2 OR sII)". > > -- Jack Krupansky > > -Original Message- From: Rohan Thakur > Sent: Thursday, February 07, 2013 8:24 AM > To: solr-user@lucene.apache.org > Subject: solr file based spell suggestions > > > hi all > > I wanted to know how can I apply file based dictionary for spell > suggestions such that if I search for s2 in the query it would take it as > sII which also represent same thing in my indexed field...but as in search > it can also be interpreted as s2 please help anyone... > > thanks > regards > Rohan >
ExtractingRequestHandler literals
Hi, I am trying to index some documents using ExtractingRequestHandler and tika. Solr 3.6 I would like to add some extra data coming from a different source using literal. My schema contains these fields My url http://dzoagent001:8080/solr/document/update/extract?commit=true&stream.file=//DZOAGENT001/ShareFolder/file.txt&literal.DocumentID=125 IT looks like literals are not working properly . Any idea? *Error* SEVERE: org.apache.solr.common.SolrException: [doc=null] missing required field: DocumentID at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:355) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:141) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:146) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:236) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:865) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555) at java.lang.Thread.run(Thread.java:679) Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/ExtractingRequestHandler-literals-tp4039222.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud new zookeper node on different ip/ replicate between two clasters
I dont think its so simple. First I need to have at last 3 zoo to keep failver for one server. Second after one zoo die, i need restart of all solrs. Maybe i define simply question. Two data centers. How to replicate two solr claster between two datacenters? In no SolrClaud there is repeater, if i connect all SolrCloud nodes in one claster between dc i will make lots of trafiick between them. I dont mention about that i will get leader elected in wrong datacanter eventually. How can i have two claster of solr and replicate them between two datacenters? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-new-zookeper-node-on-different-ip-replicate-between-two-clasters-tp4039101p4039217.html Sent from the Solr - User mailing list archive at Nabble.com.
unable to get the same results for ab and a b due to whitespace in solr
Hi All, I have a synonyms.txt file where ab should give same result as a b . but when i search ab it gives 4 results and a b gives 104 results. I tried giving a+b but i donno how to give the + throug schema.xml . Please help. I tried giving expand= false and expand =false when doing indexing but it is the same. i have gone through the link http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46 but it didnt help me -- View this message in context: http://lucene.472066.n3.nabble.com/unable-to-get-the-same-results-for-ab-and-a-b-due-to-whitespace-in-solr-tp4039212.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Trying to understand soft vs hard commit vs transaction log
Shawn, what about 'flush to disk' behaviour on MMapDirectoryFactory? On Fri, Feb 8, 2013 at 11:12 AM, Prakhar Birla wrote: > Great explanation Shawn! BTW soft commited documents will be not be > recovered on JVM crash. > > On 8 February 2013 13:27, Shawn Heisey wrote: > > > On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote: > > > >> Hello, > >> > >> What actually happens when using soft (as opposed to hard) commit? > >> > >> I understand somewhat very high-level picture (documents become > available > >> faster, but you may loose them on power loss). > >> I don't care about low-level implementation details. > >> > >> But I am trying to understand what is happening on the medium level of > >> details. > >> > >> For example what are stages of a document if we are using all available > >> transaction log, soft commit, hard commit options? It feels like there > is > >> three stages: > >> *) Uncommitted (soft or hard): accessible only via direct real-time get? > >> *) Soft-committed: accessible through all search operatons? (but not on > >> disk? but where is it? in memory?) > >> *) Hard-committed: all the same as soft-committed but it is now on disk > >> > >> Similarly, in performance section of Wiki, it says: "A commit > (including > >> a > >> soft commit) will free up almost all heap memory" - why would soft > commit > >> free up heap memory? I thought it was not flushed to disk. > >> > >> Also, with soft-commits and transaction log enabled, doesn't transaction > >> log allows to replay/recover the latest state after crash? I believe > >> that's > >> what transaction log does for the database. If not, how does one > recover, > >> if at all? > >> > >> And where does openSearcher=false fits into that? Does it cause > >> inconsistent results somehow? > >> > >> I am missing something, but I am not sure what or where. Any points in > the > >> right direction would be appreciated. > >> > > > > Let's see if I can answer your questions without giving you incorrect > > information. > > > > New indexed content is not searchable until you open a new searcher, > > regardless of the type of commit that you do. > > > > A hard commit will close the current transaction log and start a new one. > > It will also instruct the Directory implementation to flush to disk. If > > you specify openSearcher=false, then the content that has just been > > committed will NOT be searchable, as discussed in the previous paragraph. > > The existing searcher will remain open and continue to serve queries > > against the same index data. > > > > A soft commit does not flush the new content to disk, but it does open a > > new searcher. I'm sure that the amount of memory available for caching > > this content is not large, so it's possible that if you do a lot of > > indexing with soft commits and your hard commits are too infrequent, > you'll > > end up flushing part of the cached data to disk anyway. I'd love to hear > > from a committer about this, because I could be wrong. > > > > There's a caveat with that 'flush to disk' operation -- the default > > Directory implementation in the Solr example config, which is > > NRTCachingDirectoryFactory, will cache the last few megabytes of indexed > > data and not flush it to disk even with a hard commit. If your commits > are > > small, then the net result is similar to a soft commit. If the server or > > Solr were to crash, the transaction logs would be replayed on Solr > startup, > > recovering that last few megabytes. The transaction log may also recover > > documents that were soft committed, but I'm not 100% sure about that. > > > > To take full advantage of NRT functionality, you can commit as often as > > you like with soft commits. On some reasonable interval, say every one > to > > fifteen minutes, you can issue a hard commit with openSearcher set to > > false, to flush things to disk and cycle through transaction logs before > > they get huge. Solr will keep a few of the transaction logs around, and > if > > they are huge, it can take a long time to replay them. You'll want to > > choose a hard commit interval that doesn't create giant transaction logs. > > > > If any of the info I've given here is wrong, someone should correct me! > > > > Thanks, > > Shawn > > > > > > > -- > Regards, > Prakhar Birla > +91 9739868086 >
Re: how-to configure mysql pool connection on Solr Server
Thanks for help It's a good idea configure datasource pool on Jetty or Tomcat and after reuse on my custom plugin. In this page of Jetty: http://docs.codehaus.org/display/JETTY/DataSource+Examples explain how-to configure differents datasources. thanks again. El 07/02/2013 17:35, Michael Della Bitta escribió: Hello Miguel, If you set up a JNDI datasource in your servlet container, you can use that as your database config. Then you just need to use a pooling datasource: http://wiki.apache.org/solr/DataImportHandlerFaq#How_do_I_use_a_JNDI_DataSource.3F http://dev.mysql.com/tech-resources/articles/connection_pooling_with_connectorj.html Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Feb 7, 2013 at 7:20 AM, Miguel wrote: Hi I need configure a mysql pool connection on Solr Server for using on custom plugin. I saw DataImportHandler wiki: http://wiki.apache.org/solr/DataImportHandler , but it's seems that DataImportHandler open the connection when handler is calling and close when finish import and I need keep opening pool to reuse connections whenever I need them. I not found on documentation of Apache Solr how-to define a pools connection to DB for reusing them on whatever class of solr. Any ideas? thanks
Re: Trying to understand soft vs hard commit vs transaction log
Great explanation Shawn! BTW soft commited documents will be not be recovered on JVM crash. On 8 February 2013 13:27, Shawn Heisey wrote: > On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote: > >> Hello, >> >> What actually happens when using soft (as opposed to hard) commit? >> >> I understand somewhat very high-level picture (documents become available >> faster, but you may loose them on power loss). >> I don't care about low-level implementation details. >> >> But I am trying to understand what is happening on the medium level of >> details. >> >> For example what are stages of a document if we are using all available >> transaction log, soft commit, hard commit options? It feels like there is >> three stages: >> *) Uncommitted (soft or hard): accessible only via direct real-time get? >> *) Soft-committed: accessible through all search operatons? (but not on >> disk? but where is it? in memory?) >> *) Hard-committed: all the same as soft-committed but it is now on disk >> >> Similarly, in performance section of Wiki, it says: "A commit (including >> a >> soft commit) will free up almost all heap memory" - why would soft commit >> free up heap memory? I thought it was not flushed to disk. >> >> Also, with soft-commits and transaction log enabled, doesn't transaction >> log allows to replay/recover the latest state after crash? I believe >> that's >> what transaction log does for the database. If not, how does one recover, >> if at all? >> >> And where does openSearcher=false fits into that? Does it cause >> inconsistent results somehow? >> >> I am missing something, but I am not sure what or where. Any points in the >> right direction would be appreciated. >> > > Let's see if I can answer your questions without giving you incorrect > information. > > New indexed content is not searchable until you open a new searcher, > regardless of the type of commit that you do. > > A hard commit will close the current transaction log and start a new one. > It will also instruct the Directory implementation to flush to disk. If > you specify openSearcher=false, then the content that has just been > committed will NOT be searchable, as discussed in the previous paragraph. > The existing searcher will remain open and continue to serve queries > against the same index data. > > A soft commit does not flush the new content to disk, but it does open a > new searcher. I'm sure that the amount of memory available for caching > this content is not large, so it's possible that if you do a lot of > indexing with soft commits and your hard commits are too infrequent, you'll > end up flushing part of the cached data to disk anyway. I'd love to hear > from a committer about this, because I could be wrong. > > There's a caveat with that 'flush to disk' operation -- the default > Directory implementation in the Solr example config, which is > NRTCachingDirectoryFactory, will cache the last few megabytes of indexed > data and not flush it to disk even with a hard commit. If your commits are > small, then the net result is similar to a soft commit. If the server or > Solr were to crash, the transaction logs would be replayed on Solr startup, > recovering that last few megabytes. The transaction log may also recover > documents that were soft committed, but I'm not 100% sure about that. > > To take full advantage of NRT functionality, you can commit as often as > you like with soft commits. On some reasonable interval, say every one to > fifteen minutes, you can issue a hard commit with openSearcher set to > false, to flush things to disk and cycle through transaction logs before > they get huge. Solr will keep a few of the transaction logs around, and if > they are huge, it can take a long time to replay them. You'll want to > choose a hard commit interval that doesn't create giant transaction logs. > > If any of the info I've given here is wrong, someone should correct me! > > Thanks, > Shawn > > -- Regards, Prakhar Birla +91 9739868086
Re: Updating data
i have question what if id not exits in previous data ? ? like [ { "id":"6", "is_good":{"add":"1"} } ] -- View this message in context: http://lucene.472066.n3.nabble.com/Updating-data-tp4038492p4039190.html Sent from the Solr - User mailing list archive at Nabble.com.