Solr Search fails
Hi all. have been trying to implement a universal search on a field but somehow it fails... when I make a full import everything is ok I can see the indexed field. But when i make a query like universal:Male it shows no match any ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907093.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is it possible to use sub-fields or multivalued fields for boosting?
it seems like I will use dismax... I have tried some other ways but dismax seems the best :) -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-sub-fields-or-multivalued-fields-for-boosting-tp2901992p2907094.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Search fails
What is your field type and analysis chain - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907097.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Search fails
type is string and i use standard analyzer ( i am not sure what you mean by the word chain ) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907104.html Sent from the Solr - User mailing list archive at Nabble.com.
Use Solr / Lucene to search in a Logfile
Hello, we want to search large log4j logfiles withSolr / Lucene and find any lines with a special argument. ( for examle: with a defined userid ) It should be a solr indput document for each row in the file. The log file is growing continuously and the search index must be refreshed. Has already anybidy implemented something like this and can give me a tip ? Greetings, Robert
How to convert date/timestamp to long in data-config.xml
SOLR : 1.4.1 There are 1,300,000+ documents in the index. Sorting on a date field with timestamp leads to OutOfMemoryError. So, we are looking for a way to copy the timestamp as a long value to a field and sort based on that field. Can any one help me on how to convert the timestamp to a long value in data-config.xml? Is there any existing transformer? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-convert-date-timestamp-to-long-in-data-config-xml-tp2907125p2907125.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Search fails
If its type is string then you can search for exact text only not for any part of string and also case sensitive - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907148.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Search fails
well i have already done.. both the exact text and also some word in the whole text... nothing changes... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907157.html Sent from the Solr - User mailing list archive at Nabble.com.
How can i use Solr based Search Engine for My University?
I am a student at http://jmi.ac.in/index.htm Jamia Millia Islamia , a central univeristy in India. I want to use my search engine for the benefit of students. The university has course like undergraduate,graduate,phd etc inlcuding Engineering . Earlier one of my teacher suggested developing Intranet Search ( for Lan) , but i am not able to figure it out as to how to implement it. My university uses Google as its own site search tool. I am in Engg department and i see students( including me ) using Xerox, Previous year papers , Notes etc during exam time. People use internet or say google to learn if any topics is not inlucded in book. Please give some valuable suggestions. Thanks - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can i use Solr based Search Engine for My University?
Have you looked at Nutch? Or any other web-harvester? That seems to be closest. paul Le 6 mai 2011 à 10:01, Anurag a écrit : I am a student at http://jmi.ac.in/index.htm Jamia Millia Islamia , a central univeristy in India. I want to use my search engine for the benefit of students. The university has course like undergraduate,graduate,phd etc inlcuding Engineering . Earlier one of my teacher suggested developing Intranet Search ( for Lan) , but i am not able to figure it out as to how to implement it. My university uses Google as its own site search tool. I am in Engg department and i see students( including me ) using Xerox, Previous year papers , Notes etc during exam time. People use internet or say google to learn if any topics is not inlucded in book. Please give some valuable suggestions. Thanks - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Search fails
provide your schema for more detail about your problem - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907192.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can i use Solr based Search Engine for My University?
Use Nutch for your Intranet crawling.For more detail http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907200.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to convert date/timestamp to long in data-config.xml
HI, You can convert time stamp to long by writing a custom transformer. But how will it help for OutOfMemory error.Because any sorting will use lucene field cache which will take a lot of memory as you have huge data. If you can then buy more RAM for your server. - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-convert-date-timestamp-to-long-in-data-config-xml-tp2907125p2907229.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can i use Solr based Search Engine for My University?
In my search engine Nutch and Solr have been integrated both. Also i am impplemented autocraling process. Whenever any one puts a http link in a given box and then submit it, the http site address gets automatically crawled and Indexes to solr... On Fri, May 6, 2011 at 2:02 PM, Grijesh [via Lucene] ml-node+2907200-1529372386-146...@n3.nabble.com wrote: Use Nutch for your Intranet crawling.For more detail http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ Thanx: Grijesh www.gettinhahead.co.in http://www.gettingahead.co.in -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907200.html To unsubscribe from How can i use Solr based Search Engine for My University?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2907168code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXwyOTA3MTY4fC0yMDk4MzQ0MTk2. -- Kumar Anurag - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907483.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:
--- On Fri, 5/6/11, Rohit ro...@in-rev.com wrote: From: Rohit ro...@in-rev.com Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date String: To: solr-user@lucene.apache.org Date: Friday, May 6, 2011, 8:47 AM Hi Craig, Thanks for the response, actually what we need to achive is see group by results based on dates like, 2011-01-01 23 2011-01-02 14 2011-01-03 40 2011-01-04 10 Now the records in my table run into millions, grouping the result based on UTC date would not produce the right result since the result should be grouped on users timezone. Is there anyway we can achieve this in Solr? Easiest way can be create additional string typed field, and use copyField to populate it. (copy first 10 characters from (t)date into string) And facet on that string field. facet=onfacet.field=SDATE field name=DATE type=tdate indexed=true stored=true/ field name=SDATE type=string indexed=true stored=true/ copyField source=DATE dest=SDATE maxChars=10/
uima fieldMappings and solr dynamicField
Hello, I'd like to use dynamicField in feature-field mapping of uima update processor. It doesn't seem to be acceptable currently. Is it a bad idea in terms of use of uima? If it is not so bad, I'd like to try a patch. Background: Because my uima annotator can generate many types of named entity from a text, I don't want to implement so many types, but one type NamedEntity: typeSystemDescription types typeDescription namecom.rondhuit.uima.next.NamedEntity/name description/ supertypeNameuima.tcas.Annotation/supertypeName features featureDescription namename/name description/ rangeTypeNameuima.cas.String/rangeTypeName /featureDescription featureDescription nameentity/name description/ rangeTypeNameuima.cas.String/rangeTypeName /featureDescription /features /typeDescription /types /typeSystemDescription sample extracted named entities: name=PERSON, entity=Barack Obama name=TITLE, entity=the President Now, I'd like to map these named entities to Solr fields like this: PERSON_S:Barack Obama TITLE_S:the President Because the type of name (PERSON, TITLE, etc.) can be so many, I'd like to use dynamicField *_s. And where * is replaced by the name feature of NamedEntity. I think this is natural requirement from Solr view point, but I'm not sure my uima annotator implementation is correct or not. In other words, should I implement many types for each entity types? (e.g. PersonEntity, TitleEntity, ... instead of NamedEntity) Thank you! Koji -- http://www.rondhuit.com/en/
Re: Thoughts on Search Analytics?
1. Reports based on Location. Group by City / Country 2. Total search performed per hour / week / month 3. Frequently used search keywords 4. Analytics based on search keywords. Regards Aditya www.findbestopensource.com On Fri, May 6, 2011 at 3:55 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I'd like to solicit your thoughts about Search Analytics if you are doing any sort of analysis/reporting of search logs or click stream or anything related. * Which information or reports do you find the most useful and why? * Which reports would you like to have, but don't have for whatever reason (don't have the needed data, or it's too hard to produce such reports, or ...) * Which tool(s) or service(s) do you use and find the most useful? I'm preparing a presentation on the topic of Search Analytics, so I'm trying to solicit opinions, practices, desires, etc. on this topic. Your thoughts would be greatly appreciated. If you could reply directly, that would be great, since this may be a bit OT for the list. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: How can i use Solr based Search Engine for My University?
Hello Anurag Google is always there to do internet search. You need to support search for your university. My opinion would be don't crawl the sites. You require only Solr and not Nutch. 1. Provide an interface to upload the documents by the university students. The documents could be previous year question paper, Notes, E-books etc. Scan the documents and convert it to PDF and upload them. Providing search on these things would be more valuable than crawling the sites. Regards Aditya www.findbestopensource.com On Fri, May 6, 2011 at 1:31 PM, Anurag anurag.it.jo...@gmail.com wrote: I am a student at http://jmi.ac.in/index.htm Jamia Millia Islamia , a central univeristy in India. I want to use my search engine for the benefit of students. The university has course like undergraduate,graduate,phd etc inlcuding Engineering . Earlier one of my teacher suggested developing Intranet Search ( for Lan) , but i am not able to figure it out as to how to implement it. My university uses Google as its own site search tool. I am in Engg department and i see students( including me ) using Xerox, Previous year papers , Notes etc during exam time. People use internet or say google to learn if any topics is not inlucded in book. Please give some valuable suggestions. Thanks - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: UIMA analysisEngine path
Barry, I understand your need and I agree with you it'd be useful to be able to load AEs also from filesystem, I created SOLR-2501 [1] to track that requirement. Consider that loading AEs from relative paths, as using relative paths in general, is not a good practice since different environments could set the relative path start at different points in the filesystem; I think a good solution would be using the solr.home as the root of a relative path because that is a Solr instance/core property. Regards, Tommaso [1] : https://issues.apache.org/jira/browse/SOLR-2501 2011/5/5 Barry Hathaway bhath...@nycap.rr.com Tommaso, Thanks. Now Solr finds the descriptor; however, I think this is very bad practice. Descriptors really aren't meant to be jarred up. They often contain relative paths. For example, in my case I have a directory that looks like: appassemble |- desc |- pear where the AnalysisEngine descriptor contained in desc is an aggregate analysis engine and refers to other analysis engines packaged as installed PEAR files in the pear subdirectory. As such, the descriptor contains relative paths pointing into the pear subdirectory. Grabbing the descriptor from the jar breaks that since OverridingParamsAEProvider uses the XMLInputSource method without relative path signature. Barry On 5/4/2011 6:16 AM, Tommaso Teofili wrote: Hello Barry, the main AnalysisEngine descriptor defined inside theanalysisEngine element should be inside one of the jars imported with thelib elements. At the moment it cannot be taken from expanded directories but it should be easy to do it (and indeed useful) modifying the OverridingParamsAEProvider class [1] at line 57. Hope this helps, Tommaso [1] : http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/src/main/java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java?view=markup 2011/5/3 Barry Hathawaybhath...@nycap.rr.com I'm new to Solr and trying to get it call a UIMA aggregate analysis engine and not having much luck. The null pointer exception indicates that it can't find the xml file associated with the engine. I have tried a number of combinations of a path in theanalysisEngine element, but nothing seems to work. In addition, I've put the directory containing the descriptor in both the classpath when starting the server and in alib element in solrconfig.xml. So: What classpath does theanalysisEngine tag effectively search for to locate the descriptor? Do thelib entries in solrconfig.xml affect this classpath? Do the engine descriptors have to be in a jar or can they be in an expanded directory? Thanks in advance. Barry
Re: why query chinese character with bracket become phrase query by default?
On Thu, May 5, 2011 at 10:00 AM, Yonik Seeley yo...@lucidimagination.com wrote: 2011/5/5 Michael McCandless luc...@mikemccandless.com: The very first thing every non-whitespace language Solr app should do is turn off autoGeneratePhraseQueries! Luckily, this is configurable per FieldType... so if it doesn't exist yet, we should come up with a good CJK fieldtype to add to the example schema. +1 Shouldn't we have field types in the eg schema for the different languages? Ie, text_zh, text_th, text_en, text_ja, text_nl, etc. Mike http://blog.mikemccandless.com
How many UpdateHandlers can a Solr config have?
Hello everyone, just a very basic question, but I haven't been able to find the answer in the Solr wiki: how many updateHandlers can one Solr config have? Just one? Or many? Thank you very much -Julian
Re: DIH for e-mails
Take a look at Transformers, perhaps a custom Transformer. They're surprisingly easy to add. Essentially, if you write your own it gets a map representing the Solr document. That map contains all of the modifications to the document made by any other Transformers previously defined, and you can freely add/remove fields in the map. DIH will then pass the entire result off toSolr to be indexed. Best Erick 2011/5/5 m _ 米蟲ы~ fangzhenp...@foxmail.com: I’m using Data Import Handler for index emails. The problem is that I wanna add my own field such as security_number. Someone have any idea? Regards, -- James Bond Fang
Re: Use Solr / Lucene to search in a Logfile
Hi Robert, Have you considered just using Loggly.com ? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Robert Naczinski robert.naczin...@googlemail.com To: solr-user@lucene.apache.org Sent: Fri, May 6, 2011 3:01:45 AM Subject: Use Solr / Lucene to search in a Logfile Hello, we want to search large log4j logfiles withSolr / Lucene and find any lines with a special argument. ( for examle: with a defined userid ) It should be a solr indput document for each row in the file. The log file is growing continuously and the search index must be refreshed. Has already anybidy implemented something like this and can give me a tip ? Greetings, Robert
Re: Thoughts on Search Analytics?
Hi Aditya, - Original Message From: findbestopensource findbestopensou...@gmail.com 1. Reports based on Location. Group by City / Country In other words, much like what one gets in Google Analytics? 2. Total search performed per hour / week / month 3. Frequently used search keywords 4. Analytics based on search keywords. Could you please elaborate and be more specific about this last one? Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ On Fri, May 6, 2011 at 3:55 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I'd like to solicit your thoughts about Search Analytics if you are doing any sort of analysis/reporting of search logs or click stream or anything related. * Which information or reports do you find the most useful and why? * Which reports would you like to have, but don't have for whatever reason (don't have the needed data, or it's too hard to produce such reports, or ...) * Which tool(s) or service(s) do you use and find the most useful? I'm preparing a presentation on the topic of Search Analytics, so I'm trying to solicit opinions, practices, desires, etc. on this topic. Your thoughts would be greatly appreciated. If you could reply directly, that would be great, since this may be a bit OT for the list. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: How can i use Solr based Search Engine for My University?
Thanks Aditya, i appreciate your suggestion.i will implemet your suggestions. Besides these is there any other useful aspect that i may be not taking into account? Thanks a lot.. On Fri, May 6, 2011 at 4:57 PM, findbestopensource [via Lucene] ml-node+2907727-211212-146...@n3.nabble.com wrote: Hello Anurag Google is always there to do internet search. You need to support search for your university. My opinion would be don't crawl the sites. You require only Solr and not Nutch. 1. Provide an interface to upload the documents by the university students. The documents could be previous year question paper, Notes, E-books etc. Scan the documents and convert it to PDF and upload them. Providing search on these things would be more valuable than crawling the sites. Regards Aditya www.findbestopensource.com On Fri, May 6, 2011 at 1:31 PM, Anurag [hidden email]http://user/SendEmail.jtp?type=nodenode=2907727i=0by-user=t wrote: I am a student at http://jmi.ac.in/index.htm Jamia Millia Islamia , a central univeristy in India. I want to use my search engine for the benefit of students. The university has course like undergraduate,graduate,phd etc inlcuding Engineering . Earlier one of my teacher suggested developing Intranet Search ( for Lan) , but i am not able to figure it out as to how to implement it. My university uses Google as its own site search tool. I am in Engg department and i see students( including me ) using Xerox, Previous year papers , Notes etc during exam time. People use internet or say google to learn if any topics is not inlucded in book. Please give some valuable suggestions. Thanks - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.htmlhttp://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.html?by-user=t Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907727.html To unsubscribe from How can i use Solr based Search Engine for My University?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2907168code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXwyOTA3MTY4fC0yMDk4MzQ0MTk2. -- Kumar Anurag - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2908076.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Terms and Date field issues
OK, I'm reaching a little here, but I think it's got a pretty good chance of being the issue you're seeing. Sure hope somebody jumps in and corrects me if I'm wrong (hint hint)... I haven't delved into the actual Trie code, this is just from looking with TermsComponent and Luke. Using Solr 1.4.1 BTW. What you're seeing it a consequence of the trie field type with a precision step other than 0. Trie fields with precisionstep 0 add extra stuff to the index to allow more efficient range queries. A hint about this is that your 5 documents with the tdate type produce 16 tokens rather than just 5. If you try your experiment with the date type (which is a trie type with precisionstep=0) you'll see exactly what you expect. So the long and short of it is that Solr's working as expected, and you can use your index without worrying. But, if you're trying to do some lower-level term walking, you'll either have to filter stuff out, copy your dates to something with precisionstep=0 and use that field or Best Erick On Thu, May 5, 2011 at 9:08 PM, Ahmet Arslan iori...@yahoo.com wrote: It is okey to see weird things in admin/schema.jsp or terms component with trie based types. Please see http://search-lucene.com/m/WEfSI1Yi4562/ If you really need terms component, consider using copyField (tdate to string type) Please find attached the schema and some test data (test.xml). Thanks for looking this. Viswa Date: Thu, 5 May 2011 19:08:31 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org H, this is puzzling. If you could come up with a couple of xml files and a schema that illustrate this, I'll see what I can see... Thanks, Erick On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote: Erik, I suspected the same, and setup a test instance to reproduce this. The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW. However, I have reproduced this issue with fields which do no have defaults too. On the second one, I did a delete-commit (with expungeDeletes=true) and then a optimize. All other fields show updated terms except the date fields. I have also double checked to see if the Luke handler has any different terms, and it did not. Thanks Viswa Date: Wed, 4 May 2011 08:17:39 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst
Re: Testing the limits of non-Java Solr
You've hit it right on the head... if you can use the standard analyzers/filters/etc, you're in good shape. You have to process the output (xml, json, whatever) as Otis says, but that's in whatever language your app server uses. But when was the last time you were motivated to write a blog post like just used the package and it all worked :). Perhaps one of the things you're seeing is that people are motivated to write about the nifty parts of what they do... Coupled with the fact that people write to the users' list exactly because they can't make the standard stuff do their particular task. It's nice to know you *can* extend it with plugins for those gnarly situations though. So I say go for it! Best Erick On Thu, May 5, 2011 at 6:28 PM, Jack Repenning jrepenn...@collab.net wrote: What's the probability that I can build a non-trivial Solr app without writing any Java? I've been planning to use Solr, Lucene, and existing plug-ins, and sort of hoping not to write any Java (the app itself is Ruby / Rails). The dox (such as http://wiki.apache.org/solr/FAQ) seem encouraging. [I *can* write Java, but my planning's all been no Java.] I'm just beginning the design work in earnest, and I suddenly notice that it seems every mail thread, blog, or example starts out Java-free, but somehow ends up involving Java code. I'm not sure I yet understand all these snippets; conceivably some of the Java I see could just as easily be written in another language, but it makes me wonder. Is it realistic to plan a sizable Solr application without some Java programming? I know, I know, I know: everything depends on the details. I'd be interested even in anecdotes: has anyone ever achieved this before? Also, what are the clues I should look for that I need to step into the Java realm? I understand, for example, that it's possible to write filters and tokenizers to do stuff not available in any standard one; in this case, the clue would be I can't find what I want in the standard list, I guess. Are there other things I should look for? -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
Re: UIMA analysisEngine path
Thanks for creating the case to track the requirement. I really don't agree with your comments about using relative paths though. The only way to specify the AE's making up an aggregate AE is to use a import location ..., leaving you to choose either a absolute, relative, or a URL. All of these are not that great. You are not allowed to use environment variables. The UIMA documentation clearly states that relative paths are relative with respect to the location of the descriptor containing the import. That is the way in which XMLInputSource works. Solr's OverridingParamsAEProvider, in my opinion, is clearly broken. If it wants to suck a descriptor out of a jar then it MUST call XMLInputSource using the signature in with both the descriptor name AND the path to the jar containing are passed in so that XMLInputSource knows how to process the descriptor. Barry On 5/6/2011 8:47 AM, Tommaso Teofili wrote: Barry, I understand your need and I agree with you it'd be useful to be able to load AEs also from filesystem, I created SOLR-2501 [1] to track that requirement. Consider that loading AEs from relative paths, as using relative paths in general, is not a good practice since different environments could set the relative path start at different points in the filesystem; I think a good solution would be using the solr.home as the root of a relative path because that is a Solr instance/core property. Regards, Tommaso [1] : https://issues.apache.org/jira/browse/SOLR-2501 2011/5/5 Barry Hathawaybhath...@nycap.rr.com Tommaso, Thanks. Now Solr finds the descriptor; however, I think this is very bad practice. Descriptors really aren't meant to be jarred up. They often contain relative paths. For example, in my case I have a directory that looks like: appassemble |- desc |- pear where the AnalysisEngine descriptor contained in desc is an aggregate analysis engine and refers to other analysis engines packaged as installed PEAR files in the pear subdirectory. As such, the descriptor contains relative paths pointing into the pear subdirectory. Grabbing the descriptor from the jar breaks that since OverridingParamsAEProvider uses the XMLInputSource method without relative path signature. Barry On 5/4/2011 6:16 AM, Tommaso Teofili wrote: Hello Barry, the main AnalysisEngine descriptor defined inside theanalysisEngine element should be inside one of the jars imported with thelib elements. At the moment it cannot be taken from expanded directories but it should be easy to do it (and indeed useful) modifying the OverridingParamsAEProvider class [1] at line 57. Hope this helps, Tommaso [1] : http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/src/main/java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java?view=markup 2011/5/3 Barry Hathawaybhath...@nycap.rr.com I'm new to Solr and trying to get it call a UIMA aggregate analysis engine and not having much luck. The null pointer exception indicates that it can't find the xml file associated with the engine. I have tried a number of combinations of a path in theanalysisEngine element, but nothing seems to work. In addition, I've put the directory containing the descriptor in both the classpath when starting the server and in alib element in solrconfig.xml. So: What classpath does theanalysisEngine tag effectively search for to locate the descriptor? Do thelib entries in solrconfig.xml affect this classpath? Do the engine descriptors have to be in a jar or can they be in an expanded directory? Thanks in advance. Barry
Re: UIMA analysisEngine path
Hello Barry, 2011/5/6 Barry Hathaway bhath...@nycap.rr.com Thanks for creating the case to track the requirement. I really don't agree with your comments about using relative paths though. The only way to specify the AE's making up an aggregate AE is to use a import location ..., leaving you to choose either a absolute, relative, or a URL. this is not true, you can do also import name=... which handles classpaths and datapaths, have a look here: http://uima.apache.org/d/uimaj-2.3.1/references.html#ugr.ref.xml.component_descriptor.imports All of these are not that great. You are not allowed to use environment variables. The UIMA documentation clearly states that relative paths are relative with respect to the location of the descriptor containing the import. I know that, I meant the relative path to retrieve the main aggregate AE from Solr not the relative path used in import location=.. to get the delegate AEs from the aggregate AE. I am not proposing to introduce environment variables, I am just saying that if we want to support relative paths then I think it'd be a nice idea to choose where the relative file URL starts. That is the way in which XMLInputSource works. Solr's OverridingParamsAEProvider, in my opinion, is clearly broken. If it wants to suck a descriptor out of a jar then it MUST call XMLInputSource using the signature in with both the descriptor name AND the path to the jar containing are passed in so that XMLInputSource knows how to process the descriptor. The XMLInputSource offers a URL based constructor which is useful to serve both scenarios [1]. I am ok on supporting also filesystem retrieved descriptors; this was not taken in account in the first implementation since many existing annotators already deliver descriptors bundled inside the jars/pears but this addition sounds like a good improvement so, basically, let's do it ;-) Regards, Tommaso [1] : http://uima.apache.org/d/uimaj-2.3.1/api/org/apache/uima/util/XMLInputSource.html#XMLInputSource(java.net.URL) Barry On 5/6/2011 8:47 AM, Tommaso Teofili wrote: Barry, I understand your need and I agree with you it'd be useful to be able to load AEs also from filesystem, I created SOLR-2501 [1] to track that requirement. Consider that loading AEs from relative paths, as using relative paths in general, is not a good practice since different environments could set the relative path start at different points in the filesystem; I think a good solution would be using the solr.home as the root of a relative path because that is a Solr instance/core property. Regards, Tommaso [1] : https://issues.apache.org/jira/browse/SOLR-2501 2011/5/5 Barry Hathawaybhath...@nycap.rr.com Tommaso, Thanks. Now Solr finds the descriptor; however, I think this is very bad practice. Descriptors really aren't meant to be jarred up. They often contain relative paths. For example, in my case I have a directory that looks like: appassemble |- desc |- pear where the AnalysisEngine descriptor contained in desc is an aggregate analysis engine and refers to other analysis engines packaged as installed PEAR files in the pear subdirectory. As such, the descriptor contains relative paths pointing into the pear subdirectory. Grabbing the descriptor from the jar breaks that since OverridingParamsAEProvider uses the XMLInputSource method without relative path signature. Barry On 5/4/2011 6:16 AM, Tommaso Teofili wrote: Hello Barry, the main AnalysisEngine descriptor defined inside theanalysisEngine element should be inside one of the jars imported with thelib elements. At the moment it cannot be taken from expanded directories but it should be easy to do it (and indeed useful) modifying the OverridingParamsAEProvider class [1] at line 57. Hope this helps, Tommaso [1] : http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/src/main/java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java?view=markup 2011/5/3 Barry Hathawaybhath...@nycap.rr.com I'm new to Solr and trying to get it call a UIMA aggregate analysis engine and not having much luck. The null pointer exception indicates that it can't find the xml file associated with the engine. I have tried a number of combinations of a path in theanalysisEngine element, but nothing seems to work. In addition, I've put the directory containing the descriptor in both the classpath when starting the server and in alib element in solrconfig.xml. So: What classpath does theanalysisEngine tag effectively search for to locate the descriptor? Do thelib entries in solrconfig.xml affect this classpath? Do the engine descriptors have to be in a jar or can they be in an expanded directory? Thanks in advance. Barry
Michigan Information Retrieval Enthusiasts Group Quarterly Meetup - May 19th 2011
Our next IR Meetup is at Cengage Learning on May 19, 2011. Please RSVP here: http://www.meetup.com/Michigan-Information-Retrieval-Enthusiasts-Group/events/17567795/ Presentations: 1. Bayesian Language Model This talk presents a Bayesian language model, originally described by (Teh 2006), which uses a hierarchical Pitman-Yor process to describe the distribution of n-grams in an n-gram language model and which allows for a Bayesian back-off and smoothing strategy. The language model, which assumes a power-law prior over the n-gram space, compares favorably with language models based upon state of the art empirical n-gram smoothing techniques. In addition to the language model, and primarily because the background information required to understand it is somewhat difficult, that material, most of which does not appear in (Teh 2006), is also presented in some detail. In particular, background information related to the Dirichlet distribution and the Dirichlet process is given. The Dirichlet process is then related to the Pitman-Yor process, and the hierarchical Pitman-Yor process is also presented. 2. Using GATE for Word Polarity in Context Classification GATE (General Architecture for Text Engineering) is an open source software for creating text processing workflows. Core GATE includes the tools for solving many text engineering issues: modeling and persistence of specialized data structures; measurement, evaluation, benchmarking; visualization and editing of annotations, ontologies, parse trees, etc.; extraction of training instances for machine learning; pluggable machine learning implementations. This tutorial will show how to use GATE for advanced machine learning applications. Detecting word polarity in context will be used as an example to show some of the GATE features. The tutorial project is based on the latest sentiment analysis research, specifically the work by Theresa Wilson, Janyce Wiebe, Paul Hoffmann Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis, 2009. Using different features (words, part of speech, negations, etc...) SVM classifier is trained and evaluated. Thank you, Ivan Provalov
RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:
Thanks Ahmet, let me give this a shot. Regards, Rohit -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: 06 May 2011 15:39 To: solr-user@lucene.apache.org Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date String: --- On Fri, 5/6/11, Rohit ro...@in-rev.com wrote: From: Rohit ro...@in-rev.com Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date String: To: solr-user@lucene.apache.org Date: Friday, May 6, 2011, 8:47 AM Hi Craig, Thanks for the response, actually what we need to achive is see group by results based on dates like, 2011-01-01 23 2011-01-02 14 2011-01-03 40 2011-01-04 10 Now the records in my table run into millions, grouping the result based on UTC date would not produce the right result since the result should be grouped on users timezone. Is there anyway we can achieve this in Solr? Easiest way can be create additional string typed field, and use copyField to populate it. (copy first 10 characters from (t)date into string) And facet on that string field. facet=onfacet.field=SDATE field name=DATE type=tdate indexed=true stored=true/ field name=SDATE type=string indexed=true stored=true/ copyField source=DATE dest=SDATE maxChars=10/
Replication question
I have Replication set up with str name=pollInterval00:00:60/str I assumed that meant it would poll the master for updates once a minute. But my logs make it look like it is trying to sync up almost constantly. Below is an example of my log from just 1 minute in time. Am I reading this wrong? This is from one of the slaves, I have 2 of them so my Master's log file is double this. Is this normal? May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. -- View this message in context: http://lucene.472066.n3.nabble.com/Replication-question-tp2909157p2909157.html Sent from the Solr - User mailing list archive at Nabble.com.
How to pass resultset to stored procedure ? DataImportHandler
Hi I am new to Solr. I wrote a stored procedure in Oracle. I tried calling from solr. But the procedure is not getting executed as it expects a resultset as out param. CREATE OR REPLACE PROCEDURE GETSEARCHQUERY(p_cursor in out sys_refcursor) AS BEGIN OPEN p_cursor FOR select * from X where X.id = 10730; END GETSEARCHQUERY; Dataconfig.xml entity name=coreY transformer=TemplateTransformer pk=id query={call GETSEARCHQUERY()} deltaQuery={call GETSEARCHQUERY()} /entity Can someone help me. Thanks Binoy -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-pass-resultset-to-stored-procedure-DataImportHandler-tp2906902p2906902.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Programmatic restructuring of a Solr cloud
Hello Jan, Thank you very much for the answer. Unfortunately, we don't use Amazon, and I doubt we will be able to persuade the customer to switch to it. Moreover, the amount of data will not allow us to store everything on a single master. However, having considered your design I am starting to see the problem in a new light, so maybe it will still prove helpful ;) In the meanwhile, I'm still looking for other solutions... Best regards, Sergey Sazonov. On 05/05/11 15:07, Jan Høydahl wrote: Hi, One approach if you're using Amazon is using BeanStalk * Create one master with 12 cores, named jan, feb, mar etc * Every month, you clear the current month index and switch indexing to it You will only have one master, because you're only indexing to one month at a time * For each of the 12 months, setup an Amazon BeanStalk instance with a Solr replica pointing to its master This way, Amazon will spin off replicas as needed NOTE: Your replica could still be located at /solr/select even if it replicates from /solr/may/replication * You only query the replicas, and the client will control whether to query one or more shards shards=jan.elasticbeanstalk.com/solr,feb.elasticbeanstalk.com/solr,mar.elasticbeanstalk.com/solr After this is setup, you have 0 config to worry about :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 5. mai 2011, at 14.03, Sergey Sazonov wrote: Dear Solr Experts, First of all, I would like to thank you for your patience when answering questions of those who are less experienced. And now to the main topic: I would like to learn whether it is possible to restructure a Solr cloud programmatically. Let me describe the system we are designing to make the requirements clear. The indexed documents are certain log entries. We are planning to shard them by month, and only keep the last 12 months in the index. We are going to replicate each shard across several servers. Now, the user is always required to search within a single month (= shard). Most importantly, we expect an absolute majority of the requests to query the current month, with only a minor load on the previous months. In order to utilise the cluster most efficiently, we would like a majority of the servers to contain replicas of the current month data, and have only one or two servers per older month. To this end, we are planning to have a set of slaves that migrate from master to master, depending on which master holds the data for the current month. When a new month starts, those slaves have to be reconfigured to hold the new shard and to replicate from the new master (their old master now holding the data for the previous month). Since this operation has to be done every month, we are naturally considering automating it. So my question is whether anyone has faced a similar problem before, and what is the best way to solve it. We are not committed to any solution, or even architecture, so feel free to propose different solutions. The only requirement is that a majority of the servers should be able to serve requests to the current month at any given moment. Thank you in advance for your answers. Best regards, Sergey Sazonov.
RE: Solr Terms and Date field issues
Thanks Erick Ahmet, that helps. Date: Fri, 6 May 2011 09:25:11 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org OK, I'm reaching a little here, but I think it's got a pretty good chance of being the issue you're seeing. Sure hope somebody jumps in and corrects me if I'm wrong (hint hint)... I haven't delved into the actual Trie code, this is just from looking with TermsComponent and Luke. Using Solr 1.4.1 BTW. What you're seeing it a consequence of the trie field type with a precision step other than 0. Trie fields with precisionstep 0 add extra stuff to the index to allow more efficient range queries. A hint about this is that your 5 documents with the tdate type produce 16 tokens rather than just 5. If you try your experiment with the date type (which is a trie type with precisionstep=0) you'll see exactly what you expect. So the long and short of it is that Solr's working as expected, and you can use your index without worrying. But, if you're trying to do some lower-level term walking, you'll either have to filter stuff out, copy your dates to something with precisionstep=0 and use that field or Best Erick On Thu, May 5, 2011 at 9:08 PM, Ahmet Arslan iori...@yahoo.com wrote: It is okey to see weird things in admin/schema.jsp or terms component with trie based types. Please see http://search-lucene.com/m/WEfSI1Yi4562/ If you really need terms component, consider using copyField (tdate to string type) Please find attached the schema and some test data (test.xml). Thanks for looking this. Viswa Date: Thu, 5 May 2011 19:08:31 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org H, this is puzzling. If you could come up with a couple of xml files and a schema that illustrate this, I'll see what I can see... Thanks, Erick On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote: Erik, I suspected the same, and setup a test instance to reproduce this. The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW. However, I have reproduced this issue with fields which do no have defaults too. On the second one, I did a delete-commit (with expungeDeletes=true) and then a optimize. All other fields show updated terms except the date fields. I have also double checked to see if the Luke handler has any different terms, and it did not. Thanks Viswa Date: Wed, 4 May 2011 08:17:39 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst
Re: Use Solr / Lucene to search in a Logfile
Hi, thanks for the reply. I did not know that. Is there still a way to use Solr or Lucene? Or Apache Nutch would be not be bad. Could I maybe write a customized DIH? Greetings, Robert 2011/5/6 Otis Gospodnetic otis_gospodne...@yahoo.com: Loggly.com
Re: Field names with a period (.)
: I remember the same, except I think I've seen the recommendation that you : make all the letters lower-case. As I remember, there are some interesting : edge cases that you might run into later with upper case. i can't think of *any* reason why upper case character names i na field would cause you problems. In general, the low level guts of SOlr don't care what characters you use i na field name. where people run into problems is that some specific features of solr either have limitiations in what they can deal with in a field name, or work in some wya that makes certain characters extremely frustrating to use. the simplest example of frustration is in needing to URL escape any special characters when building a URL that contains a field name (ie: as the value of a facet.field param for example) an example of a hard limitation is sorting: the sort param expects whitespace seperated lists of fieldname asc|desc pairs -- if your field name contains whitespace, that can screw you up. the lucene QueryParser is another situation where punctuation and whitespace are significant, so having those characters in your field names may cause your problems (i think in most cases they can be backslash escaped, but i'm not certain) As far as the specific question about . in field names -- i can't think of any feature that would break on that ... the only thing that comes to mind as a possibility is using per-field override params (ie: if the field name is foo.bar and you want to use facet.field=foo.barf.foo.bar.facet.prefix=xxx) ... but even then, i'm pretty sure it would work fine (you and the other people maintaining your code might get really confused however) -Hoss
Re: Field names with a period (.)
On Sat, May 7, 2011 at 1:29 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I remember the same, except I think I've seen the recommendation that you : make all the letters lower-case. As I remember, there are some interesting : edge cases that you might run into later with upper case. i can't think of *any* reason why upper case character names i na field would cause you problems. [...] Will second that in so far as we have been using camelCase for several months now without issues. I would like to hear about any edge cases here. Other than that, we have always stuck to a-z, A-Z, so I cannot comment directly from experience about any issues with other characters. Regards, Gora
custom types file for WordDelimeterFilterFactory
Hi there, I would like to experiment with the custom types file introduced in solr-2059. I have copied the wdftypes.txt file from SVN and put it in my solrhome/solr/conf directory. However, it doesn't appear to me that the WordDelimeterFilterFactory is using it. Have I put it in the correct path? Is there an argument to the WordDelimeterFilterFactory that I must provide in order for the file to be used? How do I verify that it is in use? Will I see it in the analyzer? Thanks, Jerry Mindek
Replication Clarification Please
Hello, Pardon me if this has been already answered somewhere and I apologize for a lengthy post. I was wondering if anybody could help me understand Replication internals a bit more. We have a single master-slave setup (solr 1.4.1) with the configurations as shown below. Our environment is quite commit heavy (almost 100s of docs every 5 minutes), and all indexing is done on Master and all searches go to the Slave. We are seeing that the slave replication performance gradually decreases and the speed decreases 1kbps and ultimately gets backed up. Once we reload the core on slave it will be work fine for sometime and then it again gets backed up. We have mergeFactor set to 10 and ramBufferSizeMB is set to 32MB and solr itself is running with 2GB memory and locktype is simple on both master and slave. I am hoping that the following questions might help me understand the replication performance issue better (Replication Configuration is given at the end of the email) 1. Does the Slave get the whole index every time during replication or just the delta since the last replication happened ? 2. If there are huge number of queries being done on slave will it affect the replication ? How can I improve the performance ? (see the replications details at he bottom of the page) 3. Will the segment names be same be same on master and slave after replication ? I see that they are different. Is this correct ? If it is correct how does the slave know what to fetch the next time i.e. the delta. 4. When and why does the index.TIMESTAMP folder get created ? I see this type of folder getting created only on slave and the slave instance is pointing to it. 5. Does replication process copy both the index and index.TIMESTAMP folder ? 6. what happens if the replication kicks off even before the previous invocation has not completed ? will the 2nd invocation block or will it go through causing more confusion ? 7. If I have to prep a new master-slave combination is it OK to copy the respective contents into the new master-slave and start solr ? or do I have have to wipe the new slave and let it replicate from its new master ? 8. Doing an 'ls | wc -l' on index folder of master and slave gave 194 and 17968 respectively...I slave has lot of segments_xxx files. Is this normal ? MASTER requestHandler name=/replication class=solr. ReplicationHandler lst name=master str name=replicateAfterstartup/str str name=replicateAftercommit/str str name=replicateAfteroptimize/str str name=confFilesschema.xml,stopwords.txt/str str name=commitReserveDuration00:00:10/str /lst /requestHandler SLAVE requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlmaster core url/str str name=pollInterval00:03:00/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler REPLICATION DETAILS FROM PAGE Master master core url Poll Interval 00:03:00 Local Index Index Version: 1296217104577, Generation: 20190 Location: /data/solr/core/search-data/index.20110429042508 Size: 2.1 GB Times Replicated Since Startup: 672 Previous Replication Done At: Fri May 06 15:41:01 EDT 2011 Config Files Replicated At: null Config Files Replicated: null Times Config Files Replicated Since Startup: null Next Replication Cycle At: Fri May 06 15:44:00 EDT 2011 Current Replication Status Start Time: Fri May 06 15:41:00 EDT 2011 Files Downloaded: 43 / 197 Downloaded: 477.08 KB / 588.82 MB [0.0%] Downloading File: _hdm.prx, Downloaded: 9.3 KB / 9.3 KB [100.0%] Time Elapsed: 967s, Estimated Time Remaining: 1221166s, Speed: 505 bytes/s Ravi Kiran Bhaskar
RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:
: Thanks for the response, actually what we need to achive is see group by : results based on dates like, : : 2011-01-01 23 : 2011-01-02 14 : 2011-01-03 40 : 2011-01-04 10 : : Now the records in my table run into millions, grouping the result based on : UTC date would not produce the right result since the result should be : grouped on users timezone. Is there anyway we can achieve this in Solr? Date faceting is entirely driven by query params, so if you index your events using the true time that they happend at (formatted as a string in UTC) you can then select your date ranges using whatever timezone offset is specified by your user at query time as a UTC offset. facet.range = dateField facet.range.start = 2011-01-01T00:00:00Z+${useroffset}MINUTES facet.range.gap = +1DAY etc... -Hoss
*:* query with dismax
I am using dismax and trying to use q=*:* to return all indexed documents. However, it is always returning 0 found. If I used the default select (not dismax) handler and try q=*:* then it returns all documents. There is nothing in the logs to indicate why this happening. Does anyone have any clues? Thanks, Jason
Re: *:* query with dismax
This is exactly what should be happening, as the dismax parser doesn't understand regular query syntax (and for good reason too). This tripped me up as well when I first started using dismax. Solution for me was to comfigure the handler to use *:* when the query is empty, so that you can still get back a full result set if you need it, say for faceting. HTH Mark On May 7, 2011 9:22 AM, Jason Chaffee jchaf...@ebates.com wrote: I am using dismax and trying to use q=*:* to return all indexed documents. However, it is always returning 0 found. If I used the default select (not dismax) handler and try q=*:* then it returns all documents. There is nothing in the logs to indicate why this happening. Does anyone have any clues? Thanks, Jason
RE: *:* query with dismax
Can you shed some light on what you did to configure it to handle *:*? I have the same issue that I need it to work for faceting, but I do need the dismax abilities as well. -Original Message- From: Mark Mandel [mailto:mark.man...@gmail.com] Sent: Friday, May 06, 2011 4:30 PM To: solr-user@lucene.apache.org Subject: Re: *:* query with dismax This is exactly what should be happening, as the dismax parser doesn't understand regular query syntax (and for good reason too). This tripped me up as well when I first started using dismax. Solution for me was to comfigure the handler to use *:* when the query is empty, so that you can still get back a full result set if you need it, say for faceting. HTH Mark On May 7, 2011 9:22 AM, Jason Chaffee jchaf...@ebates.com wrote: I am using dismax and trying to use q=*:* to return all indexed documents. However, it is always returning 0 found. If I used the default select (not dismax) handler and try q=*:* then it returns all documents. There is nothing in the logs to indicate why this happening. Does anyone have any clues? Thanks, Jason
edismax available in solr 3.1?
Hi, is edixmax available in solr 3.1? I don't see any documentation about it. if it is, does it support the prefix and fuzzy query? Thanks, cy -- View this message in context: http://lucene.472066.n3.nabble.com/edismax-available-in-solr-3-1-tp2910613p2910613.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: *:* query with dismax
it does seem a little weird, but q.alt will get what you want: http://wiki.apache.org/solr/DisMaxQParserPlugin#q.alt hth, rc On Fri, May 6, 2011 at 7:41 PM, Jason Chaffee jchaf...@ebates.com wrote: Can you shed some light on what you did to configure it to handle *:*? I have the same issue that I need it to work for faceting, but I do need the dismax abilities as well. -Original Message- From: Mark Mandel [mailto:mark.man...@gmail.com] Sent: Friday, May 06, 2011 4:30 PM To: solr-user@lucene.apache.org Subject: Re: *:* query with dismax This is exactly what should be happening, as the dismax parser doesn't understand regular query syntax (and for good reason too). This tripped me up as well when I first started using dismax. Solution for me was to comfigure the handler to use *:* when the query is empty, so that you can still get back a full result set if you need it, say for faceting. HTH Mark On May 7, 2011 9:22 AM, Jason Chaffee jchaf...@ebates.com wrote: I am using dismax and trying to use q=*:* to return all indexed documents. However, it is always returning 0 found. If I used the default select (not dismax) handler and try q=*:* then it returns all documents. There is nothing in the logs to indicate why this happening. Does anyone have any clues? Thanks, Jason
Why special character is handled differently by standard/lucene query parser?
Hi, When user entered text contains special character, can this being taken care by the tokenizer/filter configured at the field? In application code, Do i need to parse the user input string and add the escape in front of those special character? If so, will those special characters differ for different language, such as english versus chinese? As of now, I didn't parse those special character. i am getting this inconsistent/strange behavior/error. For example: 1. search: title_name_en_US:(my! god) solr thinks the second term god is something NOT to include, why is that? lst name=debug str name=rawquerystringtitle_name_en_US:(my! god)/str str name=querystringtitle_name_en_US:(my! god)/str str name=parsedquerytitle_name_en_US:my -title_name_en_US:god/str str name=parsedquery_toStringtitle_name_en_US:my -title_name_en_US:god/str 2. search: title_name_en_US:my! solr return error instead, even worse: -- INFO: [titles] webapp=/solr path=/select params={explainOther=fl=*,scoredebugQ uery=onindent=onstart=0q=title_name_en_US:(Oh!)hl.fl=qt=standardwt=standar dfq=rows=10version=2.2} status=400 QTime=0 May 7, 2011 2:13:48 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.Pars eException: Cannot parse 'title_name_en_US:Oh!': Encountered EOF at line 1, column 20. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.solr.handler.component.QueryComponent.prepare(QueryCompone nt.java:108) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea rchHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:131) Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'title_nam e_en_US:my!': Encountered EOF at line 1, column 20. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:205) -- View this message in context: http://lucene.472066.n3.nabble.com/Why-special-character-is-handled-differently-by-standard-lucene-query-parser-tp2910692p2910692.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why special character is handled differently by standard/lucene query parser?
On Fri, May 6, 2011 at 10:35 PM, cyang2010 ysxsu...@hotmail.com wrote: When user entered text contains special character, can this being taken care by the tokenizer/filter configured at the field? In application code, Do i need to parse the user input string and add the escape in front of those special character? If so, will those special characters differ for different language, such as english versus chinese? As of now, I didn't parse those special character. i am getting this inconsistent/strange behavior/error. For example: 1. search: title_name_en_US:(my! god) solr thinks the second term god is something NOT to include, why is that? ! is a synonym for the NOT operator in lucene query parser syntax. The fact that it's treated as an operator even when followed by whitespace is a bug. This was fixed by LUCENE-2566 (which is in the trunk version, but not 3.1) One workaround is to escape the ! or quote the term. title_name_en_US:(my\! god) title_name_en_US:(my! god) In general, the lucene query parser isn't meant for directly handling literal user queries since it has a more strict syntax (like SQL). Something like the dismax or edismax may help (try adding defType=dismax to your request). They are designed to try and never throw exceptions. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Why special character is handled differently by standard/lucene query parser?
I know about dismax. But with that, i can't perform prefix and fuzzy query. can edismax handle prefix and fuzzy query? My application logic just pass the user entered text to solr server to perform term query, phrase query, prefix and fuzzy query. And i don't want to escape the special character by parsing the java string, since i might deal with things in different language set. That is why I also ask whether those special character is lanaguage specific or agnostic. Look for your answers. cy -- View this message in context: http://lucene.472066.n3.nabble.com/Why-special-character-is-handled-differently-by-standard-lucene-query-parser-tp2910692p2910809.html Sent from the Solr - User mailing list archive at Nabble.com.