Re: Which query parser and how to do full text on mulitple fields
You said you were using a third party plugin. What do you expect people herre to know? Solr plugins don't have parameters lat, long, radius and threadCount (they have pt and dist). On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon gear...@sbcglobal.netwrote: Which query parser did my partner set up below, and how to I parse three fields in the index for scoring and returning results? /solr/select?wt=jsonindent=truestart=0rows=20q={!spatial%20lat=37.326375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20Loft Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Re: PDFBOX 1.3.1 Parsing Error
If the document is encrypted maybe it isn't meant to be indexed and publicly visible after all? On Sun, Dec 12, 2010 at 10:22 PM, pankaj bhatt panbh...@gmail.com wrote: hi All, While using PDFBOX 1.3.1 in APACHE TIKA 1.7 i am getting the following error to parse an PDF Document. *Error: Expected an integer type, actual='' at org.apache.pdfbox.pdfparser.BaseParser.readInt* * * This error occurs, because of SHA-256 Encryption used by Adobe Acrobat 9. is there is any solution to this problem??? I get stuck because of this approoach. In Jira Issue-697 has been created against this. https://issues.apache.org/jira/browse/PDFBOX-697 Please help!! / Pankaj Bhatt.
Re: Rollback can't be done after committing?
In some cases you can rollback to a named checkpoint. I am not too sure but I think I read in the lucene documentation that it supported named checkpointing. On Thu, Nov 11, 2010 at 7:12 PM, gengshaoguang gengshaogu...@ceopen.cnwrote: Hi, Kouta: Any data store does not support rollback AFTER commit, rollback works only BEFORE. On Friday, November 12, 2010 12:34:18 am Kouta Osabe wrote: Hi, all I have a question about Solr and SolrJ's rollback. I try to rollback like below try{ server.addBean(dto); server.commit; }catch(Exception e){ if (server != null) { server.rollback();} } I wonder if any Exception thrown, rollback process is run. so all data would not be updated. but once commited, rollback would not be well done. rollback correctly will be done only when commit process will not? Solr and SolrJ's rollback system is not the same as any RDB's rollback?
Re: Looking for Developers
This is the second time he has sent this shit. Kill his subscription. Is it possible? On Tue, Oct 26, 2010 at 10:38 PM, Yuchen Wang yuc...@trulia.com wrote: UNSUBSCRIBE On Tue, Oct 26, 2010 at 10:15 PM, Igor Chudov ichu...@gmail.com wrote: UNSUBSCRIBE On Wed, Oct 27, 2010 at 12:14 AM, ST ST stst2...@gmail.com wrote: Looking for Developers Experienced in Solr/Lucene And/OR FAST Search Engines from India (Pune) We are looking for off-shore India Based Developers who are proficient in Solr/Lucene and/or FAST search engine . Developers in the cities of Pune/Bombay in India are preferred. Development is for projects based in US for a reputed firm. If you are proficient in Solr/Lucene/FAST and have 5 years minimum industry experience with atleast 3 years in Search Development, please send me your resume. Thanks
Re: command line to check if Solr is up running
How about - Please do not respond to 20 emails at one time? On Wed, Oct 27, 2010 at 12:33 AM, Lance Norskog goks...@gmail.com wrote: Please start new threads for new topics. Xin Li wrote: As we know we can use browser to check if Solr is running by going to http://$hostName:$portNumber/$masterName/admin, say http://localhost:8080/solr1/admin. My questions is: are there any ways to check it using command line? I used curl http://localhost:8080; to check my Tomcat, it worked fine. However, no response if I try curl http://localhost:8080/solr1/admin; (even when my Solr is running). Does anyone know any command line alternatives? Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity.
Re: Highlighting for non-stored fields
Another way you can do this is - after the search has completed, load the field in your application, write separate code to reanalyze that field/document, index it in RAM, and run it through highlighter classes. All this as part of your web application outside of Solr. Considering the size of your data it doesn't look advisable to store it because then you would be almost doubling the size of your index (if you are looking to highlight on a field then it's probably going to be full of content). -Pradeep On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I understand that I need to store the fields in order to use highlighting out of the box. I'm looking for a way to highlighting using term offsets instead of the actual text since the text is not stored. What am asking is is it possible to modify the response (thru custom implementation) to contain highlighted offsets instead of the actual matched text. Should I be writing my own DefaultHighlighter? Or overiding some of its functionality? Can this be done this way or am I way off? BTW, I'm using solr-1.4. Thanks, P. On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo israele...@gmail.com wrote: Check out this link http://wiki.apache.org/solr/FieldOptionsByUseCase You need to store the field if you want to use the highlighting feature. If you need to retrieve and display the highlighted snippets then the fields definitely needs to be stored. To use term offsets, it will be a good idea to enable the following attributes for that field termVectors termPositions termOffsets The only issue here is that your storage costs will increase because of these extra features. Nevertheless, you definitely need to store the field if you need to retrieve it for highlighting purposes. On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I've been looking thru the mailing archive for the past week and I haven't found any useful info regarding this issue. My requirement is to index a few terabytes worth of data to be searched. Due to the size of the data, I would like to index without storing but I would like to use the highlighting feature. Is this even possible? What are my options? I've read about termOffsets, payload that could possibly be used to do this but I have no idea how this could be done. Any pointers greatly appreciated. Someone please point me in the right direction. I don't mind having to write some code or digging thru existing code to accomplish this task. Thanks, P. -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Multiple Word Facets
Use this field type - fieldType name=facetField class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType On Tue, Oct 26, 2010 at 6:43 PM, Adam Estrada estrada.a...@gmail.comwrote: All, I am a new to Solr faceting and stuck on how to get multiple-word facets returned from a standard Solr query. See below for what is currently being returned. lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=title int name=Federal89/int int name=EFLHD87/int int name=Eastern87/int int name=Lands87/int int name=Highways84/int int name=FHWA60/int int name=Transportation32/int int name=GIS22/int int name=Planning19/int int name=Asset15/int int name=Environment15/int int name=Management14/int int name=Realty12/int int name=Highway11/int int name=HEP10/int int name=Program9/int int name=HEPGIS7/int int name=Resources7/int int name=Roads7/int int name=EEI6/int int name=Environmental6/int int name=Right6/int int name=Way6/int ...etc... There are many terms in there that are 2 or 3 word phrases. For example, Eastern Federal Lands Highway Division all gets broken down in to the individual words that make up the total group of words. I've seen quite a few websites that do what it is I am trying to do here so any suggestions at this point would be great. See my schema below (copied from the example schema). fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer Similar for type=query. Please advise on how to group or cluster document terms so that they can be used as facets. Many thanks in advance, Adam Estrada
Re: How to use AND as opposed to OR as the default query operator.
Which query handler are you using? For a standard query handler you can set q.op per request or set defaultOperator in schema.xml. For a dismax handler you will have to work with min should match. On Mon, Oct 25, 2010 at 6:41 AM, Swapnonil Mukherjee swapnonil.mukher...@gettyimages.com wrote: Hi Everybody, I simply want to use AND as the default operator in queries. When a user searches for Jennifer Lopez solr converts this to a Jennifer OR Lopez query. On the other hand I want solr to treat this query as Jennifer AND Lopez and not as Jennifer OR Lopez. In other words I want a default AND behavior in phrase queries instead of OR. I have seen in this presentation http://www.slideshare.net/pittaya/using-apache-solr on Slide number 52 that this OR behavior is configurable. Could you please tell me where this configuration is located? I could not locate it in schema.xml. Swapnonil Mukherjee +91-40092712 +91-9007131999
Re: Failing to successfully import international characters via DIH
What would you recommend changing or checking? Tomcat *Connector* URIEncoding. I have done this several times on tomcat, might be at a loss on other servers though. - Pradeep
Re: Failing to successfully import international characters via DIH
Holy cow, you already have this in place. I apologize. This looked exactly the kind of problem I have solved this way. On Fri, Oct 22, 2010 at 8:38 AM, Pradeep Singh pksing...@gmail.com wrote: What would you recommend changing or checking? Tomcat *Connector* URIEncoding. I have done this several times on tomcat, might be at a loss on other servers though. - Pradeep
Re: Lucene vs Solr
Is that right? On Tue, Oct 19, 2010 at 11:08 PM, findbestopensource findbestopensou...@gmail.com wrote: Hello all, I have posted an article Lucene vs Solr http://www.findbestopensource.com/article-detail/lucene-vs-solr Please feel free to add your comments. Regards Aditya www.findbestopensource.com
Re: Mulitple facet - fq
fq=(category:corporate category:personal) On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ yvzslmyilm...@gmail.com wrote: Under category facet, there are multiple selections, whicih can be personal,corporate or other How can I get both personal and corporate ones, I tried fq=category:corporatefq=category:personal It looks easy, but I can't find the solution. -- Yavuz Selim YILMAZ
Re: Spatial
Thanks for your response Grant. I already have the bounding box based implementation in place. And on a document base of around 350K it is super fast. What about a document base of millions of documents? While a tier based approach will narrow down the document space significantly this concern might be misplaced because there are other numeric range queries I am going to run anyway which don't have anything to do with spatial query. But the keyword here is numeric range query based on NumericField, which is going to be significantly faster than regular number based queries. I see that the dynamic field type _latLon is of type double and not tdouble by default. Can I have your input about that decision? -Pradeep On Tue, Oct 19, 2010 at 6:10 PM, Grant Ingersoll gsing...@apache.orgwrote: On Oct 19, 2010, at 6:23 PM, Pradeep Singh wrote: https://issues.apache.org/jira/browse/LUCENE-2519 If I change my code as per 2519 to have this - public double[] coords(double latitude, double longitude) { double rlat = Math.toRadians(latitude); double rlong = Math.toRadians(longitude); double nlat = rlong * Math.cos(rlat); return new double[]{nlat, rlong}; } return this - x = (gamma - gamma[0]) cos(phi) y = phi would it make it give correct results? Correct projections, tier ids? I'm not sure. I have a lot of doubt around that code. After making that correction, I spent several days trying to get the tests to pass and ultimately gave up. Does that mean it is wrong? I don't know. I just don't have enough confidence to recommend it given that the tests I were asking it to do I could verify through other tools. Personally, I would recommend seeing if one of the non-tier based approaches suffices for your situation and use that. -Grant
Re: Step by step tutorial for multi-language indexing and search
Here's what I would do - Search all the fields everytime regardless of language. Use one handler and specify all of these in qf and pf. question_en, answer_en, question_fr, answer_fr, question_pl, answer_pl Individual field based analyzers will take care of appropriate tokenization and you will get a match across all languages. Even with this setup if you wanted you could also have a separate field called language and use a fq to limit searches to that language only. -Pradeep On Wed, Oct 20, 2010 at 6:03 AM, Jakub Godawa jakub.god...@gmail.comwrote: Hi everyone! (my first post) I am new, but really curious about usefullness of lucene/solr in documents search from the web applications. I use Ruby on Rails to create one, with plugin acts_as_solr_reloaded that makes connection between web app and solr easy. So I am in a point, where I know that good solution is to prepare multi-language documents with fields like: question_en, answer_en, question_fr, answer_fr, question_pl, answer_pl... etc. I need to create an index that would work with 6 languages: english, french, german, russian, ukrainian and polish. My questions are: 1. Is it doable to have just one search field that behaves like Google's for all those documents? It can be an option to indicate a language to search. 2. How should I begin changing the solr/conf/schema.xml (or other) file to tailor it to my needs? As I am a real rookie here, I am still a bit confused about fields, fieldTypes and their connection with particular field (ex. answer_fr) and the tokenizers and analyzers. If someone can provide a basic step by step tutorial on how to make it work in two languages I would be more that happy. 3. Do all those languages are supported (officially/unofficialy) by lucene/solr? Thank you for help, Jakub Godawa.
Re: Uppercase and lowercase queries
Use text field. On Tue, Oct 19, 2010 at 3:19 AM, PeterKerk vettepa...@hotmail.com wrote: I want to query on cityname. This works when I query for example: Boston But when I query boston it didnt show any results. In the database is stored: Boston. So I thought: I should change the filter on this field to make everything lowercase. The field definition for city is: field name=city type=string indexed=true stored=true/ So I changed its fieldtype string from: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true TO: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType But it still doesnt show any results when I query boston...why? -- View this message in context: http://lucene.472066.n3.nabble.com/Uppercase-and-lowercase-queries-tp1731349p1731349.html Sent from the Solr - User mailing list archive at Nabble.com.
Spatial
https://issues.apache.org/jira/browse/LUCENE-2519 If I change my code as per 2519 to have this - public double[] coords(double latitude, double longitude) { double rlat = Math.toRadians(latitude); double rlong = Math.toRadians(longitude); double nlat = rlong * Math.cos(rlat); return new double[]{nlat, rlong}; } return this - x = (gamma - gamma[0]) cos(phi) y = phi would it make it give correct results? Correct projections, tier ids? I am not talking about changing Lucene/Solr code, I can duplicate the classes to create my own version. Just wanted to be sure about the results. Pradeep
Re: Spell checking question from a Solr novice
I think a spellchecker based on your index has clear advantages. You can spellcheck words specific to your domain which may not be available in an outside dictionary. You can always dump the list from wordnet to get a starter english dictionary. But then it also means that misspelled words from your domain become the suggested correct word. Hmmm ... you'll need to have a way to prune out such words. Even then, your own domain based dictionary is a total go. On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind rochk...@jhu.edu wrote: In general, the benefit of the built-in Solr spellcheck is that it can use a dictionary based on your actual index. If you want to use some external API, you certainly can, in your actual client app -- but it doesn't really need to involve Solr at all anymore, does it? Is there any benefit I'm not thinking of to doing that on the solr side, instead of just in your client app? I think Yahoo (and maybe Microsoft?) have similar APIs with more generous ToSs, but I haven't looked in a while. Xin Li wrote: Oops, never mind. Just read Google API policy. 1000 queries per day limit for non-commercial use only. -Original Message- From: Xin Li Sent: Monday, October 18, 2010 3:43 PM To: solr-user@lucene.apache.org Subject: Spell checking question from a Solr novice Hi, I am looking for a quick solution to improve a search engine's spell checking performance. I was wondering if anyone tried to integrate Google SpellCheck API with Solr search engine (if possible). Google spellcheck came to my mind because of two reasons. First, it is costly to clean up the data to be used as spell check baseline. Secondly, google probably has the most complete set of misspelled search terms. That's why I would like to know if it is a feasible way to go. Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity. This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity.
Admin for spellchecker?
Do we need an admin screen for spellchecker? Where you can browse the words and delete the ones you don't like so that they don't get suggested?
Re: I need to indexing the first character of a field in another field
You can use regular expression based template transformer without writing a separate function. It's pretty easy to use. On Mon, Oct 18, 2010 at 2:31 PM, Renato Wesenauer renato.wesena...@gmail.com wrote: Hello guys, I need to indexing the first character of the field autor in another field inicialautor. Example: autor = Mark Webber inicialautor = M I did a javascript function in the dataimport, but the field inicialautor indexing empty. The function: function InicialAutor(linha) { var aut = linha.get(autor); if (aut != null) { if (aut.length 0) { var ch = aut.charAt(0); linha.put(inicialautor, ch); } else { linha.put(inicialautor, ''); } } else { linha.put(inicialautor, ''); } return linha; } What's wrong? Thank's, Renato Wesenauer
facet.field :java.lang.NullPointerException
Faceting blows up when the field has no data. And this seems to be random. Sometimes it will work even with no data, other times not. Sometimes the error goes away if the field is set to multiValued=true (even though it's one value every time), other times it doesn't. In all cases setting facet.method to enum takes care of the problem. If this param is not set, the default leads to null pointer exception. 09:18:52,218 SEVERE [SolrCore] Exception during facet.field of xyz:java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.lucene.util.PagedBytes.copy(PagedBytes.java:247) at org.apache.solr.request.TermIndex$1.setTerm(UnInvertedField.java:1164) at org.apache.solr.request.NumberedTermsEnum.init(UnInvertedField.java:960) at org.apache.solr.request.TermIndex$1.init(UnInvertedField.java:1151) at org.apache.solr.request.TermIndex.getEnumerator(UnInvertedField.java:1151) at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:204) at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:188) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:911) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:298) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:354) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:190) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at