Re: Indexing multiple entities
On Sun, Nov 1, 2009 at 5:30 AM, Avlesh Singh avl...@gmail.com wrote: The use case on DocumentObjectBinder is that I could override toSolrInputDocument, and if field = ID, I could do: setField(id, obj.getClass().getName() + obj.getId()) or something like that. Unless I am missing something here, can't you write the getter of id field in your solr bean as underneath? @Field private String id; public getId(){ return (this.getClass().getName() + this.id); } I'm using a code generator for my entities, and I cannot modify the generation. I need to work out another option :( Cheers Avlesh On Fri, Oct 30, 2009 at 1:33 PM, Christian López Espínola penyask...@gmail.com wrote: On Fri, Oct 30, 2009 at 2:04 AM, Avlesh Singh avl...@gmail.com wrote: One thing I thought about is if I can define my own DocumentObjectBinder, so I can concatenate my entity names with the IDs in the XML creation. Anyone knows if something like this can be done without modifying Solrj sources? Is there any injection or plugin mecanism for this? More details on the use-case please. If I index a Book with ID=3, and then a Magazine with ID=3, I'll be really removing my Book3 and indexing Magazine3. I want both entities to be in the index. The use case on DocumentObjectBinder is that I could override toSolrInputDocument, and if field = ID, I could do: setField(id, obj.getClass().getName() + obj.getId()) or something like that. The goal is avoiding creating all the XMLs to be sent to Solr but having the possibility of modifying them in some way. Do you know how can I do that, or a better way of achieving the same results? Cheers Avlesh On Fri, Oct 30, 2009 at 2:16 AM, Christian López Espínola penyask...@gmail.com wrote: Hi Israel, Thanks for your suggestion, On Thu, Oct 29, 2009 at 9:37 PM, Israel Ekpo israele...@gmail.com wrote: On Thu, Oct 29, 2009 at 3:31 PM, Christian López Espínola penyask...@gmail.com wrote: Hi, my name is Christian and I'm a newbie introducing to solr (and solrj). I'm working on a website where I want to index multiple entities, like Book or Magazine. The issue I'm facing is both of them have an attribute ID, which I want to use as the uniqueKey on my schema, so I cannot identify uniquely a document (because ID is saved in a database too, and it's autonumeric). I'm sure that this is a common pattern, but I don't find the way of solving it. How do you usually solve this? Thanks in advance. -- Cheers, Christian López Espínola penyaskito Hi Christian, It looks like you are bringing in data to Solr from a database where there are two separate tables. One for *Books* and another one for *Magazines*. If this is the case, you could define your uniqueKey element in Solr schema to be a string instead of an integer then you can still load documents from both the books and magazines database tables but your could prefix the uniqueKey field with B for books and M for magazines Like so : field name=id type=string indexed=true stored=true required=true/ uniqueKeyid/uniqueKey Then when loading the books or magazines into Solr you can create the documents with id fields like this add doc field name=idB14000/field /doc doc field name=idM14000/field /doc doc field name=idB14001/field /doc doc field name=idM14001/field /doc /add I hope this helps This was my first thought, but in practice there isn't Book and Magazine, but about 50 different entities, so I'm using the Field annotation of solrj for simplifying my code (it manages for me the XML creation, etc). One thing I thought about is if I can define my own DocumentObjectBinder, so I can concatenate my entity names with the IDs in the XML creation. Anyone knows if something like this can be done without modifying Solrj sources? Is there any injection or plugin mecanism for this? Thanks in advance. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. -- Cheers, Christian López Espínola penyaskito -- Cheers, Christian López Espínola penyaskito -- Cheers, Christian López Espínola penyaskito
problems with PhraseHighlighter
Hello everyone, I am having problems with highlighting the complete text of a field. I have an xml field. I am querying proximity searches on this field. xml: ( proximity1 AND/OR proximity2 AND/OR …) Results are returned successfully satisfying the proximity query. However when I request highlighting sometimes it returns nothing sometimes it returns missing proximity terms. I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml. maxFieldLength2147483647/maxFieldLength I am using these highlighting parameters: hl.maxAnalyzedChars=2147483647 hl.fragsize=2147483647 hl.usePhraseHighlighter=true hl.requireFieldMatch=true hl.fl=xml hl=true I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns but all query terms are highlighted. What value of hl.fragsize should I use to highlight complete text of a field? 0 or 2147483647? What is the highest value that I can set to hl.maxAnalyzedChars and hl.fragsize? I am querying same field and requesting same field in highlighting. Although a document matches a query no highlighting returns back. What could be the reason? If a document matches a query, there should be highlighting returning back, right? Any help or pointers are really appreciated.
Re: problems with PhraseHighlighter
Copy-paste your field definition for the field you are trying to highlight/search on. Cheers Avlesh On Sun, Nov 1, 2009 at 8:24 PM, AHMET ARSLAN iori...@yahoo.com wrote: Hello everyone, I am having problems with highlighting the complete text of a field. I have an xml field. I am querying proximity searches on this field. xml: ( proximity1 AND/OR proximity2 AND/OR …) Results are returned successfully satisfying the proximity query. However when I request highlighting sometimes it returns nothing sometimes it returns missing proximity terms. I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml. maxFieldLength2147483647/maxFieldLength I am using these highlighting parameters: hl.maxAnalyzedChars=2147483647 hl.fragsize=2147483647 hl.usePhraseHighlighter=true hl.requireFieldMatch=true hl.fl=xml hl=true I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns but all query terms are highlighted. What value of hl.fragsize should I use to highlight complete text of a field? 0 or 2147483647? What is the highest value that I can set to hl.maxAnalyzedChars and hl.fragsize? I am querying same field and requesting same field in highlighting. Although a document matches a query no highlighting returns back. What could be the reason? If a document matches a query, there should be highlighting returning back, right? Any help or pointers are really appreciated.
Re: problems with PhraseHighlighter
Copy-paste your field definition for the field you are trying to highlight/search on. Cheers Avlesh Thank you for your interest Avlesh, My field type mostly contains custom filters and tokenizers. fieldType name=XMLText class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=XMLStripStandardTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms_index.txt ignoreCase=true expand=true / filter class=CustomStemFilterFactory protected=protwords.txt / filter class=LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=CustomTokenizerFactory / filter class=CustomDeasciifyFilterFactory / filter class=CustomStemFilterFactory protected=protwords.txt / filter class=LowerCaseFilterFactory / /analyzer /fieldType Firstly I tried to use solr.HTMLStripCharFilterFactory to strip xml tags, it works fine but when it comes to highlighting the em tags are replaced incorrect position. Same as solr.HTMLStripStandardTokenizerFactory. The em tags are inserted interestingly exactly one character before the actual term. So I added a new token definition to StandardTokenizer's jflex file, to recogize xml tags and ingores them. I confirmed that it is working with some testcases. It strips xml tags in tokenizer level. I am doing this because I am displaying original documents with xml + xslt. Therefore i need to highlight xml files to display. And I am using ComplexPhraseQueryParser [1]. But i reproduced the problem with defType=luceneq=term1 term2~5 I see that term1 and term2 is 5 terms close to each other . Therefore it is returned. But highlighting is empty. And there is no xml tags (stripped by tokenizer) between those terms in the original document. hl.maxanalyzedchars parameter is about original document, right? I mean in my case including xml tags too. [1] http://lucene.apache.org/java/2_9_0/api/contrib-misc/org/apache/lucene/queryParser/complexPhrase/package-summary.html
Question about DIH execution order
Hi folks, I have the following data-config.xml. Is there a way to let transformation take place after executing SQL select comment from Rating where Rating.CourseId = ${Course.CourseId}? In MySQL database, column CourseId in table Course is integer 1, 2, etc; template transformation will make them like Course:1, Course:2; column CourseId in table Rating is also integer 1, 2, etc. If transformation happens before executing select comment from Rating where Rating.CourseId = ${Course.CourseId}, then there will no match for the SQL statement execution. document entity name=Course transformer=TemplateTransformer query=select * from Course field column=CourseId template=Course:${Course.CourseId} name=id/ entity name=Rating query=select comment from Rating where Rating.CourseId = ${Course.CourseId} field column=comment name=review/ /entity /entity /document
RE: autocomplete
Hey Avlesh, Thanks for your reply. -Ankit -Original Message- From: Avlesh Singh [mailto:avl...@gmail.com] Sent: Saturday, October 31, 2009 10:08 PM To: solr-user@lucene.apache.org Subject: Re: autocomplete q=*:*fq=ac:*all*wt=jsonrows=15start=0indent=onomitHeader=truejson.wrf=?; Why is the json.wrf not specified? Without the callback function, the string that is return back is illegal javascript for the browser. You need to specify this parameter which is a wrapper or a callback function. If you specify json.wrf=foo, as soon as the browser gets a response, it would call a function named foo (needs to already defined). Inside foo you can have you own implementation to interpret and render this data. Cheers Avlesh On Sat, Oct 31, 2009 at 12:13 AM, Ankit Bhatnagar abhatna...@vantage.comwrote: Hi guys, Enterprise 1.4 Solr Book (AutoComplete) says this works - My query looks like - q=*:*fq=ac:*all*wt=jsonrows=15start=0indent=onomitHeader=truejson.wrf=?; And it returns three results { responseHeader:{ status:0, QTime:38, params:{ indent:on, start:0, q:*:*, wt:json, fq:ac:*all*, rows:15}}, response:{numFound:3,start:0,docs:[ { id:1, ac:Can you show me all the results}, { id:2, ac:Can you show all companies }, { id:3, ac:Can you list all companies}] }} But browser says syntax error -- Ankit
latest lucene libraries in maven repo
Hi, It seems the the latest lucene libraries are not up to date in the Solr maven repo (http://people.apache.org/repo/m2-snapshot-repository/org/apache/solr/solr-lucene-core/1.4-SNAPSHOT/) Can we expect them to be updated soon? Cheers, Uri
Programmatically configuring SLF4J for Solr 1.4?
So, I've spent a bit of the day banging my head against this, and can't get it sorted. I'm using a DirectSolrConnection embedded in a JRuby application, and everything works great, except I can't seem to get it to do anything except log to the console. I've tried pointing 'java.util.logging.config.file' to a properties file, as well as specifying a logfile as part of the constructor for DirectSolrConnection, but so far, nothing has really worked. What I'd like to do is programmatically direct the Solr logs to a logfile, so that I can have my app start up, parse its config, and throw the Solr logs where they need to go based on that. So, I don't suppose anybody has a code snippet (in Java) that sets up SLF4J for Solr logging (and that doesn't reference an external properties file)? Using the latest (1 Nov 2009) nightly build of Solr 1.4.0-dev
Re: Programmatically configuring SLF4J for Solr 1.4?
I'm sure it is possible to configure JDK logging (java.util.loging) programatically... but I have never had much luck with it. It is very easy to configure log4j programatically, and this works great with solr. To use log4j rather then JDK logging, simply add slf4j- log4j12-1.5.8.jar (from http://www.slf4j.org/download.html) to your classpath ryan On Nov 1, 2009, at 11:05 PM, Don Werve wrote: So, I've spent a bit of the day banging my head against this, and can't get it sorted. I'm using a DirectSolrConnection embedded in a JRuby application, and everything works great, except I can't seem to get it to do anything except log to the console. I've tried pointing 'java.util.logging.config.file' to a properties file, as well as specifying a logfile as part of the constructor for DirectSolrConnection, but so far, nothing has really worked. What I'd like to do is programmatically direct the Solr logs to a logfile, so that I can have my app start up, parse its config, and throw the Solr logs where they need to go based on that. So, I don't suppose anybody has a code snippet (in Java) that sets up SLF4J for Solr logging (and that doesn't reference an external properties file)? Using the latest (1 Nov 2009) nightly build of Solr 1.4.0-dev
Re: Question about DIH execution order
On Sun, Nov 1, 2009 at 11:59 PM, Bertie Shen bertie.s...@gmail.com wrote: Hi folks, I have the following data-config.xml. Is there a way to let transformation take place after executing SQL select comment from Rating where Rating.CourseId = ${Course.CourseId}? In MySQL database, column CourseId in table Course is integer 1, 2, etc; template transformation will make them like Course:1, Course:2; column CourseId in table Rating is also integer 1, 2, etc. If transformation happens before executing select comment from Rating where Rating.CourseId = ${Course.CourseId}, then there will no match for the SQL statement execution. document entity name=Course transformer=TemplateTransformer query=select * from Course field column=CourseId template=Course:${Course.CourseId} name=id/ entity name=Rating query=select comment from Rating where Rating.CourseId = ${Course.CourseId} field column=comment name=review/ /entity /entity /document keep the field as follows field column=TmpCourseId name=CourseId template=Course:${Course.CourseId} name=id/ -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: multiple sql queries for one index?
I don't particularly like the nested entities approach because from what I recall it will execute separate SQL queries for each top level record which, to me, doesn't seem very ideal for large scale indexing. I know it's a pain to do a ton of joins.. believe me our dataset has a boat load of joins too but I think it works out much better to have a GIANT SQL statement execute because you can do record level fetching and index faster (as opposed to waiting for the entire recordset to buffer and send to the client). Try using temporary tables in your connection to help reduce some of the data down or stored procedures if you have that control in your DB. Hope that helps! Amit On Thu, Oct 29, 2009 at 5:00 PM, Avlesh Singh avl...@gmail.com wrote: Read this example fully - http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example nested entities is an answer to your question. The example has a sample. Cheers Avlesh On Fri, Oct 30, 2009 at 2:58 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, Its been hurting my brain all day to try to build 1 query for my index (joins upon joins upon joins). Is there a way I can do multiple queries to populate the same index? I have one main table that I can join everything back via ID, it should be theoretically possible If this can be done, can someone point me to an example? thanks Joel
Re: Solr YUI autocomplete
I've used the YUI auto complete (albeit not with Solr which shouldn't matter here) and it should work with JSON. I did one that simply made XHR calls over to a method on my server which returned pipe delimited text which worked fine. Are you using the XHR Data source and if so, what type are you telling it to expect. One of the examples on the YUI site is text based and i'm sure you can specify TYPE_JSON or JS_ARRAY too. - Amit On Fri, Oct 30, 2009 at 7:04 AM, Ankit Bhatnagar abhatna...@vantage.comwrote: Does Solr supports JSONP (JSON with Padding) in the response? -Ankit -Original Message- From: Ankit Bhatnagar [mailto:abhatna...@vantage.com] Sent: Friday, October 30, 2009 10:27 AM To: 'solr-user@lucene.apache.org' Subject: Solr YUI autocomplete Hi Guys, I have question regarding - how to specify the I am using YUI autocomplete widget and it expects the JSONP response. http://localhost:8983/solr/select/?q=monitorversion=2.2start=0rows=10indent=onwt=jsonjson.wrf= I am not sure how should I specify the json.wrf=function Thanks Ankit
Re: Greater-than and less-than in data import SQL queries
A thought I had on this from a DIH design perspective. Would it be better to have the SQL queries stored in an element rather than an attribute so that you can wrap it in a CDATA block without having to mess up the look of query with lt, gt? Makes debugging easier (I know find and replace is trivial but it can be annoying when debugging SQL issues :-)). On Wed, Oct 28, 2009 at 5:15 PM, Lance Norskog goks...@gmail.com wrote: It is easier to put SQL select statements in a view, and just use that view from the DIH configuration file. On Tue, Oct 27, 2009 at 12:30 PM, Andrew Clegg andrew.cl...@gmail.com wrote: Heh, eventually I decided where 4 node_depth was the most pleasing (if slightly WTF-ish) way of writing it... Cheers, Andrew. Erik Hatcher-4 wrote: Use lt; instead of in that attribute. That should fix the issue. Remember, it's an XML file, so it has to obey XML encoding rules which make it ugly but whatcha gonna do? Erik On Oct 27, 2009, at 11:50 AM, Andrew Clegg wrote: Hi, If I have a DataImportHandler query with a greater-than sign in, like this: entity name=higher_node dataSource=database query=select *, title as keywords from cathnode_text where node_depth 4 Everything's fine. However, if it contains a less-than sign: entity name=higher_node dataSource=database query=select *, title as keywords from cathnode_text where node_depth 4 I get this exception: INFO: Processing configuration from solrconfig.xml: {config=dataconfig.xml} [Fatal Error] :240:129: The value of attribute query associated with an element type null must not contain the '' character. 27-Oct-2009 15:30:49 org.apache.solr.handler.dataimport.DataImportHandler inform SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context at org .apache .solr .handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:184) at org .apache .solr.handler.dataimport.DataImporter.init(DataImporter.java:101) at org .apache .solr .handler.dataimport.DataImportHandler.inform(DataImportHandler.java: 113) at org .apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java: 424) at org.apache.solr.core.SolrCore.init(SolrCore.java:588) at org.apache.solr.core.CoreContainer $Initializer.initialize(CoreContainer.java:137) at org .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 83) at org .apache .catalina .core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 275) at org .apache .catalina .core .ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 397) at org .apache .catalina .core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) at org .apache .catalina.core.StandardContext.filterStart(StandardContext.java:3709) at org.apache.catalina.core.StandardContext.start(StandardContext.java: 4356) at org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java: 1244) at org .apache .catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java: 604) at org .apache .catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java: 129) at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 290) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org .apache .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 233) at org .apache .catalina.core.StandardContextValve.invoke(StandardContextValve.java: 175) at org .apache .catalina .authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 568) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 109) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 286) at org .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 844) at org.apache.coyote.http11.Http11Protocol $Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java: 447) at
Re: Iso accents and wildcards
Tks for the explain now I can clearly understand why it doesn't work as I was expecting :) jfmel...@free.fr a écrit : if the request contains any wilcard then filters are not called : no ISOLatin1AccentFilterFactory and no SnowballPorterFilterFactory ! économie is indexed to econom solr don't found : - term starts with éco (éco*) - term starts with economi (economi*) if you index manger, mangé and mangue, the indexed terms will be mang and mangu requests - results manger - mange, mangé mangé- mange, mangé mang - mange, manger mangu- mangue mang*- manger, mangé, mangue mang?- mangue (and not mangé) mangé* - nothing Jean-François - Nicolas Leconte nicolas.ai...@aidel.com a écrit : | Hi all, | | I have a field that contains accentuated char in it, what I whant is | to | be able to search with ignore accents. | I have set up that field with : | analyzer | tokenizer class=solr.StandardTokenizerFactory/ | filter class=solr.StandardFilterFactory/ | filter class=solr.WordDelimiterFilterFactory generateWordParts=1 | | generateNumberParts=1 catenateWords=1 catenateNumbers=1 | catenateAll=0 splitOnCaseChange=1 / | filter class=solr.LowerCaseFilterFactory/ | filter class=solr.StopFilterFactory ignoreCase=true | words=stopwords.txt / | filter class=solr.SnowballPorterFilterFactory language=French/ | filter class=solr.LowerCaseFilterFactory/ | filter class=solr.ISOLatin1AccentFilterFactory/ | filter class=solr.RemoveDuplicatesTokenFilterFactory/ | /analyzer | | In the index the word économie is translated to econom, the | accent | is removed thanks to the ISOLatin1AccentFilterFactory and the end of | the | word removent thanks to the SnowballPorterFilterFactory. | | When I request with title:econ* I can have the correct answers, but | if | I request with title:écon* I have no answers. | If I request with title:économ (the exact word of the index) it works, | | so there might be something wrong with the wildcard. | As far as I can understand the analyser should be use exactly the same | | in both index and query time. | | I have tested with changing the order of the filters (putting the | ISOLatin1AccentFilterFactory on top) without any result. | | Could anybody help me with that and point me what may be wrong with my | | shema ?
Re: Iso accents and wildcards
Tks for the tips, I will try to do exactly what u suggest. Avlesh Singh a écrit : When I request with title:econ* I can have the correct answers, but if I request with title:écon* I have no answers. If I request with title:économ (the exact word of the index) it works, so there might be something wrong with the wildcard. As far as I can understand the analyser should be use exactly the same in both index and query time. Wildcard queries are not analyzed and hence the inconsistent behaviour. The easiest way out is to define one more field title_orginal as an untokenized field. While querying, you can use both the fields at the same time. e.g. q=(title:écon* title_orginal:écon*). In any case, you would get desired matches. Cheers Avlesh On Fri, Oct 30, 2009 at 9:19 PM, Nicolas Leconte nicolas.ai...@aidel.comwrote: Hi all, I have a field that contains accentuated char in it, what I whant is to be able to search with ignore accents. I have set up that field with : analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SnowballPorterFilterFactory language=French/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer In the index the word économie is translated to econom, the accent is removed thanks to the ISOLatin1AccentFilterFactory and the end of the word removent thanks to the SnowballPorterFilterFactory. When I request with title:econ* I can have the correct answers, but if I request with title:écon* I have no answers. If I request with title:économ (the exact word of the index) it works, so there might be something wrong with the wildcard. As far as I can understand the analyser should be use exactly the same in both index and query time. I have tested with changing the order of the filters (putting the ISOLatin1AccentFilterFactory on top) without any result. Could anybody help me with that and point me what may be wrong with my shema ?