Re: Order of words in proximity search
the key phrase was this one :) : A sloppy phrase query specifies a maximum slop, or the number of positions tokens need to be moved to get a match. so you could search for foo bar~101 in your example. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946620.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Order of words in proximity search
I would prefer to put a higher slop number instead of a boolean clause : 200 perhaps in your specific case. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946645.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: K-Stemmer for Solr 3.1
I don't know if it is allowed to modify Lucid code and add it to jira. If someone from Lucid would give me the permission and the Solr developers have nothing against it I won't mind adding the Lucid KStemmer to jira for Solr 3.x and 4.x. There are several Lucid KStemmer users which I can see from the many requests which I got. Also the Lucid KStemmer is faster than the standard KStemmer. Bernd Am 16.05.2011 06:33, schrieb Bill Bell: Did you upload the code to Jira? On 5/13/11 12:28 AM, Bernd Fehlingbernd.fehl...@uni-bielefeld.de wrote: I backported a Lucid KStemmer version from solr 4.0 which I found somewhere. Just changed from import org.apache.lucene.analysis.util.CharArraySet; // solr4.0 to import org.apache.lucene.analysis.CharArraySet; // solr3.1 Bernd Am 12.05.2011 16:32, schrieb Mark: java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z Would you mind explaining your modifications? Thanks On 5/11/11 11:14 PM, Bernd Fehling wrote: Am 12.05.2011 02:05, schrieb Mark: It appears that the older version of the Lucid Works KStemmer is incompatible with Solr 3.1. Has anyone been able to get this to work? If not, what are you using as an alternative? Thanks Lucid KStemmer works nice with Solr3.1 after some minor mods to KStemFilter.java and KStemFilterFactory.java. What problems do you have? Bernd -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *
Opening a file at a page where I encounter a hit
Hi, I am using ASP.Net MVC and solrnet for my search tool. The files I index include pdf files, word docs, excel etc... I am able to search and retrieve all the docs with a hit. Now the problem lies in opening the files with a hit. When I open the file, it should open at the location where the hit is encountered. How do i manage this? It will be even more helpful if I can highlight the hit inside the opened document? Please help me in this regard. Regards Vignesh
Re: Opening a file at a page where I encounter a hit
On Mon, May 16, 2011 at 12:00 PM, Vignesh Raj vignesh...@greatminds.co.in wrote: Hi, I am using ASP.Net MVC and solrnet for my search tool. The files I index include pdf files, word docs, excel etc... I am able to search and retrieve all the docs with a hit. Now the problem lies in opening the files with a hit. When I open the file, it should open at the location where the hit is encountered. How do i manage this? It will be even more helpful if I can highlight the hit inside the opened document? One way to display the document text is to also store it in Solr. There are two issues with this: * The Solr index will grow considerably. However, the performance limits are still acceptable to us, with a ~60GB index size. * You will probably lose formatting from the documents. One can manage to retain much of the original formatting by pre- processing the text to format it before indexing into Solr. However, this is not perfect. The other way is to retain in Solr a path to the original document that you can then serve from the filesystem: * How to do this depends on how you are indexing into Solr. * Highlighting query terms, and opening the document at the right place has to be done by external programs (note that one document can have multiple matches, so that there is no a priori right place to open the document). Regards, Gora
Re: Order of words in proximity search
Hi, The strange part is that i have actually tried a slop of 1000 (1K), and the results are still different. This even when the test data has a limiter of 10K for each sentence. (This means that a sloppy phrase should only give hits where the complete sentence is found, yet it is not the result...) Hope that explains the issue a bit better :) Regards Tor On Mon, May 16, 2011 at 8:08 AM, lboutros boutr...@gmail.com wrote: the key phrase was this one :) : A sloppy phrase query specifies a maximum slop, or the number of positions tokens need to be moved to get a match. so you could search for foo bar~101 in your example. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946620.html Sent from the Solr - User mailing list archive at Nabble.com. -- Mvh Tor Henning Ueland
Re: Order of words in proximity search
The analyzer of the field you are using could impact the Phrase Query Slop. Could you copy/paste the part of the schema ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946764.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Want to Delete Existing Index create fresh index
It is by default commented in solrconfig.xml On Sat, May 14, 2011 at 10:49 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: I guess you are having issues with the datadir. Did you set the datadir in solrconfig.xml? On Sat, May 14, 2011 at 4:10 PM, Pawan Darira pawan.dar...@gmail.com wrote: Hi I am using Solr 1.4. had changed schema already. When i created the index for first time, the directory was automatically created index made perfectly fine. Now, i want to create the index from scratch, so I deleted the whole data/index directory ran the script. Now it is only creating empty directories NO index files inside that. Thanks Pawan On Sat, May 14, 2011 at 6:54 PM, Dmitry Kan dmitry@gmail.com wrote: Hi Pawan, Which SOLR version do you have installed? It should be absolutely normal for the data/ sub directory to create when starting up SOLR. So just go ahead and post your data into SOLR, if you have changed the schema already. -- Regards, Dmitry Kan On Sat, May 14, 2011 at 4:01 PM, Pawan Darira pawan.dar...@gmail.com wrote: I did that. Index directory is created but not contents in that 2011/5/14 François Schiettecatte fschietteca...@gmail.com You can also shut down solr/lucene, do: rm -rf /YourIndexName/data/index and restart, the index directory will be automatically recreated. François On May 14, 2011, at 1:53 AM, Gabriele Kahlout wrote: curl --fail $solrIndex/update?commit=true -d 'deletequery*:*/query/delete' #empty index [1 http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script ] did u try? On Sat, May 14, 2011 at 7:26 AM, Pawan Darira pawan.dar...@gmail.com wrote: Hi I had an existing index created months back. now my database schema has changed. i wanted to delete the current data/index directory re-create the fresh index but it is saying that segments file not found just create blank data/index directory. Please help -- Thanks, Pawan Darira -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Thanks, Pawan Darira -- Thanks, Pawan Darira -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Thanks, Pawan Darira
Can't seem to get External Field scoring running
I want to be able to dynamically change scores without having to update the entire document. For this, I started using the External File Field. I set a fieldType called idRankFile and field called idRank in schema.xml : fieldType name=idRankFile keyField=id defVal=0 stored=false indexed=false class=solr.ExternalFileField valType=pfloat / field name=idRank type=idRankFile / Now I set the idRank for various id's in a file called external_idRank.txt in dataDir : F8V7067-APL-KIT = 1.0 IW-02 = 10.0 9885A004 = 100.0 Originally, the scores for these 3 id's (for my query) was in reverse order. Now, I query using the following : http://localhost:8983/solr/select?indent=onq=car%20power%20adaptorfl=id,name_val_:idRank However, the order for the results remains the same. It seems it hasn't taken the external field into account Any ideas how to do this? Is my query correct?
Re: UIMA analysisEngine path
Hello, if you want to take the descriptor from a jar, provided that you configured the jar inside a lib element in solrconfig, then you just need to write the correct classpath in the analysisEngine element. For example if your descriptor resides in com/something/desc/ path inside the jar then you should set the analysisEngine element as /com/something/desc/descriptorname.xml If you instead need to get the descriptor from filesystem try the patch in SOLR-2501 [1]. Hope this helps, Tommaso [1] : https://issues.apache.org/jira/browse/SOLR-2501 2011/5/13 chamara chama...@gmail.com Hi, Is this code line 57 needs to be changed to the location where the jar files(library files) resides? URL url = this.getClass().getResource(location of the jar files); I did change it but no luck so far. Let me know what i am doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Order of words in proximity search
http://pastebin.com/svyefmM6 Pretty standard :) /Tor On Mon, May 16, 2011 at 9:18 AM, lboutros boutr...@gmail.com wrote: The analyzer of the field you are using could impact the Phrase Query Slop. Could you copy/paste the part of the schema ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946764.html Sent from the Solr - User mailing list archive at Nabble.com. -- Mvh Tor Henning Ueland
[POLL] How do you (like to) do logging with Solr
Hi, This poll is to investigate how you currently do or would like to do logging with Solr when deploying solr.war to a SEPARATE java application server (such as Tomcat, Resin etc) outside of the bundled solr/example. For background on how things work in Solr now, see http://wiki.apache.org/solr/SolrLogging and for more info on the SLF4J framework, see http://www.slf4j.org/manual.html Please tick one of the options below with an [X]: [ ] I always use the JDK logging as bundled in solr.war, that's perfect [ ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [ ] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! Note that NOT bundling a logger binding with solr.war means defaulting to the NOP logger after outputting these lines to stderr: SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com
Re: [POLL] How do you (like to) do logging with Solr
[X] I always use the JDK logging as bundled in solr.war, that's perfect [ ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [ ] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool!
Re: [POLL] How do you (like to) do logging with Solr
[ ] I always use the JDK logging as bundled in solr.war, that's perfect [X ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [ ] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! Setting up log4j is easy but encountered issues with versions when switching to 3.1.
Re: why query chinese character with bracket become phrase query by default?
On Sun, May 15, 2011 at 7:44 PM, Mark Miller markrmil...@gmail.com wrote: Could you please revert your commit, until we've reached some consensus on this discussion first? Let's reach some consensus, but why revert? This has been the behavior - shouldn't the consensus onus be on changing it to begin with? That's how I see it. To be clear, I'm asking that Yonik revert his commit from yesterday (rev 1103444), where he added text_nwd fieldType and dynamic fields *_nwd to the example schema.xml. I agree we should reach consensus before changing what's already committed, that's exactly why I'm asking Yonik to revert -- we were in the middle of discussing this, and I had posted a patch on SOLR-2519, when he suddenly committed the text_nwd change, yesterday. Does anyone disagree that Yonik's commit was inappropriate? This is not how we work at Apache. I'm going to need to get back up to speed on this issue before I can comment more helpfully. Better out of the box support for other languages is important - I think it makes sense to discuss this issue again myself. +1 Solr, out of box, is just awful for non-whitespace languages (eg CJK, and others). And for every user who comes to the list asking for help (thank you cyang2010!), I imagine there are many others who simply gave up and walked away (from Solr) when they tried it on CJK content. Lucene has made awesome strides in having natural defaults that work well across many languages, thanks to the hard work of Robert and others (StandardAnalyzer now actually follows a standard (UAX #29 -- text segmentation), autophrase off in QP, etc.), and I think we should take advantage of this in Solr, just like ElasticSearch does. Really, the best solution (I think) would be to have language-specific fieldTypes (text_en, text_zh, etc.), but I suspect there's a good amount of work to reach that so in the meantime I think we should fix the defaults for the text fieldType to work well across many languages. Mike http://blog.mikemccandless.com
Re: [POLL] How do you (like to) do logging with Solr
On Mon, May 16, 2011 at 2:13 PM, Jan Høydahl jan@cominvent.com wrote: [...] Please tick one of the options below with an [X]: [ X] I always use the JDK logging as bundled in solr.war, that's perfect [ ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [ ] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! Regards, Gora
Re: [POLL] How do you (like to) do logging with Solr
[ ] I always use the JDK logging as bundled in solr.war, that's perfect [ ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [X] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! On 16 May 2011 11:32, Gora Mohanty g...@mimirtech.com wrote: On Mon, May 16, 2011 at 2:13 PM, Jan Høydahl jan@cominvent.com wrote: [...] Please tick one of the options below with an [X]: [ X] I always use the JDK logging as bundled in solr.war, that's perfect [ ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [ ] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! Regards, Gora -- Met vriendelijke groet, Martijn van Groningen
Re: [POLL] How do you (like to) do logging with Solr
Please tick one of the options below with an [X]: [ ] I always use the JDK logging as bundled in solr.war, that's perfect [X] I sometimes use log4j or another framework and am happy with re-packaging solr.war actually : not so happy because our operations team has to repackage it. But there is no option for [X] add the logger configuration to the server's classpath, no repackaging! [ ] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool!
Re: [POLL] How do you (like to) do logging with Solr
[X] I sometimes use log4j or another framework and am happy with re-packaging solr.war actually : not so happy because our operations team has to repackage it. But there is no option for [X] add the logger configuration to the server's classpath, no repackaging! That's what happens if we ship solr.war without any pre-set logger binding - it's the binding provided in your app-server's classpath which will be used. And now my vote: [ ] I always use the JDK logging as bundled in solr.war, that's perfect [ ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [X] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool!
LockObtainedFailedException on solr update
My solr index is updated simultaneously by multiple clients via REST. I use commitWithing attribute in the add/add command to direct auto commits. I start getting this error after a couple of days of usage. How do i fix this ? Please find the error log below. Using solr 3.1 with tomcat Thanks -- HTTP Status 500 - Lock obtain timed out: NativeFSLock@ /var/lib/solr/data/index/write.lock org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/lib/solr/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.lt;initgt;(IndexWriter.java:1097) at org.apache.solr.update.SolrIndexWriter.lt;initgt;(SolrIndexWriter.java:83) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) -- Regards, Nitesh Nandy
Re: Set Full-Import Clean=False
On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal jasneet.sabhar...@ngicorporation.com wrote: Hi Where do I set the default value of clean = false when a full-import is done. Append it to the URL, e.g., dataimport?command=full-importclean=false Regards, Gora
Re: Set Full-Import Clean=False
I have been doing that, but I want to set it as False by default, so that even if the admin forgets to set clean=false in the URL, it doesn't do it on its own. On 16-05-2011 17:38, Gora Mohanty wrote: On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal jasneet.sabhar...@ngicorporation.com wrote: Hi Where do I set the default value of clean = false when a full-import is done. Append it to the URL, e.g., dataimport?command=full-importclean=false Regards, Gora -- Regards Jasneet Sabharwal Software Developer NextGen Invent Corporation +91-9871228582
Solr Cell and operations on metadata extracted
Hi, I have a question about Solr Cell please. I index some files. For example, if I want to extract the filename, then use a hash function on it like MD5 and then store it on Solr ; the correct way is to use Tika « manually » to extract the metadata I want, do the transformations on it and then send it to Solr ? I can’t use directly Solr Cell in this case because I can't do modifications on the metadata extracted, right ? Thanks, Olivier
Re: Set Full-Import Clean=False
Jasneet, what about defining the value as a default in the dataimport request-handler? like the sample at http://wiki.apache.org/solr/SolrRequestHandler does? Regards Stefan On Mon, May 16, 2011 at 2:10 PM, Jasneet Sabharwal jasneet.sabhar...@ngicorporation.com wrote: I have been doing that, but I want to set it as False by default, so that even if the admin forgets to set clean=false in the URL, it doesn't do it on its own. On 16-05-2011 17:38, Gora Mohanty wrote: On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal jasneet.sabhar...@ngicorporation.com wrote: Hi Where do I set the default value of clean = false when a full-import is done. Append it to the URL, e.g., dataimport?command=full-importclean=false Regards, Gora -- Regards Jasneet Sabharwal Software Developer NextGen Invent Corporation +91-9871228582
Re: why query chinese character with bracket become phrase query by default?
On May 16, 2011, at 5:30 AM, Michael McCandless wrote: Does anyone disagree that Yonik's commit was inappropriate? This is not how we work at Apache. Ah - dunno yet - I obviously missed part of the conversation here. I thought you where talking about reversing 'autophrase off' as the default, not these 'quick' new field types. Excuse me for a moment while I read... Yeah - seems a little hasty. Not a fan of 'text_nwd' as a field name either. Didn't seem malicious to me, but it does seem we should probably work together in JIRA/discussion before just shotgunning changes... Don't know that I care if it's reverted (if we fall back another 10 steps into that BS I quit everything and I'm moving to South America), but we should push on here either way. - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org
Re: Set Full-Import Clean=False
Stefan, I have added the DIH request handler in the solrconfig.xml. Do I have to add the clean=false in that or somewhere else ? Regards Jasneet On 16-05-2011 18:03, Stefan Matheis wrote: Jasneet, what about defining the value as a default in the dataimport request-handler? like the sample at http://wiki.apache.org/solr/SolrRequestHandler does? Regards Stefan On Mon, May 16, 2011 at 2:10 PM, Jasneet Sabharwal jasneet.sabhar...@ngicorporation.com wrote: I have been doing that, but I want to set it as False by default, so that even if the admin forgets to set clean=false in the URL, it doesn't do it on its own. On 16-05-2011 17:38, Gora Mohanty wrote: On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal jasneet.sabhar...@ngicorporation.comwrote: Hi Where do I set the default value of clean = false when a full-import is done. Append it to the URL, e.g., dataimport?command=full-importclean=false Regards, Gora -- Regards Jasneet Sabharwal Software Developer NextGen Invent Corporation +91-9871228582 -- Regards Jasneet Sabharwal Software Developer NextGen Invent Corporation +91-9871228582
Re: Set Full-Import Clean=False
Jasneet On Mon, May 16, 2011 at 3:10 PM, Jasneet Sabharwal jasneet.sabhar...@ngicorporation.com wrote: I have added the DIH request handler in the solrconfig.xml. Exactly there :) Regards Stefan
Getting Null pointer exception While doing a full import
Hi, I am doing a full import in one of the cores. But I am getting Null poniter exception and the import is failing again and again. I also tried clearing the indexes and started the full import, but still indexing failed. The full import request is prefect and I verified it with other full import requests too. Any Suggestion/Solution will be of great help. Thanks in advance. The exception is as follows: May 14, 2011 5:06:56 AM org.apache.solr.core.SolrCore execute INFO: [core6] webapp=/solr path=/dataimport params={wt=javabinversion=1} status=0 QTime=0 May 14, 2011 9:03:55 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:33) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at org.apache.solr.search.QParser.getQuery(QParser.java:137) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:85) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Thanks Regards, Sivaganesh Email id: sivaganesh_sel...@infosys.com -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-Null-pointer-exception-While-doing-a-full-import-tp2947854p2947854.html Sent from the Solr - User mailing list archive at Nabble.com.
boolean versus non-boolean search
Dear list, Might have missed it from the literature and the list, sorry if so, but: SOLR 1.4.1 solrQueryParser defaultOperator=AND/ Consider the query: term1 term2 OR term1 term2 OR term1 term3 Problem: The query produces a hit containing only term1. Solution: Modified query, grouping with parenthesis (term1 term2) OR term1 term2 OR term1 term3 produces hits with both term1 and term2 present and other hits that are hit by OR'ed clauses. Problem 1. Another modified query, AND instead of parenthesis: term1 AND term2 OR term1 term2 OR term1 term3 produces same results as the original query and same debug output. Why is that? -- Regards, Dmitry Kan
Re: Set Full-Import Clean=False
Stefan requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/home/jasneet/apache-solr-3.1.0/example/solr/conf/data-config.xml/str str name=cleanfalse/str /lst /requestHandler Should it be like this ? On 16-05-2011 18:48, Stefan Matheis wrote: Jasneet On Mon, May 16, 2011 at 3:10 PM, Jasneet Sabharwal jasneet.sabhar...@ngicorporation.com wrote: I have added the DIH request handler in the solrconfig.xml. Exactly there :) Regards Stefan -- Regards Jasneet Sabharwal Software Developer NextGen Invent Corporation +91-9871228582
RE: SolrDispatchFilter
Yep that fixed my problem ...many thanks ! -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Friday, May 13, 2011 6:37 PM To: solr-user@lucene.apache.org Subject: RE: SolrDispatchFilter : This problem is only occurring when using IE8 ( Chrome FireFox fine ) if it only happens when using the form on the admin screen (and not when hitting the URL directly, via shift-reload for example), it may just be a differnet manifestation of this silly javascript bug... https://issues.apache.org/jira/browse/SOLR-2455 -Hoss
Re: why query chinese character with bracket become phrase query by default?
On Sun, May 15, 2011 at 1:48 PM, Michael McCandless luc...@mikemccandless.com wrote: Could you please revert your commit, until we've reached some consensus on this discussion first? Huh? I thought everyone was in agreement that we needed more field types for different languages? I added my best guess about what a generic type for non-whitespace-delimited might look like. Since it's a new field type, it doesn't affect anything. Hopefully it only improves the situation for someone trying to use one of these languages. The only negative would seem to be if it's worse than nothing (i.e. a very bad example because it actually doesn't work for non-whitespace-delimited languages). The issue about changing defaults on TextField and changing what text does in the example schema by default is not dependent on this. They are only related by the fact that if another field is added/changed then _nwd may become redundant and can be removed. For now, it only seems like an improvement? Anyway... the whole language of revert seems unnecessarily confrontational. Feel free to improve what's there (or delete *_nwd if people really feel it adds no/negative value) -Yonik
How to index and query C# as whole term?
Hi, I'm using Apache Solr v3.1. How do I configure/allow Solr to both index and query the term c# as a whole word/term? From Analysis page, I could see that the term c# is being reduced/converted into just c by solr.WordDelimiterFilterFactory. Regards, Gnanam
Re: [POLL] How do you (like to) do logging with Solr
[ ] I always use the JDK logging as bundled in solr.war, that's perfect [x] I sometimes use log4j or another framework and am happy with re-packaging solr.war [ ] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! Péter
Re: Set Full-Import Clean=False
On Mon, May 16, 2011 at 3:27 PM, Jasneet Sabharwal jasneet.sabhar...@ngicorporation.com wrote: Should it be like this ? Never tried it myself, but what i guess from the Wiki ... Yes. doesn't work for you, or just asked to be sure, before integrating it?
Re: why query chinese character with bracket become phrase query by default?
On Mon, May 16, 2011 at 5:30 AM, Michael McCandless luc...@mikemccandless.com wrote: To be clear, I'm asking that Yonik revert his commit from yesterday (rev 1103444), where he added text_nwd fieldType and dynamic fields *_nwd to the example schema.xml. So... your position is that until the text fieldType is changed to support non-whitespace-delimited languages better, that no other fieldType should be changed/added to better support non-whitespace-delimited languages? Man, that seems political, not technical. Whatever... I'll revert. -Yonik
Re: why query chinese character with bracket become phrase query by default?
On Mon, May 16, 2011 at 3:51 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, May 16, 2011 at 5:30 AM, Michael McCandless luc...@mikemccandless.com wrote: To be clear, I'm asking that Yonik revert his commit from yesterday (rev 1103444), where he added text_nwd fieldType and dynamic fields *_nwd to the example schema.xml. So... your position is that until the text fieldType is changed to support non-whitespace-delimited languages better, that no other fieldType should be changed/added to better support non-whitespace-delimited languages? Man, that seems political, not technical. To me it seems neither nor. Its rather the process of improving aligned with outstanding issues. It shouldn't feel wrong. Simon Whatever... I'll revert. -Yonik
Re: why query chinese character with bracket become phrase query by default?
On Mon, May 16, 2011 at 9:51 AM, Yonik Seeley yo...@lucidimagination.com wrote: To be clear, I'm asking that Yonik revert his commit from yesterday (rev 1103444), where he added text_nwd fieldType and dynamic fields *_nwd to the example schema.xml. So... your position is that until the text fieldType is changed to support non-whitespace-delimited languages better, that no other fieldType should be changed/added to better support non-whitespace-delimited languages? No, that's not my position at all. My position is: please don't suddenly commit changes, with your way, while we're still discussing how to solve the issue. That's not the Apache way. This applies in general, not just this case (fixing Solr's out-of-the-box behavior with non-whitespace languages). So, it could very well be, after we iterate on SOLR-2519, that we all agree your baby step is great, in which case let's go forward with that. But we should all come to some consensus about that before you suddenly commit. Man, that seems political, not technical. I'm sorry you feel that way, but it's important to me that we all follow the Apache way here. I feel this will only make our community stronger. It's also important that any time another committer is uncomfortable with what just got committed, and asks for a revert, that it *not* be a big deal. It's not political, it was just a mistake and the revert is quick and painless. We are commit-then-review here, and if someone is uncomfortable, they should say so and whoever committed should simply revert it and re-iterate. This should be a simple free tool for all of us to use. Whatever... I'll revert. Thank you. Mike
Re: Show filename in search result using a FileListEntityProcessor
Hi, thanks for the reply. I tried a couple of things both in the tika-test entity and in the entity named 'f'. In the tika-test entity I tried: field column=fileName name=${f.fileName} / field column=fileName name=${f.file} / even field column=fileName name=${f.fileAbsolutePath} / I also tried doing things in the entity 'f' like: field column=fileName name=fileName/ field column=fileName name=file/ None of it works. I also added fileName to the schema like: field name=fileName type=string indexed=true stored=true / In fields. Doesn't help. Can anyone provide me with a working example? I'm pretty stuck here on something that seems really trivial and simple :-( On Sat, May 14, 2011 at 22:56, kbootz kbo...@caci.com wrote: There is a JIRA item(can't recall it atm) that addresses the issue with the docs. I'm running 3.1 and per your example you should be able to get it using ${f.file}. I think* it should also be in the entity desc. but I'm also new and that's just how I access it. GL -- View this message in context: http://lucene.472066.n3.nabble.com/Show-filename-in-search-result-using-a-FileListEntityProcessor-tp2939193p2941305.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: why query chinese character with bracket become phrase query by default?
On Mon, May 16, 2011 at 10:06 AM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, May 16, 2011 at 9:51 AM, Yonik Seeley yo...@lucidimagination.com wrote: To be clear, I'm asking that Yonik revert his commit from yesterday (rev 1103444), where he added text_nwd fieldType and dynamic fields *_nwd to the example schema.xml. So... your position is that until the text fieldType is changed to support non-whitespace-delimited languages better, that no other fieldType should be changed/added to better support non-whitespace-delimited languages? No, that's not my position at all. My position is: please don't suddenly commit changes, with your way, while we're still discussing how to solve the issue. That's not the Apache way. Dude... everyone has always agreed we need more fieldtypes to support different languages (as you did earlier in this thread too). There's been a history of just adding stuff like that (half of the commits to the example schema have no associated JIRA issue). What happens to the default text field will have no bearing on that. We will still need more field types to support more languages. Would you be against me adding a text_cjk fieldtype too? My position: it's silly for a lack of consensus on the text field to block progesss on any other fieldtype. -Yonik
Re: How to index and query C# as whole term?
I don't think you'd want to use the string type here. String type is almost never appropriate for a field you want to actually search on (it is appropriate for fields to facet on). But you may want to use Text type with different analyzers selected. You probably want Text type so the value is still split into different tokens on word boundaries; you just don't want an analyzer set that removes punctuation. On 5/16/2011 10:46 AM, Gora Mohanty wrote: On Mon, May 16, 2011 at 7:05 PM, Gnanakumargna...@zoniac.com wrote: Hi, I'm using Apache Solr v3.1. How do I configure/allow Solr to both index and query the term c# as a whole word/term? From Analysis page, I could see that the term c# is being reduced/converted into just c by solr.WordDelimiterFilterFactory. [...] Yes, as you have discovered the analyzers for the field type in question will affect the values indexed. To index c# exactly as is, you can use the string type, instead of the text type. However, what you probably want some filters to be applied, e.g., LowerCaseFilterFactory. Take a look at the definition of the fieldType text in schema.xml, define a new field type that has only the tokenizers and analyzers that you need, and use that type for your field. This Wiki page should be helpful: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Regards, Gora
Re: boolean versus non-boolean search
Why? Becuase of how the solr/lucene query parser parses? It parses into seperate tokens/phrases, and then marks each unit as mandatory or optional. The operator's joining the tokens/phrases are used to determine if a unit is mandatory or optional. Since your defaultOperator=AND term1 term2 OR X is the same as: term1 AND term2 OR X because it used the defaultOperator in between term1 and term2, since no explicit operator was provided. Then we get to the one you specifically did add the AND in. I guess that it basically groups left-to-right. So: term1 AND term2 OR X OR Y is the same as: term1 AND (term2 OR (X OR Y)) But I guess you already figured this all out, yeah? On 5/16/2011 9:24 AM, Dmitry Kan wrote: Dear list, Might have missed it from the literature and the list, sorry if so, but: SOLR 1.4.1 solrQueryParser defaultOperator=AND/ Consider the query: term1 term2 OR term1 term2 OR term1 term3 Problem: The query produces a hit containing only term1. Solution: Modified query, grouping with parenthesis (term1 term2) OR term1 term2 OR term1 term3 produces hits with both term1 and term2 present and other hits that are hit by OR'ed clauses. Problem 1. Another modified query, AND instead of parenthesis: term1 AND term2 OR term1 term2 OR term1 term3 produces same results as the original query and same debug output. Why is that?
RE: How to index and query C# as whole term?
I have always just converted terms like 'C#' or 'C++' into 'csharp' and 'cplusplus' before indexing them and similarly converted those terms if someone searched on them. That always has worked just fine for me... :) -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Monday, May 16, 2011 8:28 AM To: solr-user@lucene.apache.org Subject: Re: How to index and query C# as whole term? I don't think you'd want to use the string type here. String type is almost never appropriate for a field you want to actually search on (it is appropriate for fields to facet on). But you may want to use Text type with different analyzers selected. You probably want Text type so the value is still split into different tokens on word boundaries; you just don't want an analyzer set that removes punctuation. On 5/16/2011 10:46 AM, Gora Mohanty wrote: On Mon, May 16, 2011 at 7:05 PM, Gnanakumargna...@zoniac.com wrote: Hi, I'm using Apache Solr v3.1. How do I configure/allow Solr to both index and query the term c# as a whole word/term? From Analysis page, I could see that the term c# is being reduced/converted into just c by solr.WordDelimiterFilterFactory. [...] Yes, as you have discovered the analyzers for the field type in question will affect the values indexed. To index c# exactly as is, you can use the string type, instead of the text type. However, what you probably want some filters to be applied, e.g., LowerCaseFilterFactory. Take a look at the definition of the fieldType text in schema.xml, define a new field type that has only the tokenizers and analyzers that you need, and use that type for your field. This Wiki page should be helpful: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Regards, Gora
Re: How to index and query C# as whole term?
Before indexing so outside Solr? Using the SynonymFilter would be easier i guess. On Monday 16 May 2011 17:44:24 Robert Petersen wrote: I have always just converted terms like 'C#' or 'C++' into 'csharp' and 'cplusplus' before indexing them and similarly converted those terms if someone searched on them. That always has worked just fine for me... :) -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Monday, May 16, 2011 8:28 AM To: solr-user@lucene.apache.org Subject: Re: How to index and query C# as whole term? I don't think you'd want to use the string type here. String type is almost never appropriate for a field you want to actually search on (it is appropriate for fields to facet on). But you may want to use Text type with different analyzers selected. You probably want Text type so the value is still split into different tokens on word boundaries; you just don't want an analyzer set that removes punctuation. On 5/16/2011 10:46 AM, Gora Mohanty wrote: On Mon, May 16, 2011 at 7:05 PM, Gnanakumargna...@zoniac.com wrote: Hi, I'm using Apache Solr v3.1. How do I configure/allow Solr to both index and query the term c# as a whole word/term? From Analysis page, I could see that the term c# is being reduced/converted into just c by solr.WordDelimiterFilterFactory. [...] Yes, as you have discovered the analyzers for the field type in question will affect the values indexed. To index c# exactly as is, you can use the string type, instead of the text type. However, what you probably want some filters to be applied, e.g., LowerCaseFilterFactory. Take a look at the definition of the fieldType text in schema.xml, define a new field type that has only the tokenizers and analyzers that you need, and use that type for your field. This Wiki page should be helpful: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Regards, Gora -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: boolean versus non-boolean search
Hi Jonathan, Well, I clearly understand, why 'term1 term2 OR ...' gives exactly same results as 'term1 AND term2 OR ...', but what I do not get is, why grouping with parentheses is required to have both term1 and term2 in the same hit even though AND is the default operator and space between terms is expected to be treated as AND. Dmitry On Mon, May 16, 2011 at 6:33 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Why? Becuase of how the solr/lucene query parser parses? It parses into seperate tokens/phrases, and then marks each unit as mandatory or optional. The operator's joining the tokens/phrases are used to determine if a unit is mandatory or optional. Since your defaultOperator=AND term1 term2 OR X is the same as: term1 AND term2 OR X because it used the defaultOperator in between term1 and term2, since no explicit operator was provided. Then we get to the one you specifically did add the AND in. I guess that it basically groups left-to-right. So: term1 AND term2 OR X OR Y is the same as: term1 AND (term2 OR (X OR Y)) But I guess you already figured this all out, yeah? On 5/16/2011 9:24 AM, Dmitry Kan wrote: Dear list, Might have missed it from the literature and the list, sorry if so, but: SOLR 1.4.1 solrQueryParser defaultOperator=AND/ Consider the query: term1 term2 OR term1 term2 OR term1 term3 Problem: The query produces a hit containing only term1. Solution: Modified query, grouping with parenthesis (term1 term2) OR term1 term2 OR term1 term3 produces hits with both term1 and term2 present and other hits that are hit by OR'ed clauses. Problem 1. Another modified query, AND instead of parenthesis: term1 AND term2 OR term1 term2 OR term1 term3 produces same results as the original query and same debug output. Why is that? -- Regards, Dmitry Kan
Re: document storage
On 05/15/2011 11:48 AM, Erick Erickson wrote: Where are the documents coming from? Because storing them ONLY in Solr risks losing them if your index is somehow hosed. In our case, we generally have source documents and can reproduce the index if need be, but that's a good point. Storing them externally only has the advantage that your index will be much smaller, which helps when replicating as you scale. The downside here is that highlighting will be more resource-intensive since you're re-analyzing text in order to highlight. I had been imagining that the Highlighter could use stored term positions so as to avoid re-analysis. Is this incompatible with external storage? We might conceivably need to replicate the documents anyway, even if they are stored externally, in order to make them available to a farm of servers, although a SAN is another possibility here. My main concern about storing internally was the cost of merging (optimizing) the index. Presumably that would be increased if the docs are stored in it. So, as usual, it depends (tm). What is the scale you need? What is the QPS you're thinking of supporting? Things are working well at a small scale, and in that environment I think all of these solutions work more or less equally well. We're worrying about 10's of millions of documents and QPS around 50, so I expect we will have some significant challenges in coordinating a cluster of servers, and we're trying to plan as well as we can for that. We expect updates to be performed in a batch mode - they don't have to be real-time, but they might need to be daily. -Mike
Problem with custom Similarity class
Hi, I'm new to Solr and I'm trying to use my custom Similarity class but I've not succeeded on that. I added some debug information and my class is loaded, but it is not used when queries are made. Does someone could help me? If any further information is relevant, I can provide it. Thanks in advance -- Alex Bredariol Grilo Developer - umamao.com
Re: why query chinese character with bracket become phrase query by default?
On Mon, May 16, 2011 at 10:22 AM, Yonik Seeley yo...@lucidimagination.com wrote: My position is: please don't suddenly commit changes, with your way, while we're still discussing how to solve the issue. That's not the Apache way. Dude... everyone has always agreed we need more fieldtypes to support different languages (as you did earlier in this thread too). +1, and I still agree that'd be best. In that ideal future we would have no more text fieldType, only text_zh, text_en, etc. There's been a history of just adding stuff like that (half of the commits to the example schema have no associated JIRA issue). I wasn't objecting to the lack of a referenced JIRA issue; I was objecting to you suddenly committing 'your way while we were still discussing what to do. What happens to the default text field will have no bearing on that. That's not really true? I think any changes we make to any default text* fieldTypes are strongly related. For example, if we fix the text fieldType to have good all-around defaults for all languages (ie, the patch on SOLR-2519) then we don't need separate text_nwd/*_nwd field types. Instead, maybe we could add text_autophrase fieldTypes? Or maybe text_en_autophrase? We will still need more field types to support more languages. Right. Would you be against me adding a text_cjk fieldtype too? text_cjk would be *awesome*, but text_zh, text_ja, text_ko would be even better! If we fix text fieldType to be generic for all languages (use StandardAnalyzer, turn off autophrase), but then go and add in specific languages over time (say text_en, text_cjk, etc.), I think that's a great way to iterate towards the ideal future where we have text_XX coverage for many languages. My position: it's silly for a lack of consensus on the text field to block progesss on any other fieldtype. I disagree; I think changes to text fieldType are very much tied up to what other text_* fieldTypes we want to introduce. This is a *really* important configuration file in Solr and we should present good defaults with it. People who first use Solr start with the schema.xml as their starting point. People who first start with ElasticSearch today get StandardAnalyzer and no autophrase as the default, which is the best overall default Lucene has to offer right now. I think Solr should do the same. So to sum up, I think we should: 1) Fix text fieldType to stop destroying non-whitespace languages, and use the best general defaults we have to offer today (switch from WhitespaceTokenizer - StandardTokenizer, and turn off autophrase); this is the patch on SOLR-2519. 2) Add in text_XX specific language field types for as many as we can now, iterating over time to add more as we can / people get the itch. We now have a fabulous analysis module (thank you Robert!), so we should take advantage of that and at least make text_XX for all the matching analyzers in there. Let's continue this on the issue... Mike http://blog.mikemccandless.com
Re: assit with the Clustering component in Solr/Lucene
Both of the clustering algorithms that ship with Solr (Lingo and STC) are designed to allow one document to appear in more than one cluster, which actually does make sense in many scenarios. There's no easy way to force them to produce hard clusterings because this would require a complete change in the way the algorithms work. If you need each document to belong to exactly one cluster, you'd have to post-process the clusters to remove the redundant document assignments. On the second thought, I have a simple implementation of k-means clustering that could do hard clustering for you. It's not available yet, it will most probably be part of the next major release of Carrot2 (the package that does the clustering). Please watch this issue http://issues.carrot2.org/browse/CARROT-791 to get updates on this. Just to let you know: Carrot2 3.5.0 has landed in Solr trunk and branch_3x, so you can use the bisecting k-means clustering algorithm (org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm) which will produce non-overlapping clusters for you. The downside of this simple implementation of k-means is that, for the time being, it produces one-word cluster labels rather than phrases as Lingo and STC. Cheers, S.
Re: Debugging same SOLR installation on 2 different servers
Thanks Erick ! As I re-checked the configuration files, it turns out someone had modified the /solr/conf/*stopwords.txt* on the production server, and now we know what problem we're dealing with, which seems to be related to: - http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html#a493488 - http://stackoverflow.com/questions/3635096/dismax-feat-stopwords-synonyms-etc Now I've tried to get around that issue by changing str name=mm2lt;-35%/str to str name=mm1/str in *solrconfig.xml*, as suggested on http://drupal.org/node/1102646#comment-4249774 which actually gets us results for the incriminated queries, but it adds way too much *noise*... So I tried to make sure all my field types were using our StopFilterFactory (even fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true), with no luck. I'll keep on looking for clues, meanwhile if there's a known way around that issue, I'd be really grateful to hear about it :) Cheers ! Paul Le 15/05/2011 16:48, Erick Erickson a écrit : What happens if you copy the index from one machine to the other? Probably from prod to test. If your results stay the same, that'd eliminate index differences as the culprit. What do you get by attachingdebugQuery=on the the queries that differ? Is the parsed query any different? I'm wondering here if you somehow have a difference in the configuration, perhaps dismax? anyway, if the parsed queries are identical, that eliminates that possibility. Next, what about synonym files? Stopwords? Are you absolutely sure they're identical? If you're using dismax, is it possible that the mm (minimum should match) is different? Perhaps this is all stuff you've done already, but this would at least narrow down where the problem might lie... Best Erick On Wed, May 11, 2011 at 12:10 PM, Paul Michaletp...@pix-l.fr wrote: Thanks for the hint :) We ruled that out after having tested special characters, and if it was an applicative bug, it wouldn't work consistently like it currently does for the majority of queries. The only difference we noticed was in the HTTP headers in the SOLR response: occasionnally, the Content-length is present, but I've been told it was probably not causing our bug: = dev: headers = Array ( [0] = HTTP/1.1 200 OK [1] = Last-Modified: Fri, 29 Apr 2011 13:36:21 GMT [2] = ETag: MTFjZjU2MTgxNDgwMDAwMFNvbHI= [3] = Content-Type: text/plain; charset=utf-8 [4] = Server: Jetty(6.1.3) ) = production: headers = Array ( [0] = HTTP/1.1 200 OK [1] = Last-Modified: Fri, 06 May 2011 14:18:36 GMT [2] = ETag: OGI3ZWYyZDUxNDgwMDAwMFNvbHI= [3] = Content-Type: text/plain; charset=utf-8 [4] = Content-Length: 2558 [5] = Server: Jetty(6.1.3) ) Paul Michalet Le 11/05/2011 17:47, Paul Libbrecht a écrit : Could it be something in the transmission of the query? Or is it also identical? paul Le 11 mai 2011 à 17:19, Paul Michalet a écrit : Hello everyone We have succesfully installed SOLR on 2 servers (developpement and production), using the same configuration files and paths. Both SOLR instances have indexed the same contents and most queries give identical results, but there's a few exceptions where the production instance returns 0 results (the developpement instance returns perfectly valid results for the same query). We checked the logs in both environments without finding anything suspicous (the queries are rigorously identical, and the index is built in the exact same way) and we've run out of options as to where to look for debugging these cases. Our developpement server is Debian and the production is CentOS; the SOLR version installed in both environments is 1.4.0. The weird thing is that the few queries failing in the production instance contain very common terms (without quotes) which, when queried individually, return valid results... Any pointers would be greatly appreciated; thanks in advance ! Paul
Re: boolean versus non-boolean search
On 05/16/2011 09:24 AM, Dmitry Kan wrote: Dear list, Might have missed it from the literature and the list, sorry if so, but: SOLR 1.4.1 solrQueryParser defaultOperator=AND/ Consider the query: term1 term2 OR term1 term2 OR term1 term3 I think what's happening is that your query gets rewritten into something like: +term1 + (term2? term1 term2? term3?) where in my notation term? means term is optional, and + means required. So any document would match the second clause -Mike
Re: assit with the Clustering component in Solr/Lucene
Thanks much Stan, Ramdev On May 16, 2011, at 11:38 AM, Stanislaw Osinski wrote: Both of the clustering algorithms that ship with Solr (Lingo and STC) are designed to allow one document to appear in more than one cluster, which actually does make sense in many scenarios. There's no easy way to force them to produce hard clusterings because this would require a complete change in the way the algorithms work. If you need each document to belong to exactly one cluster, you'd have to post-process the clusters to remove the redundant document assignments. On the second thought, I have a simple implementation of k-means clustering that could do hard clustering for you. It's not available yet, it will most probably be part of the next major release of Carrot2 (the package that does the clustering). Please watch this issue http://issues.carrot2.org/browse/CARROT-791 to get updates on this. Just to let you know: Carrot2 3.5.0 has landed in Solr trunk and branch_3x, so you can use the bisecting k-means clustering algorithm (org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm) which will produce non-overlapping clusters for you. The downside of this simple implementation of k-means is that, for the time being, it produces one-word cluster labels rather than phrases as Lingo and STC. Cheers, S.
Re: [POLL] How do you (like to) do logging with Solr
We use log4j explicitly and find it irritating to deal with the built-in JDK logging default. We also have conflicts with other packages that have their own ideas about how to bind slf4j, so the less of this the better, IMO. The 1.6.1 no-op default behavior seems a bit unfortunate as out-of-the-box behavior to me though. Not sure if there's anything to be done about that. Can you log to stderr when there's no logger available? -Mike On 05/16/2011 04:43 AM, Jan Høydahl wrote: Hi, This poll is to investigate how you currently do or would like to do logging with Solr when deploying solr.war to a SEPARATE java application server (such as Tomcat, Resin etc) outside of the bundled solr/example. For background on how things work in Solr now, see http://wiki.apache.org/solr/SolrLogging and for more info on the SLF4J framework, see http://www.slf4j.org/manual.html Please tick one of the options below with an [X]: [ ] I always use the JDK logging as bundled in solr.war, that's perfect [ ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [X] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! Note that NOT bundling a logger binding with solr.war means defaulting to the NOP logger after outputting these lines to stderr: SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com
Re: UIMA analysisEngine path
Hi Tommaso, Thanks for the quick reply. I had copied the lib files and followed instructions on http://wiki.apache.org/solr/SolrUIMA#Installation. However i get this error. The AnalysisEngine has the default class path which is /org/apache/uima/desc/. SEVERE: org.apache.solr.common.SolrException: Error Instantiating UpdateRequestP rocessorFactory, org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactor y is not a org.apache.solr.update.processor.UpdateRequestProcessorFactory Regards, Chamara On Mon, May 16, 2011 at 9:17 AM, Tommaso Teofili [via Lucene] ml-node+2946920-843126873-399...@n3.nabble.com wrote: Hello, if you want to take the descriptor from a jar, provided that you configured the jar inside a lib element in solrconfig, then you just need to write the correct classpath in the analysisEngine element. For example if your descriptor resides in com/something/desc/ path inside the jar then you should set the analysisEngine element as /com/something/desc/descriptorname.xml If you instead need to get the descriptor from filesystem try the patch in SOLR-2501 [1]. Hope this helps, Tommaso [1] : https://issues.apache.org/jira/browse/SOLR-2501 2011/5/13 chamara [hidden email]http://user/SendEmail.jtp?type=nodenode=2946920i=0 Hi, Is this code line 57 needs to be changed to the location where the jar files(library files) resides? URL url = this.getClass().getResource(location of the jar files); I did change it but no luck so far. Let me know what i am doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2946920.html To unsubscribe from UIMA analysisEngine path, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2895284code=Y2hhbWFyYXdAZ21haWwuY29tfDI4OTUyODR8MjY5ODM2NTMx. -- --- Chamara -- View this message in context: http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2948760.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with custom Similarity class
On Mon, May 16, 2011 at 10:04 PM, Alex Grilo a...@umamao.com wrote: Hi, I'm new to Solr and I'm trying to use my custom Similarity class but I've not succeeded on that. I added some debug information and my class is loaded, but it is not used when queries are made. Does someone could help me? If any further information is relevant, I can provide it. [...] Have you overriden the default similarity class in schema.xml? Though, if your class is getting loaded, that should be the case. The code for the class should be pretty small, right? Please post it here, or better yet at pastebin.com, and send a link to this list. Regards, Gora
Re: solr velocity.log setting
I solved the problem of velocity.log following this tutorial: http://kris-itproblems.blogspot.com/2010/11/velocitylog-permission-denied.html On Thu, May 12, 2011 at 6:36 PM, Yuhan Zhang yzh...@onescreen.com wrote: hi all, I'm new to solr, and trying to install it on tomcat. however, an exception was reached when the page http://localhost/sorl/browse was visited: *FileNotFoundException: velocity.log (Permission denied) * looks like solr is trying to create a velocity.log file to tomcat root. so, how should I set the configuration file on solr to change the location that velocity.log is logging to? Thank you. Y
Re: UIMA analysisEngine path
The error you pasted doesn't seem to be related to a (class)path issue but more likely to be related to a Solr instance at 1.4.1/3.1.0 and Solr-UIMA module at 3.1.0/4.0-SNAPSHOT(trunk); it seems that the error raises from UpdateRequestProcessorFactory API changed. Hope this helps, Tommaso Il giorno 16/mag/2011, alle ore 18.54, chamara ha scritto: Hi Tommaso, Thanks for the quick reply. I had copied the lib files and followed instructions on http://wiki.apache.org/solr/SolrUIMA#Installation. However i get this error. The AnalysisEngine has the default class path which is /org/apache/uima/desc/. SEVERE: org.apache.solr.common.SolrException: Error Instantiating UpdateRequestP rocessorFactory, org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactor y is not a org.apache.solr.update.processor.UpdateRequestProcessorFactory Regards, Chamara On Mon, May 16, 2011 at 9:17 AM, Tommaso Teofili [via Lucene] ml-node+2946920-843126873-399...@n3.nabble.com wrote: Hello, if you want to take the descriptor from a jar, provided that you configured the jar inside a lib element in solrconfig, then you just need to write the correct classpath in the analysisEngine element. For example if your descriptor resides in com/something/desc/ path inside the jar then you should set the analysisEngine element as /com/something/desc/descriptorname.xml If you instead need to get the descriptor from filesystem try the patch in SOLR-2501 [1]. Hope this helps, Tommaso [1] : https://issues.apache.org/jira/browse/SOLR-2501 2011/5/13 chamara [hidden email]http://user/SendEmail.jtp?type=nodenode=2946920i=0 Hi, Is this code line 57 needs to be changed to the location where the jar files(library files) resides? URL url = this.getClass().getResource(location of the jar files); I did change it but no luck so far. Let me know what i am doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2946920.html To unsubscribe from UIMA analysisEngine path, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2895284code=Y2hhbWFyYXdAZ21haWwuY29tfDI4OTUyODR8MjY5ODM2NTMx. -- --- Chamara -- View this message in context: http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2948760.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem indexing CSV
I’m pretty new to Solr and I have a question about indexing data using CSV. I have a Blacklight-application running on my Mac 10.6.7 and I configured the schema.xml and solrconfig.xml in the separate Apache-Solr-directory according to the guidelines on the Blacklight-website. I have added the RequestHandler to solrconfig.xml as well. But when I try to index the exemplary document books.csv (with Solr and the Blacklight script running in the background), I get an error saying that it came across an undefined field, cat. I assume it’s not just cat that isn’t recognised as a field. What should I do to make the indexing via CSV possible, both for the exemplary document as for further documents to follow? Kind regards
Re: Problem with custom Similarity class
The code is here: http://pastebin.com/50ugqRfA http://pastebin.com/50ugqRfAand my schema.xml configuration entry for similarity is: similarity class=com.umamao.solr.ShortFieldNormSimilarity/ Thanks Alex On Mon, May 16, 2011 at 2:01 PM, Gora Mohanty g...@mimirtech.com wrote: On Mon, May 16, 2011 at 10:04 PM, Alex Grilo a...@umamao.com wrote: Hi, I'm new to Solr and I'm trying to use my custom Similarity class but I've not succeeded on that. I added some debug information and my class is loaded, but it is not used when queries are made. Does someone could help me? If any further information is relevant, I can provide it. [...] Have you overriden the default similarity class in schema.xml? Though, if your class is getting loaded, that should be the case. The code for the class should be pretty small, right? Please post it here, or better yet at pastebin.com, and send a link to this list. Regards, Gora
Re: why query chinese character with bracket become phrase query by default?
: Does anyone disagree that Yonik's commit was inappropriate? This is : not how we work at Apache. FWIW: I don't see how Yonik's commit was inappropriate at all He added some new example configuration to trunk that was unused, and in no way un-did or blocked any other attempts at improving the configs. It had no impact on any existing usage, and only served as an example (which could be iterated forward) I seriously don't see the problem here. -Hoss
RE: How to index and query C# as whole term?
Sorry I am also using a synonyms.txt for this in the analysis stack. I was not clear, sorry for any confusion. I am not doing it outside of Solr but on the way into the index it is converted... :) -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Monday, May 16, 2011 8:51 AM To: solr-user@lucene.apache.org Subject: Re: How to index and query C# as whole term? Before indexing so outside Solr? Using the SynonymFilter would be easier i guess. On Monday 16 May 2011 17:44:24 Robert Petersen wrote: I have always just converted terms like 'C#' or 'C++' into 'csharp' and 'cplusplus' before indexing them and similarly converted those terms if someone searched on them. That always has worked just fine for me... :) -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Monday, May 16, 2011 8:28 AM To: solr-user@lucene.apache.org Subject: Re: How to index and query C# as whole term? I don't think you'd want to use the string type here. String type is almost never appropriate for a field you want to actually search on (it is appropriate for fields to facet on). But you may want to use Text type with different analyzers selected. You probably want Text type so the value is still split into different tokens on word boundaries; you just don't want an analyzer set that removes punctuation. On 5/16/2011 10:46 AM, Gora Mohanty wrote: On Mon, May 16, 2011 at 7:05 PM, Gnanakumargna...@zoniac.com wrote: Hi, I'm using Apache Solr v3.1. How do I configure/allow Solr to both index and query the term c# as a whole word/term? From Analysis page, I could see that the term c# is being reduced/converted into just c by solr.WordDelimiterFilterFactory. [...] Yes, as you have discovered the analyzers for the field type in question will affect the values indexed. To index c# exactly as is, you can use the string type, instead of the text type. However, what you probably want some filters to be applied, e.g., LowerCaseFilterFactory. Take a look at the definition of the fieldType text in schema.xml, define a new field type that has only the tokenizers and analyzers that you need, and use that type for your field. This Wiki page should be helpful: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Regards, Gora -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Highlighting issue with Solr 3.1
All, I have just installed Solr 3.1 running on Tomcat 7. I am noticing a possible issue with Highlighting. I have a filed in my index called story. The solr document that I am testing with the data in the story field starts with the following snippet (remaining data in the field is not shown to keep things simple) pa idref=0 //ppEN AMÉRICA LATINA, When I search for america with the highlighting enabled on the story' field, here is what I get in my highlighting section of the response. I am using the ASCIIFoldingFilterFactory to make my searches accent insensitive. lst name=highlightinglst name=2011_May_13_ _1c77033aarr name=storystrlt;pgt;lt;a idref=quot;0quot; /gt;lt;/pgt;lt;pgt;EN emAM#201;RICA/em LATINA, SE HAN PRODUCIDO AVANCES, CON RESPECTO A LA PROTECCI#211;N/str/arr/lst. The problem is the encode html tags before the em showing up as raw html tags (because of the encoding) on my search results page. Just to make sure, I do want the html to be interpreted as html not as text. In this particular situation I am not worried about the dangers of allowing such behavior. The same test performed on the same data running on 1.4.1 index does not exhibit this behavior. Any help is appreciated. Please let me know if I need to post my field type definitions (index and query) from the SolrConfig.xml for the story field. Thanks in advance Raj
indexing directed graph
Hello, is it possible to index graph - named vertices and named edges? My target is, with text search to find whether particular node is connected(direct or indirect) with another. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949556.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing directed graph
Dani, i'm actually playing with Neo4j .. and the have a Lucene-Indexing and plan to have Solr-Integration (no idea what the current state is). http://lists.neo4j.org/pipermail/user/2010-January/002372.html Regards Stefan Am 16.05.2011 21:50, schrieb dani.b.angelov: Hello, is it possible to index graph - named vertices and named edges? My target is, with text search to find whether particular node is connected(direct or indirect) with another. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949556.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [POLL] How do you (like to) do logging with Solr
[ ] I always use the JDK logging as bundled in solr.war, that's perfect [ ] I sometimes use log4j or another framework and am happy with re-packaging solr.war [X] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool! Actually, more specifically, the build distribution could build a war done either way, but I'd most like to see the war file WITHOUT a binding be deployed to Maven. As it stands, I've done both 1) deploy solr without logging to Maven and use it, and 2) deploy solr with jdk logging to Maven, then have a Maven build repackage to remove jdk and use my preferred implementation (logback). I've only done 2) at the preference of others who don't want me to deploy a modified war to our Maven repo. Stephen Duncan Jr www.stephenduncanjr.com
Re: indexing directed graph
Thank you Gora, 1. Could you confirm, that the context of IMHO is 'In My Humble Opinion'. 2. Could you point example of graph database. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949734.html Sent from the Solr - User mailing list archive at Nabble.com.
indexing directed graph
Hello, is it possible to index graph - named vertices and named edges? My target is, with text search to find whether particular node is connected(direct or indirect) with another. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949553p2949553.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index and query C# as whole term?
The other advantage to the synonyms approach is it will be much less of a headache down the road. For instance, imagine you've defined whitespacetokenizer and lowercasefilter. That'll fix your example just fine. It'll also cause all punctuation to be included in the tokens, so if you indexed try to find me. (note the period) and searched for me (without the period) you'd not get a hit. Then, let's say you get clever and do a regex manipulation via PatternReplaceCharFilterFactory to leave in '#' but remove other punctuation. Then any miscellaneous stream that contains a # will give surprising results. Consider 15# (for 15 pounds). Won't match 15 in a search now. So whatever solution you choose, think about it pretty carefully before you jump G.. Best Erick On Mon, May 16, 2011 at 2:10 PM, Robert Petersen rober...@buy.com wrote: Sorry I am also using a synonyms.txt for this in the analysis stack. I was not clear, sorry for any confusion. I am not doing it outside of Solr but on the way into the index it is converted... :) -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Monday, May 16, 2011 8:51 AM To: solr-user@lucene.apache.org Subject: Re: How to index and query C# as whole term? Before indexing so outside Solr? Using the SynonymFilter would be easier i guess. On Monday 16 May 2011 17:44:24 Robert Petersen wrote: I have always just converted terms like 'C#' or 'C++' into 'csharp' and 'cplusplus' before indexing them and similarly converted those terms if someone searched on them. That always has worked just fine for me... :) -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Monday, May 16, 2011 8:28 AM To: solr-user@lucene.apache.org Subject: Re: How to index and query C# as whole term? I don't think you'd want to use the string type here. String type is almost never appropriate for a field you want to actually search on (it is appropriate for fields to facet on). But you may want to use Text type with different analyzers selected. You probably want Text type so the value is still split into different tokens on word boundaries; you just don't want an analyzer set that removes punctuation. On 5/16/2011 10:46 AM, Gora Mohanty wrote: On Mon, May 16, 2011 at 7:05 PM, Gnanakumargna...@zoniac.com wrote: Hi, I'm using Apache Solr v3.1. How do I configure/allow Solr to both index and query the term c# as a whole word/term? From Analysis page, I could see that the term c# is being reduced/converted into just c by solr.WordDelimiterFilterFactory. [...] Yes, as you have discovered the analyzers for the field type in question will affect the values indexed. To index c# exactly as is, you can use the string type, instead of the text type. However, what you probably want some filters to be applied, e.g., LowerCaseFilterFactory. Take a look at the definition of the fieldType text in schema.xml, define a new field type that has only the tokenizers and analyzers that you need, and use that type for your field. This Wiki page should be helpful: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Regards, Gora -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: indexing directed graph
I am wandering, whether the following idea is worth. We can describe the graph with series of triples. So can we create some bean with fields, for example: ... @Field String[] sybjects; @Field String[] predicates; @Field String[] objects; @Field int[] level; ... or other combination of metadata. We can index/search this bean. Based on the content of the found bean, we can conclude for interconnections between graph participants. What do you thing? -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949845.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing directed graph
You can certainly index it, the problem will be being able to make the kinds of queries you want to make on it once indexed. Indexing it in a way that will let you do such queries. The kind of typical queries I'd imagine you wanting to run on such a graph -- I can't think of any way to index in Solr to support. But if you give examples of the sorts of queries you want to run, maybe someone else has an idea, or can give a definitive 'no'. On 5/16/2011 3:49 PM, dani.b.angelov wrote: Hello, is it possible to index graph - named vertices and named edges? My target is, with text search to find whether particular node is connected(direct or indirect) with another. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949553p2949553.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing directed graph
Absolutely you can index each point or line of the graph with it's own document in Solr, perhaps as a triple. (Sounds like you are specifically talking about RDF-type data, huh? Asking about that specifically might get you more useful ideas than asking graphs in general). But if you want to then figure out if two points are connected, or get the list of all points within X distance from a known point, or do other things you are likely to want to do it with it... Solr's not going to give you the tools to do that, indexed like that. On 5/16/2011 4:52 PM, dani.b.angelov wrote: I am wandering, whether the following idea is worth. We can describe the graph with series of triples. So can we create some bean with fields, for example: ... @Field String[] sybjects; @Field String[] predicates; @Field String[] objects; @Field int[] level; ... or other combination of metadata. We can index/search this bean. Based on the content of the found bean, we can conclude for interconnections between graph participants. What do you thing? -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949845.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How many UpdateHandlers can a Solr config have?
: just a very basic question, but I haven't been able to find the answer in : the Solr wiki: how many updateHandlers can one Solr config have? Just one? : Or many? There can only be one updateHandler / declaration in solrconfig.xml, it's responsible for owning updates to the index. But there can be any number of requestHandler / declarations to configure request handlers that do updates, as well as any number of updateRequestProcessorChain / declarations that can identify the processors used for dealing with updates (which cna be refered to by name from the request handlers) -Hoss
Re: K-Stemmer for Solr 3.1
Lucid's KStemmer is LGPL and the Solr committers have shown that they don't want LGPL libraries shipping with Solr. If you are intent on releasing your changes, I suggest attaching both the modified source and the compiled jar onto Solr's k-stemmer wiki page; and of course say that it's LGPL licensed. ~ David Smiley On May 16, 2011, at 2:24 AM, Bernd Fehling wrote: I don't know if it is allowed to modify Lucid code and add it to jira. If someone from Lucid would give me the permission and the Solr developers have nothing against it I won't mind adding the Lucid KStemmer to jira for Solr 3.x and 4.x. There are several Lucid KStemmer users which I can see from the many requests which I got. Also the Lucid KStemmer is faster than the standard KStemmer. Bernd Am 16.05.2011 06:33, schrieb Bill Bell: Did you upload the code to Jira? On 5/13/11 12:28 AM, Bernd Fehlingbernd.fehl...@uni-bielefeld.de wrote: I backported a Lucid KStemmer version from solr 4.0 which I found somewhere. Just changed from import org.apache.lucene.analysis.util.CharArraySet; // solr4.0 to import org.apache.lucene.analysis.CharArraySet; // solr3.1 Bernd Am 12.05.2011 16:32, schrieb Mark: java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z Would you mind explaining your modifications? Thanks On 5/11/11 11:14 PM, Bernd Fehling wrote: Am 12.05.2011 02:05, schrieb Mark: It appears that the older version of the Lucid Works KStemmer is incompatible with Solr 3.1. Has anyone been able to get this to work? If not, what are you using as an alternative? Thanks Lucid KStemmer works nice with Solr3.1 after some minor mods to KStemFilter.java and KStemFilterFactory.java. What problems do you have? Bernd -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: Problem with custom Similarity class
: The code is here: http://pastebin.com/50ugqRfA : : http://pastebin.com/50ugqRfAand my schema.xml configuration entry for : similarity is: : similarity class=com.umamao.solr.ShortFieldNormSimilarity/ exactly what version of Solr are you using? what does the full field/fieldType declaration look like in your schema.xml for the filed you are testing with? what does your exactl query request look like? The trunk branch of lucene/solr has made some changes to how similarity works (it's now very much per field) and how you declare your similarity in schema.xml ... if i remember correctly, the syntax from 3.1 to declare a global similarity *should* still work in 4.x as a way to declare the default used by fields that don't define a similarity, but there may be a bug (or i may be remembering incorrectly ... if the syntax really is no longer used at all then we should make sure it logs a nice fat error on startup) : I added some debug information and my class is loaded, but it is not used : when queries are made. Please clarify exactly how you are testing this and what you mean by is not used when queries are made ... it's important to rule out the possibility that you are just missunderstanding how the similarity methods are used. -Hoss
RE: K-Stemmer for Solr 3.1
On 5/16/2011 at 5:33 PM, David W. Smiley wrote: Lucid's KStemmer is LGPL and the Solr committers have shown that they don't want LGPL libraries shipping with Solr. If you are intent on releasing your changes, I suggest attaching both the modified source and the compiled jar onto Solr's k-stemmer wiki page; and of course say that it's LGPL licensed. AFAICT, all Apache MoinMoin wikis (at least Lucene's and Solr's) have disabled attachments - you can't retrieve existing attachments, and you can't create new ones. (Spam, apparently, was the impetus for this change.) Steve
Re: K-Stemmer for Solr 3.1
On Mon, May 16, 2011 at 5:33 PM, Smiley, David W. dsmi...@mitre.org wrote: Lucid's KStemmer is LGPL and the Solr committers have shown that they don't want LGPL libraries shipping with Solr. If you are intent on releasing your changes, I suggest attaching both the modified source and the compiled jar onto Solr's k-stemmer wiki page; and of course say that it's LGPL licensed. ~ David Smiley Hi David, I don't know much about this stemmer but the original implementation is BSD-licensed (http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi)
Re: [POLL] How do you (like to) do logging with Solr
: This poll is to investigate how you currently do or would like to do : logging with Solr when deploying solr.war to a SEPARATE java application : server (such as Tomcat, Resin etc) outside of the bundled FWIW... a) the context of this poll is SOLR-2487 b) this poll seems flawed to me, as it completely sidesteps what i consider the major crux of the issue: If: You are someone who does not like (or has conflicts with) the JDK logging binding currently included in the solr.war that is built by default and included in the binary releases; Then: Do you consider building solr.war from source difficult? -Hoss
Re: [POLL] How do you (like to) do logging with Solr
My answers... : [X] I always use the JDK logging as bundled in solr.war, that's perfect : [X] I sometimes use log4j or another framework and am happy with re-packaging solr.war : [ ] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time : [X] Let me choose whether to bundle a binding or not at build time, using an ANT option : [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! : [ ] What? Solr can do logging? How cool! -Hoss
Re: Boost newer documents only if date is different from timestamp
The map function lets you replace an arbitrary range of values with a new value, so you could map any value greater then the ms that today started on to any other point in history... http://wiki.apache.org/solr/FunctionQuery#map An easier approach would be probably be to apply some logic at index time: you can still index the the Last-Modified date you are getting, but if you believe that date is artificial, you can index an alternate date (possibly based on some rules you know about the site, or reuse the first' last modified date you ever got for that URL, etc...) in a distinct field and use that value for date boosting. : I am trying to boost newer documents in Solr queries. The ms function : http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents : seems to be the right way to go, but I need to add an additional : condition: : I am using the last-Modified-Date from crawled web pages as the date : to consider, and that does not always provide a meaningful date. : Therefore I would like the function to only boost documents where the : date (not time) found in the last-Modified-Date is different from the : timestamp, eliminating results that just return the current date as : the last-Modified-Date. Suggestions are appreciated! : -Hoss
Re: Embedded Solr Optimize under Windows
: http://code.google.com/p/solr-geonames/wiki/DeveloperInstall : It's worth noting that the build has also been run on Mac and Solaris now, : and the Solr index is about half the size. We suspect the optimize() call in : Embedded Solr is not working correctly under Windows. : : We've observed that Windows leaves lots of segments on disk and takes up : twice the volume as the other OSs. Perhaps file locking or something The problem isn't that optimize doesn't work on windows, the problem is that windows file semantics won't let files be deleted while there are open file handles -- so Lucene's Directory behavior is to leave the files on disk, and try to clean them up later. (on the next write, or next optimize call) -Hoss
Re: Embedded Solr Optimize under Windows
Thanks for the reply. I'm at home right now, or I'd try this myself, but is the suggestion that two optimize() calls in a row would resolve the issue? The process in question is a JVM devoted entirely to harvesting, calls optimize() then shuts down. The least processor intensive way of triggering this behaviour is desirable... perhaps a commit()? But I wouldn't have expected that to trigger a write. On 17 May 2011 10:20, Chris Hostetter hossman_luc...@fucit.org wrote: : http://code.google.com/p/solr-geonames/wiki/DeveloperInstall : It's worth noting that the build has also been run on Mac and Solaris now, : and the Solr index is about half the size. We suspect the optimize() call in : Embedded Solr is not working correctly under Windows. : : We've observed that Windows leaves lots of segments on disk and takes up : twice the volume as the other OSs. Perhaps file locking or something The problem isn't that optimize doesn't work on windows, the problem is that windows file semantics won't let files be deleted while there are open file handles -- so Lucene's Directory behavior is to leave the files on disk, and try to clean them up later. (on the next write, or next optimize call) -Hoss
Re: Highlighting issue with Solr 3.1
(11/05/17 3:27), Nemani, Raj wrote: All, I have just installed Solr 3.1 running on Tomcat 7. I am noticing a possible issue with Highlighting. I have a filed in my index called story. The solr document that I am testing with the data in the story field starts with the following snippet (remaining data in the field is not shown to keep things simple) pa idref=0 //ppEN AMÉRICA LATINA, When I search for america with the highlighting enabled on the story' field, here is what I get in my highlighting section of the response. I am using the ASCIIFoldingFilterFactory to make my searches accent insensitive. lst name=highlightinglst name=2011_May_13_ _1c77033aarr name=storystrlt;pgt;lt;a idref=quot;0quot; /gt;lt;/pgt;lt;pgt;ENemAM#201;RICA/em LATINA, SE HAN PRODUCIDO AVANCES, CON RESPECTO A LA PROTECCI#211;N/str/arr/lst. The problem is the encode html tags before theem showing up as raw html tags (because of the encoding) on my search results page. Just to make sure, I do want the html to be interpreted as html not as text. In this particular situation I am not worried about the dangers of allowing such behavior. The same test performed on the same data running on 1.4.1 index does not exhibit this behavior. Any help is appreciated. Please let me know if I need to post my field type definitions (index and query) from the SolrConfig.xml for the story field. Thanks in advance Raj I bet you have an encoder setting in your solrconfig.xml: encoder name=html default=true class=solr.highlight.HtmlEncoder / If so, try to comment it out. Koji -- http://www.rondhuit.com/en/
Structured fields and termVectors
How does MoreLikeThis use termVectors? My documents (full sample at the bottom) frequently include lines more or less like this M /trunk/home/.Aquamacs/Preferences.el I want to MoreLikeThis based on the full path, but not the M. But what I actually display as a search result should include M (should look pretty much like the sample, below). If I define a field to include that whole line, I can certainly search in ways that skip the M, but how do I control the termVector and MoreLikeThis? I think the answer is not to termVector the line as shown, but rather to index these lines twice, once whole (which is also copyFielded into the display text), and a second time with just the path (and termVectors=true). Which is OK, but since these lines will represent most of my data, double-indexing seems to double my storage, which is ... oh, well ... not entirely optimal. So is there some way I can index the full line, once, with M and path, and tell the termVector to include the whole path and nothing but the path? -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep r3580 | jack | 2011-04-26 13:55:46 -0700 (Tue, 26 Apr 2011) | 1 line Changed paths: M /trunk/home/.Aquamacs M /trunk/home/.Aquamacs/Preferences.el M /trunk/www/wynton-start-page.html simplify the hijack of Aquamacs prefs storage, aufl PGP.sig Description: This is a digitally signed message part
Re: How to set a common field to several values types ?
I want create field from extract value from another field with some java code ( using regular expressions ) . How to make this ? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-set-a-common-field-to-several-values-types-tp2922192p2951036.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Want to Delete Existing Index create fresh index
I set the datadir in solrconfig.xml. actually m using core based structures. is it creating any problem On Sat, May 14, 2011 at 10:49 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: I guess you are having issues with the datadir. Did you set the datadir in solrconfig.xml? On Sat, May 14, 2011 at 4:10 PM, Pawan Darira pawan.dar...@gmail.com wrote: Hi I am using Solr 1.4. had changed schema already. When i created the index for first time, the directory was automatically created index made perfectly fine. Now, i want to create the index from scratch, so I deleted the whole data/index directory ran the script. Now it is only creating empty directories NO index files inside that. Thanks Pawan On Sat, May 14, 2011 at 6:54 PM, Dmitry Kan dmitry@gmail.com wrote: Hi Pawan, Which SOLR version do you have installed? It should be absolutely normal for the data/ sub directory to create when starting up SOLR. So just go ahead and post your data into SOLR, if you have changed the schema already. -- Regards, Dmitry Kan On Sat, May 14, 2011 at 4:01 PM, Pawan Darira pawan.dar...@gmail.com wrote: I did that. Index directory is created but not contents in that 2011/5/14 François Schiettecatte fschietteca...@gmail.com You can also shut down solr/lucene, do: rm -rf /YourIndexName/data/index and restart, the index directory will be automatically recreated. François On May 14, 2011, at 1:53 AM, Gabriele Kahlout wrote: curl --fail $solrIndex/update?commit=true -d 'deletequery*:*/query/delete' #empty index [1 http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script ] did u try? On Sat, May 14, 2011 at 7:26 AM, Pawan Darira pawan.dar...@gmail.com wrote: Hi I had an existing index created months back. now my database schema has changed. i wanted to delete the current data/index directory re-create the fresh index but it is saying that segments file not found just create blank data/index directory. Please help -- Thanks, Pawan Darira -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Thanks, Pawan Darira -- Thanks, Pawan Darira -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Thanks, Pawan Darira
error while doing full import
org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:http://xxx.xxx.xxx/frontend_dev.php/xxx/xxx/xxx rows processed:0 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp at [row,col {unknown-source}]: [170,29] at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:181) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282) ... 10 more Caused by: com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp at [row,col {unknown-source}]: [170,29] at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:467) at com.ctc.wstx.sr.BasicStreamReader.handleUndeclaredEntity(BasicStreamReader.java:5431) at com.ctc.wstx.sr.StreamScanner.expandUnresolvedEntity(StreamScanner.java:1661) at com.ctc.wstx.sr.StreamScanner.expandEntity(StreamScanner.java:1555) at com.ctc.wstx.sr.StreamScanner.fullyResolveEntity(StreamScanner.java:1523) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2757) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) at org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStartElement(XPathRecordReader.java:370) at org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:304) at org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$200(XPathRecordReader.java:196) at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:178) ... 11 more May 17, 2011 10:51:51 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:http://xxx.xxx.xxx/frontend_dev.php/xxx/xxx/xxx rows processed:0 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp at [row,col {unknown-source}]: [170,29] at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:181) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282) ... 10 more Caused by: com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp at [row,col {unknown-source}]: [170,29] at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:467) at com.ctc.wstx.sr.BasicStreamReader.handleUndeclaredEntity(BasicStreamReader.java:5431) at com.ctc.wstx.sr.StreamScanner.expandUnresolvedEntity(StreamScanner.java:1661) at