Re: ArrayIndexOutOfBoundsException heeeeeelp !?!?!?!!?! Sorting
hey. of course i mean sint, from the default/example SchemaXML. after few days sort for popularity works well again ... ?! i found a value of -1252810 in my popularity field... i think this was the problem. but i dont know how can the field become this value. -- View this message in context: http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-heelp-Sorting-tp932956p943791.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wiki Documentation of facet.sort
Oh well. Thanks for pointing that out. *sigh* Chantal On Thu, 2010-07-01 at 04:15 +0200, Koji Sekiguchi wrote: (10/07/01 1:12), Chantal Ackermann wrote: Hi there, in the wiki, on http://wiki.apache.org/solr/SimpleFacetParameters it says: The default is true/count if facet.limit is greater than 0, false/index otherwise. I've just migrated to 1.4.1 (reindexed). I can't remember how it was with 1.4.0. When I specify my facet query with facet.mincount=0 (explicitely) or without mincount (default is 0), the resulting facets are sorted by count, nevertheless. Changing mincount from 0 to 1 and back actually makes not difference in the sorting. I'm fine with a constant default behaviour (always sorting by count, e.g., no matter what parameters are given). If this is intended - shall I change the wiki accordingly? Cheers, Chantal Chantal, Wiki says facet.limit but you are changing facet.mincount? :) Koji
Re: how to apply stemming to the index ?
thanx a lot Erick. It worked. Regards -Sarfaraz --- On Mon, 5/7/10, Erick Erickson erickerick...@gmail.com wrote: From: Erick Erickson erickerick...@gmail.com Subject: Re: how to apply stemming to the index ? To: solr-user@lucene.apache.org Date: Monday, 5 July, 2010, 6:32 AM I'm a little confused about what you're trying to accomplish where. The fact that you posted to the SOLR users list would indicate you're using SOLR, in which case all you have to do is apply the stemming in your config file. Something like: filter class=solr.PorterStemFilterFactory/ in your schema.xml file for your index AND search analyzers. If you're in Lucene, you can add PorterStemFilter to a filter chain when making our own analyzer (see the synonym example in Lucene In Action, first or second edition. If this is gibberish, perhaps you could provide some more context for what you're trying to accomplish. HTH Erick On Fri, Jul 2, 2010 at 5:08 AM, sarfaraz masood sarfarazmasood2...@yahoo.com wrote: I want to stem the terms in my index. but currently i am using standard analyzer that is not performing any kind of stemming. StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); After some searching i found a code for PorterStemAnalyzer but that is having some problems import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.StopFilter; import org.apache.lucene.analysis.LowerCaseTokenizer; import org.apache.lucene.analysis.PorterStemFilter; import java.io.Reader; import java.util.Hashtable; // PorterStemAnalyzer processes input // text by stemming English words to their roots. // This Analyzer also converts the input to lower case // and removes stop words. A small set of default stop // words is defined in the STOP_WORDS // array, but a caller can specify an alternative set // of stop words by calling non-default constructor. public class PorterStemAnalyzer extends Analyzer { private static Hashtable _stopTable; // An array containing some common English words // that are usually not useful for searching. public static final String[] STOP_WORDS = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 000, $, about, after, all, also, an, and, another, any, are, as, at, be, because, been, before, being, between, both, but, by, came, can, come, could, did, do, does, each, else, for, from, get, got, has, had, he, have, her, here, him, himself, his, how,if, in, into, is, it, its, just, like, make, many, me, might, more, most, much, must, my, never, now, of, on, only, or, other, our, out, over, re, said, same, see, should, since, so, some, still, such, take, than, that, the, their, them, then, there, these, they, this, those, through, to, too, under, up, use, very, want, was, way, we, well, were, what, when, where, which, while, who, will, with, would, you, your, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z }; // Builds an analyzer. public PorterStemAnalyzer() { this(STOP_WORDS); } //Builds an analyzer with the given stop words. //@param stopWords a String array of stop words public PorterStemAnalyzer(String[] stopWords) { _stopTable = StopFilter.makeStopTable(stopWords); } // Processes the input by first converting it to // lower case, then by eliminating stop words, and // finally by performing Porter stemming on it. // // @param reader the Reader that // provides access to the input text // @return an instance of TokenStream public final TokenStream tokenStream(Reader reader) { return new PorterStemFilter( new StopFilter(new LowerCaseTokenizer(reader), _stopTable)); } } *Errors marked in bold. Plz let me know if there is some alternate way to apply stemming to the index if this is -Sarfaraz
Re: Error in building Solr-Cloud (ant example)
Hi Mark, Thanks for your reply. Sorry for this stupid question, but when you say trunk, do you mean the SOLR trunk as in https://svn.apache.org/repos/asf/lucene/dev/trunk ? If so, I assume I just check that out and apply the patch? Or is there a specific SOLR-Cloud trunk? Thanks!! From: Mark Miller-3 [via Lucene] ml-node+942934-1192373116-310...@n3.nabble.com To: jayf jatf...@ymail.com Sent: Sun, July 4, 2010 9:35:31 PM Subject: Re: Error in building Solr-Cloud (ant example) Hey jayf - Offhand I'm not sure why you are having these issues - last I knew, a couple people had had success with the cloud branch. Cloud has moved on from that branch really though - we probably should update the wiki about that. More important than though, that I need to get Cloud committed to trunk! I've been saying it for a while, but I'm going to make a strong effort to wrap up the final unit test issue (apparently a testing issue, not cloud issue) and get this committed for further iterations. The way to follow along with the latest work is to go to : https://issues.apache.org/jira/browse/SOLR-1873 The latest patch there should apply to recent trunk. I've scheduled a bit of time to work on getting this committed this week, fingers crossed. -- - Mark http://www.lucidimagination.com?by-user=t On 7/4/10 3:37 PM, jayf wrote: Hi there, I'm having a trouble installing Solr Cloud. I checked out the project, but when compiling (ant example on OSX) I get compile a error (cannot find symbol - pasted below). I also get a bunch of warnings: [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. I have tried both Java 1.5 and 1.6. Before I got to this point, I was having problems with the included ZooKeeper jar (java versioning issue) - so I had to download the source and build this. Now 'ant' gets a bit further, to the stage listed above. Any idea of the problem??? THANKS! [javac] Compiling 438 source files to /Volumes/newpart/solrcloud/cloud/build/solr [javac] /Volumes/newpart/solrcloud/cloud/src/java/org/apache/solr/cloud/ZkController.java:588: cannot find symbol [javac] symbol : method stringPropertyNames() [javac] location: class java.util.Properties [javac] for (String sprop : System.getProperties().stringPropertyNames()) { View message @ http://lucene.472066.n3.nabble.com/Error-in-building-Solr-Cloud-ant-example-tp942836p942934.html To unsubscribe from Error in building Solr-Cloud (ant example), click here. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-in-building-Solr-Cloud-ant-example-tp942836p943969.html Sent from the Solr - User mailing list archive at Nabble.com.
Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory
Hi, I'm using Solr 1.4 and I need to use a Latin Accent Filter. In the Solr wiki (http://wiki.apache.org/solr/SchemaDesign), it's recommended to use MappingCharFilterFactory instead of ISOLatin1AccentFilterFactory. Could someone tell me the reason of choosing the first filter instead of the second one? In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory must be used with MappingCharFilterFactory. But, when I use these tokenizer and filter together, I get a sever error saying that the filed type containing these filter and tokenizer is unknown. However, when I use this filter with StandardTokenizerFactory or WhitespaceTokenizerFactory! I saw on the Web that this problem has been faced, but I didn't see any solution. Does someone have any idea to fix this issue? Thanks, -Saïd
Re: Modifications to AbstractSubTypeFieldType
On 3 Jul 2010, at 1:50 am, Chris Hostetter wrote: : The changes to AbstractSubTypeFieldType do not have any adverse effects on the : solr.PointType class, so I'd quite like to suggest it gets included in the : main solr source code. Where can I send a patch for someone to evaluate or : should I just attach it to the issue in JIRA and see what happens? : https://issues.apache.org/jira/browse/SOLR-1131 please open a new Jira issue. OK, done. https://issues.apache.org/jira/browse/SOLR-1986 I'm not too familiar with AbstractSubTypeFieldType, but your improvement sounds pretty good to me on the surface ... i'm just wondering if we should have a simpler way of specifying the suffix when dimension is really large. Yes, I wondered that myself but wasn't sure which way to go. I thought about something like this: fieldType name=temporal class=uk.ac.edina.solr.schema.TemporalCoverage dimension=3 subFieldSuffix_ti/subFieldSuffix subFieldSuffix_ti/subFieldSuffix subFieldSuffix_s/subFieldSuffix /fieldType but it doesn't really seem to help much. If anything, it probably makes it *less* readable. Mark -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
meaning of underscore in prefix search.
Hello. i use facet.prefix and terms.prefix for my search. what is the meaning of the underscore _ in the results. when change solr some string into a underscore ? sometimes it make no sence to suggest the client with this ... analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ filter class=solr.TrimFilterFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer thx ! -- View this message in context: http://lucene.472066.n3.nabble.com/meaning-of-underscore-in-prefix-search-tp944120p944120.html Sent from the Solr - User mailing list archive at Nabble.com.
TransformerException under Jetty 7.0.1
I'm attempting to set up Solr 1.4.1 under Jetty 7.0.1, and I'm getting a TransformerException when I start things off. I've used the same solrconfig.xml in an embedded Solr under Jetty 7 and it works fine. But, this is the first time I've tried to get things going via Jetty's XML configuration syntax, so I may be doing something wrong. I've noticed that Solr doesn't create the data directory, even if I specify it by full path in solrconfig.xml. Any ideas? Here's the output from the error log: INFO: Solr home set to '/solr/home/' 2010/07/05 19:54:38 org.apache.solr.common.SolrException log Fatal Error: javax.xml.transform.TransformerException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:74) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:55) at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:668) at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:204) at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:995) at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:588) at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:381) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:55) at org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:92) at org.eclipse.jetty.server.Server.doStart(Server.java:228) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:55) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:433) at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:297) at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:41) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:290) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:109) at $_dot_.solr.__file__(solr.rb:53) at $_dot_.solr.load(solr.rb) at org.jruby.Ruby.runScript(Ruby.java:628) at org.jruby.Ruby.runNormally(Ruby.java:550) at org.jruby.Ruby.runFromMain(Ruby.java:396) at org.jruby.Main.run(Main.java:272) at org.jruby.Main.run(Main.java:117) at org.jruby.Main.main(Main.java:97) Caused by: java.lang.RuntimeException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335) ... 34 more
Not split a field on whitespaces?
Hey there, I might be just to blind to see this, but isn't it possible to have a solr.TextField not getting filtered in any way. That means the input Michael Jackson should just stay that way and not get split on whitespaces? How do I implement that? Thanks for any help, Sebastian
Re: Not split a field on whitespaces?
Use solr.StrField or solr.KeywordTokenizerFactory instead. simon On Mon, Jul 5, 2010 at 2:47 PM, Sebastian Funk qbasti.f...@googlemail.com wrote: Hey there, I might be just to blind to see this, but isn't it possible to have a solr.TextField not getting filtered in any way. That means the input Michael Jackson should just stay that way and not get split on whitespaces? How do I implement that? Thanks for any help, Sebastian
Re: FastVectorHighlighter and SynonymFilter
I think the cause of the problem is that combination of query time expansion and N-gram tokenizer generates MultiPhraseQuery, however, FVH doesn't support MPQ. Sekiguchi-san I try following test. - Index time filtering and set SynonymFilter expand=true. Query result is up to my expectations. (correct snippet.) I guess this problem related to LUCENE-1889. https://issues.apache.org/jira/browse/LUCENE-1889 thanks for your reply.
Facet Search is too slow ! Optimize suggestions ?
Hello. I use facet-search for my autosuggestion. the results are okay, but sometimes its too slow. We Have 4,2 Million Documents and each day we get more and more ... I try out the Cache Settings with this parameter: filterCache class=solr.FastLRUCache size=200 initialSize=100 autowarmCount=80/ For every Cache. How can i optimize the facet search ? The Server is an 8 Dual Core with .. i think 12 GB RAM ... so i thought thats enough for Solr Facet search. When a new Search search for something like: rotw and nothing is cached solr need sometimes much time for a response ... =( =( all over two seconds is not good fur autosuggestion ... thxxx -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-Search-is-too-slow-Optimize-suggestions-tp944361p944361.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet Search is too slow ! Optimize suggestions ?
On Jul 5, 2010, at 10:08 AM, stockii wrote: Hello. I use facet-search for my autosuggestion. the results are okay, but sometimes its too slow. We Have 4,2 Million Documents and each day we get more and more ... I try out the Cache Settings with this parameter: filterCache class=solr.FastLRUCache size=200 initialSize=100 autowarmCount=80/ For every Cache. How can i optimize the facet search ? The Server is an 8 Dual Core with .. i think 12 GB RAM ... so i thought thats enough for Solr Facet search. My initial thought is that cache size is way too big, but before we get into that, can you tell us more about your app? What are you faceting on? How are you faceting on it? How many unique terms? When a new Search search for something like: rotw and nothing is cached solr need sometimes much time for a response ... =( =( all over two seconds is not good fur autosuggestion ... thxxx -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-Search-is-too-slow-Optimize-suggestions-tp944361p944361.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet Search is too slow ! Optimize suggestions ?
okay. How many unique terms? - Docs: 3911249 and Distinct: 2302852 app ? - iPhone App for product search. faceting on ? - im faceting on the product names, with Shingles (maxShingle: 5 ?? too big ?) how you faceting? - search over all .../?q=*:*facet.prefix=stringrows=0 search with shops filter: .../?q=*:*facet.prefix=stringrows=0shop_id:54 i dont kow how its the best way to configure it. -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-Search-is-too-slow-Optimize-suggestions-tp944361p944386.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory
In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory must be used with MappingCharFilterFactory. But, when I use these tokenizer and filter together, I get a sever error saying that the filed type containing these filter and tokenizer is unknown. However, when I use this filter with StandardTokenizerFactory or WhitespaceTokenizerFactory! The wiki is not correct today. Before Lucene 2.9 (and Solr 1.4), Tokenizers can take Reader argument in constructor. But after that, because they can take CharStream argument in constructor, *CharStreamAware* Tokenizers are no longer needed (all Tokenizers are aware of CharStream). I'll update the wiki. Koji -- http://www.rondhuit.com/en/
Re: solr with hadoop
I need to revive this discussion... If you do distributed indexing correctly, what about updating the documents and what about replicating them correctly? Does this work? Or wasn't this an issue? Kind regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p944413.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory
Thanks Koji for the reply and for updating wiki. As it's written now in wiki, it sounds (at least to me) like MappingCharFilterFactory works only with WhitespaceTokenizerFactory. Did you really mean that? Because this filter works also with other tkenizers. For instance, in my text type, I'm using StandardTokenizerFactory for document processing, and WhitespaceTokenizerFactory for query processing. I also noticed that, in whatever order you put this filter in the definition of a field type, it's always applied (during text processing) before the tokenizer and all the other filters. Is there a reason for that? Is there a possibility to force the filter to be applied at a certain order among the other filters? Thanks, -S On Jul 5, 2010, at 4:28 PM, Koji Sekiguchi wrote: In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory must be used with MappingCharFilterFactory. But, when I use these tokenizer and filter together, I get a sever error saying that the filed type containing these filter and tokenizer is unknown. However, when I use this filter with StandardTokenizerFactory or WhitespaceTokenizerFactory! The wiki is not correct today. Before Lucene 2.9 (and Solr 1.4), Tokenizers can take Reader argument in constructor. But after that, because they can take CharStream argument in constructor, *CharStreamAware* Tokenizers are no longer needed (all Tokenizers are aware of CharStream). I'll update the wiki. Koji -- http://www.rondhuit.com/en/
Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory
No, all tokenizer can be used with mappingcharfilter Koji Sekiguchi from mobile On 2010/07/06, at 0:32, Saïd Radhouani r.steve@gmail.com wrote: Thanks Koji for the reply and for updating wiki. As it's written now in wiki, it sounds (at least to me) like MappingCharFilterFactory works only with WhitespaceTokenizerFactory. Did you really mean that? Because this filter works also with other tkenizers. For instance, in my text type, I'm using StandardTokenizerFactory for document processing, and WhitespaceTokenizerFactory for query processing. I also noticed that, in whatever order you put this filter in the definition of a field type, it's always applied (during text processing) before the tokenizer and all the other filters. Is there a reason for that? Is there a possibility to force the filter to be applied at a certain order among the other filters? Thanks, -S On Jul 5, 2010, at 4:28 PM, Koji Sekiguchi wrote: In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory must be used with MappingCharFilterFactory. But, when I use these tokenizer and filter together, I get a sever error saying that the filed type containing these filter and tokenizer is unknown. However, when I use this filter with StandardTokenizerFactory or WhitespaceTokenizerFactory! The wiki is not correct today. Before Lucene 2.9 (and Solr 1.4), Tokenizers can take Reader argument in constructor. But after that, because they can take CharStream argument in constructor, *CharStreamAware* Tokenizers are no longer needed (all Tokenizers are aware of CharStream). I'll update the wiki. Koji -- http://www.rondhuit.com/en/
search multiple default fields
hi there, is it possible to define multiple default search fields in the solrconfig.xml? at the moment i am using a queryfilter programatically but i want to be able to configure things such that my query will be processed as: defaultfield:myquery OR field2:myquery OR field3:myquery ... .. basically i want my query to match any of my named fields, but not always matching the defaultfield... at the moment i have one default field + a queryfilter which is not returning the desired results. thanks