Re: Solr index searcher to lucene index searcher
Hi , Thanks Chris. I had been using Nutch 1.1 . The Nutch IndexSearcher used to call the lucene IndexSearcher . As the documents are collected in TopDocs in Lucene , before that is passed back to Nutch , i used to look into the top K matching documents , consult some external repository and further score the Top K documents and reorder them in the TopDocs array . These reordered TopDocs is passed to Nutch . All these reordering code was implemented by Extending the lucene IndexSearcher class . The lucene core that comes with solr is a bit different from the one that used to come with Nutch 1.1 . As a result implementing the same is not straight forward .Moreover , i cannot figure out at which point exactly the SolrIndexSearcher makes a direct Interaction with LuceneIndexSearcher . With FunctionQuery i loose the flexibility of looking into the documents before passing to the final result set. Now i am using solr 3.4 and i would like to implement the same between solr and lucene. Thanks , Pom On Wed, Apr 24, 2013 at 3:05 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : . For any query it passes through the search handler and solr finally : directs it to lucene Index Searcher. As results are matched and collected : as TopDocs in lucene i want to inspect the top K Docs , reorder them by : some logic and pass the final TopDocs to solr which solr may send as a : response . can you elaborate on what exactly your some logic involves? instead of writing a custom collector, using a function query may be the best solution. https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: Update on shards
We are using tomcat so we'll just wait. Hopefully it's fixed in 4.3 but we have a work around for now so... What exactly is the difference between jetty and tomcat. We are using tomcat because we've read somewhere that it should be more robust in heavily loaded production environments. Arkadi On 04/23/2013 06:14 PM, Mark Miller wrote: If you use jetty - which you should :) It's what we test with. Tomcat only gets user testing. If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in 4.3 (we are voting on 4.3 now). No clue on other containers. - Mark On Apr 23, 2013, at 10:59 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I believe as of 4.2 you can talk to any host in the cloud. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be wrote: Hi Is it correct that when inserting or updating document into solr you have to talk to a solr host where at least one shard of that collection is stored? For select you can talk to any host within the collection.configName? BR, Arkadi
JVM Parameters to Startup Solr?
Lucidworks Solr Guide says that: If you are using Sun's JVM, add the -server command-line option when you start Solr. This tells the JVM that it should optimize for a long running, server process. If the Java runtime on your system is a JRE, rather than a full JDK distribution (including javac and other development tools), then it is possible that it may not support the -server JVM option Does any folks using -server parameter? Also what parameters you are using to start up Solr? I mean parallel garbage collector vs.?
Luke misreporting index-time boosts?
Hello, all I have recently been attempting to apply index-time boosts to fields using the following syntax: add doc field name=important_field boost=5bleah bleah bleah/field field name=standard_field boost=2content here/field field name=trivial_fieldcontent here/field /doc doc field name=important_field boost=5content here/field field name=standard_field boost=2bleah bleah bleah/field field name=trivial_fieldcontent here/field /doc /add The intention is that matches on important_field should be more important to score than matches on trivial_field (so that a search across all fields for the term 'content' would return the second document above the first), while still being able to use the standard query parser. Looking at output from Luke, however, all fields are reported as having a boost of 1.0. The following possibilities occur to me. (1) The entire index-time-boosting approach is misconceived (2) Luke is misreporting, because index-time boosting alters more fundamental aspects of scoring (tf-idf calculations, I suppose), and the index-time boost is thus invisible to it (3) Some combination of (1) and (2) Can anyone help illuminate the situation for me? Documentation for these questions seems patchy. Thanks, Tim
Facets with OR clause
Hi, my request contains following term: The are 3 facets: groups, locations, categories. When I select some items then I see such syntax in my request. fq=groups:group1fq=locations:location1 Is it possible to add OR clause between facets items in query? -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-with-OR-clause-tp4058553.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facets with OR clause
Try fq=(groups:group1 OR locations:location1) Am 24.04.2013 um 12:39 schrieb vsl: Hi, my request contains following term: The are 3 facets: groups, locations, categories. When I select some items then I see such syntax in my request. fq=groups:group1fq=locations:location1 Is it possible to add OR clause between facets items in query? -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-with-OR-clause-tp4058553.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Listing Priority
Hi, Check out the new RegexpBoostProcessor https://lucene.apache.org/solr/4_2_0/solr-core/org/apache/solr/update/processor/RegexpBoostProcessor.html which does exactly this based on a config file -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 24. apr. 2013 kl. 00:22 skrev Furkan KAMACI furkankam...@gmail.com: Let's assume that I have written an update processor and extracted the domain and checked it with my predefined list. What should I do at indexing process and select? 2013/4/15 Alexandre Rafalovitch arafa...@gmail.com You may find the work and code contributions by Jan Høydahl quite relevant. See the presentation from 2 years ago: http://www.slideshare.net/lucenerevolution/jan-hoydahl-improving-solrs-update-chain-eurocon2011 One of the things he/they contributed is URLClassify Update Processor, it might be quite relevant. https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Apr 14, 2013 at 4:59 PM, Furkan KAMACI furkankam...@gmail.com wrote: I have crawled some internet pages and indexed them at Solr. When I list my results via Solr I want that: if a page has a URL(my schema includes a field for URL) that ends with .edu, .edu.az or .co.uk I will give more priority to them. How can I do it in a more efficient way at Solr?
Re: How to let Solr load libs from within my JAR?
Hi, Java class loader does not support JAR within JAR. You'll have to unpack both JARs and then JAR them together as one. Or simply give several JARs to Solr, that's the easiest. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 24. apr. 2013 kl. 03:37 skrev Xi Shen davidshe...@gmail.com: Hi, I developed a data import handler, it has some dependent libraries. I deployed them in a parallel folder with my JAR and included the path in solrconfig.xml. It works fine. But I am thinking maybe I can pack those JAR libs within my JAR, but I got NoClassDefFoundError exception when executing my DIH. Is it possible Solr can load JAR libs packed in my JAR? How can I do that. -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84
Solr - WordDelimiterFactory with Custom Tokenizer to split only on Boundires
I have configured WordDelimiterFilterFactory for custom tokenizers for '' and '-' , and for few tokenizer (like . _ :) we need to split on boundries only. e.g. test.com (should tokenized to test.com) newyear. (should tokenized to newyear) new_car (should tokenized to new_car) .. .. Below is defination for text field fieldType name=text_general_preserved class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange =0 splitOnNumerics =0 stemEnglishPossessive =0 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 protected=protwords_general.txt types=wdfftypes_general.txt / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange =0 splitOnNumerics =0 stemEnglishPossessive =0 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 protected=protwords_general.txt types=wdfftypes_general.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType below is wdfftypes_general.txt content = ALPHA - = ALPHA _ = SUBWORD_DELIM : = SUBWORD_DELIM . = SUBWORD_DELIM types can be used in worddelimiter are LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, SUBWORD_DELIM . there's no description available for use of each type. as per name, i thought type SUBWORD_DELIM may fulfill my need, but it doesn't seem to work. Can anybody suggest me how can i set configuration for worddelimiter factory to fulfill my requirement. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-WordDelimiterFactory-with-Custom-Tokenizer-to-split-only-on-Boundires-tp4058557.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR 4.3
As you can see on the issue, it is already fixed for 4.3 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 24. apr. 2013 kl. 07:02 skrev William Bell billnb...@gmail.com: Can we get this in please to 4.3? https://issues.apache.org/jira/browse/SOLR-4746 -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Bug? JSON output changes when switching to solr cloud
Note 4.3 is being cut right now, it will probably be out next week barring unforeseen problems. Best Erick On Mon, Apr 22, 2013 at 9:11 PM, David Parks davidpark...@yahoo.com wrote: Thanks Yonik! That was fast! We switched over to XML for the moment and will switch back to JSON when 4.3 comes out. Dave -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Monday, April 22, 2013 8:18 PM To: solr-user@lucene.apache.org Subject: Re: Bug? JSON output changes when switching to solr cloud Thanks David, I've confirmed this is still a problem in trunk and opened https://issues.apache.org/jira/browse/SOLR-4746 -Yonik http://lucidworks.com On Sun, Apr 21, 2013 at 11:16 PM, David Parks davidpark...@yahoo.com wrote: We just took an installation of 4.1 which was working fine and changed it to run as solr cloud. We encountered the most incredibly bizarre apparent bug: In the JSON output, a colon ':' changed to a comma ',', which of course broke the JSON parser. I'm guessing I should file this as a bug, but it was so odd I thought I'd post here before doing so. Demo below: Here is a query on our previous single-server instance: Query: -- http://10.1.3.28:8081/solr/select?q=bookfl=score%2Cid%2Cunique_catalo g_name start=0rows=50wt=jsongroup=truegroup.field=unique_catalog_namegr oup.li mit=50 Response: - {responseHeader:{status:0,QTime:15714,params:{fl:score,id,u nique_ catalog_name,start:0,q:book,group.limit:50,group.field: uniqu e_catalog_name,group:true,wt:json,rows:50}},grouped:{u nique_ catalog_name:{matches:106711214,groups:[{groupValue:ls:2653, doclis t:{numFound:103981882,start:0,maxScore:4.7039795,docs:[{id: 10055 02088784,score:4.7039795},{id:1005500291075,score:4.7039795},{id: 1000810546074,score:4.7039795},{id:1000611003270,score:4.703 9795}, Note this part: -- {unique_catalog_name:{matches: Now we run that same query on a server that was derived from the same build, just configuration changes to run it in distributed solr cloud mode. Query: - http://10.1.3.18:8081/solr/select?q=bookfl=score%2Cid%2Cunique_catalo g_name start=0rows=50wt=jsongroup=truegroup.field=unique_catalog_namegr oup.li mit=50 Response: -{responseHeader:{status:0,QTime:8855,params:{fl :scor e,id,unique_catalog_name,start:0,q:book,group.limit:50,g roup.f ield:unique_catalog_name,group:true,wt:json,rows:50}}, groupe d:[unique_catalog_name,{matches:106711214,groups:[{groupValue :ls:2 653,doclist:{numFound:103981882,start:0,maxScore:4.7042913,d ocs:[ {id:1005502088784,score:4.7042913},{id:1000611003270,score :4.704 2913},{id:1005500291075,score:4.703668},{id:1000810546074,score: 4.703668}, Note how it's changed: unique_catalog_name,{matches:
Re: Fields issue 4.2.1
Hi, Have you tried fl=*_user ? I think fl may try to interpret the number as a function. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 24. apr. 2013 kl. 07:16 skrev William Bell billnb...@gmail.com: I am getting no results when using dynamic field, and the name begins with numbers. This is okay on 3.6, but does not work in 4.2. dynamic name: 1234566_user fl=1234566_user If I change it to name: user_1234566 it works. This appears to be a bug. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Solr consultant recommendation
Hi We have some detailed Solr setup issues we would like to discuss with a Solr Expert (certified or self-declared), but we are having some difficulties getting in contact with anyone here in Copenhagen, Denmark. Therefore I would like to hear if anybody out there can drop me some names of Solr Experts to contact, available in Denmark? We have issues regarding hardware setup (storage, RAM, cores pr instance, instances per machine), Solr Cloud vs Classic Master/Slave, shard size, to store or not to store, automated deployment of (more) shards, cache optimization, garbage collection issues, field collapsing, PERFORMANCE. You name it and we probably have it as an issue to discuss. We are currently running a setup of ~450 mio documents, receiving +1mio/day. Interesting challenge, if you ask me… If YOU are the one, then please get in contact. Med venlig hilsen / Best Regards Christian von Wendt-Jensen IT Team Lead, Customer Solutions Infopaq International A/S Kgs. Nytorv 22 DK-1050 København K Phone +45 36 99 00 00 Mobile +45 31 17 10 07 Email christian.sonne.jen...@infopaq.commailto:christian.sonne.jen...@infopaq.com Webwww.infopaq.comhttp://www.infopaq.com/ DISCLAIMER: This e-mail and accompanying documents contain privileged confidential information. The information is intended only for the recipient(s) named. Any unauthorised disclosure, copying, distribution, exploitation or the taking of any action in reliance of the content of this e-mail is strictly prohibited. If you have received this e-mail in error we would be obliged if you would delete the e-mail and attachments and notify the dispatcher by return e-mail or at +45 36 99 00 00 P Please consider the environment before printing this mail note.
Re: Solr consultant recommendation
On 24 April 2013 16:28, Christian von Wendt-Jensen christian.vonwendt-jen...@infopaq.com wrote: Hi We have some detailed Solr setup issues we would like to discuss with a Solr Expert (certified or self-declared), but we are having some difficulties getting in contact with anyone here in Copenhagen, Denmark. Therefore I would like to hear if anybody out there can drop me some names of Solr Experts to contact, available in Denmark? [...] Have you looked at http://wiki.apache.org/solr/Support ? Regards, Gora
Re: Solr consultant recommendation
Actually no, I didn't. But I can see that I should have. Thanks! Med venlig hilsen / Best Regards Christian von Wendt-Jensen IT Team Lead, Customer Solutions Infopaq International A/S Kgs. Nytorv 22 DK-1050 København K Phone +45 36 99 00 00 Mobile +45 31 17 10 07 Email christian.sonne.jen...@infopaq.commailto:christian.sonne.jen...@infopaq.com Webwww.infopaq.comhttp://www.infopaq.com/ DISCLAIMER: This e-mail and accompanying documents contain privileged confidential information. The information is intended only for the recipient(s) named. Any unauthorised disclosure, copying, distribution, exploitation or the taking of any action in reliance of the content of this e-mail is strictly prohibited. If you have received this e-mail in error we would be obliged if you would delete the e-mail and attachments and notify the dispatcher by return e-mail or at +45 36 99 00 00 P Please consider the environment before printing this mail note. From: Gora Mohanty g...@mimirtech.commailto:g...@mimirtech.com Reply-To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Date: Wed, 24 Apr 2013 13:02:03 +0200 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Subject: Re: Solr consultant recommendation On 24 April 2013 16:28, Christian von Wendt-Jensen christian.vonwendt-jen...@infopaq.commailto:christian.vonwendt-jen...@infopaq.com wrote: Hi We have some detailed Solr setup issues we would like to discuss with a Solr Expert (certified or self-declared), but we are having some difficulties getting in contact with anyone here in Copenhagen, Denmark. Therefore I would like to hear if anybody out there can drop me some names of Solr Experts to contact, available in Denmark? [...] Have you looked at http://wiki.apache.org/solr/Support ? Regards, Gora
Re: Too many unique terms
Even if you could know ahead of time, 7M stop words is a lot to maintain. But assuming that your index is really pretty static, you could consider building it once, then creating the stopword file from unique terms and re-indexing. You could consider cleaning them on the input side or creating a custom filter that, say, checked against a dictionary (that you'd have to find). There's nothing that I know of that'll allow you to delete unique terms from a static index. About a regex, you could use PatternReplaceCharFilterFactory to remove them from your input stream, but the trick is defining useless. Part numbers are really useful in some situations for instance. There's nothing standard because there's no standard. You haven't, for instance, provided any criteria for what useless is. Do you care about e-mails? What about accents? Unicode? The list gets pretty endless. You should be able to write a regex that removes everything non-alpha-numeric or some such for instance, although even that is a problem if you're indexing anything but plain-vanilla English. The Java pre-defined '\w', for instance, refers to [a-zA-Z_0-9]. Nary an accented character in sight. Best Erick On Tue, Apr 23, 2013 at 3:53 PM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Hi there, Looking at one of my shards (about 1M docs) i see lot of unique terms, more than 8M which is a significant part of my total term count. These are very likely useless terms, binaries or other meaningless numbers that come with few of my docs. I am totally fine with deleting them so these terms would be unsearchable. Thinking about it i get that 1. It is impossible apriori knowing if it is unique term or not, so i cannot add them to my stop words. 2. I have a performance decrease cause my cached chuncks do contain useless data, and im short on memory. Assuming a constant index, is there a way of deleting all terms that are unique from at least the dictionary tim and tip files? Will i get significant query time performance increase? Does any body know a class of regex that identify meaningless terms that i can add to my updateProcessor? Thanks Manu
Fwd: [solr 3.4] anomaly during distributed facet query with 102 shards
Hello list, We deal with an anomaly when doing a distributed facet query against 102 shards. The problem manifests itself in both the frontend solr (router) and a shard. Each time the request is executed, always different shard is affected (at random, hence the anomaly). The query is: http://router_host:router_port /solr/select?q=testfacet=truefacet.field=field_of_type_longfacet.limit=1330facet.mincount=1rows=1facet.sort=indexfacet.zeros=falsefacet.offset=0 I have omitted the shards parameter. The router log: request: http://10.155.244.181:9150/solr/select at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Notice the port of a shard, that is affected. That port changes all the time, even for the same request The log entry is prepended with lines: SEVERE: org.apache.solr.common.SolrException: Internal Server Error Internal Server Error (they are not in the pastebin link) The shard log: Apr 24, 2013 11:08:49 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:50) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:722) Apr 24, 2013 11:08:49 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={} status=500 QTime=2 Apr 24, 2013 11:08:49 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:50) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at
Re: Solr consultant recommendation
On 24/04/2013 11:58, Christian von Wendt-Jensen wrote: Hi We have some detailed Solr setup issues we would like to discuss with a Solr Expert (certified or self-declared), but we are having some difficulties getting in contact with anyone here in Copenhagen, Denmark. Therefore I would like to hear if anybody out there can drop me some names of Solr Experts to contact, available in Denmark? We have issues regarding hardware setup (storage, RAM, cores pr instance, instances per machine), Solr Cloud vs Classic Master/Slave, shard size, to store or not to store, automated deployment of (more) shards, cache optimization, garbage collection issues, field collapsing, PERFORMANCE. You name it and we probably have it as an issue to discuss. We are currently running a setup of ~450 mio documents, receiving +1mio/day. Interesting challenge, if you ask me… If YOU are the one, then please get in contact. Hi Christian, We are based in the UK but have worked for a client in Copenhagen with a large Solr index - in fact I was there last week visiting another potential client. You can find out more about us from www.flax.co.uk - generally we work remotely but the flight from our local airport is only 1hr20m. Do get in touch if I can tell you more. Cheers Charlie Med venlig hilsen / Best Regards Christian von Wendt-Jensen IT Team Lead, Customer Solutions Infopaq International A/S Kgs. Nytorv 22 DK-1050 København K Phone +45 36 99 00 00 Mobile +45 31 17 10 07 Email christian.sonne.jen...@infopaq.commailto:christian.sonne.jen...@infopaq.com Webwww.infopaq.comhttp://www.infopaq.com/ DISCLAIMER: This e-mail and accompanying documents contain privileged confidential information. The information is intended only for the recipient(s) named. Any unauthorised disclosure, copying, distribution, exploitation or the taking of any action in reliance of the content of this e-mail is strictly prohibited. If you have received this e-mail in error we would be obliged if you would delete the e-mail and attachments and notify the dispatcher by return e-mail or at +45 36 99 00 00 P Please consider the environment before printing this mail note. -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException
One more thing: The hack that you commented when the query is a combination of restricted query operators such +-, +, --++--+%, etc? In this cases the application has to deal with all this cases to. Greetings! - Mensaje original - De: Jérôme Étévé jerome.et...@gmail.com Para: solr-user@lucene.apache.org Enviados: Martes, 23 de Abril 2013 10:44:39 Asunto: Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException If you want to allow your users to search for '+' , you also define your '+' as being a regular ALPHA characters: In config: delimiter_types.txt: # # We let +, # and * be part of normal words. # This is to let c++, c#, c* and RD as words. # + = ALPHA # = ALPHA * = ALPHA = ALPHA @ = ALPHA Then in your solr.WordDelimiterFilterFactory, use types=delimiter_types.txt You'll then be able to let your users search for + as part of a word. If you want to allow them to search for just '+' , a little hacking is necessary in your client code. Personally, I just double quote the query if it's only one char length. Can't be harmful and as it will turn your single + into + , it will be considered as a token (rather than being part of the query syntax) by the parser. Providing you're using the edismax parser, it should be just fine for any other queries, like '+ foo' , 'foo +', '++' ... J. On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cuwrote: Hi Kai: Thanks for your reply, for what I've understood this logic must be included in my application, It would be possible to, for instance, use some regular expression at querying time in my schema to avoid a query that contains only this characters? for instance + and + would be a good catch to avoid. Thanks in advance! - Mensaje original - De: Kai Becker m...@kai-becker.com Para: solr-user@lucene.apache.org Enviados: Martes, 23 de Abril 2013 9:48:26 Asunto: Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException Hi, you need to escape that char in search terms. Special chars are + - ! ( ) { } [ ] ^ ~ * ? : \ / at the moment. The %2B is just the url encoding, but it will still be a + for Solr, so just put a \ in front of the chars I mentioned. Cheers, Kai Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez: Hi! Currently I'm working on a basica search engine for, the main problem is that during some tests a problem was detected, in the application if a user search for the + or - term only or the + string it causes an exception in my application, the problem is caused for an org.apache.lucene.queryParser.ParseException in solr. I get the same response if, from the solr admin interface, I search for the + term. For what I've seen the + character gets encoded into %2B which cause the exception. Is there any way of escaping this character so they behave like any other character? or at least get no response for this cases? I'm using solr 3.6.2, deployed in tomcat7. Greetings! http://www.uci.cu http://www.uci.cu http://www.uci.cu -- Jerome Eteve +44(0)7738864546 http://www.eteve.net/ http://www.uci.cu http://www.uci.cu
Solr as a jar file with Embedded Jetty
Hi; I am new to Solr and I was using Solr as war file and deploying it into Tomcat. However I decided to use Solr as jar file with Embedded Jetty. I was doing like that: when I run dist at ant I get .war file of Solr and used to deploy to Tomcat. I want to use it as a jar file as like start.jar under example folder. What should I do, what is that solr-core-4.2.1-SNAPSHOT? When you change code and want to use Solr in a production environment what do you do. Should I use that start.jar, how to compile it.
solr.StopFilterFactory doesn't work with wildcard
Good day! I have a problem with the solr.StopFilterFactory and wildcard text search. For query like this 'hp* pavilion* series* d4*', where 'series' is stop word, I recieve error: 'analyzer returned no terms for multiTerm term: series' But for query like this 'hp* pavilion* series d4*', I recieve expected results. Could you help me? I have field type for search as below: fieldType name=search_string class=solr.TextField positionIncrementGap=100 analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ /analyzer analyzer type=multiterm tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ /analyzer /fieldType Solr version: solr-spec 4.0.0.2012.10.06.03.04.33 solr-impl 4.0.0 1394950 - rmuir - 2012-10-06 03:04:33 lucene-spec 4.0.0 lucene-impl 4.0.0 1394950 - rmuir - 2012-10-06 03:00:40 -- View this message in context: http://lucene.472066.n3.nabble.com/solr-StopFilterFactory-doesn-t-work-with-wildcard-tp4058581.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr as a jar file with Embedded Jetty
I'm not following exactly what you want, but the recommendation you'll get from the majority of folks is to simply use Solr's example/ directory as a starting point. start.jar is Jetty and it's how most of us deploy Solr, and I'll recommend going that route. Solr in Jetty is a .war file. If you want to diverge from that path, you're in unrecommended territory. Erik On Apr 24, 2013, at 09:41 , Furkan KAMACI wrote: Hi; I am new to Solr and I was using Solr as war file and deploying it into Tomcat. However I decided to use Solr as jar file with Embedded Jetty. I was doing like that: when I run dist at ant I get .war file of Solr and used to deploy to Tomcat. I want to use it as a jar file as like start.jar under example folder. What should I do, what is that solr-core-4.2.1-SNAPSHOT? When you change code and want to use Solr in a production environment what do you do. Should I use that start.jar, how to compile it.
Re: Autocommit and replication have been slowing down
Hi Shawn, Thanks for the lesson! I really appreciate your help. I'll figure out a way to use that knowledge to solve my problem. Best Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Autocommit-and-replication-have-been-slowing-down-tp4058361p4058584.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.6.1: changing a field from stored to not stored
I would create a new core as slave of the existing configuration without replicating the core schema and configuration. This way I can get the information from one index to the other while saving the space as fields in the new schema are mainly not stored. After the replication I would swap the cores for the online core to point to the right index dir and conf. i.e. the one with less stored fields. Maj On 24 April 2013 01:48, Petersen, Robert robert.peter...@mail.rakuten.comwrote: Hey I just want to verify one thing before I start doing this: function queries only require fields to be indexed but don't require them to be stored right? -Original Message- From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com] Sent: Tuesday, April 23, 2013 4:39 PM To: solr-user@lucene.apache.org Subject: RE: Solr 3.6.1: changing a field from stored to not stored Good info, Thanks Hoss! I was going to add a more specific fl= parameter to my queries at the same time. Currently I am doing fl=*,score so that will have to be changed. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Tuesday, April 23, 2013 4:18 PM To: solr-user@lucene.apache.org Subject: Re: Solr 3.6.1: changing a field from stored to not stored : index? I noticed I am unnecessarily storing some fields in my index and : I'd like to stop storing them without having to 'reindex the world' and : let the changes just naturally percolate into my index as updates come : in the normal course of things. Do you guys think I could get away with : this? Yes, you can easily get away with this type of change w/o re-indexing, however you won't gain any immediate index size savings until each and every existing doc has been reindexed and the old copies expunged from the index via segment merges. the one hicup thta can affect people when doing this is what happens if you use something like fl=* (and likely hl=* as well) ... many places in Solr will try to avoid failure if a stored field is found in the index which isn't defined in the schema, and treat that stored value as a string (legacy behavior designed to make it easier for people to point Solr at old lucene indexes built w/o using Solr) ... so if these stored values are not strings, you might get some weird data in your response for these documents. -Hoss
RE: ranking score by fields
Highlighter doesn’t help me. It mark terms but not search text. F.e. I have doc with field1=apache lucene, field2=apache solr. I search apache solr with AND default option. I found this doc with highlighted field1=emapache/em lucene. This is bad result for me. Look I want to do something like this: Search text: apache solr RESULT: Found in field1 Doc1 Doc2 Doc3 ... Found in field2 Doc101 Doc102 Doc103 ... Search result have two (or more) parts. Each part sorted by other field f.e. by field date desc. It mean I need make right sort and I need some flag to insert Found in field2 text. I try q=apache solr fl=field1, field2, score, val1:$q1, val2:$q2 defType=dismax qf=field1^1000 field2^1 q1={!dismax qf=field1 v='apache solr'} q2={!dismax qf=field2 v='apache solr'} Now I have flags: val10 - found in field1 But now I have problem with sort: I cant use val1, val2 in sort :(. And now my questions: 1. have I posibility use my custom fields val1, val2 in sort? With formula. Or params $q1, $q2? 2. may be I have posibility set score by formula at qurey-time? 3. your variant? Thanks. Alex. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 23, 2013 1:53 AM To: solr-user@lucene.apache.org Subject: Re: ranking score by fields You can sometimes use the highlighter component to do this, but it's a little tricky... But note your syntax isn't doing what you expect. (field1:apache solr) parses as field1:apache defaultfield:solr. You want field1:(apache solr) debug=all is your friend for these kinds of things, especially the parsed query section Best Erick On Mon, Apr 22, 2013 at 4:44 AM, Каскевич Александр akaskev...@prontosoft.by wrote: Hi. I want to make subject but don't know exactly how can I do it. Example. I have index with field1, field2, field3. I make a query like: (field1:apache solr) OR (field2:apache solr) OR (field3:apache solr) And I want to know: is it found this doc by field1 or by field2 or by field3? I try to make like this: (field1:apache solr)^100 OR (field2:apache solr)^10 OR (field3:apache solr)^1 But the problem is that I don't know range, minimum and maximum value of score for each field. With other types of similarities (BM25 or othres) same situation. I cant find information about this in manual. Else, I try to use Relevance Functions, f.e. termfreq but it work only with terms, not with phrases, like apache solr. May be I miss something or you have other idea to do this? And else, I am not a java programmer and best way for me don't write any plugins for solr. Thanks. Alex.
Re: Indexing PDF Files
Have you tried using absolute path to the relevant urls? That will cleanly split the problem into 'still not working' and 'wrong relative path'. Regards, Alex. On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI furkankam...@gmail.com wrote: lib dir=../../../contrib/extraction/lib regex=.*\.jar / lib dir=../../../dist/ regex=solr-cell-\d.*\.jar / Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Indexing PDF Files
Also, at Solr startup time it logs what it loads from those lib elements, so you can see whether it is loading the files you intend to or not. Erik On Apr 24, 2013, at 10:05 , Alexandre Rafalovitch wrote: Have you tried using absolute path to the relevant urls? That will cleanly split the problem into 'still not working' and 'wrong relative path'. Regards, Alex. On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI furkankam...@gmail.com wrote: lib dir=../../../contrib/extraction/lib regex=.*\.jar / lib dir=../../../dist/ regex=solr-cell-\d.*\.jar / Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Fields issue 4.2.1
Field names don't absolutely have to follow Java naming conventions, but if they don't then they are not GUARANTEED to work in all contexts in Solr. The fl parameter is one of those contexts. You can work around it by using a function query: field(1234566_user) -- Jack Krupansky -Original Message- From: William Bell Sent: Wednesday, April 24, 2013 1:16 AM To: solr-user@lucene.apache.org Subject: Fields issue 4.2.1 I am getting no results when using dynamic field, and the name begins with numbers. This is okay on 3.6, but does not work in 4.2. dynamic name: 1234566_user fl=1234566_user If I change it to name: user_1234566 it works. This appears to be a bug. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Book text with chapter line number
Chapter seems too broad and line seems too narrow -- have you thought about paragraph level? Something like: docID, book fields (title, author, publisher, etc), chapter fields (#, title, pages, etc), section fields (title, #, etc), sub-sectionN fields, paragraph text, lines Seems like line #'s would only be useful for display so just store the lines the paragraph covers. On Tue, Apr 23, 2013 at 7:51 PM, Walter Underwood wun...@wunderwood.org wrote: If you can represent your books in XML, then MarkLogic could do the job very cleanly. It isn't free, but it is very good. wunder On Apr 23, 2013, at 6:47 PM, Jason Funk wrote: Is there a better tool than Solr to use for my situation? On Apr 23, 2013, at 5:04 PM, Jack Krupansky j...@basetechnology.com wrote: There is no simple, obvious, and direct approach, right out of the box. Sure, you can highlight passages of raw text, right out of the box, but that won't give you chapters, pages, and line numbers. To do all of that, you would have to either: 1. Add chapter, page, and line number as part of the payload for each word. And add some custom document transformers to access the information. or 2. Index each line as a separate Solr document, with fields for book, chapter, page, and line number. -- Jack Krupansky -Original Message- From: Jason Funk Sent: Tuesday, April 23, 2013 5:02 PM To: solr-user@lucene.apache.org Subject: Book text with chapter line number Hello. I'm trying to figure out if Solr is going to work for a new project that I am wanting to build. At it's heart it's a book text searching application. Each book is broken into chapters and each chapter is broken into lines. I want to be able to search these books and return relevant sections of the book and display the results with chapter and line number. I'm not sure how I would structure my data so that it's efficient and functional. I could simply treat each line of text as a document which would provide some of the functionality but what if the search query spanned two lines? Then it seems the passage the user was searching for wouldn't be returned. I could treat each book as a document and use highlighting to find the context but that seems to limit weighting/results for best matches as well as difficultly in finding chapter/line numbers. What is the best way to do this with Solr? Is there a better tool to use to solve my problem? -- Walter Underwood wun...@wunderwood.org
Re: Book text with chapter line number
It's easy to then store a map of term position to line-number and page-number along with each paragraph, or? Paul On 24 avr. 2013, at 16:24, Timothy Potter wrote: Chapter seems too broad and line seems too narrow -- have you thought about paragraph level? Something like: docID, book fields (title, author, publisher, etc), chapter fields (#, title, pages, etc), section fields (title, #, etc), sub-sectionN fields, paragraph text, lines Seems like line #'s would only be useful for display so just store the lines the paragraph covers. On Tue, Apr 23, 2013 at 7:51 PM, Walter Underwood wun...@wunderwood.org wrote: If you can represent your books in XML, then MarkLogic could do the job very cleanly. It isn't free, but it is very good. wunder On Apr 23, 2013, at 6:47 PM, Jason Funk wrote: Is there a better tool than Solr to use for my situation? On Apr 23, 2013, at 5:04 PM, Jack Krupansky j...@basetechnology.com wrote: There is no simple, obvious, and direct approach, right out of the box. Sure, you can highlight passages of raw text, right out of the box, but that won't give you chapters, pages, and line numbers. To do all of that, you would have to either: 1. Add chapter, page, and line number as part of the payload for each word. And add some custom document transformers to access the information. or 2. Index each line as a separate Solr document, with fields for book, chapter, page, and line number. -- Jack Krupansky -Original Message- From: Jason Funk Sent: Tuesday, April 23, 2013 5:02 PM To: solr-user@lucene.apache.org Subject: Book text with chapter line number Hello. I'm trying to figure out if Solr is going to work for a new project that I am wanting to build. At it's heart it's a book text searching application. Each book is broken into chapters and each chapter is broken into lines. I want to be able to search these books and return relevant sections of the book and display the results with chapter and line number. I'm not sure how I would structure my data so that it's efficient and functional. I could simply treat each line of text as a document which would provide some of the functionality but what if the search query spanned two lines? Then it seems the passage the user was searching for wouldn't be returned. I could treat each book as a document and use highlighting to find the context but that seems to limit weighting/results for best matches as well as difficultly in finding chapter/line numbers. What is the best way to do this with Solr? Is there a better tool to use to solve my problem? -- Walter Underwood wun...@wunderwood.org
Re: solr.StopFilterFactory doesn't work with wildcard
Well, what is happening is that the query parser detects a prefix query (series*) and then does a term analysis on the prefix alone (series), which you probably have in your stop words list, which causes the analyzer to return... nothing, which is what the error is complaining about. You can workaround my querying for serie* (as long as serie is not also a stop word. In any case, technically, the stop filter is doing exactly what it is supposed to do. In all honesty, I can't imagine a context in which a noun such as series would be on a stop word list. What's your thinking on why it is there?? -- Jack Krupansky -Original Message- From: Dmitry Baranov Sent: Wednesday, April 24, 2013 9:43 AM To: solr-user@lucene.apache.org Subject: solr.StopFilterFactory doesn't work with wildcard Good day! I have a problem with the solr.StopFilterFactory and wildcard text search. For query like this 'hp* pavilion* series* d4*', where 'series' is stop word, I recieve error: 'analyzer returned no terms for multiTerm term: series' But for query like this 'hp* pavilion* series d4*', I recieve expected results. Could you help me? I have field type for search as below: fieldType name=search_string class=solr.TextField positionIncrementGap=100 analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ /analyzer analyzer type=multiterm tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/ /analyzer /fieldType Solr version: solr-spec 4.0.0.2012.10.06.03.04.33 solr-impl 4.0.0 1394950 - rmuir - 2012-10-06 03:04:33 lucene-spec 4.0.0 lucene-impl 4.0.0 1394950 - rmuir - 2012-10-06 03:00:40 -- View this message in context: http://lucene.472066.n3.nabble.com/solr-StopFilterFactory-doesn-t-work-with-wildcard-tp4058581.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr faceted search UI
Hi, I am working on a POC, where I have to display faceted search result on web page. can anybody please help me to suggest what all set up I need to configure to display. I would prefer java technologies. Just to mention, I have solr cloud running on remote server. I would like to know: 1. Should I use MVC framework? 2. How will my local interact with remote solr server? 3. How will I send query through java code and what technology I should use to display faceted search result? Please help me on this. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Update on shards
On 4/24/2013 12:49 AM, Arkadi Colson wrote: We are using tomcat so we'll just wait. Hopefully it's fixed in 4.3 but we have a work around for now so... What exactly is the difference between jetty and tomcat. We are using tomcat because we've read somewhere that it should be more robust in heavily loaded production environments. Both are servlet containers - a Java executable server program that can run other programs written using the Java Servlet API. The servlet API was invented by Sun, who also invented Java itself. For comparison purposes, first think about Apache's HTTPD, which is a web server designed to serve files. Through its rich modular capability, it does have the ability to run web applications, but the core HTTPD is designed to grab a file off the hard drive and send it to a user. A servlet container is different. You can think of a servlet container as a smart web server designed from the ground up to run web applications. http://en.wikipedia.org/wiki/Java_Servlet Solr is a servlet. By itself, Solr can't run. It requires a servlet container. Here is what wikipedia has to say about the histories of the two projects: http://en.wikipedia.org/wiki/Apache_Tomcat#History http://en.wikipedia.org/wiki/Jetty_%28Web_server%29#History If you google difference between jetty and tomcat you'll find a lot of links. The one written by Jetty folks is particularly detailed, but has an obvious bias. With emphasis on tuning, you can probably get good performance out of either container. Jetty is smaller with a default configuration, but as others have pointed out, most of the resource utilization will be done by Solr, not the container. There is one major reason that I chose to use Jetty. It was already there in the Solr download. The reasons that I have stuck with it even after having time to research: It works well, and it is extensively tested every time anyone runs the tests that come with the Solr build system. Thanks, Shawn
Re: Update on shards
Thx! On 04/24/2013 04:46 PM, Shawn Heisey wrote: On 4/24/2013 12:49 AM, Arkadi Colson wrote: We are using tomcat so we'll just wait. Hopefully it's fixed in 4.3 but we have a work around for now so... What exactly is the difference between jetty and tomcat. We are using tomcat because we've read somewhere that it should be more robust in heavily loaded production environments. Both are servlet containers - a Java executable server program that can run other programs written using the Java Servlet API. The servlet API was invented by Sun, who also invented Java itself. For comparison purposes, first think about Apache's HTTPD, which is a web server designed to serve files. Through its rich modular capability, it does have the ability to run web applications, but the core HTTPD is designed to grab a file off the hard drive and send it to a user. A servlet container is different. You can think of a servlet container as a smart web server designed from the ground up to run web applications. http://en.wikipedia.org/wiki/Java_Servlet Solr is a servlet. By itself, Solr can't run. It requires a servlet container. Here is what wikipedia has to say about the histories of the two projects: http://en.wikipedia.org/wiki/Apache_Tomcat#History http://en.wikipedia.org/wiki/Jetty_%28Web_server%29#History If you google difference between jetty and tomcat you'll find a lot of links. The one written by Jetty folks is particularly detailed, but has an obvious bias. With emphasis on tuning, you can probably get good performance out of either container. Jetty is smaller with a default configuration, but as others have pointed out, most of the resource utilization will be done by Solr, not the container. There is one major reason that I chose to use Jetty. It was already there in the Solr download. The reasons that I have stuck with it even after having time to research: It works well, and it is extensively tested every time anyone runs the tests that come with the Solr build system. Thanks, Shawn
Re: JVM Parameters to Startup Solr?
On 4/24/2013 2:02 AM, Furkan KAMACI wrote: Lucidworks Solr Guide says that: If you are using Sun's JVM, add the -server command-line option when you start Solr. This tells the JVM that it should optimize for a long running, server process. If the Java runtime on your system is a JRE, rather than a full JDK distribution (including javac and other development tools), then it is possible that it may not support the -server JVM option Does any folks using -server parameter? Also what parameters you are using to start up Solr? I mean parallel garbage collector vs.? The answers to your questions are hotly debated in Java communities. This is treading on religious ground. :) I never actually use the -server parameter. When java runs on my multiprocessor 64-bit Linux machines, it already knows it should be in server mode. If you run on a platform that Java decides is a client machine, you might need the -server parameter. Most people agree that you should use the CMS collector. You won't find much agreement about anything else on the startup commandline. I can tell you what I use. It may work for you, apart from the specific value of the -Xmx parameter. These parameters result in fairly low GC pause times for me. I can tell you that I have arrived at these parameters through testing that wasn't very methodical, so they are probably not the optimal settings: -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts The G1 collector is supposed to work for all situations without tuning, but it didn't work for me. GC pause times were just as long as when I had a badly tuned CMS setup. Thanks, Shawn
Noob question: why doesn't this query work?
So, I'm executing the following query: id:6178dB=@Fm AND i_0:613OFS AND (i_3:6111 OR i_3:1yyy\~) AND (NOT id:6178ZwWj5m OR numfields:[* TO 6114] OR d_4:false OR NOT i_4:6142E=m) It's machine generated, which explains the redundancies. The problem is that the query returns no results- but there is a document that should match- it has an id of 6178dB=@Fm, an i_0 field of 613OFS, an i_3 field of 6111, a numfields of 611A, a d_4 of true (but this shouldn't matter), and an i_4 of 6142F1S. The problem seems to be with the negations. I did try to replace the NOT's with -'s, so, for example, NOT id:6178ZwWj5m would become -id:6178ZwWj5m, and this didn't seem to work. Help? What's wrong with the query? Thanks. Brian
Re: Solr faceted search UI
It's a pretty subjective and opinionated kinda thing here, as UIs are built with all sorts of technologies and even though I'm quite opinionated about how *I* would build something I work with a lot of folks that have their own preferences or organizational standards/constraints on what they can use. Pragmatically speaking, it's best to use what you or your team are familiar with. That being said... if this is strictly for a PoC and not something you need to put into production as-is, you can leverage the /browse feature powered by Solr's VelocityResponseWriter (wt=velocity) that is in Solr's example configuration. I'm not aware of any Java-based framework out there for Solr - there's so many choices (Struts? Tapestry? JSPs? etc) that any single one of them would be off-putting to others. In Java, the SolrJ library is what you want to use for remote access to Solr. You'll get back a Java response object that you can navigate to pull out the facet information to hand to your view tier. If you're ok with something not Java (but can be deployed in a Java container and can interact with Java) then give projectblacklight.org a try - it's a Ruby on Rails full featured front-end to Solr. There's also solrstrap that looks like a fun place to do some lightweight PoC development. Erik On Apr 24, 2013, at 10:43 , richa wrote: Hi, I am working on a POC, where I have to display faceted search result on web page. can anybody please help me to suggest what all set up I need to configure to display. I would prefer java technologies. Just to mention, I have solr cloud running on remote server. I would like to know: 1. Should I use MVC framework? 2. How will my local interact with remote solr server? 3. How will I send query through java code and what technology I should use to display faceted search result? Please help me on this. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr faceted search UI
Hi richa, You can use solrJ (http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr) to query your solr index. On the wiki page indicated, you will see example of faceted search using solrJ. 2009 article by Yonik available on searchhubhttp://searchhub.org/2009/09/02/faceted-search-with-solr/ is a good tutorial on faceted search. Whether you go for MVC framework or not is up to you. It is recommend tough to develop search engine application in a Service Oriented Architecture. Regards, Maj On 24 April 2013 16:43, richa striketheg...@gmail.com wrote: Hi, I am working on a POC, where I have to display faceted search result on web page. can anybody please help me to suggest what all set up I need to configure to display. I would prefer java technologies. Just to mention, I have solr cloud running on remote server. I would like to know: 1. Should I use MVC framework? 2. How will my local interact with remote solr server? 3. How will I send query through java code and what technology I should use to display faceted search result? Please help me on this. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Book text with chapter line number
It seems that the normal use case is line=document with some exception for cross-line indexing. The edge case could be solved by either indexing additional 'two-line' documents with lower boost or to have 'context' field with line before/after where applicable (e.g. within same para). Then there might also be some trick around using highlighter to figure out whether the match came from the 'line' field or from 'context' field. I also like payload idea, though there does not seem to be too much information around on using that. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 10:28 AM, Paul Libbrecht p...@hoplahup.net wrote: It's easy to then store a map of term position to line-number and page-number along with each paragraph, or? Paul On 24 avr. 2013, at 16:24, Timothy Potter wrote: Chapter seems too broad and line seems too narrow -- have you thought about paragraph level? Something like: docID, book fields (title, author, publisher, etc), chapter fields (#, title, pages, etc), section fields (title, #, etc), sub-sectionN fields, paragraph text, lines Seems like line #'s would only be useful for display so just store the lines the paragraph covers. On Tue, Apr 23, 2013 at 7:51 PM, Walter Underwood wun...@wunderwood.org wrote: If you can represent your books in XML, then MarkLogic could do the job very cleanly. It isn't free, but it is very good. wunder On Apr 23, 2013, at 6:47 PM, Jason Funk wrote: Is there a better tool than Solr to use for my situation? On Apr 23, 2013, at 5:04 PM, Jack Krupansky j...@basetechnology.com wrote: There is no simple, obvious, and direct approach, right out of the box. Sure, you can highlight passages of raw text, right out of the box, but that won't give you chapters, pages, and line numbers. To do all of that, you would have to either: 1. Add chapter, page, and line number as part of the payload for each word. And add some custom document transformers to access the information. or 2. Index each line as a separate Solr document, with fields for book, chapter, page, and line number. -- Jack Krupansky -Original Message- From: Jason Funk Sent: Tuesday, April 23, 2013 5:02 PM To: solr-user@lucene.apache.org Subject: Book text with chapter line number Hello. I'm trying to figure out if Solr is going to work for a new project that I am wanting to build. At it's heart it's a book text searching application. Each book is broken into chapters and each chapter is broken into lines. I want to be able to search these books and return relevant sections of the book and display the results with chapter and line number. I'm not sure how I would structure my data so that it's efficient and functional. I could simply treat each line of text as a document which would provide some of the functionality but what if the search query spanned two lines? Then it seems the passage the user was searching for wouldn't be returned. I could treat each book as a document and use highlighting to find the context but that seems to limit weighting/results for best matches as well as difficultly in finding chapter/line numbers. What is the best way to do this with Solr? Is there a better tool to use to solve my problem? -- Walter Underwood wun...@wunderwood.org
Re: Re: Support of field variants in solr
You can certainly specify all your aliases in the request. The request handler is just there to simplify the client by allowing it to specify a different URL with everything else mapped on the server. And, of course, with request handler you can lock the parameters to force them. Regarding language detection during indexing, there is a module for that: http://wiki.apache.org/solr/LanguageDetection . Hopefully that would be sufficient. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Apr 23, 2013 at 4:45 PM, Timo Schmidt timo-schm...@gmx.net wrote: Ok, thanks for this hint i have two further questions to understand it completly. Settingup custom request handler makes it easier to avoid all the mapping parameters in the query but it would also be possible with one request handler and all mapping in the request arguments right? What about indexing, i there also a mechanism like this or should the application deside with target field to use? Gesendet: Dienstag, 23. April 2013 um 02:32 Uhr Von: Alexandre Rafalovitch arafa...@gmail.com An: solr-user@lucene.apache.org Betreff: Re: Support of field variants in solr To route different languages, you could use different request handlers and do different alias mapping. There are two alias mapping: On the way in for eDisMax: https://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming On the way out: https://wiki.apache.org/solr/CommonQueryParameters#Field_alias[https://wiki.apache.org/solr/CommonQueryParameters#Field_alias] Between the two, you can make sure that all searches to /searchES map 'content' field to 'content_es' and for /searchDE map 'content' to 'content_de'. Hope this helps, Alex. Personal blog: http://blog.outerthoughts.com/[http://blog.outerthoughts.com/] LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch[http://www.linkedin.com/in/alexandrerafalovitch] - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Apr 22, 2013 at 2:31 PM, Timo Schmidt timo-schm...@gmx.net wrote: Hi together, i am timo and work for a solr implementation company. During the last projects we came to know that we need to be able to generate different variants of a document. Example 1 (Language): To handle all documents in one solr core, we need a field variant for each language. content for spanish content field name=content type=text_es indexed=true stored=true variant=“es“ / content for german content field name=content type=text_de indexed=true stored=true variant=“de“ / Each of these fields can be configured in the solr schema to act optimal for the specific taget language. Example 2 (Stores): We have customers who want to sell the same product in different stores for different prices. price in frankfurt field name=price type=sfloat indexed=true stored=true variant=“fr“ / price in paris field name=price type=sfloat indexed=true stored=true variant=“pr“ / To solve this in an optimal way it would be nice when this works complely transparent inside solr by definig a „variantQuery“ A select query could look like this: select?variantQuery=frqf=price,content Additional the following is possible. No variant is present, behavious should be as before, so it should be relevant for all queries. The setting variant=“*“ would mean: There can be several wildcard variant defined in a commited document. This makes sence when the data type would be the same for all variants and you will have many variants (like in the price example). The same as during query time should be possible during indexing time. I know, that we can do somthing like this also with dynamic fields but then we need to resolve the concrete fields during index and querytime on the application level, what is possible but it would be nicer to have a concept like this in solr, also working with facets is easier with this approach when the concrete fieldname does not need to be populated in the application. So my questions are: What do you think about this approach? Is it better to work with dynamic fields? Is it reasonable when you have 200 variants or more of a document? What needs to be done in solr to have something like this variant attribute for fields? Do you have other approaches?
Re: ranking score by fields
Hi Alex, Back to your original requirement, I think you can do the job at the client side. As Erik noted, highlighter component can help. You are right it marks terms but not search text. But analyzing the search text with the appropriate analyzer will give you the terms of your text as used by the highlighter component. Hope this helps. Cheers, Maj On 24 April 2013 16:02, Каскевич Александр akaskev...@prontosoft.by wrote: Highlighter doesn’t help me. It mark terms but not search text. F.e. I have doc with field1=apache lucene, field2=apache solr. I search apache solr with AND default option. I found this doc with highlighted field1=emapache/em lucene. This is bad result for me. Look I want to do something like this: Search text: apache solr RESULT: Found in field1 Doc1 Doc2 Doc3 ... Found in field2 Doc101 Doc102 Doc103 ... Search result have two (or more) parts. Each part sorted by other field f.e. by field date desc. It mean I need make right sort and I need some flag to insert Found in field2 text. I try q=apache solr fl=field1, field2, score, val1:$q1, val2:$q2 defType=dismax qf=field1^1000 field2^1 q1={!dismax qf=field1 v='apache solr'} q2={!dismax qf=field2 v='apache solr'} Now I have flags: val10 - found in field1 But now I have problem with sort: I cant use val1, val2 in sort :(. And now my questions: 1. have I posibility use my custom fields val1, val2 in sort? With formula. Or params $q1, $q2? 2. may be I have posibility set score by formula at qurey-time? 3. your variant? Thanks. Alex. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 23, 2013 1:53 AM To: solr-user@lucene.apache.org Subject: Re: ranking score by fields You can sometimes use the highlighter component to do this, but it's a little tricky... But note your syntax isn't doing what you expect. (field1:apache solr) parses as field1:apache defaultfield:solr. You want field1:(apache solr) debug=all is your friend for these kinds of things, especially the parsed query section Best Erick On Mon, Apr 22, 2013 at 4:44 AM, Каскевич Александр akaskev...@prontosoft.by wrote: Hi. I want to make subject but don't know exactly how can I do it. Example. I have index with field1, field2, field3. I make a query like: (field1:apache solr) OR (field2:apache solr) OR (field3:apache solr) And I want to know: is it found this doc by field1 or by field2 or by field3? I try to make like this: (field1:apache solr)^100 OR (field2:apache solr)^10 OR (field3:apache solr)^1 But the problem is that I don't know range, minimum and maximum value of score for each field. With other types of similarities (BM25 or othres) same situation. I cant find information about this in manual. Else, I try to use Relevance Functions, f.e. termfreq but it work only with terms, not with phrases, like apache solr. May be I miss something or you have other idea to do this? And else, I am not a java programmer and best way for me don't write any plugins for solr. Thanks. Alex.
Re: JVM Parameters to Startup Solr?
On Apr 24, 2013, at 4:02 AM, Furkan KAMACI furkankam...@gmail.com wrote: Lucidworks Solr Guide says that: If you are using Sun's JVM, add the -server command-line option when you start Solr. This tells the JVM that it should optimize for a long running, server process. If the Java runtime on your system is a JRE, rather than a full JDK distribution (including javac and other development tools), then it is possible that it may not support the -server JVM option Does any folks using -server parameter? Also what parameters you are using to start up Solr? I mean parallel garbage collector vs.? Unless you are using 32-bit Windows, you are probably getting the server JVM. It's not a bad idea to use -server to be sure - it's certainly preferable to -client for Solr. You should generally use the concurrent low pause garbage collector with Solr. - Mark
Re: Solr - WordDelimiterFactory with Custom Tokenizer to split only on Boundires
The WDF types will treat a character the same regardless of where it appears. For something conditional, like dot between letters vs. dot lot preceded and followed by a letter, you either have to have a custom tokenizer or a character filter. Interesting that although the standard tokenizer messes up embedded hyphens, it does handle the embedded dot vs. trailing dot case as you wish (but messes up U.S.A. by stripping the trailing dot) - but that doesn't help your case. A character filter like the following might help your case: fieldType name=text_ws_dot class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory pattern=([\w\d])[\._amp;]+($|[^\w\d]) replacement=$1 $2 / charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^|[^\w\d])[\._amp;]+($|[^\w\d]) replacement=$1 $2 / charFilter class=solr.PatternReplaceCharFilterFactory pattern=(^|[^\w\d])[\._amp;]+([\w\d]) replacement=$1 $2 / tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType I'm not a regular expression expert, so I'm not sure whether/how those patterns could be combined. Also, that doesn't allow the case of a single ., , or _ as a word - but you didn't specify how that case should be handled. -- Jack Krupansky -Original Message- From: meghana Sent: Wednesday, April 24, 2013 6:49 AM To: solr-user@lucene.apache.org Subject: Solr - WordDelimiterFactory with Custom Tokenizer to split only on Boundires I have configured WordDelimiterFilterFactory for custom tokenizers for '' and '-' , and for few tokenizer (like . _ :) we need to split on boundries only. e.g. test.com (should tokenized to test.com) newyear. (should tokenized to newyear) new_car (should tokenized to new_car) .. .. Below is defination for text field fieldType name=text_general_preserved class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange =0 splitOnNumerics =0 stemEnglishPossessive =0 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 protected=protwords_general.txt types=wdfftypes_general.txt / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange =0 splitOnNumerics =0 stemEnglishPossessive =0 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 protected=protwords_general.txt types=wdfftypes_general.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType below is wdfftypes_general.txt content = ALPHA - = ALPHA _ = SUBWORD_DELIM : = SUBWORD_DELIM . = SUBWORD_DELIM types can be used in worddelimiter are LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, SUBWORD_DELIM . there's no description available for use of each type. as per name, i thought type SUBWORD_DELIM may fulfill my need, but it doesn't seem to work. Can anybody suggest me how can i set configuration for worddelimiter factory to fulfill my requirement. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-WordDelimiterFactory-with-Custom-Tokenizer-to-split-only-on-Boundires-tp4058557.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr faceted search UI
Thank you very much for your suggestion. This is only for PoC. As you suggested about blacklight, can I run this on windows and to build PoC do I have to have ruby on rails knowledge? Irrespective of any technology and considering the fact that in past I had worked on java, j2ee what would you suggest or how would you have proceeded for this? Blacklight seems to be a good option, not sure without prior knowledge of ruby on rails, will I be able to present in short period of time? any suggestion on this? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598p4058617.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Noob question: why doesn't this query work?
On 4/24/2013 8:59 AM, Brian Hurt wrote: So, I'm executing the following query: id:6178dB=@Fm AND i_0:613OFS AND (i_3:6111 OR i_3:1yyy\~) AND (NOT id:6178ZwWj5m OR numfields:[* TO 6114] OR d_4:false OR NOT i_4:6142E=m) It's machine generated, which explains the redundancies. The problem is that the query returns no results- but there is a document that should match- it has an id of 6178dB=@Fm, an i_0 field of 613OFS, an i_3 field of 6111, a numfields of 611A, a d_4 of true (but this shouldn't matter), and an i_4 of 6142F1S. The problem seems to be with the negations. I did try to replace the NOT's with -'s, so, for example, NOT id:6178ZwWj5m would become -id:6178ZwWj5m, and this didn't seem to work. Help? What's wrong with the query? Thanks. It looks like you might have meant to negate all of the query clauses inside the last set of parentheses. That's not what your actual query says. If you change your negation so that the NOT is outside the parentheses, so that it reads AND NOT (... OR ...), that should fix that part of it. If the boolean layout you have is really what you want, then you need to change the negation queries to (*:* -query) instead, because pure negative queries are not supported. That syntax says all documents except those that match the query. For simple negation queries, Solr can figure out that it needs to add the *:* internally, but this query is more complex. A few other possible problems: A backslash is a special character used to escape other special characters, so you *might* need two of them - one to say 'the next character is literal' and one to actually be the backslash. If you follow the advice in the next paragraph, I can guarantee this will be the case. For that reason, you might want to keep the quotes on fields that might contain characters that have special meaning to the Solr query parser. Don't use quotes unless you really are after phrase queries or you can't escape special characters. You might actually need phrase queries for some of this, but I would try simple one-field queries without the quotes to see whether you need them. I have no idea what happens if you include quotes inside a range query (the 6114), but it might not do what you expect. I would definitely remove the quotes from that part of the query. Thanks, Shawn
Re: Solr faceted search UI
I tried previous version of blacklight (on a Mac) and was able to get it to the demo stage without much RoR knowledge. The facet field declarations were all in the config files. You should be able to get a go/nogo decision in under four hours. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 11:23 AM, richa striketheg...@gmail.com wrote: Thank you very much for your suggestion. This is only for PoC. As you suggested about blacklight, can I run this on windows and to build PoC do I have to have ruby on rails knowledge? Irrespective of any technology and considering the fact that in past I had worked on java, j2ee what would you suggest or how would you have proceeded for this? Blacklight seems to be a good option, not sure without prior knowledge of ruby on rails, will I be able to present in short period of time? any suggestion on this? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598p4058617.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JVM Parameters to Startup Solr?
Just verifying that it is also recommended to use the JVM options to kill on OOM? I vaguely recall a message from Mark about this sometime ago: -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError On Wed, Apr 24, 2013 at 9:13 AM, Mark Miller markrmil...@gmail.com wrote: On Apr 24, 2013, at 4:02 AM, Furkan KAMACI furkankam...@gmail.com wrote: Lucidworks Solr Guide says that: If you are using Sun's JVM, add the -server command-line option when you start Solr. This tells the JVM that it should optimize for a long running, server process. If the Java runtime on your system is a JRE, rather than a full JDK distribution (including javac and other development tools), then it is possible that it may not support the -server JVM option Does any folks using -server parameter? Also what parameters you are using to start up Solr? I mean parallel garbage collector vs.? Unless you are using 32-bit Windows, you are probably getting the server JVM. It's not a bad idea to use -server to be sure - it's certainly preferable to -client for Solr. You should generally use the concurrent low pause garbage collector with Solr. - Mark
Re: Solr 3.6.1: changing a field from stored to not stored
I would create a new core as slave of the existing configuration without replicating the core schema and configuration. This way I can get the This won't work, as master/slave replication copies the index files as-is. You should re-index all your data. You don't need to take down the cluster to do that, just re-index on top of what's there already, and your index will become smaller and smaller as merging kicks out the old data :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 24. apr. 2013 kl. 15:59 skrev Majirus FANSI majirus@gmail.com: I would create a new core as slave of the existing configuration without replicating the core schema and configuration. This way I can get the information from one index to the other while saving the space as fields in the new schema are mainly not stored. After the replication I would swap the cores for the online core to point to the right index dir and conf. i.e. the one with less stored fields. Maj On 24 April 2013 01:48, Petersen, Robert robert.peter...@mail.rakuten.comwrote: Hey I just want to verify one thing before I start doing this: function queries only require fields to be indexed but don't require them to be stored right? -Original Message- From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com] Sent: Tuesday, April 23, 2013 4:39 PM To: solr-user@lucene.apache.org Subject: RE: Solr 3.6.1: changing a field from stored to not stored Good info, Thanks Hoss! I was going to add a more specific fl= parameter to my queries at the same time. Currently I am doing fl=*,score so that will have to be changed. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Tuesday, April 23, 2013 4:18 PM To: solr-user@lucene.apache.org Subject: Re: Solr 3.6.1: changing a field from stored to not stored : index? I noticed I am unnecessarily storing some fields in my index and : I'd like to stop storing them without having to 'reindex the world' and : let the changes just naturally percolate into my index as updates come : in the normal course of things. Do you guys think I could get away with : this? Yes, you can easily get away with this type of change w/o re-indexing, however you won't gain any immediate index size savings until each and every existing doc has been reindexed and the old copies expunged from the index via segment merges. the one hicup thta can affect people when doing this is what happens if you use something like fl=* (and likely hl=* as well) ... many places in Solr will try to avoid failure if a stored field is found in the index which isn't defined in the schema, and treat that stored value as a string (legacy behavior designed to make it easier for people to point Solr at old lucene indexes built w/o using Solr) ... so if these stored values are not strings, you might get some weird data in your response for these documents. -Hoss
Re: Solr faceted search UI
Hi Maj, Thanks for your suggestion. Tell me one thing, do you have any example on solrj? suppose I decide to use solrj in simple web application, to display faceted search on web page. Where will this fit into? what will be the flow? Please suggest. Thanks On Wed, Apr 24, 2013 at 11:01 AM, Majirus FANSI [via Lucene] ml-node+s472066n4058610...@n3.nabble.com wrote: Hi richa, You can use solrJ ( http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr) to query your solr index. On the wiki page indicated, you will see example of faceted search using solrJ. 2009 article by Yonik available on searchhubhttp://searchhub.org/2009/09/02/faceted-search-with-solr/ is a good tutorial on faceted search. Whether you go for MVC framework or not is up to you. It is recommend tough to develop search engine application in a Service Oriented Architecture. Regards, Maj On 24 April 2013 16:43, richa [hidden email]http://user/SendEmail.jtp?type=nodenode=4058610i=0 wrote: Hi, I am working on a POC, where I have to display faceted search result on web page. can anybody please help me to suggest what all set up I need to configure to display. I would prefer java technologies. Just to mention, I have solr cloud running on remote server. I would like to know: 1. Should I use MVC framework? 2. How will my local interact with remote solr server? 3. How will I send query through java code and what technology I should use to display faceted search result? Please help me on this. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598p4058610.html To unsubscribe from Solr faceted search UI, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4058598code=c3RyaWtldGhlZ29hbEBnbWFpbC5jb218NDA1ODU5OHwxNzIzOTAyMzYx . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598p4058619.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JVM Parameters to Startup Solr?
Yes, I recommend this. You can't predict a JVM that has had an OOM - so it's best to neutralize it. We have seen cases where the node was messed up but still advertised as active and good in zk due to OOM's. Behavior after an OOM is undefined. I was actually going to ask if you were positive you had restarted that node in the other OOM thread, because that sounded similar. Just a straw to grasp for, as I'd guess you are sure you did restart it. - Mark On Apr 24, 2013, at 11:37 AM, Timothy Potter thelabd...@gmail.com wrote: Just verifying that it is also recommended to use the JVM options to kill on OOM? I vaguely recall a message from Mark about this sometime ago: -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError On Wed, Apr 24, 2013 at 9:13 AM, Mark Miller markrmil...@gmail.com wrote: On Apr 24, 2013, at 4:02 AM, Furkan KAMACI furkankam...@gmail.com wrote: Lucidworks Solr Guide says that: If you are using Sun's JVM, add the -server command-line option when you start Solr. This tells the JVM that it should optimize for a long running, server process. If the Java runtime on your system is a JRE, rather than a full JDK distribution (including javac and other development tools), then it is possible that it may not support the -server JVM option Does any folks using -server parameter? Also what parameters you are using to start up Solr? I mean parallel garbage collector vs.? Unless you are using 32-bit Windows, you are probably getting the server JVM. It's not a bad idea to use -server to be sure - it's certainly preferable to -client for Solr. You should generally use the concurrent low pause garbage collector with Solr. - Mark
Re: JVM Parameters to Startup Solr?
On Apr 24, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote: -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError The way I like to handle this is to have the OOM trigger a little script or set of cmds that logs the issue and kills the process. Then if you have the process supervised (via runit or something), it will just start back up (what else do you do after an OOM?), but you will have logged something, triggered a notification, whatever. - Mark
Solr indeing Partially working
Hi i am using Solr 4.2.0 and extension 2.8.2 with Typo3. Whever I try to do indexing pages and news pages It gets only 3.29% indexed. I checked a developer log and found error in solrservice.php. And in solr admin it is giving Dups is not defined please add it. What should i do in this case? If possible please send me the settings of schema.xml and solrconfig.xml .i am new to typo3 and solr both. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indeing-Partially-working-tp4058623.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JVM Parameters to Startup Solr?
I like the idea of running a script vs. kill -9 ;-) Right now when a node fails, we have monitors for whether a node is up and serving queries. If not, that triggers some manual investigation and restart process. Part of the process was to capture the logs and heap dump file. What happened previously is that the log capture part wasn't scripted into the restart process and so the logs got wiped out when the restart happened :-( One question about this - when you say logs the issue from your script - what type of things do you log? I've been relying on the timestamp of the heap dump (hprof) as a way to trace back into our log files. Thanks. Tim On Wed, Apr 24, 2013 at 10:03 AM, Mark Miller markrmil...@gmail.com wrote: On Apr 24, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote: -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError The way I like to handle this is to have the OOM trigger a little script or set of cmds that logs the issue and kills the process. Then if you have the process supervised (via runit or something), it will just start back up (what else do you do after an OOM?), but you will have logged something, triggered a notification, whatever. - Mark
Deletes and inserts
We are using a Solr collection to serve auto complete suggestions. We'd like for the update to be without any noticeable delay for the users. I've been looking at adding new cores, loading them with the new data and then swapping them with the current ones, but but I don't see how that would work in a cloud installation. It seems that when I create a new core it is part of the collection and the old data will start replicating to it. Is that correct? I've also looked at standing up a new collection and then adding an alias for it, but that's not well documented. If the alias already exists and I add to to another collection is it removed from the first collection? I'm open to any suggestions. -- To *know* is one thing, and to know for certain *that* we know is another. --William James
Re: JVM Parameters to Startup Solr?
On Apr 24, 2013, at 12:22 PM, Timothy Potter thelabd...@gmail.com wrote: I like the idea of running a script vs. kill -9 ;-) Right now when a node fails, we have monitors for whether a node is up and serving queries. If not, that triggers some manual investigation and restart process. Part of the process was to capture the logs and heap dump file. What happened previously is that the log capture part wasn't scripted into the restart process and so the logs got wiped out when the restart happened :-( One question about this - when you say logs the issue from your script - what type of things do you log? I've been relying on the timestamp of the heap dump (hprof) as a way to trace back into our log files. Yeah, that's pretty much it - the time of the event and the fact that an OOM occurred. If you are dropping a heap dump, that has the same info, but a log is just a nice compact little history of events. - Mark Thanks. Tim On Wed, Apr 24, 2013 at 10:03 AM, Mark Miller markrmil...@gmail.com wrote: On Apr 24, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote: -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError The way I like to handle this is to have the OOM trigger a little script or set of cmds that logs the issue and kills the process. Then if you have the process supervised (via runit or something), it will just start back up (what else do you do after an OOM?), but you will have logged something, triggered a notification, whatever. - Mark
Re: How to let Solr load libs from within my JAR?
If you want to pack JARs inside JARs, you can use something that does classloader magic like One-JAR, but it's usually good to avoid things like that unless you really need them. Alternatively, you could look at something that unpacks jars and reassembles them into a new JAR, like the Maven Assembly or Shade plugins. But usually moving a few extra JARs isn't too difficult. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Apr 23, 2013 at 9:37 PM, Xi Shen davidshe...@gmail.com wrote: Hi, I developed a data import handler, it has some dependent libraries. I deployed them in a parallel folder with my JAR and included the path in solrconfig.xml. It works fine. But I am thinking maybe I can pack those JAR libs within my JAR, but I got NoClassDefFoundError exception when executing my DIH. Is it possible Solr can load JAR libs packed in my JAR? How can I do that. -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84
Re: Deletes and inserts
We're using aliases to control visibility of collections we rebuild from scratch nightly. It works pretty well. If you run CREATEALIAS again, it'll switch to a new one, not augment the old one. If for some reason, you want to bridge more than one collection, you can add more than one collection to the alias at creation time, but then it becomes read-only. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Wed, Apr 24, 2013 at 12:26 PM, Jon Strayer j...@strayer.org wrote: We are using a Solr collection to serve auto complete suggestions. We'd like for the update to be without any noticeable delay for the users. I've been looking at adding new cores, loading them with the new data and then swapping them with the current ones, but but I don't see how that would work in a cloud installation. It seems that when I create a new core it is part of the collection and the old data will start replicating to it. Is that correct? I've also looked at standing up a new collection and then adding an alias for it, but that's not well documented. If the alias already exists and I add to to another collection is it removed from the first collection? I'm open to any suggestions. -- To *know* is one thing, and to know for certain *that* we know is another. --William James
SOLR Install
I m trying to use solr as part of another maven based web application. I m not sure how to wire the two war files. Any help please? I found this documentation in SOLR but unsure how to go about it. !-- If you are wiring Solr into a larger web application which controls the web context root, you will probably want to mount Solr under a path prefix (app.war with /app/solr mounted into it, for example). You will need to put this prefix in front of the SolrDispatchFilter url-pattern mapping too (/solr/*), and also on any paths for legacy Solr servlet mappings you may be using. For the Admin UI to work properly in a path-prefixed configuration, the admin folder containing the resources needs to be under the app context root named to match the path-prefix. For example: .war xxx js main.js -- !-- init-param param-namepath-prefix/param-name param-value/xxx/param-value /init-param -- Thank you, Peri Subrahmanya On 4/24/13 12:52 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: solrservice.php and the text of that error both sound like parts of Typo3... they're definitely not part of Solr. You should ask on a list devoted to Typo3 to figure out what to do in this situation. It likely won't involve reconfiguring Solr. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn¹t a Game On Wed, Apr 24, 2013 at 11:53 AM, vishal gupta vishalgup...@yahoo.co.in wrote: Hi i am using Solr 4.2.0 and extension 2.8.2 with Typo3. Whever I try to do indexing pages and news pages It gets only 3.29% indexed. I checked a developer log and found error in solrservice.php. And in solr admin it is giving Dups is not defined please add it. What should i do in this case? If possible please send me the settings of schema.xml and solrconfig.xml .i am new to typo3 and solr both. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indeing-Partially-working-tp40586 23.html Sent from the Solr - User mailing list archive at Nabble.com. *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: Solr indeing Partially working
On 24 April 2013 22:22, Michael Della Bitta michael.della.bi...@appinions.com wrote: solrservice.php and the text of that error both sound like parts of Typo3... they're definitely not part of Solr. You should ask on a list devoted to Typo3 to figure out what to do in this situation. It likely won't involve reconfiguring Solr. You would definitely have better luck asking on a TYPO3 list. Also, I would check the version of Solr supported by the extension: 4.2.0 is pretty new, and might not be supported. Regards, Gora
RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.
One of our main concerns is the solr returns the best match based on what it thinks is the best. It uses Levenshtein's distance metrics to determine the best suggestions. Can we tune this to put more weightage on the number of frequency/hits vs the number of edits ? If we can tune this, suggestions would seem more relevant when corrected.Also, if we can do this while keeping maxCollation = 1 and maxCollationTries = some reasonable number so that QTime does not go out of control that will be great! Any insights into this would be great. Thanks for your help. Regards, -- Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058655.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.
When getting collations there are two steps. First, the spellchecker gets individual word choices for each misspelled word. By default, these are sorted by string distance first, then document frequency second. You can override this by specifying str name=comparatorClassfreq/str in your spellchecker component configuration in solrconfig.xml. The example provided in the distribution has a commented-out section explaining this. In the second step, one correction is taken off each list and checked against the index to see if it is a valid collation. By valid, it needs to return at least 1 hit. The order in which words combinations are tried is dictated by the first step. Once it runs out of tries, runs out of suggestions, or has enough valid collations, it stops. You cannot configure this to try a bunch and sort by # hits or anything like that. You would have to specify a large # of collations to be returned and do this in your application. But this can run the risk of a high qtimes. So you can sort by frequency, but not by hits. Sorting by hits would mean trying a lot of collations and that is probably too expensive. One caveat is that sorting by frequency could result in far afield results being returned to the user. You might find that lower-frequency, smaller-edit-distance suggestions are going to give the user what they want more than higher-edit-distance, higher-frequency suggestions. Just because a word is very common doesn't mean it is the right word. This is why distance is the default and not freq. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: SandeepM [mailto:skmi...@hotmail.com] Sent: Wednesday, April 24, 2013 12:13 PM To: solr-user@lucene.apache.org Subject: RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times. One of our main concerns is the solr returns the best match based on what it thinks is the best. It uses Levenshtein's distance metrics to determine the best suggestions. Can we tune this to put more weightage on the number of frequency/hits vs the number of edits ? If we can tune this, suggestions would seem more relevant when corrected.Also, if we can do this while keeping maxCollation = 1 and maxCollationTries = some reasonable number so that QTime does not go out of control that will be great! Any insights into this would be great. Thanks for your help. Regards, -- Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058655.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Noob question: why doesn't this query work?
Thanks for your reponse. You've given me some solid leads. On Wed, Apr 24, 2013 at 11:25 AM, Shawn Heisey s...@elyograg.org wrote: On 4/24/2013 8:59 AM, Brian Hurt wrote: So, I'm executing the following query: id:6178dB=@Fm AND i_0:613OFS AND (i_3:6111 OR i_3:1yyy\~) AND (NOT id:6178ZwWj5m OR numfields:[* TO 6114] OR d_4:false OR NOT i_4:6142E=m) It's machine generated, which explains the redundancies. The problem is that the query returns no results- but there is a document that should match- it has an id of 6178dB=@Fm, an i_0 field of 613OFS, an i_3 field of 6111, a numfields of 611A, a d_4 of true (but this shouldn't matter), and an i_4 of 6142F1S. The problem seems to be with the negations. I did try to replace the NOT's with -'s, so, for example, NOT id:6178ZwWj5m would become -id:6178ZwWj5m, and this didn't seem to work. Help? What's wrong with the query? Thanks. It looks like you might have meant to negate all of the query clauses inside the last set of parentheses. That's not what your actual query says. If you change your negation so that the NOT is outside the parentheses, so that it reads AND NOT (... OR ...), that should fix that part of it. No, I meant the NOT to only bind to the next id. So the query I wanted was: id:6178dB=@Fm AND i_0:613OFS AND (i_3:6111 OR i_3:1yyy\~) AND ((NOT id:6178ZwWj5m) OR numfields:[* TO 6114] OR d_4:false OR (NOT i_4:6142E=m)) If the boolean layout you have is really what you want, then you need to change the negation queries to (*:* -query) instead, because pure negative queries are not supported. That syntax says all documents except those that match the query. For simple negation queries, Solr can figure out that it needs to add the *:* internally, but this query is more complex. This could be the problem. This is query is machine generated, so I don't care how ugly it is. Does this apply even to inner queries? I.e., should that last clause be (*:* -i_4:6142E=m) instead of (NOT I-4:6142E=m)? A few other possible problems: A backslash is a special character used to escape other special characters, so you *might* need two of them - one to say 'the next character is literal' and one to actually be the backslash. If you follow the advice in the next paragraph, I can guarantee this will be the case. For that reason, you might want to keep the quotes on fields that might contain characters that have special meaning to the Solr query parser. I wash all strings through ClientUtils.escapeQueryChars always, so this isn't a problem. That string should just be 1yyy~, the ~ was getting escaped. Don't use quotes unless you really are after phrase queries or you can't escape special characters. You might actually need phrase queries for some of this, but I would try simple one-field queries without the quotes to see whether you need them. I have no idea what happens if you include quotes inside a range query (the 6114), but it might not do what you expect. I would definitely remove the quotes from that part of the query. This is another solid possibility, although it might raise some difficulties for me- I need to be able to support literal string comparisons, so I'm not sure how well this would support the query s_7 = some string with spaces sorts of queries. But some experimentation here is definitely in order. Thanks, Shawn
Re: Noob question: why doesn't this query work?
: This could be the problem. This is query is machine generated, so I don't : care how ugly it is. Does this apply even to inner queries? I.e., should : that last clause be (*:* -i_4:6142E=m) instead of (NOT I-4:6142E=m)? yes -- you can't exclude 6142E=m w/o defining what set (ie: the set of all documents: *:*) you are excluding it from. Related reading about building up nested queries with parens and using the AND/OR/NOT syntax... http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/ -Hoss
Re: solr.StopFilterFactory doesn't work with wildcard
: In any case, technically, the stop filter is doing exactly what it is supposed : to do. Jack has kind of glossed over some key questions here... 1) why are you using StopFilterFactory in your multiterm analyzer like this? 2) what do you expect it to do if series is in your stopwords and someone queries for series* : fieldType name=search_string class=solr.TextField : positionIncrementGap=100 : analyzer type=query : tokenizer class=solr.WhitespaceTokenizerFactory / : filter class=solr.StopFilterFactory words=stopwords.txt : ignoreCase=true/ : /analyzer : analyzer type=multiterm : tokenizer class=solr.WhitespaceTokenizerFactory / : filter class=solr.StopFilterFactory words=stopwords.txt : ignoreCase=true/ : /analyzer : /fieldType -Hoss
Re: Noob question: why doesn't this query work?
On 4/24/2013 12:13 PM, Brian Hurt wrote: If the boolean layout you have is really what you want, then you need to change the negation queries to (*:* -query) instead, because pure negative queries are not supported. That syntax says all documents except those that match the query. For simple negation queries, Solr can figure out that it needs to add the *:* internally, but this query is more complex. This could be the problem. This is query is machine generated, so I don't care how ugly it is. Does this apply even to inner queries? I.e., should that last clause be (*:* -i_4:6142E=m) instead of (NOT I-4:6142E=m)? Exactly right. I wash all strings through ClientUtils.escapeQueryChars always, so this isn't a problem. That string should just be 1yyy~, the ~ was getting escaped. A quick check with debugQuery seems to confirm my thoughts on this - if you have the quotes, the escaping isn't necessary, although including it appears to be working correctly too. Depending on exactly what field type you have, you might be good there. Don't use quotes unless you really are after phrase queries or you can't escape special characters. You might actually need phrase queries for some of this, but I would try simple one-field queries without the quotes to see whether you need them. I have no idea what happens if you include quotes inside a range query (the 6114), but it might not do what you expect. I would definitely remove the quotes from that part of the query. This is another solid possibility, although it might raise some difficulties for me- I need to be able to support literal string comparisons, so I'm not sure how well this would support the query s_7 = some string with spaces sorts of queries. But some experimentation here is definitely in order. Due to the query parser trying to be smart, quotes appear to be necessary if spaces are part of your indexed values and your query. Since I now know that you don't want to negate the range query, it makes sense for me to tell you that a value of 611A is outside the range [* TO 6114], because numbers are lower than letters when doing string comparisons. This was why I thought you might be trying to negate the entire query clause - it's the only way that particular piece would match. Thanks, Shawn
Re: Update on shards
Sorry - need to correct myself - updates worked the same as read requests - they also needed to hit a SolrCore in order to get forwarded to the right node. I was not thinking clearly when I said this applied to just reads and not writes. Both needed a SolrCore to do their work - with the request proxying, this is no longer the case, so you can hit Solr instances with no SolrCores or with SolrCores that are not part of the collection you are working with, and both read and write side requests are now proxied to a suitable node that has a SolrCore that can do the search or forward the update (or accept the update). - Mark On Apr 23, 2013, at 3:38 PM, Mark Miller markrmil...@gmail.com wrote: We have a 3rd release candidate for 4.3 being voted on now. I have never tested this feature with Tomcat - only Jetty. Users have reported it does not work with Tomcat. That leads one to think it may have a problem in other containers as well. A previous contributor donated a patch that explicitly flushes a stream in our proxy code - he says this allows the feature to work with Tomcat. I committed this feature - the flush can't hurt, and given the previous contributions of this individual, I'm fairly confident the fix makes things work in Tomcat. I have no first hand knowledge that it does work though. You might take the RC for a spin and test it our yourself: http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/ - Mark On Apr 23, 2013, at 3:20 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Mark; All in all you say that when 4.3 is tagged at repository (I mean when it is ready) this feature will work for Tomcat too at a stable version? 2013/4/23 Mark Miller markrmil...@gmail.com On Apr 23, 2013, at 2:49 PM, Shawn Heisey s...@elyograg.org wrote: What exactly is the 'request proxying' thing that doesn't work on tomcat? Is this something different from basic SolrCloud operation where you send any kind of request to any server and they get directed where they need to go? I haven't heard of that not working on tomcat before. Before 4.2, if you made a read request to a node that didn't contain part of the collection you where searching, it would return 404. Write requests would be forwarded to where they belong no matter what node you sent them to, but read requests required that node have a part of the collection you were accessing. In 4.2 we added request proxying for this read side case. If a piece of the collection you are querying is not found on the node you hit, a simple proxy of the request is done to a node that does contain a piece of the collection. - Mark
How do set compression for compression on stored fields in SOLR 4.2.1
https://issues.apache.org/jira/browse/LUCENE-4226 It mentions that we can set compression mode: FAST, HIGH_COMPRESSION, FAST_UNCOMPRESSION. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: SOLR 4.3
OK I did not see it on the latest 4.3 RC3. On Wed, Apr 24, 2013 at 4:52 AM, Jan Høydahl jan@cominvent.com wrote: As you can see on the issue, it is already fixed for 4.3 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 24. apr. 2013 kl. 07:02 skrev William Bell billnb...@gmail.com: Can we get this in please to 4.3? https://issues.apache.org/jira/browse/SOLR-4746 -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Bill Bell billnb...@gmail.com cell 720-256-8076
full-import takes 4 days(48 hours) to complete where main db table size 700k only
Hi, Environment is Solr 3.6.1. The database is having enough indexes. The box is having enough memory. The DB is performance is good. Auto commit is enabled for every 1 minute. Please see the following entity. The full-import of this entity is taking over 48 hours to complete on production environment. The number records in the main table is around 700,000 only. I tried materialized view, but that view is having duplicate records. So I can't go with materialized view for all these queries. Can someone please suggest how to improve the performance for full-import? entity name=oracle-article dataSource=oracle pk=VCMID preImportDeleteQuery=content_type:article AND repository:oracleqa query=select ID as VCMID from tab_story2 order by published_date desc deltaImportQuery=select '${dataimporter.delta.VCMID}' as VCMID from dual deltaQuery=select s2.ID as VCMID from tab_story2 s2, gnasmomap mms2, gnasmometadata mmd where s2.id = mms2.keystring1 and mms2.recordid = mmd.contentmgmtid and mmd.lastpublishdate ((CAST(SYS_EXTRACT_UTC(TIMESTAMP '${dataimporter.oracle-article.last_index_time}') AS DATE) - TO_DATE('01/01/1970 00:00:00', 'MM-DD- HH24:MI:SS')) * 24 * 60 * 60 * 1000)-30 entity name=recordid dataSource=oracle transformer=TemplateTransformer query=select RECORDID from gnasmomap where keystring1 = '${oracle-article.VCMID}' field column=content_type template=article/ field column=RECORDID name=native_id/ field column=repository template=oracleqa/ /entity entity name=quot;article_detailsquot; dataSource=quot;oraclequot; transformer=quot;ClobTransformer,RegexTransformer,script:trimTicker,script:hasBody,script:hasDeckquot; query=quot;select STORY_TITLE, STORY_HEADLINE, SOURCE, DECK, regexp_replace(body, '\lt;p\\[(pullquote|summary)\]\/p\|\[video [0-9]+?\]|\[youtube .+?\]', '') as BODY, PUBLISHED_DATE, MODIFIED_DATE, DATELINE, REPORTER_NAME, TICKER_CODES,ADVERTORIAL_CONTENT from tab_story2 where id = '${oracle-article.VCMID}' field column=STORY_TITLE name=title/ field column=DECK name=description clob=true/ field column=PUBLISHED_DATE name=date/ field column=MODIFIED_DATE name=last_modified_date/ field column=BODY name=body clob=true/ field column=SOURCE name=source/ field column=DATELINE name=dateline/ field column=STORY_HEADLINE name=export_headline/ field column=ticker splitBy=, sourceColName=TICKER_CODES/ field column=ADVERTORIAL_CONTENT name=advertorial_content/ field column=has_body sourceColName=body/ field column=has_description sourceColName=description/ /entity entity name=site dataSource=oracle query=select CASE WHEN site.name='fq2' THEN 'fqn' WHEN site.name='fq' THEN 'sbc' WHEN site.name='fq-lat' THEN 'latino' ELSE 'gc' END SITE, CASE WHEN site.name='fq2' THEN 'v8-qa.tabbusiness.com' WHEN site.name='fb' THEN 'v8-qa.smallbusiness.tabbusiness.com' WHEN site.name='qn-latino' THEN 'v8-qa.latino.tabdays.com' ELSE 'v8-qa.tabdays.com' END SERVER from gnasmomap mm, gnaschannelfileassociation cfa, gnaschannel ch, gnassite site where mm.keystring1 = '${oracle-article.VCMID}' and mm.recordid = cfa.vcmobjectid and cfa.channelid = ch.id and ch.siteid = site.id and rownum = 1 field column=SITE name=site/ entity name=url dataSource=oracle query=select 'http://' || '${site.SERVER}' || furl as URL from tab_furl where parent_id = '${oracle-article.VCMID}' field column=URL name=url/ /entity entity name=image dataSource=oracle transformer=script:hasImageURL query=select distinct('http://qa.global.fqstatic.com' || sourcepath) as IMAGE_URL from ( select mc.sourcepath from tab_rel_content rc, tab_story2 st, gnasmomap mm, dsx_media_common mc where rc.parent_id = '${oracle-article.VCMID}' and rc.parent_id = st.id and (st.NO_FEATURED_MEDIA != 'yes' OR st.NO_FEATURED_MEDIA is null) and rc.ref_id = mm.recordid and mm.keystring1 = mc.mediaid and rc.rank = 1 union all select mc.sourcepath from tab_rel_content arm, tab_story2 st, gnasmomap cmm, tab_rel_media crm, gnasmomap mmm, dsx_media_common mc where arm.parent_id = '${oracle-article.VCMID}' and arm.parent_id = st.id and (st.NO_FEATURED_MEDIA !='yes' OR st.NO_FEATURED_MEDIA is null) and arm.ref_id = cmm.recordid and cmm.keystring1 = crm.parent_id and crm.rank = 1 and crm.ref_id = mmm.recordid and mmm.keystring1 = mc.mediaid and arm.rank = 1) field column=IMAGE_URL name=image_url/ field column=has_image_url sourceColName=IMAGE_URL/ /entity /entity entity name=taxonomy dataSource=oracle query=select tc.PATH from gnasmomap mm, gndaassociation ass, gndataxonomycategory tc where mm.recordid = ass.cmsobjectid and ass.categoryid = tc.id and mm.keystring1 = '${oracle-article.VCMID}' field column=PATH name=taxonomy_path/ /entity entity name=keyword dataSource=oracle transformer=RegexTransformer,script:trimKeyword query=select KEYWORDS from tab_rel_metadata where parent_id = '${oracle-article.VCMID}' field column=keyword splitBy=, sourceColName=KEYWORDS/ /entity entity name=author dataSource=oracle query=select pmm.recordid as author_id, trim(trim(trailing ',' from
Re: full-import takes 4 days(48 hours) to complete where main db table size 700k only
1) You may have a small primary table but for each ID in it, you seem to be calling another 6 tables with nested SQL queries. Perhaps you need to cache those calls: http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor 2) You seem to be double-dipping into the main table tab_story2 in the nested entity, perhaps there is a way to avoid that 3) You are sorting the main table in the outside query. Why? You are going to process every record anyway. 4) Auto-commit is probably way too expensive here. Try setting it to every 2 minutes without changing anything else and see how many more entities you process in the same X minutes. In Solr 4+, there are better options for commit. Regards, Alex On Wed, Apr 24, 2013 at 3:25 PM, srinalluri nallurisr...@yahoo.com wrote: Hi, Environment is Solr 3.6.1. The database is having enough indexes. The box is having enough memory. The DB is performance is good. Auto commit is enabled for every 1 minute. Please see the following entity. The full-import of this entity is taking over 48 hours to complete on production environment. The number records in the main table is around 700,000 only. I tried materialized view, but that view is having duplicate records. So I can't go with materialized view for all these queries. Can someone please suggest how to improve the performance for full-import? entity name=oracle-article dataSource=oracle pk=VCMID preImportDeleteQuery=content_type:article AND repository:oracleqa Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
RE: Error while starting Solr on Websphere
I'm having this same issue (versions and all). Was there ever a response to this question? I can't seem to find one. Thanks in advance! From divz80 divya.god...@gmail.com Subject Error while starting Solr on Websphere Date Wed, 20 Mar 2013 23:13:07 GMT Hi, i'm attempting to setup Solr 4.2.0 on IBM Websphere 8.5. I've deployed the solr.war and when I try to access the admin page, I get this error. Error 503: Server is shutting down The log files has this error: [3/20/13 18:56:33:564 EDT] 0061 HttpClientUti I org.apache.solr.client.solrj.impl.HttpClientUtil createClient Creating new http client, config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false [3/20/13 18:56:33:592 EDT] 0061 SolrDispatchF E org.apache.solr.servlet.SolrDispatchFilter init Could not start Solr. Check solr/home property and the logs [3/20/13 18:56:33:639 EDT] 0061 SolrCore E org.apache.solr.common.SolrException log null:java.lang.NoSuchMethodError: org/apache/http/conn/scheme/Scheme.init(Ljava/lang/String;ILorg/apache/http/conn/scheme/SchemeSocketFactory;)V I verified that the latest jar, httpclient-4.2.3.jar is in the lib folder, there are no older versions of the jar. is there any other configuration step I'm missing or a property I need to set?
Re: full-import takes 4 days(48 hours) to complete where main db table size 700k only
How long do those query take to execute and return all it's rows outside of DataImportHandler? I'd bring those queries into SQL Developer and get an explain plan on them to find out if any of them are much slower than the other. You might have only 700k documents for your index, but you're issuing a separate query for every entity for every document. Multiply that number of queries times the average round trip latency to your database and that's the amount of time your app server and database server spend sitting around doing nothing, waiting for messages to arrive. If you can remove any of those entities in favor of joins, you'll be doing yourself a favor. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Wed, Apr 24, 2013 at 3:25 PM, srinalluri nallurisr...@yahoo.com wrote: Hi, Environment is Solr 3.6.1. The database is having enough indexes. The box is having enough memory. The DB is performance is good. Auto commit is enabled for every 1 minute. Please see the following entity. The full-import of this entity is taking over 48 hours to complete on production environment. The number records in the main table is around 700,000 only. I tried materialized view, but that view is having duplicate records. So I can't go with materialized view for all these queries. Can someone please suggest how to improve the performance for full-import? entity name=oracle-article dataSource=oracle pk=VCMID preImportDeleteQuery=content_type:article AND repository:oracleqa query=select ID as VCMID from tab_story2 order by published_date desc deltaImportQuery=select '${dataimporter.delta.VCMID}' as VCMID from dual deltaQuery=select s2.ID as VCMID from tab_story2 s2, gnasmomap mms2, gnasmometadata mmd where s2.id = mms2.keystring1 and mms2.recordid = mmd.contentmgmtid and mmd.lastpublishdate ((CAST(SYS_EXTRACT_UTC(TIMESTAMP '${dataimporter.oracle-article.last_index_time}') AS DATE) - TO_DATE('01/01/1970 00:00:00', 'MM-DD- HH24:MI:SS')) * 24 * 60 * 60 * 1000)-30 entity name=recordid dataSource=oracle transformer=TemplateTransformer query=select RECORDID from gnasmomap where keystring1 = '${oracle-article.VCMID}' field column=content_type template=article/ field column=RECORDID name=native_id/ field column=repository template=oracleqa/ /entity entity name=article_details dataSource=oracle transformer=ClobTransformer,RegexTransformer,script:trimTicker,script:hasBody,script:hasDeck query=select STORY_TITLE, STORY_HEADLINE, SOURCE, DECK, regexp_replace(body, '\p\\[(pullquote|summary)\]\/p\|\[video [0-9]+?\]|\[youtube .+?\]', '') as BODY, PUBLISHED_DATE, MODIFIED_DATE, DATELINE, REPORTER_NAME, TICKER_CODES,ADVERTORIAL_CONTENT from tab_story2 where id = '${oracle-article.VCMID}' field column=STORY_TITLE name=title/ field column=DECK name=description clob=true/ field column=PUBLISHED_DATE name=date/ field column=MODIFIED_DATE name=last_modified_date/ field column=BODY name=body clob=true/ field column=SOURCE name=source/ field column=DATELINE name=dateline/ field column=STORY_HEADLINE name=export_headline/ field column=ticker splitBy=, sourceColName=TICKER_CODES/ field column=ADVERTORIAL_CONTENT name=advertorial_content/ field column=has_body sourceColName=body/ field column=has_description sourceColName=description/ /entity entity name=site dataSource=oracle query=select CASE WHEN site.name='fq2' THEN 'fqn' WHEN site.name='fq' THEN 'sbc' WHEN site.name='fq-lat' THEN 'latino' ELSE 'gc' END SITE, CASE WHEN site.name='fq2' THEN 'v8-qa.tabbusiness.com' WHEN site.name='fb' THEN 'v8-qa.smallbusiness.tabbusiness.com' WHEN site.name='qn-latino' THEN 'v8-qa.latino.tabdays.com' ELSE 'v8-qa.tabdays.com' END SERVER from gnasmomap mm, gnaschannelfileassociation cfa, gnaschannel ch, gnassite site where mm.keystring1 = '${oracle-article.VCMID}' and mm.recordid = cfa.vcmobjectid and cfa.channelid = ch.id and ch.siteid = site.id and rownum = 1 field column=SITE name=site/ entity name=url dataSource=oracle query=select 'http://' || '${site.SERVER}' || furl as URL from tab_furl where parent_id = '${oracle-article.VCMID}' field column=URL name=url/ /entity entity name=image dataSource=oracle transformer=script:hasImageURL query=select distinct('http://qa.global.fqstatic.com' || sourcepath) as IMAGE_URL from ( select mc.sourcepath from tab_rel_content rc, tab_story2 st, gnasmomap mm, dsx_media_common mc where rc.parent_id = '${oracle-article.VCMID}' and rc.parent_id = st.id and (st.NO_FEATURED_MEDIA != 'yes' OR st.NO_FEATURED_MEDIA is null) and rc.ref_id = mm.recordid and mm.keystring1 = mc.mediaid and rc.rank = 1 union all select mc.sourcepath from tab_rel_content arm, tab_story2 st, gnasmomap cmm, tab_rel_media crm, gnasmomap mmm, dsx_media_common mc where arm.parent_id = '${oracle-article.VCMID}' and arm.parent_id = st.id and
Re: Error while starting Solr on Websphere
On 25 April 2013 01:42, Van Tassell, Kristian kristian.vantass...@siemens.com wrote: I'm having this same issue (versions and all). Was there ever a response to this question? I can't seem to find one. Thanks in advance! [...] As the error message says, my first guess would be that solr/home is not set properly. Please see: http://wiki.apache.org/solr/SolrInstall#Setup and also http://wiki.apache.org/solr/SolrWebSphere You might also want to first try to get Solr working with the embedded Jetty as that is the most straightforward way to get started, and is fine for production use also. Regards, Gora
RE: Error while starting Solr on Websphere
I never got it to work on Websphere 8.5. We are using Websphere 7 in production, so I deployed the same app (no changes) and it worked on Websphere 7. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-while-starting-Solr-on-Websphere-tp4049583p4058707.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Error while starting Solr on Websphere
Thanks for the reply. I have setup Solr on Jetty, Tomcat, JBoss and WebLogic (we have to be able to deploy to multiple server types for our doc). On this particular machine, I've set up WebLogic as well with the same Solr home (although WebLogic is stopped at the moment so they don't compete over the same index). For the WebSphere instance, I have the Solr home defined by the startup script (defined by JAVA_OPT, essentially, as -Dsolr.solr.home=D:/solr - which is very similar to how WebLogic, JBoss and Tomcat are set up). Anyways, perhaps I'll set up a separate solr home entirely from the WebLogic instance (just to be sure). -Original Message- From: Gora Mohanty [mailto:g...@mimirtech.com] Sent: Wednesday, April 24, 2013 3:31 PM To: solr-user@lucene.apache.org Subject: Re: Error while starting Solr on Websphere On 25 April 2013 01:42, Van Tassell, Kristian kristian.vantass...@siemens.com wrote: I'm having this same issue (versions and all). Was there ever a response to this question? I can't seem to find one. Thanks in advance! [...] As the error message says, my first guess would be that solr/home is not set properly. Please see: http://wiki.apache.org/solr/SolrInstall#Setup and also http://wiki.apache.org/solr/SolrWebSphere You might also want to first try to get Solr working with the embedded Jetty as that is the most straightforward way to get started, and is fine for production use also. Regards, Gora
Re: Indexing PDF Files
I have added that fields: field name=text type=text_general indexed=true stored=true/ dynamicField name=attr_* type=text_general indexed=true stored=true multiValued=true/ dynamicField name=ignored_* type=ignored/ and I have that definition: fieldtype name=ignored stored=false indexed=false multiValued=true class=solr.StrField / here is my error: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status400/int int name=QTime4154/int /lst lst name=error str name=msgERROR: [doc=1] unknown field 'ignored_meta'/str int name=code400/int /lst /response What should I do more? 2013/4/24 Erik Hatcher erik.hatc...@gmail.com Also, at Solr startup time it logs what it loads from those lib elements, so you can see whether it is loading the files you intend to or not. Erik On Apr 24, 2013, at 10:05 , Alexandre Rafalovitch wrote: Have you tried using absolute path to the relevant urls? That will cleanly split the problem into 'still not working' and 'wrong relative path'. Regards, Alex. On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI furkankam...@gmail.com wrote: lib dir=../../../contrib/extraction/lib regex=.*\.jar / lib dir=../../../dist/ regex=solr-cell-\d.*\.jar / Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
filter before facet
We're testing SolrCloud 4.1 for NRT search over hundreds of millions of documents. I've been really impressed. The query performance is so much better than we were getting out of our database. With filter queries, we're able to get query times of less than 100ms under moderate load. That's amazing. My question today is on faceting. Let me give some examples to help make my point. *fq=state:California* numFound = 92193 QTime = *80* *fq=state:Calforni* numFound = 0 QTime = *8* *fq=state:Californiafacet=truefacet.field=city* numFound = 92193 QTime = *1316* *fq=city:San Franciscofacet=truefacet.field=city* numFound = 1961 QTime = *1477* *fq=state:Californifacet=truefacet.field=city* numFound = 0 QTime = *1380* So filtering is fast and faceting is slow, which is understandable. But why is it slow to generate facets on a result set of 0? Furthermore, why does it take the same amount of time to generate facets on a result set of 2000 as 100,000 documents? This leads me to believe that the FQ is being applied AFTER the facets are calculated on the whole data set. For my use case it would make a ton of sense to apply the FQ first and then facet. Is it possible to specify this behavior or do I need to get into the code and get my hands dirty? Best Regards, Daniel
Re: Indexing PDF Files
Wrong case for fieldType ? Though I would have through Solr would complaint about that when it hits dynamicField with unknown type. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 4:59 PM, Furkan KAMACI furkankam...@gmail.com wrote: I have added that fields: field name=text type=text_general indexed=true stored=true/ dynamicField name=attr_* type=text_general indexed=true stored=true multiValued=true/ dynamicField name=ignored_* type=ignored/ and I have that definition: fieldtype name=ignored stored=false indexed=false multiValued=true class=solr.StrField / here is my error: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status400/int int name=QTime4154/int /lst lst name=error str name=msgERROR: [doc=1] unknown field 'ignored_meta'/str int name=code400/int /lst /response What should I do more? 2013/4/24 Erik Hatcher erik.hatc...@gmail.com Also, at Solr startup time it logs what it loads from those lib elements, so you can see whether it is loading the files you intend to or not. Erik On Apr 24, 2013, at 10:05 , Alexandre Rafalovitch wrote: Have you tried using absolute path to the relevant urls? That will cleanly split the problem into 'still not working' and 'wrong relative path'. Regards, Alex. On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI furkankam...@gmail.com wrote: lib dir=../../../contrib/extraction/lib regex=.*\.jar / lib dir=../../../dist/ regex=solr-cell-\d.*\.jar / Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Indexing PDF Files
Did you restart after adding those fields and types? On Apr 24, 2013, at 16:59, Furkan KAMACI furkankam...@gmail.com wrote: I have added that fields: field name=text type=text_general indexed=true stored=true/ dynamicField name=attr_* type=text_general indexed=true stored=true multiValued=true/ dynamicField name=ignored_* type=ignored/ and I have that definition: fieldtype name=ignored stored=false indexed=false multiValued=true class=solr.StrField / here is my error: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status400/int int name=QTime4154/int /lst lst name=error str name=msgERROR: [doc=1] unknown field 'ignored_meta'/str int name=code400/int /lst /response What should I do more? 2013/4/24 Erik Hatcher erik.hatc...@gmail.com Also, at Solr startup time it logs what it loads from those lib elements, so you can see whether it is loading the files you intend to or not. Erik On Apr 24, 2013, at 10:05 , Alexandre Rafalovitch wrote: Have you tried using absolute path to the relevant urls? That will cleanly split the problem into 'still not working' and 'wrong relative path'. Regards, Alex. On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI furkankam...@gmail.com wrote: lib dir=../../../contrib/extraction/lib regex=.*\.jar / lib dir=../../../dist/ regex=solr-cell-\d.*\.jar / Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: filter before facet
What's your facet.method? Have you tried setting it both ways? http://wiki.apache.org/solr/SimpleFacetParameters#facet.method Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 5:10 PM, Daniel Tyreus dan...@webshots.com wrote: We're testing SolrCloud 4.1 for NRT search over hundreds of millions of documents. I've been really impressed. The query performance is so much better than we were getting out of our database. With filter queries, we're able to get query times of less than 100ms under moderate load. That's amazing. My question today is on faceting. Let me give some examples to help make my point. *fq=state:California* numFound = 92193 QTime = *80* *fq=state:Calforni* numFound = 0 QTime = *8* *fq=state:Californiafacet=truefacet.field=city* numFound = 92193 QTime = *1316* *fq=city:San Franciscofacet=truefacet.field=city* numFound = 1961 QTime = *1477* *fq=state:Californifacet=truefacet.field=city* numFound = 0 QTime = *1380* So filtering is fast and faceting is slow, which is understandable. But why is it slow to generate facets on a result set of 0? Furthermore, why does it take the same amount of time to generate facets on a result set of 2000 as 100,000 documents? This leads me to believe that the FQ is being applied AFTER the facets are calculated on the whole data set. For my use case it would make a ton of sense to apply the FQ first and then facet. Is it possible to specify this behavior or do I need to get into the code and get my hands dirty? Best Regards, Daniel
***Immediate requirement for Java Solr search consultant at Bothell, WA***
Hello Professionals, This is DWAYNE from KRG Technologies; KRG is headquartered in Valencia, CA – Incorporated in 2003, currently have over 200 consultants. We are specialized in providing Staffing Services Solutions in Americas. We are a Tier1 vendor in providing Professional Services on diversified IT Skills for many customers across the country. We are looking for a *Java Solr search consultant* for the below mentioned job description. Kindly forward me your Consultant’s resume, rate and contact details for further process. I also kindly request you to forward this opportunity to your friends or colleagues; so that we can help someone who may be in search of a job or looking for a change Location - Bothell, WA Duration – 6+ Months Job Description: 1. Experience of Apache SOLR Search is required. Other Search product such as endeca search engine can also be considered. 2. Should be familiar with search concepts, solutions and terminologies. 3. Should posses very good knowledge of Java/J2EE/JSP/JQuery/Ajax/JS/XML/XSL/JSON/HTML technologies. 4. Prior experience in Agile Methodologies, Webtrends Reporting, J2EE Design patterns will be an advantage. Thanks Regards, Dwayne 25000 Avenue Stanford, # 243 Valencia, CA 91355 Direct Phone: (661) 310 1677 | Fax: (661) 257-9968 Email: dwa...@krgtech.com | URL: www.krgtech.com -- View this message in context: http://lucene.472066.n3.nabble.com/Immediate-requirement-for-Java-Solr-search-consultant-at-Bothell-WA-tp4058711.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr consultant recommendation
: Subject: Solr consultant recommendation : In-Reply-To: e8a79384-5570-4777-b90c-c59c89cf4...@cominvent.com https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss
Re: Too many unique terms
Hey Erick, thanks for the interesting reply. Indexing unicode characters is not a problem i see, nor indexing mails. I'm alraight with defining useless a word that is unique through all my index. I will try reindexing strategy you proposed, though, as you said, having a few millions of stop words will not be an easy task to maintain. More to it i will reduce the memory chunks that get saved to the RAM, as most of it is trash. As my problem seems to be very specific I think i'll turn to the code to check how can i do it on my own. Hope this adventure will go well. Cheers, Manu On Wed, Apr 24, 2013 at 2:10 PM, Erick Erickson erickerick...@gmail.comwrote: Even if you could know ahead of time, 7M stop words is a lot to maintain. But assuming that your index is really pretty static, you could consider building it once, then creating the stopword file from unique terms and re-indexing. You could consider cleaning them on the input side or creating a custom filter that, say, checked against a dictionary (that you'd have to find). There's nothing that I know of that'll allow you to delete unique terms from a static index. About a regex, you could use PatternReplaceCharFilterFactory to remove them from your input stream, but the trick is defining useless. Part numbers are really useful in some situations for instance. There's nothing standard because there's no standard. You haven't, for instance, provided any criteria for what useless is. Do you care about e-mails? What about accents? Unicode? The list gets pretty endless. You should be able to write a regex that removes everything non-alpha-numeric or some such for instance, although even that is a problem if you're indexing anything but plain-vanilla English. The Java pre-defined '\w', for instance, refers to [a-zA-Z_0-9]. Nary an accented character in sight. Best Erick On Tue, Apr 23, 2013 at 3:53 PM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Hi there, Looking at one of my shards (about 1M docs) i see lot of unique terms, more than 8M which is a significant part of my total term count. These are very likely useless terms, binaries or other meaningless numbers that come with few of my docs. I am totally fine with deleting them so these terms would be unsearchable. Thinking about it i get that 1. It is impossible apriori knowing if it is unique term or not, so i cannot add them to my stop words. 2. I have a performance decrease cause my cached chuncks do contain useless data, and im short on memory. Assuming a constant index, is there a way of deleting all terms that are unique from at least the dictionary tim and tip files? Will i get significant query time performance increase? Does any body know a class of regex that identify meaningless terms that i can add to my updateProcessor? Thanks Manu
how to get display Jessionid with solr results
Hi, We are using jetty as a container for solr 3.6. We have two slave servers to serve queries for the user request and queries distributed to any one slave through load balancer. When one user send a first search request say its going to slave1 and when that user queries again we want to send the query to the same server with the help of Jsessionid. how to achieve this? How to get that Jsessionid with solr search results? Please provide your suggestions. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-get-display-Jessionid-with-solr-results-tp4058751.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing PDF Files
I just want to search on rich documents but I still get same error. I have copied example folder into anywhere else at my computer. I have copied dist and contrib folders from my build folder into that copy of example folder (because solr-cell etc. are within that folders) However I still get same error. If any of you could help me you are welcome. Here is my schema: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- Description: This document contains Solr 4.x schema definition to be used with Solr integration currently build into Nutch. This schema is not minimal, there are some useful field type definitions left, and the set of fields and their flags (indexed/stored/term vectors) can be further optimized depending on needs. See http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup for more info. -- schema name=nutch version=1.5 types !-- The StrField type is not analyzed, but indexed/stored verbatim. -- fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ !-- Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types. -- fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ !-- Numeric field types that index each value at various levels of precision to accelerate range queries when the number of values between the range endpoints is large. See the javadoc for NumericRangeQuery for internal implementation details. Smaller precisionStep values (specified in bits) will lead to more tokens indexed per value, slightly larger index size, and faster range queries. A precisionStep of 0 disables indexing at different precision levels. -- fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 omitNorms=true positionIncrementGap=0/ !-- The format for this date field is of the form 1995-12-31T23:59:59Z, and is a more restricted form of the canonical representation of dateTime http://www.w3.org/TR/xmlschema-2/#dateTime The trailing Z designates UTC time and is mandatory. Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z All other components are mandatory. Expressions can also be used to denote calculations that should be performed relative to NOW to determine the value, ie... NOW/HOUR ... Round to the start of the current hour NOW-1DAY ... Exactly 1 day prior to now NOW/DAY+6MONTHS+3DAYS ... 6 months and 3 days in the future from the start of the current day Consult the DateField javadocs for more information. Note: For faster range queries, consider the tdate type -- fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ !-- solr.TextField allows the specification of custom text analyzers specified as a tokenizer and a list of token filters. Different analyzers may be specified for indexing and querying. The optional positionIncrementGap puts space between multiple fields of this type on the same document, with the purpose of preventing false phrase matching across fields. For more info on customizing your analyzer chain, please see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters -- !-- A general text field that has reasonable, generic cross-language defaults: it tokenizes with StandardTokenizer, removes stop words from case-insensitive stopwords.txt (empty by default), and down cases. At query time only, it also applies synonyms. -- fieldType name=text_general
Re: Indexing PDF Files
Here is my definition for handler: requestHandler name=/update/extract class=solr.extraction. ExtractingRequestHandler lst name=defaults str name=fmap.contenttext/str str name=lowernamestrue/str str name=uprefixattr_/str str name=captureAttrtrue/str /lst /requestHandler 2013/4/25 Furkan KAMACI furkankam...@gmail.com I just want to search on rich documents but I still get same error. I have copied example folder into anywhere else at my computer. I have copied dist and contrib folders from my build folder into that copy of example folder (because solr-cell etc. are within that folders) However I still get same error. If any of you could help me you are welcome. Here is my schema: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- Description: This document contains Solr 4.x schema definition to be used with Solr integration currently build into Nutch. This schema is not minimal, there are some useful field type definitions left, and the set of fields and their flags (indexed/stored/term vectors) can be further optimized depending on needs. See http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup for more info. -- schema name=nutch version=1.5 types !-- The StrField type is not analyzed, but indexed/stored verbatim. -- fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ !-- Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types. -- fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ !-- Numeric field types that index each value at various levels of precision to accelerate range queries when the number of values between the range endpoints is large. See the javadoc for NumericRangeQuery for internal implementation details. Smaller precisionStep values (specified in bits) will lead to more tokens indexed per value, slightly larger index size, and faster range queries. A precisionStep of 0 disables indexing at different precision levels. -- fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 omitNorms=true positionIncrementGap=0/ !-- The format for this date field is of the form 1995-12-31T23:59:59Z, and is a more restricted form of the canonical representation of dateTime http://www.w3.org/TR/xmlschema-2/#dateTime The trailing Z designates UTC time and is mandatory. Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z All other components are mandatory. Expressions can also be used to denote calculations that should be performed relative to NOW to determine the value, ie... NOW/HOUR ... Round to the start of the current hour NOW-1DAY ... Exactly 1 day prior to now NOW/DAY+6MONTHS+3DAYS ... 6 months and 3 days in the future from the start of the current day Consult the DateField javadocs for more information. Note: For faster range queries, consider the tdate type -- fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ !-- solr.TextField allows the specification of custom text analyzers specified as a tokenizer and a list of token filters. Different analyzers may be specified for indexing and querying. The optional positionIncrementGap puts space between multiple fields of this type on the same document, with the purpose of preventing false phrase matching across
Fwd: SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField.
When I try to configure schema.xml within example folder and start jar file I get that error: org.apache.solr.common.SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField. There is nothing about it at example schema.xml file?
Solr 4.3: Too late to improve error messages?
Hello, I am testing 4.3rc3. It looks ok, but I notice that some log messages could be more informative. For example: 680 [coreLoadExecutor-3-thread-3] WARN org.apache.solr.schema.IndexSchema – schema has no name! Would be _very nice_ to know which core this is complaining about. Later, once the core is loaded, the core name shows up in the logs, but it would be nice to have it earlier without having to triangulating it through 'Loading core' messages. Is that too late for 4.3? I know somebody was looking at logging, so maybe there is a chance. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: filter before facet
I'm actually using one not listed in that doc (I suspect it's new). At least with 3 or more facet fields, the FCS method is by far the best. Here are some representative numbers with everything the same except for the facet.method. facet.method = fc QTime = 3168 facet.method = enum QTime = 309 facet.method = fcs QTime = 19 On Wed, Apr 24, 2013 at 2:19 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: What's your facet.method? Have you tried setting it both ways? http://wiki.apache.org/solr/SimpleFacetParameters#facet.method Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 5:10 PM, Daniel Tyreus dan...@webshots.com wrote: We're testing SolrCloud 4.1 for NRT search over hundreds of millions of documents. I've been really impressed. The query performance is so much better than we were getting out of our database. With filter queries, we're able to get query times of less than 100ms under moderate load. That's amazing. My question today is on faceting. Let me give some examples to help make my point. *fq=state:California* numFound = 92193 QTime = *80* *fq=state:Calforni* numFound = 0 QTime = *8* *fq=state:Californiafacet=truefacet.field=city* numFound = 92193 QTime = *1316* *fq=city:San Franciscofacet=truefacet.field=city* numFound = 1961 QTime = *1477* *fq=state:Californifacet=truefacet.field=city* numFound = 0 QTime = *1380* So filtering is fast and faceting is slow, which is understandable. But why is it slow to generate facets on a result set of 0? Furthermore, why does it take the same amount of time to generate facets on a result set of 2000 as 100,000 documents? This leads me to believe that the FQ is being applied AFTER the facets are calculated on the whole data set. For my use case it would make a ton of sense to apply the FQ first and then facet. Is it possible to specify this behavior or do I need to get into the code and get my hands dirty? Best Regards, Daniel
Re: Indexing PDF Files
You still seem to have 'fieldtype' with wrong case. Can you try that simple thing before doing other complicated steps? And yes, restart Solr after you change schema.xml Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 6:50 PM, Furkan KAMACI furkankam...@gmail.com wrote: Here is my definition for handler: requestHandler name=/update/extract class=solr.extraction. ExtractingRequestHandler lst name=defaults str name=fmap.contenttext/str str name=lowernamestrue/str str name=uprefixattr_/str str name=captureAttrtrue/str /lst /requestHandler 2013/4/25 Furkan KAMACI furkankam...@gmail.com I just want to search on rich documents but I still get same error. I have copied example folder into anywhere else at my computer. I have copied dist and contrib folders from my build folder into that copy of example folder (because solr-cell etc. are within that folders) However I still get same error. If any of you could help me you are welcome. Here is my schema: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- Description: This document contains Solr 4.x schema definition to be used with Solr integration currently build into Nutch. This schema is not minimal, there are some useful field type definitions left, and the set of fields and their flags (indexed/stored/term vectors) can be further optimized depending on needs. See http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup for more info. -- schema name=nutch version=1.5 types !-- The StrField type is not analyzed, but indexed/stored verbatim. -- fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ !-- Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types. -- fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ !-- Numeric field types that index each value at various levels of precision to accelerate range queries when the number of values between the range endpoints is large. See the javadoc for NumericRangeQuery for internal implementation details. Smaller precisionStep values (specified in bits) will lead to more tokens indexed per value, slightly larger index size, and faster range queries. A precisionStep of 0 disables indexing at different precision levels. -- fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 omitNorms=true positionIncrementGap=0/ !-- The format for this date field is of the form 1995-12-31T23:59:59Z, and is a more restricted form of the canonical representation of dateTime http://www.w3.org/TR/xmlschema-2/#dateTime The trailing Z designates UTC time and is mandatory. Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z All other components are mandatory. Expressions can also be used to denote calculations that should be performed relative to NOW to determine the value, ie... NOW/HOUR ... Round to the start of the current hour NOW-1DAY ... Exactly 1 day prior to now NOW/DAY+6MONTHS+3DAYS ... 6 months and 3 days in the future from the start of the current day Consult the DateField javadocs for more information. Note: For faster range queries, consider the tdate type -- fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ !-- A Trie based
Re: SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField.
You are running 4.2, right? If you searched mailing list, you would probably find that this is a regression: https://issues.apache.org/jira/browse/SOLR-4567 Should be fixed in 4.3 (I reported this originally and it works in 4.3rc3). Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 7:02 PM, Furkan KAMACI furkankam...@gmail.com wrote: When I try to configure schema.xml within example folder and start jar file I get that error: org.apache.solr.common.SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField. There is nothing about it at example schema.xml file?
Re: Indexing PDF Files
Hi Alex; What do you mean with wrong case. Could you tell me what should I do? 2013/4/25 Alexandre Rafalovitch arafa...@gmail.com You still seem to have 'fieldtype' with wrong case. Can you try that simple thing before doing other complicated steps? And yes, restart Solr after you change schema.xml Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 6:50 PM, Furkan KAMACI furkankam...@gmail.com wrote: Here is my definition for handler: requestHandler name=/update/extract class=solr.extraction. ExtractingRequestHandler lst name=defaults str name=fmap.contenttext/str str name=lowernamestrue/str str name=uprefixattr_/str str name=captureAttrtrue/str /lst /requestHandler 2013/4/25 Furkan KAMACI furkankam...@gmail.com I just want to search on rich documents but I still get same error. I have copied example folder into anywhere else at my computer. I have copied dist and contrib folders from my build folder into that copy of example folder (because solr-cell etc. are within that folders) However I still get same error. If any of you could help me you are welcome. Here is my schema: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- Description: This document contains Solr 4.x schema definition to be used with Solr integration currently build into Nutch. This schema is not minimal, there are some useful field type definitions left, and the set of fields and their flags (indexed/stored/term vectors) can be further optimized depending on needs. See http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup for more info. -- schema name=nutch version=1.5 types !-- The StrField type is not analyzed, but indexed/stored verbatim. -- fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ !-- Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types. -- fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ !-- Numeric field types that index each value at various levels of precision to accelerate range queries when the number of values between the range endpoints is large. See the javadoc for NumericRangeQuery for internal implementation details. Smaller precisionStep values (specified in bits) will lead to more tokens indexed per value, slightly larger index size, and faster range queries. A precisionStep of 0 disables indexing at different precision levels. -- fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 omitNorms=true positionIncrementGap=0/ !-- The format for this date field is of the form 1995-12-31T23:59:59Z, and is a more restricted form of the canonical representation of dateTime http://www.w3.org/TR/xmlschema-2/#dateTime The trailing Z designates UTC time and is mandatory. Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z All other components are mandatory. Expressions can also be used to denote calculations that should be performed relative to NOW to determine the value, ie... NOW/HOUR ... Round to the start of the current hour NOW-1DAY ... Exactly 1 day prior to now NOW/DAY+6MONTHS+3DAYS ... 6 months and 3 days in the future from the
Re: SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField.
Yes, I use 4.2.1, thanks. 2013/4/25 Alexandre Rafalovitch arafa...@gmail.com You are running 4.2, right? If you searched mailing list, you would probably find that this is a regression: https://issues.apache.org/jira/browse/SOLR-4567 Should be fixed in 4.3 (I reported this originally and it works in 4.3rc3). Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 7:02 PM, Furkan KAMACI furkankam...@gmail.com wrote: When I try to configure schema.xml within example folder and start jar file I get that error: org.apache.solr.common.SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField. There is nothing about it at example schema.xml file?
Re: Indexing PDF Files
In your schema you have written fieldtype name=ignored stored=false indexed=false multiValued=true class=solr.StrField / Note that XML tag and param names are case sensitive, so instead of fieldtype you should use fieldType I see that you have the same error for several fieldTypes in your schema, probably resulting in other similar errors too. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 25. apr. 2013 kl. 01:10 skrev Furkan KAMACI furkankam...@gmail.com: Hi Alex; What do you mean with wrong case. Could you tell me what should I do? 2013/4/25 Alexandre Rafalovitch arafa...@gmail.com You still seem to have 'fieldtype' with wrong case. Can you try that simple thing before doing other complicated steps? And yes, restart Solr after you change schema.xml Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 6:50 PM, Furkan KAMACI furkankam...@gmail.com wrote: Here is my definition for handler: requestHandler name=/update/extract class=solr.extraction. ExtractingRequestHandler lst name=defaults str name=fmap.contenttext/str str name=lowernamestrue/str str name=uprefixattr_/str str name=captureAttrtrue/str /lst /requestHandler 2013/4/25 Furkan KAMACI furkankam...@gmail.com I just want to search on rich documents but I still get same error. I have copied example folder into anywhere else at my computer. I have copied dist and contrib folders from my build folder into that copy of example folder (because solr-cell etc. are within that folders) However I still get same error. If any of you could help me you are welcome. Here is my schema: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- Description: This document contains Solr 4.x schema definition to be used with Solr integration currently build into Nutch. This schema is not minimal, there are some useful field type definitions left, and the set of fields and their flags (indexed/stored/term vectors) can be further optimized depending on needs. See http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup for more info. -- schema name=nutch version=1.5 types !-- The StrField type is not analyzed, but indexed/stored verbatim. -- fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ !-- Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types. -- fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ !-- Numeric field types that index each value at various levels of precision to accelerate range queries when the number of values between the range endpoints is large. See the javadoc for NumericRangeQuery for internal implementation details. Smaller precisionStep values (specified in bits) will lead to more tokens indexed per value, slightly larger index size, and faster range queries. A precisionStep of 0 disables indexing at different precision levels. -- fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 omitNorms=true positionIncrementGap=0/ !-- The format for this date field is of the form 1995-12-31T23:59:59Z, and is a more restricted form of the canonical representation of dateTime http://www.w3.org/TR/xmlschema-2/#dateTime
Re: Solr 4.3: Too late to improve error messages?
On 4/24/2013 5:02 PM, Alexandre Rafalovitch wrote: I am testing 4.3rc3. It looks ok, but I notice that some log messages could be more informative. For example: 680 [coreLoadExecutor-3-thread-3] WARN org.apache.solr.schema.IndexSchema – schema has no name! Would be _very nice_ to know which core this is complaining about. Later, once the core is loaded, the core name shows up in the logs, but it would be nice to have it earlier without having to triangulating it through 'Loading core' messages. Is that too late for 4.3? I know somebody was looking at logging, so maybe there is a chance. I haven't been around as long as the guys who make the decisions, but I am fairly sure that there won't be a new release candidate for a cosmetic issue. Make sure the issue is filed in Jira so that it can be fixed in 4.4. This is something that should definitely be fixed, but from what I have seen, only serious bugs will trigger a new RC, and that's only if they don't have a viable workaround and they can be fixed quickly. Thanks, Shawn
Re: SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField.
Alexandre, Furkan reports an error about a copyField *dest* - SOLR-4567 was about the copyField *source*, and the fix was included in 4.2.1. In order for copyFields to work, the dest *must* match a field or dynamicField declaration in the schema - otherwise there's no way to know what type the destination field is. Furkan, can you give the parts of your schema that are involved here? Maybe you just need to add a *_s dynamicField with type=string? Steve On Apr 24, 2013, at 7:08 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: You are running 4.2, right? If you searched mailing list, you would probably find that this is a regression: https://issues.apache.org/jira/browse/SOLR-4567 Should be fixed in 4.3 (I reported this originally and it works in 4.3rc3). Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 7:02 PM, Furkan KAMACI furkankam...@gmail.com wrote: When I try to configure schema.xml within example folder and start jar file I get that error: org.apache.solr.common.SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField. There is nothing about it at example schema.xml file?
Re: Pushing a whole set of pdf-files to solr
I am still struggling with this. I have solr 4.2.1.2013.03.26.08.26.55 installed. So are you telling me that I should somehow install the older version of that tool that comes with Solr 3.x? Because with the newer version I get the errors I already mentioned. Now I suppose I may be an untypical user, as I am running all of this under windows and really just want to find an easy way to get a whole bunch of files from a local folder (on my harddrive) into my local version of solr. But so is there really no easier way of doing this? -Stephan -- View this message in context: http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4058776.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.3: Too late to improve error messages?
Thanks Shawn, I will create a JIRA. I just wasn't sure if there was another RC afterwards and it could fit in there. Not very familiar with process yet. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Apr 24, 2013 at 7:27 PM, Shawn Heisey s...@elyograg.org wrote: On 4/24/2013 5:02 PM, Alexandre Rafalovitch wrote: I am testing 4.3rc3. It looks ok, but I notice that some log messages could be more informative. For example: 680 [coreLoadExecutor-3-thread-3] WARN org.apache.solr.schema.IndexSchema – schema has no name! Would be _very nice_ to know which core this is complaining about. Later, once the core is loaded, the core name shows up in the logs, but it would be nice to have it earlier without having to triangulating it through 'Loading core' messages. Is that too late for 4.3? I know somebody was looking at logging, so maybe there is a chance. I haven't been around as long as the guys who make the decisions, but I am fairly sure that there won't be a new release candidate for a cosmetic issue. Make sure the issue is filed in Jira so that it can be fixed in 4.4. This is something that should definitely be fixed, but from what I have seen, only serious bugs will trigger a new RC, and that's only if they don't have a viable workaround and they can be fixed quickly. Thanks, Shawn
Re: Indexing PDF Files
Does the stock Solr example work for document import? Here's a sample command that I use: curl http://localhost:8983/solr/update/extract?literal.id=doc1fmap.content=featurescommit=true; -F myfile=@myfile.PDF That works with the stock Solr example, without any changes. At least get that working before moving on to the challenge of Solr under Tomcat. Note: The text field is not stored, so you can't retrieve the content/body of a document from that field. If the stock Solr example works for you, then you just need to consult Tomcat documentation as to how to configure lib/jars for an app. Also, go into solrconfig and look for: lib dir=../../../contrib/extraction/lib regex=.*\.jar / lib dir=../../../dist/ regex=solr-cell-\d.*\.jar / Those lines work for the stock Solr example because you cd to the example directory, but they won't work for Tomcat since the cwd is somewhere else. My vague recollection is that if you let Tomcat expand the war file, then you can go into the directory containing the expanded files and edit/move as you want. Like, put the necessary directory names in the above two lines. When Solr starts, it should display log messages about what directories are being used for these lib elements (INFO org.apache.solr.core.SolrConfig – Adding specified lib dirs to ClassLoader). -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Wednesday, April 24, 2013 6:50 PM To: solr-user@lucene.apache.org Subject: Re: Indexing PDF Files Here is my definition for handler: requestHandler name=/update/extract class=solr.extraction. ExtractingRequestHandler lst name=defaults str name=fmap.contenttext/str str name=lowernamestrue/str str name=uprefixattr_/str str name=captureAttrtrue/str /lst /requestHandler 2013/4/25 Furkan KAMACI furkankam...@gmail.com I just want to search on rich documents but I still get same error. I have copied example folder into anywhere else at my computer. I have copied dist and contrib folders from my build folder into that copy of example folder (because solr-cell etc. are within that folders) However I still get same error. If any of you could help me you are welcome. Here is my schema: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- Description: This document contains Solr 4.x schema definition to be used with Solr integration currently build into Nutch. This schema is not minimal, there are some useful field type definitions left, and the set of fields and their flags (indexed/stored/term vectors) can be further optimized depending on needs. See http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup for more info. -- schema name=nutch version=1.5 types !-- The StrField type is not analyzed, but indexed/stored verbatim. -- fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ !-- Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types. -- fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ !-- Numeric field types that index each value at various levels of precision to accelerate range queries when the number of values between the range endpoints is large. See the javadoc for NumericRangeQuery for internal implementation details. Smaller precisionStep values (specified in bits) will lead to more tokens indexed per value, slightly larger index size, and faster range queries. A precisionStep of 0 disables indexing at different precision levels. -- fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 omitNorms=true
Re: Pushing a whole set of pdf-files to solr
(Just documenting my experiences). I stopped and restarted solr in the tomcat web application manager. Everything seems fine http://lucene.472066.n3.nabble.com/file/n4058786/4-25-2013_2-38-43_AM.png And yet I still get that same error message. -- View this message in context: http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4058786.html Sent from the Solr - User mailing list archive at Nabble.com.