Logic behind Solr creating files in .../data/index path.
All, While we post data to Solr... The data get stored in //data/index path in some multiple files with different file extensions... Not worrying about the extensions, I want to know how are these number of files created ? Does anyone know on what logic are these multiple index files created in data/index path ... ? If we do an optimize , The number of files get reduced... Else, say some N number of files are created.. Based on what parameter it creates? And how are the sizes of file varies there? Hope I am clear about the doubt I have...
Re: Logic behind Solr creating files in .../data/index path.
Check: http://lucene.apache.org/java/3_0_2/fileformats.html On Tue, Sep 7, 2010 at 3:16 AM, rajini maski rajinima...@gmail.com wrote: All, While we post data to Solr... The data get stored in //data/index path in some multiple files with different file extensions... Not worrying about the extensions, I want to know how are these number of files created ? Does anyone know on what logic are these multiple index files created in data/index path ... ? If we do an optimize , The number of files get reduced... Else, say some N number of files are created.. Based on what parameter it creates? And how are the sizes of file varies there? Hope I am clear about the doubt I have...
Nutch/Solr
I tried to combine nutch and solr, want to ask somethig. After crawling, nutch has certain fields such as; content, tstamp, title. How can I map content field after crawling ? Do I have change the lucene code (such as add extra field)? Or overcome in solr stage? Any suggestion? Thx. -- Yavuz Selim YILMAZ
Re: Nutch/Solr
Depends on your version of Nutch. At least trunk and 1.1 obey the solrmapping.xml file in Nutch' configuration directory. I'd suggest you start with that mapping file and the Solr schema.xml file shipped with Nutch as it exactly matches with the mapping file. Just restart Solr with the new schema (or you change the mapping), crawl, fetch, parse and update your DB's and then push the index from Nutch to your Solr instance. On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote: I tried to combine nutch and solr, want to ask somethig. After crawling, nutch has certain fields such as; content, tstamp, title. How can I map content field after crawling ? Do I have change the lucene code (such as add extra field)? Or overcome in solr stage? Any suggestion? Thx. -- Yavuz Selim YILMAZ Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Null pointer exception when mixing highlighter shards q.alt
I noticed that long ago. Fixed it doing in HighlightComponent finishStage: @Override public void finishStage(ResponseBuilder rb) { boolean hasHighlighting = true ; if (rb.doHighlights rb.stage == ResponseBuilder.STAGE_GET_FIELDS) { Map.EntryString, Object[] arr = new NamedList.NamedListEntry[rb.resultIds.size()]; // TODO: make a generic routine to do automatic merging of id keyed data for (ShardRequest sreq : rb.finished) { if ((sreq.purpose ShardRequest.PURPOSE_GET_HIGHLIGHTS) == 0) continue; for (ShardResponse srsp : sreq.responses) { NamedList hl = (NamedList)srsp.getSolrResponse().getResponse().get(highlighting); //patch bug if(hl != null) { for (int i=0; ihl.size(); i++) { String id = hl.getName(i); ShardDoc sdoc = rb.resultIds.get(id); int idx = sdoc.positionInResponse; arr[idx] = new NamedList.NamedListEntry(id, hl.getVal(i)); } } else { hasHighlighting = false; } } } // remove nulls in case not all docs were able to be retrieved //patch bug if(hasHighlighting) { rb.rsp.add(highlighting, removeNulls(new SimpleOrderedMap(arr))); } } } -- View this message in context: http://lucene.472066.n3.nabble.com/Null-pointer-exception-when-mixing-highlighter-shards-q-alt-tp1430353p1431253.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nutch/Solr
In fact, I used nutch 0.9 version, but thinking of passing the new version. If anybody did something like that, ı want to learn their experience. If indexing an xml file, there are specific fields and all of them are dependent among them, so duplicates don't happen. I want to extract specific fields from the content field. Doing such extraction, new fileds should be indexed as well, then comes me that, content indexed twice for every new field. By the way, any details about how to get new fields from the content will be helpful. -- Yavuz Selim YILMAZ 2010/9/7 Markus Jelsma markus.jel...@buyways.nl Depends on your version of Nutch. At least trunk and 1.1 obey the solrmapping.xml file in Nutch' configuration directory. I'd suggest you start with that mapping file and the Solr schema.xml file shipped with Nutch as it exactly matches with the mapping file. Just restart Solr with the new schema (or you change the mapping), crawl, fetch, parse and update your DB's and then push the index from Nutch to your Solr instance. On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote: I tried to combine nutch and solr, want to ask somethig. After crawling, nutch has certain fields such as; content, tstamp, title. How can I map content field after crawling ? Do I have change the lucene code (such as add extra field)? Or overcome in solr stage? Any suggestion? Thx. -- Yavuz Selim YILMAZ Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Nutch/Solr
You should: - definately upgrade to 1.1 (1.2 is on the way), and - subscribe to the Nutch mailing list for Nutch specific questions. On Tuesday 07 September 2010 10:36:58 Yavuz Selim YILMAZ wrote: In fact, I used nutch 0.9 version, but thinking of passing the new version. If anybody did something like that, ? want to learn their experience. If indexing an xml file, there are specific fields and all of them are dependent among them, so duplicates don't happen. I want to extract specific fields from the content field. Doing such extraction, new fileds should be indexed as well, then comes me that, content indexed twice for every new field. By the way, any details about how to get new fields from the content will be helpful. -- Yavuz Selim YILMAZ 2010/9/7 Markus Jelsma markus.jel...@buyways.nl Depends on your version of Nutch. At least trunk and 1.1 obey the solrmapping.xml file in Nutch' configuration directory. I'd suggest you start with that mapping file and the Solr schema.xml file shipped with Nutch as it exactly matches with the mapping file. Just restart Solr with the new schema (or you change the mapping), crawl, fetch, parse and update your DB's and then push the index from Nutch to your Solr instance. On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote: I tried to combine nutch and solr, want to ask somethig. After crawling, nutch has certain fields such as; content, tstamp, title. How can I map content field after crawling ? Do I have change the lucene code (such as add extra field)? Or overcome in solr stage? Any suggestion? Thx. -- Yavuz Selim YILMAZ Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Query result ranking - Score independent
Hi all, I need to retrieve query-results with a ranking independent from each query-result's default lucene score, which means assigning the same score to each query result. I tried to use a zero boost factor ( ^0 ) to reset to zero each query-result's score. This strategy seems to work within the example solr instance, but in my Solr instance, using a zero boost factor causes a Buffer Exception ( HTTP Status 500 - null java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:249) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506) at org.apache.lucene.index.IndexReader.document(IndexReader.java:947) ) Do you know any other technique to reset to some fixed constant value, all the query-result's scores? Each query result should obtain the same score. Any suggestion? Thx -- -- Benedetti Alessandro Personal Page: http://tigerbolt.altervista.org Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Alphanumeric wildcard search problem
Thanks for letting us know. What was the magic? I'm still unclear what was different between my tests and your implementation, mysteries like this make me nervous G.. Thanks Erick On Mon, Sep 6, 2010 at 5:45 PM, Hasnain hasn...@hotmail.com wrote: Finally got it working, thanks for your help and support -- View this message in context: http://lucene.472066.n3.nabble.com/Alphanumeric-wildcard-search-problem-tp1393332p1429315.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Expanded Synonyms + phrase search
Did you check ../admin/analysis.jsp page to see how index and query analyzer behaved? In usual, when you add parti socialiste to synonyms-fr.txt, it would response correctly both of PS et and parti socialiste queries. On Mon, Aug 30, 2010 at 4:55 PM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hi, several documents from my index contain the phrase : PS et. However, PS is expanded to parti socialiste and a phrase search for PS et fails. A phrase search for parti socialiste et succeeds. Can I have both queries working ? Here's the field type : fieldtype name=SyFR class=solr.TextField analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ !-- Synonyms -- filter class=solr.SynonymFilterFactory synonyms=synonyms-fr.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer /fieldtype
Re: Null pointer exception when mixing highlighter shards q.alt
Marc Sturlese wrote: I noticed that long ago. Fixed it doing in HighlightComponent finishStage: ... public void finishStage(ResponseBuilder rb) { ... } Thanks! I'll try that I also seem to have a similar problem with shards + facets -- in particular it seems like the error occurrs when some of the shards have no values for some of the facets. Any chance you (or anyone else) have a fix for that one too? Here's the backtrace I'm getting from a few day old svn trunk. Sep 7, 2010 6:03:58 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:340) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:301) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Re: Implementing synonym NewBie
If you think to improve your synonyms file by time I would recommend you query time indexing. By the way you don't have to re-index when you need to add something more. On Sat, Aug 28, 2010 at 10:01 AM, Jonty Rhods jonty.rh...@gmail.com wrote: Hi All, I want to use synonym for my search. Still I am in learning phase of solr. So please help me to implement synonym in my search. according to wiki synonym can be implemented in two ways. 1 at index time 2 at search time] I have combination 10 of phrase for synonym so which will be better in my case. something like : live show in new york=live show in clifornia= live show = live show in DC = live show in USA is synonym will effect my original search? thanks with regards Jonty
ankita shinde wants to chat
--- ankita shinde wants to stay in better touch using some of Google's coolest new products. If you already have Gmail or Google Talk, visit: http://mail.google.com/mail/b-d1bf7a33e2-4d170858b7-C4KO27fMXYsHI1lHg8OOW9Oi-ts You'll need to click this link to be able to chat with ankita shinde. To get Gmail - a free email account from Google with over 2,800 megabytes of storage - and chat with ankita shinde, visit: http://mail.google.com/mail/a-d1bf7a33e2-4d170858b7-C4KO27fMXYsHI1lHg8OOW9Oi-ts Gmail offers: - Instant messaging right inside Gmail - Powerful spam protection - Built-in search for finding your messages and a helpful way of organizing emails into conversations - No pop-up ads or untargeted banners - just text ads and related information that are relevant to the content of your messages All this, and its yours for free. But wait, there's more! By opening a Gmail account, you also get access to Google Talk, Google's instant messaging service: http://www.google.com/talk/ Google Talk offers: - Web-based chat that you can use anywhere, without a download - A contact list that's synchronized with your Gmail account - Free, high quality PC-to-PC voice calls when you download the Google Talk client We're working hard to add new features and make improvements, so we might also ask for your comments and suggestions periodically. We appreciate your help in making our products even better! Thanks, The Google Team To learn more about Gmail and Google Talk, visit: http://mail.google.com/mail/help/about.html http://www.google.com/talk/about.html (If clicking the URLs in this message does not work, copy and paste them into the address bar of your browser).
How to extend IndexSchema and SchemaField
Hi, I would like to extend the field node in the schema.xml by adding new attributes. For example, I would like to be able to write: field type=myField myattribute=myvalue/ And be able to access myattribute directly from IndexSchema and SchemaField objects. However, these two classes are final, and also not very easy to extend ? Is there any other solutions ? thanks, -- Renaud Delbru
Advice requested. How to map 1:M or M:M relationships with support for facets
Hi guys, Question: What is the best way to create a solr schema which supports a 'multivalue' where the value is a two item array of event category and a date. I want to have faceted searches, counts and Date Range ability on both the category and the dates. Details: This is a person database where Person can have details about them (like address) and Person have many Events. Events have a category (type of event) and a Date for when that event occurred. At the bottom you will see a simple diagram showing the relationship. Briefly, a Person has many Events and Events have a single category and a single person. What I would like to be able to do is: Have a facet which shows all of the event categories, with a 'sub-facet' that show Category + date. For example, if a Category was Attended Conference and date was 2008-09-08, I'd be able to show a count of all Attended Conference, then have a tree type control and show the years (for example): Eg. + Attended Conference (1038) | + 2010 (100) +--- 2009 (134) +--- 2008 (234) | + Another Event Category (23432) | +-2010 (234) +2009 (245) Etc. For scale, I expect to have 100 Event Categories and a million person_event records on 250,000 persons. I don't care very much about disk space, so if it's a 1 GB or 100 GB due to indexing, that's okay if the solution works (and its fast! :-)) Solutions I looked at: * I looked at poly but they seem to be a fixed length and appeared to be the same type. Typical use case was latitude longitude. I don't think this will work because there are a variable number of events attached to a person. * I looked at multiValued but it didn't seem to permit two fields having a relationship, ie. Event Category Event Date. It seemed to me that they need to be broken out. That's not necessarily a bad thing, but it didn't seem ideal. * I thought about concatenating category date to create a fake fields strictly for faceting purposes, but I believe that will break date ranges. Eg. EventCategoryId + | + Date = 1|2009 as a facet would allow me to show counts for that event type. Seems a bit unwieldy to me... What's the groups advice for handling this situation in the best way? Thanks in advance, as always sorry if this question has been asked and answered a few times already. I googled for a few hours before writing this... but things change so fast with Solr that any article older than a year was suspect to me, also there are so many patches that provide additional functionality... Tim Schema:
Re: How to give path in SCRIPT tag?
ankita, your questions seems to be somewhat unrelated to solr / lucene and should be asked somewhere else but not on this list. Please try to keep the focus of your questions to Solr related topics or use java-user@ for lucene related topics. Thanks, Simon On Tue, Sep 7, 2010 at 3:46 PM, ankita shinde ankitashinde...@gmail.com wrote: How to give path of folder stored on our local machine in Script tag 'src' attribute in html file,head tag. Is this is correct ? script type=text/javascript src=C:/evol/core/AbstractManager.js/script
RE: solr user
You probably need to use the file:// moniker - if using firefox, install firebug and use the net panel to see if the includes load -Original Message- From: ankita shinde [mailto:ankitashinde...@gmail.com] Sent: 07 September 2010 18:22 To: solr-user@lucene.apache.org Subject: solr user hello all, I am working with Ajax solr.I m trying to send request to solr to retrieve all XML documents.I have created one folder named source which is in C drive.Source folder contains all the .js files.I have tried following code but its giving error as AjaxSolr is not defined. Can anyone pleas guide me !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd; html xmlns=http://www.w3.org/1999/xhtml; xml:lang=en lang=en head titleAJAX Solr/title link rel=stylesheet type=text/css href=css/reuters.css media=screen / script type=text/javascript src=C:/source/AbstractManager.js/script script type=text/javascript src=C:/source/Manager.jquery.js/script script type=text/javascript src=C:/source/Parameter.js/script script type=text/javascript src=C:/source/ParameterStore.js/script script type=text/javascript src=C:/source/AbstractWidget.js/script script type=text/javascript src=C:/source/ResultWidget.2.js/script script type=text/javascript src=thm.2.js/script script type=text/javascript src=jquery.min.js/script script type=text/javascript src=retuers.js/script script type=text/javascript src=C:/source/Core.js/script /head body div id=wrap div id=header h1AJAX Solr Demonstration/h1 h2Browse Reuters business news from 1987/h2 /div div class=right div id=result div id=navigation ul id=pager/ul div id=pager-header/div /div div id=docs/div /div /div div class=left h2Current Selection/h2 ul id=selection/ul h2Search/h2 span id=search_help(press ESC to close suggestions)/span ul id=search input type=text id=query name=query/ /ul h2Top Topics/h2 div class=tagcloud id=topics/div h2Top Organisations/h2 div class=tagcloud id=organisations/div h2Top Exchanges/h2 div class=tagcloud id=exchanges/div h2By Country/h2 div id=countries/div div id=preview/div h2By Date/h2 div id=calendar/div div class=clear/div /div div class=clear/div /div /body /html
Re: Query result ranking - Score independent
On Sep 7, 2010, at 7:08 AM, Alessandro Benedetti wrote: Hi all, I need to retrieve query-results with a ranking independent from each query-result's default lucene score, which means assigning the same score to each query result. I tried to use a zero boost factor ( ^0 ) to reset to zero each query-result's score. This strategy seems to work within the example solr instance, but in my Solr instance, using a zero boost factor causes a Buffer Exception ( HTTP Status 500 - null java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:249) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506) at org.apache.lucene.index.IndexReader.document(IndexReader.java:947) ) Hmm, that stack trace doesn't align w/ the boost factor. What was your request? I think there might be something else wrong here. Do you know any other technique to reset to some fixed constant value, all the query-result's scores? Each query result should obtain the same score. Any suggestion? The ConstantScoreQuery or a Filter should do this. You could do something like: q=*:*fq=the real query, as in q=*:*fq=field:foo -Grant -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)
What if we do not care about the version of a document at index-time? When it comes to distributed search, we currently decide aggregating documents based on their uniqueKey. But what would be, if we decide additionally decide on uniqueKey plus indexingDate, so that we only aggregate the last indexed version of a document? The concept could look like this: When Solr aggregated the documents for a response, it could store what shard responsed an older version of document x. Now a crawler can crawl through our SolrCloud and asking each shard whether it noticed something like shard y got an older version of doc x-case. The crawler aggregates those information. After he finished crawling, he sends delete-by-query-requests to those shards which have older versions of documents than they should have. I will call these stores document versions that are older than the newest version ODV (Old Document Versions) for better understanding. So, what can happen: Before the crawler can visit shard A - who noticed that shard y stores an ODV of doc x - shard A can go down. That's okay, because either another shard noticed the same, or shard A will be available later on. If those information will we stored at HD, it will also be available. If it was stored in RAM the information is lost... however, you could replicate those information over more than one shard, right? :-) Another case: Shard y can go down - so someone has to care for storing the noticed ODV-information, so that one can delete the document when Shard Y comes back. Pros: - You can do something like consistent hashing in connection with a concept where each node has to care for its neighbour-nodes. This is because only the neighbour nodes can store ODVs. - using the described concept, you can do nightly batches, looking for ODVs in the neigbour-nodes. - ODVs will be found at requesting time, so we can avoid to response ODVs over newer versions. Cons: - We are wasting disc space. - This works only for smaller clusters, not for large ones where the number of machines changes very frequently ... this is just another idea - and it is very very lazy. I must emphasize, that I assume that neighbour-machines do not go down very frequently. Of course, it is not a question whether a machine crashes, but when it crashes - but I assume that the same server does not crash every hour. :-) Thoughts? Kind regards Andrzej Bialecki wrote: On 2010-09-06 16:41, Yonik Seeley wrote: On Mon, Sep 6, 2010 at 10:18 AM, MitchKmitc...@web.de wrote: [...consistent hashing...] But it doesn't solve the problem at all, correct me if I am wrong, but: If you add a new server, let's call him IP3-1, and IP3-1 is nearer to the current ressource X, than doc x will be indexed at IP3-1 - even if IP2-1 holds the older version. Am I right? Right. You still need code to handle migration. Consistent hashing is a way for everyone to be able to agree on the mapping, and for the mapping to change incrementally. i.e. you add a node and it only changes the docid-node mapping of a limited percent of the mappings, rather than changing the mappings of potentially everything, as a simple MOD would do. Another strategy to avoid excessive reindexing is to keep splitting the largest shards, and then your mapping becomes a regular MOD plus a list of these additional splits. Really, there's an infinite number of ways you could implement this... For SolrCloud, I don't think we'll end up using consistent hashing - we don't need it (although some of the concepts may still be useful). I imagine there could be situations where a simple MOD won't do ;) so I think it would be good to hide this strategy behind an interface/abstract class. It costs nothing, and gives you flexibility in how you implement this mapping. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p1434329.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)
I must add something to my last post: When saying it could be used together with techniques like consistent hashing, I mean it could be used at indexing time for indexing documents, since I assumed that the number of shards does not change frequently and therefore an ODV-case becomes relatively infrequent. Furthermore the overhead of searching for and removing those ODV-documents is relatively low. -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p1434364.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search Results optimization
: also my request handler looks like this : : requestHandler name=mb_artists class=solr.SearchHandler : lst name=defaults : str name=defTypedismax/str : str name=qfname ^2.4/str : str name=tie0.1/str : /lst : /requestHandler that request handler doesn't match up with the output you posted in your previous message -- according to it, you were using qt=standard1 (not qt=mb_artists). the output you posted shows you using a query parser that searched for each word in the text field, not the name field. it also didn't appear to be the dismax parser at all. Since there seems to be some confusion about what handler/parser you are actually searching, i suggest getting to the bottom of that, it might explain a lot about the results you are getting. : I really need some help on this, : again, what I want is...if I search for swingline red stapler, In results, : docs that have all three keywords should come on top, then docs that have : any 2 keywords and then docs with 1 keyword, i mean in my sorted order. : thanks Because of hte disconnects mentioned above, i didn't look to closely at hte score explanations you posted (it's hard to make sense of them since they search a field named text and you only posted into about the name field) but if, as you mentioned, you're already omiting norms and term freq / positions for hte name field then for the most part this is the sort order you should be getting (if/when you search against the name field instead of hte text field) the biggest thing you'll probably have to watch out for with the dismax parser (if/when you use it) is to explicitly set the 'mm' param to something like 0 otherwise documents will be excluded if they only match a small number of the terms. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Is there a way to fetch the complete list of data from a particular column in SOLR document?
Hi, I am trying to get complete list of unique document ID and compare it with that of back end to make sure that both back end and SOLR documents are in sync. Is there a way to fetch the complete list of data from a particular column in SOLR document? Once I get the list, I can easily compare it against the DB and delete the orphan documents.. Please let me know if there are any other ideas / suggestions to implement this. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-fetch-the-complete-list-of-data-from-a-particular-column-in-SOLR-document-tp1435586p1435586.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: shingles work in analyzer but not real data
: Hi Robert, thanks for the response. I've looked into the query parsers a : bit and I did find that using the raw parser on a matching multi-word : keyword works correctly. I need to have shingling though, in order to : support query phrases. It seems odd to have the query parser emitting The FieldQParser should work for this -- unlike the raw QParser it uses the Analyzer for the specified field, but has no metacharacters of it's own. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
RE: Is there a way to fetch the complete list of data from a particular column in SOLR document?
q=*:*fl=id_FIELDrows=NUM_DOCS ? -Original message- From: bbarani bbar...@gmail.com Sent: Tue 07-09-2010 23:09 To: solr-user@lucene.apache.org; Subject: Is there a way to fetch the complete list of data from a particular column in SOLR document? Hi, I am trying to get complete list of unique document ID and compare it with that of back end to make sure that both back end and SOLR documents are in sync. Is there a way to fetch the complete list of data from a particular column in SOLR document? Once I get the list, I can easily compare it against the DB and delete the orphan documents.. Please let me know if there are any other ideas / suggestions to implement this. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-fetch-the-complete-list-of-data-from-a-particular-column-in-SOLR-document-tp1435586p1435586.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FieldCache.DEFAULT.getInts vs FieldCache.DEFAULT.getStringIndex. Memory usage
: I need to load a FieldCache for a field wich is a solr integer type and has : as maximum 3 digits. Let's say my index has 10M docs. : I am wandering what is more optimal and less memory consumig, to load a : FieldCache.DEFAUL.getInts or a FieldCache.DEFAULT.getStringIndex. by itself, getInts always uses use less memory then getStringIndex. no matter what your data looks like, getStringIndex can never use less memory then getInts. the question hwoever is if any other code is going to use getStringIndex on the same field, defeating any memory savings you have -- you said integer but you didn't say what FieldType class that was mapped to. In the 1.4 example schema, int is mapped to a TreiField which will use getInts() for the field cache. In Solr 1.3's example schema the integer type was mapped to IntField which also uses getInts() for hte field cache. But we have no idea what your schema is using. If it uses SortableIntField that's when the SOlr code under the covers is going to use getStringIndex(), so you might as well use it also. (you can verify this in Solr 1.4 by looking at the stats for the fieldCache - it tells you exactly what is in use at any moment) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Is there a way to fetch the complete list of data from a particular column in SOLR document?
Please let me know if there are any other ideas / suggestions to implement this. You're indexing program should really take care of this IMHO. Each time your indexer inserts a document to Solr, flag the corresponding entity in your RDBMS, each time you delete, remove the flag. You should implement this as a transaction to make sure all is still fine in the unlikely event of a crash midway. 2010/9/7 bbarani bbar...@gmail.com Hi, I am trying to get complete list of unique document ID and compare it with that of back end to make sure that both back end and SOLR documents are in sync. Is there a way to fetch the complete list of data from a particular column in SOLR document? Once I get the list, I can easily compare it against the DB and delete the orphan documents.. Please let me know if there are any other ideas / suggestions to implement this. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-fetch-the-complete-list-of-data-from-a-particular-column-in-SOLR-document-tp1435586p1435586.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Download document from solr
: Subject: Download document from solr : References: aanlkti=ajq4qpifn2r0dyz=s9hv1i=pc-nqnxp3hw...@mail.gmail.com : In-Reply-To: aanlkti=ajq4qpifn2r0dyz=s9hv1i=pc-nqnxp3hw...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: MoreLikethis and fq not giving exact results ?
: But when I enable mlt inside the query it returns the results for jp_ as : well, because job_title also exist in job posting ( though jp_ or cp_ : already differentiating to both of this ?) I don't believe the MLT Component has anyway of filtering like this. In your case you want the fq params to apply to the MLT results as well as the main results, but in other cases people wantthe fq to apply to the main result set and let the MLT be per individual doc with no ohter filters -- no one has implemented a configurable way to say when/if certain fqs should apply in the way you describe. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Deploying Solr 1.4.1 in JbossAs 6
: 1-extract the solr.war : 2-edit the web.xml for setting solr/home param : 3-create the solr.war : 4-setup solr home directory : 5-copy the solr.war to JBossAs 6 deploy directory : 7-start the jboss server I don't know a lot about JBoss, but from what i understand there really shouldn't be any need to customize the solr.war. You should be able to use JNDI to set the solr home dir, just like with tomcat... http://docs.jboss.org/jbossweb/latest/jndi-resources-howto.html -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Solr, c/s type ?
: Subject: Solr, c/s type ? : : i'm wondering c/s type is possible (not http web type). : if possible, could i get the material about it? You're going t oneed to provide more info exaplining what it is you are asking baout -- i don't know about anyone else, but i honestly have absolutely no idea what you might possibly mean by c/s type is possible (not http web type) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
RE: Re: MoreLikethis and fq not giving exact results ?
I can think of two useful cases for a feature that limits MLT results depending with an optional mlt.fq parameter that limits the MLT results for each document, based on that fq: 1. prevent irrelevant docs when in a deep faceted navigation 2. general search results with MLT where you need to distinguish between collections when there are many different collections sharing the same index -Original message- From: Chris Hostetter hossman_luc...@fucit.org Sent: Tue 07-09-2010 23:32 To: solr-user@lucene.apache.org; Subject: Re: MoreLikethis and fq not giving exact results ? I don't believe the MLT Component has anyway of filtering like this. In your case you want the fq params to apply to the MLT results as well as the main results, but in other cases people wantthe fq to apply to the main result set and let the MLT be per individual doc with no ohter filters -- no one has implemented a configurable way to say when/if certain fqs should apply in the way you describe. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Is semicolon a character that needs escaping?
: Subject: Is semicolon a character that needs escaping? ... : From this I conclude that there is a bug either in the docs or in the : query parser or I missed something. What is wrong here? Back in Solr 1.1, the standard query parser treated ; as a special character and looked for sort instructions after it. Starting in Solr 1.2 (released in 2007) a sort param was added, and semicolon was only considered a special character if you did not explicilty mention a sort param (for back compatibility) Starting with Solr 1.4, the default was changed so that semicolon wasn't considered a meta-character even if you didn't have a sort param -- you have to explicilty select the lucenePlusSort QParser to get this behavior. I can only assume that if you are seeing this behavior, you are either using a very old version of Solr, or you have explicitly selected the lucenePlusSort parser somewhere in your params/config. This was heavily documented in CHANGES.txt for Solr 1.4 (you can find mention of it when searching for either ; or semicolon) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
RE: Re: MoreLikethis and fq not giving exact results ?
: I can think of two useful cases for a feature that limits MLT results : depending with an optional mlt.fq parameter that limits the MLT results : for each document, based on that fq: i don't disagree with you -- i was just commenting that it doesn't work that way at the moment, because it was designed with differnet use cases in mind (returning docs related to the result docs, independent of how you found those result docs) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
RE: Re: MoreLikethis and fq not giving exact results ?
I know =) I was just polling votes for a feature request - there is no such issue filed for this component. Perhaps there should be? -Original message- From: Chris Hostetter hossman_luc...@fucit.org Sent: Wed 08-09-2010 00:13 To: solr-user@lucene.apache.org; Subject: RE: Re: MoreLikethis and fq not giving exact results ? i don't disagree with you -- i was just commenting that it doesn't work that way at the moment, because it was designed with differnet use cases in mind (returning docs related to the result docs, independent of how you found those result docs) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: stream.url
:I used escape charaters and made it... It is not problem for : a single file of 'solr apache' but it shows the same problem for the files : like Wireless lan.ppt, Tom info.pdf. Since you haven't told us what the original URL is that you are trying to pass as a value for the stream.url value, it's impossible for us to guess wehter your URL escaping is working properly. bear in mind that you need to escape url metacharacters *twice* for this type of thing -- once to encode the URL in a way that the final server will recognize it, and once again to pass it as a value in a URL to Solr. since you explicitly mention having problems with white space, but i don't see any %25 or %2B sequences in your URL i'm going to guess that the porblem is you are not double escaping the white space properly -- the first time you escape it it should either be + or %20 which means the second time it should either be %2B or %2520 -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Help with partial term highlighting
Hello Everyone, Thanks for taking time to read through this. I'm using a checkout from the solr 3.x branch My problem is with the highlighter and wildcards, and is exactly the same as this guy's but I can't find a reply to his problem: http://search-lucene.com/m/EARFMs6eR4/partial+highlight+wildcardsubj=Re+old+wildcard+highlighting+behaviour I can get the highlighter to work with wild cards just fine, the problem is that solr is returning the term matched, when what I want it to do is highlight the chars in the term that were matched. Example: http://192.168.1.75:8983/solr/music/select?indent=onq=name_title:wel*qt=beyondhl=truehl.fl=name_titlef.name_title.hl.usePhraseHighlighter=truef.name_title.hl.highlightMultiTerm=true The results that come back look like this: emWelcome/em to the Jungle What I want them to look like is this: emWel/emcome to the Jungle From what I gathered by searching the archives is that solr 1.1 used to do this... Is there anyway to get what I want without customizing the highlighting feature? Thanks!
Null Pointer Exception with shardsfacets where some shards have no values for some facets.
Short summary: * Mixing Facets and Shards give me a NullPointerException when not all docs have all facets. * Attached patch improves the failure mode, but still spews errors in the log file * Suggestions how to fix that would be appreciated. In my system, I tried separating out a couple similar but different types of documents into a couple different shards. Both shards have the identical schema; with the facets defined as a dynamicfield: dynamicField name=*_facettype=string indexed=true stored=false multiValued=true / Some facets only have documents with a value for them in the first shard, Other facets only have documents with a value for them in the second shard. When I try to do a query that asks for a facet.field that's only has values in the first shard, and for a different facet.field that only has values in the second shard, I'm getting this exception: Sep 7, 2010 4:55:38 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:340) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:301) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) I don't have a real simple test case yet; but could work on one if it'd make it easier to track down.Also, I could post the schema and solrconfig if that'd help. The attached patch seems to mostly work for me; in that it's returning valid search results and at least some facet information, but with that patch I'm then getting this exception showing up: Sep 7, 2010 5:28:30 PM org.apache.solr.common.SolrException log SEVERE: Exception during facet counts:org.apache.lucene.queryParser.ParseException: Expected identifier at pos 20 str='{!terms=$involvement/race_facet__terms}involvement/race_facet' at org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:718) at org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:165) at org.apache.solr.search.QueryParsing.getLocalParams(QueryParsing.java:221) at org.apache.solr.request.SimpleFacets.parseParams(SimpleFacets.java:102) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:327) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:188) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.
Thanks for the report Ron, can you open a JIRA issue? What version of Solr is this? -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer r...@0ape.com wrote: Short summary: * Mixing Facets and Shards give me a NullPointerException when not all docs have all facets. * Attached patch improves the failure mode, but still spews errors in the log file * Suggestions how to fix that would be appreciated. In my system, I tried separating out a couple similar but different types of documents into a couple different shards. Both shards have the identical schema; with the facets defined as a dynamicfield: dynamicField name=*_facet type=string indexed=true stored=false multiValued=true / Some facets only have documents with a value for them in the first shard, Other facets only have documents with a value for them in the second shard. When I try to do a query that asks for a facet.field that's only has values in the first shard, and for a different facet.field that only has values in the second shard, I'm getting this exception: Sep 7, 2010 4:55:38 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:340) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:301) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) I don't have a real simple test case yet; but could work on one if it'd make it easier to track down. Also, I could post the schema and solrconfig if that'd help. The attached patch seems to mostly work for me; in that it's returning valid search results and at least some facet information, but with that patch I'm then getting this exception showing up: Sep 7, 2010 5:28:30 PM org.apache.solr.common.SolrException log SEVERE: Exception during facet counts:org.apache.lucene.queryParser.ParseException: Expected identifier at pos 20 str='{!terms=$involvement/race_facet__terms}involvement/race_facet' at org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:718) at org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:165) at org.apache.solr.search.QueryParsing.getLocalParams(QueryParsing.java:221) at org.apache.solr.request.SimpleFacets.parseParams(SimpleFacets.java:102) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:327) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:188) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.
Yonik Seeley wrote: Thanks for the report Ron, can you open a JIRA issue? Sure. I'll do it at work tomorrow morning, hopefully after I try to verify with a standalone test case. What version of Solr is this? This is trunk as of a few days ago. I can update to the latest trunk and check there too. -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer r...@0ape.com wrote: Short summary: * Mixing Facets and Shards give me a NullPointerException when not all docs have all facets. * Attached patch improves the failure mode, but still spews errors in the log file * Suggestions how to fix that would be appreciated. In my system, I tried separating out a couple similar but different types of documents into a couple different shards. Both shards have the identical schema; with the facets defined as a dynamicfield: dynamicField name=*_facettype=string indexed=true stored=false multiValued=true / Some facets only have documents with a value for them in the first shard, Other facets only have documents with a value for them in the second shard. When I try to do a query that asks for a facet.field that's only has values in the first shard, and for a different facet.field that only has values in the second shard, I'm getting this exception: Sep 7, 2010 4:55:38 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.FacetComponent.refineFacets(FacetComponent.java:340) at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:301) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) I don't have a real simple test case yet; but could work on one if it'd make it easier to track down.Also, I could post the schema and solrconfig if that'd help. The attached patch seems to mostly work for me; in that it's returning valid search results and at least some facet information, but with that patch I'm then getting this exception showing up: Sep 7, 2010 5:28:30 PM org.apache.solr.common.SolrException log SEVERE: Exception during facet counts:org.apache.lucene.queryParser.ParseException: Expected identifier at pos 20 str='{!terms=$involvement/race_facet__terms}involvement/race_facet' at org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:718) at org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:165) at org.apache.solr.search.QueryParsing.getLocalParams(QueryParsing.java:221) at org.apache.solr.request.SimpleFacets.parseParams(SimpleFacets.java:102) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:327) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:188) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) at
How to use TermsComponent when I need a filter
Hi, I have a solr index, which for simplicity is just a list of names, and a list of associations. (either a multivalue field e.g. {A1, A2, A3, A6} or a string concatenation list e.g. A1 A2 A3 A6) I want to be able to provide autocomplete but with a specific association. E.g. Names beginning with Bob in association A5. Is this possible? I would prefer not to have to have one index per association, since the number of associations is pretty large Cheers, David
Batch update, order of evaluation
Does anyone know with certainty how (or even if) order is evaluated when updates are performed by batch? Our application internally buffers solr documents for speed of ingest before sending them to the server in chunks. The XML documents sent to the solr server contain all documents in the order they arrived without any settings changed from the defaults (so overwrite = true). We are careful to avoid things like HashMaps on our side since they'd lose the order, but I can't be certain what occurs inside Solr. Sometimes if an object has been indexed twice for various reasons it could appear twice in the buffer but the most up-to-date version is always last. I have however observed instances where the first copy of the document is indexed and differences in the second copy are missing. Does this sound likely? And if so are there any obvious settings I can play with to get the behavior I desire? I looked at: http://wiki.apache.org/solr/UpdateXmlMessages but there is no mention of order, just the overwrite flag (which I'm unsure how it is applied internally to an update message) and the deprecated duplicates flag (which I have no idea about). Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as per http://wiki.apache.org/solr/Solrj. This is no mention of order there either however. Thanks to anyone who took the time to read this. Ta, Greg
list of filters/factories/Input handlers/blah blah
Is there a definitive list of: filters inputHandlers and other 'code fragments' that do I/O processing for Solr/Lucene? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php
Re: Advice requested. How to map 1:M or M:M relationships with support for facets
These days the best practice for a 'drill-down' facet in a UI is to encode both the unique value of the facet and the displayable string into one facet value. In the UI, you unpack and show the display string, and search with the full facet string. If you want to also do date ranges, make a separate matching 'date' field. This will store the date twice. Solr schema design is all about denormalizing. Tim Gilbert wrote: Hi guys, *Question:* What is the best way to create a solr schema which supports a ‘multivalue’ where the value is a two item array of event category and a date. I want to have faceted searches, counts and Date Range ability on both the category and the dates. *Details:* This is a person database where Person can have details about them (like address) and Person have many “Events”. Events have a category (type of event) and a Date for when that event occurred. At the bottom you will see a simple diagram showing the relationship. Briefly, a Person has many Events and Events have a single category and a single person. What I would like to be able to do is: Have a facet which shows all of the event categories, with a ‘sub-facet’ that show Category + date. For example, if a Category was “Attended Conference” and date was 2008-09-08, I’d be able to show a count of all “Attended Conference”, then have a tree type control and show the years (for example): Eg. + Attended Conference (1038) | + 2010 (100) +--- 2009 (134) +--- 2008 (234) | + Another Event Category (23432) | +-2010 (234) +2009 (245) Etc. For scale, I expect to have 100 “Event Categories” and a million person_event records on 250,000 persons. I don’t care very much about disk space, so if it’s a 1 GB or 100 GB due to indexing, that’s okay if the solution works (and its fast! J) *Solutions I looked at:* * I looked at poly but they seem to be a fixed length and appeared to be the same type. Typical use case was latitude longitude. I don’t think this will work because there are a variable number of events attached to a person. * I looked at multiValued but it didn’t seem to permit two fields having a relationship, ie. Event Category Event Date. It seemed to me that they need to be broken out. That’s not necessarily a bad thing, but it didn’t seem ideal. * I thought about concatenating category date to create a fake fields strictly for faceting purposes, but I believe that will break date ranges. Eg. EventCategoryId + “|” + Date = 1|2009 as a facet would allow me to show counts for that event type. Seems a bit unwieldy to me… What’s the groups advice for handling this situation in the best way? Thanks in advance, as always sorry if this question has been asked and answered a few times already. I googled for a few hours before writing this… but things change so fast with Solr that any article older than a year was suspect to me, also there are so many patches that provide additional functionality… Tim Schema:
Re: Deploying Solr 1.4.1 in JbossAs 6
Does JBoss still uses Tomcat? Tomcat has an external file to configure war files in Catalina/localhost. If JBoss is not Tomcat any more, it must have a directory and file format somewhere for an external configuration of a servlet war. Lance Chris Hostetter wrote: : 1-extract the solr.war : 2-edit the web.xml for setting solr/home param : 3-create the solr.war : 4-setup solr home directory : 5-copy the solr.war to JBossAs 6 deploy directory : 7-start the jboss server I don't know a lot about JBoss, but from what i understand there really shouldn't be any need to customize the solr.war. You should be able to use JNDI to set the solr home dir, just like with tomcat... http://docs.jboss.org/jbossweb/latest/jndi-resources-howto.html -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
RE: list of filters/factories/Input handlers/blah blah
Not neccesarily definitive, but filters and tokenizers can be found here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Not sure if that's all of the analyzers (which I think is the generic name for both tokenizers and filters) that come with Solr, but I believe it's at least most of them. It's of course possible to write your own analyzers or use third party analyzers too, if there's a list of such available, I don't know about it, but it sure would be handy. Some Query parsers, which I _think_ is the right term for things you can pass as defType=something or {!type=something}, or one or two other things with different key names I forget, can be found here: http://wiki.apache.org/solr/SolrQuerySyntax#Other_built-in_useful_query_parsers Along with lucene and dismax also mentioned on that page, I _think_ that's the complete list of query parsers included with Solr 1.4, but someone PLEASE correct me if I'm wrong. It is indeed difficult to get a handle on this stuff for me too. Other than query parsers and analyzers, I'm not entirely certain what else falls in the category of I/O components. I don't know anything about input handlers, myself. Jonathan From: Dennis Gearon [gear...@sbcglobal.net] Sent: Tuesday, September 07, 2010 10:41 PM To: solr-user@lucene.apache.org Subject: list of filters/factories/Input handlers/blah blah Is there a definitive list of: filters inputHandlers and other 'code fragments' that do I/O processing for Solr/Lucene? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php