Re: Is semicolon a character that needs escaping?
On 08.09.2010 00:05 Chris Hostetter wrote: : Subject: Is semicolon a character that needs escaping? ... : From this I conclude that there is a bug either in the docs or in the : query parser or I missed something. What is wrong here? Back in Solr 1.1, the standard query parser treated ; as a special character and looked for sort instructions after it. Starting in Solr 1.2 (released in 2007) a sort param was added, and semicolon was only considered a special character if you did not explicilty mention a sort param (for back compatibility) Starting with Solr 1.4, the default was changed so that semicolon wasn't considered a meta-character even if you didn't have a sort param -- you have to explicilty select the lucenePlusSort QParser to get this behavior. I can only assume that if you are seeing this behavior, you are either using a very old version of Solr, or you have explicitly selected the lucenePlusSort parser somewhere in your params/config. This was heavily documented in CHANGES.txt for Solr 1.4 (you can find mention of it when searching for either ; or semicolon) I am using 1.3 without a sort param which explains it, I think. It would be nice to update to 1.4 but we try to avoid such actions on a production server as long as everything runs fine (the semicolon thing was only reported recently). Many thanks for your detailed explanation! -Michael
Re: Distance sorting with spatial filtering
I get the error on all functions. GET 'http://localhost:8983/solr/select?q=*:*sort=sum(1)+asc' Error 400 can not sort on unindexed field: sum(1) I tried another nightly build from today, Sep 7th, with the same results. I attached the schema.xml Thanks for the help! Scott On Wed, Sep 1, 2010 at 18:43, Lance Norskog goks...@gmail.com wrote: Post your schema. On Mon, Aug 30, 2010 at 2:04 PM, Scott K s...@skister.com wrote: The new spatial filtering (SOLR-1586) works great and is much faster than fq={!frange. However, I am having problems sorting by distance. If I try GET 'http://localhost:8983/solr/select/?q=*:*sort=dist(2,latitude,longitude,0,0)+asc' I get an error: Error 400 can not sort on unindexed field: dist(2,latitude,longitude,0,0) I was able to work around this with GET 'http://localhost:8983/solr/select/?q=*:* AND _val_:recip(dist(2, latitude, longitude, 0,0),1,1,1)fl=*,score' But why isn't sorting by functions working? I get this error with any function I try to sort on.This is a nightly trunk build from Aug 25th. I see SOLR-1297 was reopened, but that seems to be for edge cases. Second question: I am using the LatLonType from the Spatial Filtering wiki, http://wiki.apache.org/solr/SpatialSearch Are there any distance sorting functions that use this field, or do I need to have three indexed fields, store_lat_lon, latitude, and longitude, if I want both filtering and sorting by distance. Thanks, Scott -- Lance Norskog goks...@gmail.com ?xml version=1.0 encoding=UTF-8 ? !-- PERFORMANCE NOTE: this schema includes many optional features and should not be used for benchmarking. To improve performance one could - set stored=false for all fields possible (esp large fields) when you only need to search on the field but don't need to return the original value. - set indexed=false if you don't need to search on the field, but only return the field as a result of searching on other indexed fields. - remove all unneeded copyField statements - for best index size and searching performance, set index to false for all general text fields, use copyField to copy them to the catchall text field, and use that for searching. - For maximum indexing performance, use the StreamingUpdateSolrServer java client. - Remember to run the JVM in server mode, and use a higher logging level that avoids logging every request -- schema name=schema version=1.2 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ !-- boolean type: true or false -- fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ !--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -- fieldtype name=binary class=solr.BinaryField/ !-- Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types. -- fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ !-- A Trie based date field for faster date range queries and date faceting. -- fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ fieldType name=random class=solr.RandomSortField indexed=true / fieldType name=location class=solr.LatLonType subFieldType=double / !-- A text field that only splits on whitespace for exact matching of words -- fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter
Re: Alphanumeric wildcard search problem
The real problem was this tag !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldtext/defaultSearchField and I was quering like this q=r-1* instead of q=mat_nr:r-1* so whatever fieldType I use for mat_nr, it was using text fieldType which had WordDelimiterFilterFactory, hence I had to put space inorder to get it running. -- View this message in context: http://lucene.472066.n3.nabble.com/Alphanumeric-wildcard-search-problem-tp1393332p1437584.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandlerException for custom DIH Transformer
Resurrecting an old thread. I faced exact problem as Tommy and the jar was in {solr.home}/lib as Noble had suggested. My custom transformer overrides following method as per the specification of Transformer class. public Object transformRow(MapString, Object row, Context context); But, in the code (EntityProcessorWrapper.java), I see the following line. final Method meth = clazz.getMethod(TRANSFORM_ROW, Map.class); This doesn't match the method signature in Transformer. I think this should be final Method meth = clazz.getMethod(TRANSFORM_ROW, Map.class, Context.class); I have verified that adding a method transformRow(MapString, Object row) works. Am I missing something? --shashi 2010/2/8 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com On Mon, Feb 8, 2010 at 9:13 AM, Tommy Chheng tommy.chh...@gmail.com wrote: I'm having trouble making a custom DIH transformer in solr 1.4. I compiled the General TrimTransformer into a jar. (just copy/paste sample code from http://wiki.apache.org/solr/DIHCustomTransformer) I placed the jar along with the dataimporthandler jar in solr/lib (same directory as the jetty jar) do not keep in solr/lib it wont work. keep it in {solr.home}/lib Then I added to my DIH data-config.xml file: transformer=DateFormatTransformer, RegexTransformer, com.chheng.dih.transformers.TrimTransformer Now I get this exception when I try running the import. org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodException: com.chheng.dih.transformers.TrimTransformer.transformRow(java.util.Map) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.loadTransformers(EntityProcessorWrapper.java:120) I noticed the exception lists TrimTransformer.transformRow(java.util.Map) but the abstract Transformer class defines a two parameter method: transformRow(MapString, Object row, Context context)? -- Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: stream.url
Hi Hoss, Thanks for reply and it got working The reason was as you said i was not double escaping i used %2520 for whitespace and it is working now Thanks, satya
Re: Query result ranking - Score independent
My request was very simple: q= astronomy^0 And Solr returned the exception. Maybe the zero boost factor is not causing the exception? 1) We indexed n documents with a Schema.xml. 2)Then we changed some field type in the Schema.xml 3)Then we indexed other m documents Maybe this could cause the exception? 2010/9/7 Grant Ingersoll gsing...@apache.org On Sep 7, 2010, at 7:08 AM, Alessandro Benedetti wrote: Hi all, I need to retrieve query-results with a ranking independent from each query-result's default lucene score, which means assigning the same score to each query result. I tried to use a zero boost factor ( ^0 ) to reset to zero each query-result's score. This strategy seems to work within the example solr instance, but in my Solr instance, using a zero boost factor causes a Buffer Exception ( HTTP Status 500 - null java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:249) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506) at org.apache.lucene.index.IndexReader.document(IndexReader.java:947) ) Hmm, that stack trace doesn't align w/ the boost factor. What was your request? I think there might be something else wrong here. Do you know any other technique to reset to some fixed constant value, all the query-result's scores? Each query result should obtain the same score. Any suggestion? The ConstantScoreQuery or a Filter should do this. You could do something like: q=*:*fq=the real query, as in q=*:*fq=field:foo -Grant -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8 -- -- Benedetti Alessandro Personal Page: http://tigerbolt.altervista.org Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Phrase search + multi-word index time expanded synonym
Hello, well, first, here's the field type that is searched : fieldtype name=SyFR class=solr.TextField analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ !-- Synonyms -- filter class=solr.SynonymFilterFactory synonyms=synonyms-fr.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer /fieldtype here's the synonym from the synonyms-fr.txt file : ... PS,Parti socialiste ... and here's the query : PS et. It returns no result, whereas Parti socialiste et returns the results. How can I have both queries working ? I'm thinking about different configurations but I didn't found any solution at the moment. Thx for reading, Xavier Schepler
Creating a sub-index from another
Hej, I have a Solr Index with several million documents. I need to implement some text mining processes and I would like to create a million documents index from the original for some tests. How can I do it? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Creating-a-sub-index-from-another-tp1438386p1438386.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr searching harri finds harry
U have not provided much detail about analysis of that field,but I am sure that problem because of stemming u can see by analysis page or by debugQuery=on parameter. To prevent stemming u have to put words in protword.txt on which u do not need any stemming - Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-searching-harri-finds-harry-tp1438486p1438637.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase search + multi-word index time expanded synonym
On 08/09/2010 12:21, Grijesh.singh wrote: see the analysis.jsp with debug verbose and see what happens at index time and search time during analysis with your data Also u can use debugQuery=on for seeing what actually parsed query is. - Grijesh I've found a first solution by myself, using the query analyzer, that works for couple of synonyms. I have to test it with rows of 3 or 4 equivalents synonyms. I used analysis.jsp. The query time analyzer became : analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms2-fr.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer And the synonyms2-fr.txt contains : PS = Parti socialiste Thxs for your reply.
Re: Creating a sub-index from another
you need a separate solr core for that and have to write a processor which process with your original index ,then generate the xml data and push to the new core.That is the simple way that i have used many times. - Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/Creating-a-sub-index-from-another-tp1438386p1438673.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr searching harri finds harry
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have harry as a protected word in protword.txt Here is the xml definition for my text column fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType On 08/09/10 11:29, Grijesh.singh wrote: U have not provided much detail about analysis of that field,but I am sure that problem because of stemming u can see by analysis page or by debugQuery=on parameter. To prevent stemming u have to put words in protword.txt on which u do not need any stemming - Grijesh -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAkyHaHIACgkQLOut9Un89NmR6wCgjOS+znMEqUQKn3ACzWudAaa4 faMAn2d0LX76ZBmiL+j/EtmVpvIpHiub =5ymy -END PGP SIGNATURE- 0x49FCF4D9.asc Description: application/pgp-keys 0x49FCF4D9.asc.sig Description: PGP signature
RE: Advice requested. How to map 1:M or M:M relationships with support for facets
Thank you for your advice. Tim -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Tuesday, September 07, 2010 11:01 PM To: solr-user@lucene.apache.org Subject: Re: Advice requested. How to map 1:M or M:M relationships with support for facets These days the best practice for a 'drill-down' facet in a UI is to encode both the unique value of the facet and the displayable string into one facet value. In the UI, you unpack and show the display string, and search with the full facet string. If you want to also do date ranges, make a separate matching 'date' field. This will store the date twice. Solr schema design is all about denormalizing. Tim Gilbert wrote: Hi guys, *Question:* What is the best way to create a solr schema which supports a 'multivalue' where the value is a two item array of event category and a date. I want to have faceted searches, counts and Date Range ability on both the category and the dates. *Details:* This is a person database where Person can have details about them (like address) and Person have many Events. Events have a category (type of event) and a Date for when that event occurred. At the bottom you will see a simple diagram showing the relationship. Briefly, a Person has many Events and Events have a single category and a single person. What I would like to be able to do is: Have a facet which shows all of the event categories, with a 'sub-facet' that show Category + date. For example, if a Category was Attended Conference and date was 2008-09-08, I'd be able to show a count of all Attended Conference, then have a tree type control and show the years (for example): Eg. + Attended Conference (1038) | + 2010 (100) +--- 2009 (134) +--- 2008 (234) | + Another Event Category (23432) | +-2010 (234) +2009 (245) Etc. For scale, I expect to have 100 Event Categories and a million person_event records on 250,000 persons. I don't care very much about disk space, so if it's a 1 GB or 100 GB due to indexing, that's okay if the solution works (and its fast! J) *Solutions I looked at:* * I looked at poly but they seem to be a fixed length and appeared to be the same type. Typical use case was latitude longitude. I don't think this will work because there are a variable number of events attached to a person. * I looked at multiValued but it didn't seem to permit two fields having a relationship, ie. Event Category Event Date. It seemed to me that they need to be broken out. That's not necessarily a bad thing, but it didn't seem ideal. * I thought about concatenating category date to create a fake fields strictly for faceting purposes, but I believe that will break date ranges. Eg. EventCategoryId + | + Date = 1|2009 as a facet would allow me to show counts for that event type. Seems a bit unwieldy to me... What's the groups advice for handling this situation in the best way? Thanks in advance, as always sorry if this question has been asked and answered a few times already. I googled for a few hours before writing this... but things change so fast with Solr that any article older than a year was suspect to me, also there are so many patches that provide additional functionality... Tim Schema:
Re: Multi core schema file
solr.xml allows you to mention the other properties as well like instanceDir, config,schema in the cores/core tag So , sharing the entire conf dir may not be possible , but it is possible to share solrconfig.xml and schema.xml U can see the detail parameters at wiki page http://wiki.apache.org/solr/CoreAdmin - Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-core-schema-file-tp1438460p1438720.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr searching harri finds harry
have u restart the solr after adding words in protwords and reindex the data? - Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-searching-harri-finds-harry-tp1438486p1438735.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr searching harri finds harry
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Yes to restart, no to re-index. Was hoping that wouldn't be necessary. I'll do that now. On 08/09/10 11:48, Grijesh.singh wrote: have u restart the solr after adding words in protwords and reindex the data? - Grijesh -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAkyHbEEACgkQLOut9Un89NmAaACfdl5P/GOikHvBHu0A9/6ma30q jXYAoIAbN8tAnMc4ecqwJ4Q8r/Un3Cio =vmU8 -END PGP SIGNATURE- 0x49FCF4D9.asc Description: application/pgp-keys 0x49FCF4D9.asc.sig Description: PGP signature
Re: Solr searching harri finds harry
yes reindexing is necessary for protwords,synanym update - Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-searching-harri-finds-harry-tp1438486p1438802.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Batch update, order of evaluation
This would be surprising behavior, if you can reliably reproduce this it's worth a JIRA. But (and I'm stretching a bit here) are you sure you're committing at the end of the batch AND are you sure you're looking after the commit? Here's the scenario: Your updated document is a position 1 and 100 in your batch. Somewhere around SOLR processing document 50, an autocommit occurs, and you're looking at your results before SOLR gets around to committing document 100. Like I said, it's a stretch. To test this, you need to be absolutely sure of two things before you search: 1 the batch is finished processing 2 you've issued a commit after the last document in the batch. If you're sure of the above and still see the problem, please let us know... HTH Erick On Tue, Sep 7, 2010 at 10:32 PM, Greg Pendlebury greg.pendleb...@gmail.comwrote: Does anyone know with certainty how (or even if) order is evaluated when updates are performed by batch? Our application internally buffers solr documents for speed of ingest before sending them to the server in chunks. The XML documents sent to the solr server contain all documents in the order they arrived without any settings changed from the defaults (so overwrite = true). We are careful to avoid things like HashMaps on our side since they'd lose the order, but I can't be certain what occurs inside Solr. Sometimes if an object has been indexed twice for various reasons it could appear twice in the buffer but the most up-to-date version is always last. I have however observed instances where the first copy of the document is indexed and differences in the second copy are missing. Does this sound likely? And if so are there any obvious settings I can play with to get the behavior I desire? I looked at: http://wiki.apache.org/solr/UpdateXmlMessages but there is no mention of order, just the overwrite flag (which I'm unsure how it is applied internally to an update message) and the deprecated duplicates flag (which I have no idea about). Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as per http://wiki.apache.org/solr/Solrj. This is no mention of order there either however. Thanks to anyone who took the time to read this. Ta, Greg
Re: list of filters/factories/Input handlers/blah blah
See the javadocs at: http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.html http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.htmlalso see: http://wiki.apache.org/solr/LanguageAnalysis http://wiki.apache.org/solr/LanguageAnalysisBoth of these are linked from the page Jonathan referenced. The JavaDocs will be the most up to date... Best Erick On Tue, Sep 7, 2010 at 11:56 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Not neccesarily definitive, but filters and tokenizers can be found here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Not sure if that's all of the analyzers (which I think is the generic name for both tokenizers and filters) that come with Solr, but I believe it's at least most of them. It's of course possible to write your own analyzers or use third party analyzers too, if there's a list of such available, I don't know about it, but it sure would be handy. Some Query parsers, which I _think_ is the right term for things you can pass as defType=something or {!type=something}, or one or two other things with different key names I forget, can be found here: http://wiki.apache.org/solr/SolrQuerySyntax#Other_built-in_useful_query_parsers Along with lucene and dismax also mentioned on that page, I _think_ that's the complete list of query parsers included with Solr 1.4, but someone PLEASE correct me if I'm wrong. It is indeed difficult to get a handle on this stuff for me too. Other than query parsers and analyzers, I'm not entirely certain what else falls in the category of I/O components. I don't know anything about input handlers, myself. Jonathan From: Dennis Gearon [gear...@sbcglobal.net] Sent: Tuesday, September 07, 2010 10:41 PM To: solr-user@lucene.apache.org Subject: list of filters/factories/Input handlers/blah blah Is there a definitive list of: filters inputHandlers and other 'code fragments' that do I/O processing for Solr/Lucene? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php
Re: Alphanumeric wildcard search problem
Ah, thanks. That reconciles our differing results. Best Erick On Wed, Sep 8, 2010 at 2:58 AM, Hasnain hasn...@hotmail.com wrote: The real problem was this tag !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldtext/defaultSearchField and I was quering like this q=r-1* instead of q=mat_nr:r-1* so whatever fieldType I use for mat_nr, it was using text fieldType which had WordDelimiterFilterFactory, hence I had to put space inorder to get it running. -- View this message in context: http://lucene.472066.n3.nabble.com/Alphanumeric-wildcard-search-problem-tp1393332p1437584.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Batch update, order of evaluation
Thanks, I'll create a deliberate test tomorrow feed some random data through it several times to see what happens. I'm also working on simply improving the buffer to handle the situation internally, but a few hours of testing isn't a big deal. Ta, Greg On 8 September 2010 21:41, Erick Erickson erickerick...@gmail.com wrote: This would be surprising behavior, if you can reliably reproduce this it's worth a JIRA. But (and I'm stretching a bit here) are you sure you're committing at the end of the batch AND are you sure you're looking after the commit? Here's the scenario: Your updated document is a position 1 and 100 in your batch. Somewhere around SOLR processing document 50, an autocommit occurs, and you're looking at your results before SOLR gets around to committing document 100. Like I said, it's a stretch. To test this, you need to be absolutely sure of two things before you search: 1 the batch is finished processing 2 you've issued a commit after the last document in the batch. If you're sure of the above and still see the problem, please let us know... HTH Erick On Tue, Sep 7, 2010 at 10:32 PM, Greg Pendlebury greg.pendleb...@gmail.comwrote: Does anyone know with certainty how (or even if) order is evaluated when updates are performed by batch? Our application internally buffers solr documents for speed of ingest before sending them to the server in chunks. The XML documents sent to the solr server contain all documents in the order they arrived without any settings changed from the defaults (so overwrite = true). We are careful to avoid things like HashMaps on our side since they'd lose the order, but I can't be certain what occurs inside Solr. Sometimes if an object has been indexed twice for various reasons it could appear twice in the buffer but the most up-to-date version is always last. I have however observed instances where the first copy of the document is indexed and differences in the second copy are missing. Does this sound likely? And if so are there any obvious settings I can play with to get the behavior I desire? I looked at: http://wiki.apache.org/solr/UpdateXmlMessages but there is no mention of order, just the overwrite flag (which I'm unsure how it is applied internally to an update message) and the deprecated duplicates flag (which I have no idea about). Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as per http://wiki.apache.org/solr/Solrj. This is no mention of order there either however. Thanks to anyone who took the time to read this. Ta, Greg
Re: Query result ranking - Score independent
The change in the schema shouldn't matter (emphasis on the should). What version of SOLR are you using? I tried this query and it works just fine for me, I'm using 1.4.1 Best Erick On Wed, Sep 8, 2010 at 4:38 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: My request was very simple: q= astronomy^0 And Solr returned the exception. Maybe the zero boost factor is not causing the exception? 1) We indexed n documents with a Schema.xml. 2)Then we changed some field type in the Schema.xml 3)Then we indexed other m documents Maybe this could cause the exception? 2010/9/7 Grant Ingersoll gsing...@apache.org On Sep 7, 2010, at 7:08 AM, Alessandro Benedetti wrote: Hi all, I need to retrieve query-results with a ranking independent from each query-result's default lucene score, which means assigning the same score to each query result. I tried to use a zero boost factor ( ^0 ) to reset to zero each query-result's score. This strategy seems to work within the example solr instance, but in my Solr instance, using a zero boost factor causes a Buffer Exception ( HTTP Status 500 - null java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:249) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506) at org.apache.lucene.index.IndexReader.document(IndexReader.java:947) ) Hmm, that stack trace doesn't align w/ the boost factor. What was your request? I think there might be something else wrong here. Do you know any other technique to reset to some fixed constant value, all the query-result's scores? Each query result should obtain the same score. Any suggestion? The ConstantScoreQuery or a Filter should do this. You could do something like: q=*:*fq=the real query, as in q=*:*fq=field:foo -Grant -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8 -- -- Benedetti Alessandro Personal Page: http://tigerbolt.altervista.org Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Query result ranking - Score independent
Ooops, hit send too quickly. Could you show us the entire URL you send that produces the error? Erick On Wed, Sep 8, 2010 at 7:58 AM, Erick Erickson erickerick...@gmail.comwrote: The change in the schema shouldn't matter (emphasis on the should). What version of SOLR are you using? I tried this query and it works just fine for me, I'm using 1.4.1 Best Erick On Wed, Sep 8, 2010 at 4:38 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: My request was very simple: q= astronomy^0 And Solr returned the exception. Maybe the zero boost factor is not causing the exception? 1) We indexed n documents with a Schema.xml. 2)Then we changed some field type in the Schema.xml 3)Then we indexed other m documents Maybe this could cause the exception? 2010/9/7 Grant Ingersoll gsing...@apache.org On Sep 7, 2010, at 7:08 AM, Alessandro Benedetti wrote: Hi all, I need to retrieve query-results with a ranking independent from each query-result's default lucene score, which means assigning the same score to each query result. I tried to use a zero boost factor ( ^0 ) to reset to zero each query-result's score. This strategy seems to work within the example solr instance, but in my Solr instance, using a zero boost factor causes a Buffer Exception ( HTTP Status 500 - null java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:249) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506) at org.apache.lucene.index.IndexReader.document(IndexReader.java:947) ) Hmm, that stack trace doesn't align w/ the boost factor. What was your request? I think there might be something else wrong here. Do you know any other technique to reset to some fixed constant value, all the query-result's scores? Each query result should obtain the same score. Any suggestion? The ConstantScoreQuery or a Filter should do this. You could do something like: q=*:*fq=the real query, as in q=*:*fq=field:foo -Grant -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8 -- -- Benedetti Alessandro Personal Page: http://tigerbolt.altervista.org Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Solr, c/s type ?
I'll guess he means client/server. On Tue, Sep 7, 2010 at 5:52 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: Solr, c/s type ? : : i'm wondering c/s type is possible (not http web type). : if possible, could i get the material about it? You're going t oneed to provide more info exaplining what it is you are asking baout -- i don't know about anyone else, but i honestly have absolutely no idea what you might possibly mean by c/s type is possible (not http web type) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
RE: Solr, c/s type ?
I'll guess he means client/server. HTTP is a client/server protocol, isn't it?
Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.
On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer r...@0ape.com wrote: Short summary: * Mixing Facets and Shards give me a NullPointerException when not all docs have all facets. https://issues.apache.org/jira/browse/SOLR-2110 I believe the underlying real issue stemmed from your use of a complex key involvement/race_facet. -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
How to import data with a different date format
Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Invariants on a specific fq value
Hi, I have an index with several collections. Every document has a collection field that specifies the collection it belongs to. To make querying easier (and restrict exposed parameters) i have a request handler for each collection. The request handlers are largely the same and preset all parameters using invariants. Well, this is all very nice. But there is a catch, i cannot make an invariant of the fq parameter because it's being used (from the outside) to navigate through the facets. This means that the outside world can specify any value for the fq parameter. With the fq parameter being exposed, it is possible for request handler X to query documents that belong to collection Y and vice versa. But, as you might guess by now, request handler X should only be allowed to retrieve documents that belong to collection X. I know there are some discussions on how to restrict users to certain documents but i'd like to know if it is doable to patch the request handler logic to add an invariant-like directive that allows me to restrict a certain value for a certain parameter, but allow different values for that parameters. To give an example: requestHandler name=collection_x lst name=invariants str name=defTypedismax/str ... More invariants here /lst lst name=what_should_we_call_this? str name=fqfieldName:collection_x/str /lst /requestHandler The above configuration won't allow to change the defType and won't allow a value to be specified for the fieldName through the fq parameter. It will allow the outside worls to specify a value on another field through the fq parameter such as : fq:anotherField:someValue. Any ideas? Cheers, Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: How to import data with a different date format
No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: How to import data with a different date format
That was my first thought :-) But it would be nice to be able to do date queries. I guess when I export the data I can just add 00:00:00Z. Thanks. - Original Message From: Markus Jelsma markus.jel...@buyways.nl To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 11:34:32 AM Subject: RE: How to import data with a different date format No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
RE: Re: How to import data with a different date format
Your format (MM/DD/) is not compatible. -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 19:03 To: solr-user@lucene.apache.org; Subject: Re: How to import data with a different date format That was my first thought :-) But it would be nice to be able to do date queries. I guess when I export the data I can just add 00:00:00Z. Thanks. - Original Message From: Markus Jelsma markus.jel...@buyways.nl To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 11:34:32 AM Subject: RE: How to import data with a different date format No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: How to import data with a different date format
I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: Invariants on a specific fq value
I just found out about 'invariants', and I found out about another thing too: appends. (I don't think either of these are actually documented anywhere?). I think maybe appends rather than invariants, with your fq you want always to be there might be exactly what you want? I actually forget whether it's append or appends, and am not sure if it's documented anywhere, try both I guess. But apparently it does exist in 1.4. Jonathan Markus Jelsma wrote: Hi, I have an index with several collections. Every document has a collection field that specifies the collection it belongs to. To make querying easier (and restrict exposed parameters) i have a request handler for each collection. The request handlers are largely the same and preset all parameters using invariants. Well, this is all very nice. But there is a catch, i cannot make an invariant of the fq parameter because it's being used (from the outside) to navigate through the facets. This means that the outside world can specify any value for the fq parameter. With the fq parameter being exposed, it is possible for request handler X to query documents that belong to collection Y and vice versa. But, as you might guess by now, request handler X should only be allowed to retrieve documents that belong to collection X. I know there are some discussions on how to restrict users to certain documents but i'd like to know if it is doable to patch the request handler logic to add an invariant-like directive that allows me to restrict a certain value for a certain parameter, but allow different values for that parameters. To give an example: requestHandler name=collection_x lst name=invariants str name=defTypedismax/str ... More invariants here /lst lst name=what_should_we_call_this? str name=fqfieldName:collection_x/str /lst /requestHandler The above configuration won't allow to change the defType and won't allow a value to be specified for the fieldName through the fq parameter. It will allow the outside worls to specify a value on another field through the fq parameter such as : fq:anotherField:someValue. Any ideas? Cheers, Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Re: How to import data with a different date format
It will work. The original data is in XML format. I have an XSLT that transforms the data into the same format as that in exampledocs: adddocfield name=../field/doc.../add. - Original Message From: Markus Jelsma markus.jel...@buyways.nl To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 12:06:39 PM Subject: RE: Re: How to import data with a different date format Your format (MM/DD/) is not compatible. -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 19:03 To: solr-user@lucene.apache.org; Subject: Re: How to import data with a different date format That was my first thought :-) But it would be nice to be able to do date queries. I guess when I export the data I can just add 00:00:00Z. Thanks. - Original Message From: Markus Jelsma markus.jel...@buyways.nl To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 11:34:32 AM Subject: RE: How to import data with a different date format No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: How to import data with a different date format
I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly easy in XSLT) and then adding T00:00:00Z to it. Thanks. - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 12:09:55 PM Subject: Re: How to import data with a different date format I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
RE: Re: How to import data with a different date format
Ah, that answers Erick's question. And mine ;) -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 19:25 To: solr-user@lucene.apache.org; Subject: Re: How to import data with a different date format I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly easy in XSLT) and then adding T00:00:00Z to it. Thanks. - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 12:09:55 PM Subject: Re: How to import data with a different date format I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Solr Highlighting Question
Thanks for taking time to read through this. I'm using a checkout from the solr 3.x branch My problem is with the highlighter and wildcards I can get the highlighter to work with wild cards just fine, the problem is that solr is returning the term matched, when what I want it to do is highlight the chars in the term that were matched. Example: http://192.168.1.75:8983/solr/music/select?indent=onq=name_title:wel*qt=beyondhl=truehl.fl=name_titlef.name_title.hl.usePhraseHighlighter=truef.name_title.hl.highlightMultiTerm=true The results that come back look like this: emWelcome/em to the Jungle What I want them to look like is this: emWel/emcome to the Jungle From what I gathered by searching the archives is that solr 1.1 used to do this... Is there a way to get that functionality? Thanks!
Re: How to import data with a different date format
Just throwing it out there, I'd consider a different approach for an actual real app, although it might not be easier to get up quickly. (For quickly, yeah, I'd just store it as a string, more on that at bottom). If none of your dates have times, they're all just full days, I'm not sure you really need the date type at all. Convert the date to number-of-days since epoch integer. (Most languages will have a way to do this, but I don't know about pure XSLT). Store _that_ in a 1.4 'int' field. On top of that, make it a tint (precision non-zero) for faster range queries. But now your actual interface will have to convert from number of days since epoch to a displayable date. (And if you allow user input, convert the input to number-of-days-since-epoch before making a range query or fq, but you'd have to do that anyway even with solr dates, users aren't going to be entering W3CDate raw, I don't think). That is probably the most efficient way to have solr handle it -- using an actual date field type gives you a lot more precision than you need, which is going to hurt performance on range queries. Which you can compensate for with trie date sure, but if you don't really need that precision to begin with, why use it? Also the extra precision can end up doing unexpected things and making it easier to have bugs (range queries on that high precision stuff, you need to make sure your start date has 00:00:00 set and your end date has 23:59:59 set, to do what you probably expect). If you aren't going to use the extra precision, makes everything a lot simpler to not use a date field. Alternately, for your get this done quick method, yeah, I'd just store it as a string. With a string exactly as you've specified, sorting and range queries won't work how you'd want. But if you can make it a string of the format /mm/dd instead (always two-digit month and year), then you can even sort and do range queries on your string dates. For the quick and dirty prototype, I'd just do that. In fact, while this might make range queries and sorting _slightly_ slower than if you use an int or a tint, this might really be good enough even for a real app (hey, it's what lots of people did before the trie-based fields existed). Jonathan Erick Erickson wrote: I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
Re: How to import data with a different date format
I'm really thinking, once you convert to -MM-DD anyway, you might be better off just sticking this in a string field, rather than using a date field at all. The extra precision in the date field is going to make things confusing later, I predict. Especially for a quick and dirty prototype, I'd just use a string. Solr is not an rdbms, our learned behavior to always try and normalize everything and define the field 'right' often is not the right way to go with solr/lucene. Jonathan Rico Lelina wrote: I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly easy in XSLT) and then adding T00:00:00Z to it. Thanks. - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 12:09:55 PM Subject: Re: How to import data with a different date format I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
RE: Re: Invariants on a specific fq value
Interesting! I haven't met the appends method before and i'll be sure to give it a try tomorrow. Try, the wiki [1] is not very clear on what it really does. More suggestions before tomorrow? [1]: http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication -Original message- From: Jonathan Rochkind rochk...@jhu.edu Sent: Wed 08-09-2010 19:19 To: solr-user@lucene.apache.org; markus.jel...@buyways.nl; Subject: Re: Invariants on a specific fq value I just found out about 'invariants', and I found out about another thing too: appends. (I don't think either of these are actually documented anywhere?). I think maybe appends rather than invariants, with your fq you want always to be there might be exactly what you want? I actually forget whether it's append or appends, and am not sure if it's documented anywhere, try both I guess. But apparently it does exist in 1.4. Jonathan Markus Jelsma wrote: Hi, I have an index with several collections. Every document has a collection field that specifies the collection it belongs to. To make querying easier (and restrict exposed parameters) i have a request handler for each collection. The request handlers are largely the same and preset all parameters using invariants. Well, this is all very nice. But there is a catch, i cannot make an invariant of the fq parameter because it's being used (from the outside) to navigate through the facets. This means that the outside world can specify any value for the fq parameter. With the fq parameter being exposed, it is possible for request handler X to query documents that belong to collection Y and vice versa. But, as you might guess by now, request handler X should only be allowed to retrieve documents that belong to collection X. I know there are some discussions on how to restrict users to certain documents but i'd like to know if it is doable to patch the request handler logic to add an invariant-like directive that allows me to restrict a certain value for a certain parameter, but allow different values for that parameters. To give an example: requestHandler name=collection_x lst name=invariants str name=defTypedismax/str ... More invariants here /lst lst name=what_should_we_call_this? str name=fqfieldName:collection_x/str /lst /requestHandler The above configuration won't allow to change the defType and won't allow a value to be specified for the fieldName through the fq parameter. It will allow the outside worls to specify a value on another field through the fq parameter such as : fq:anotherField:someValue. Any ideas? Cheers, Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Invariants on a specific fq value
Ah, I NEVER would have thought to look for these defaults/invariants/appends stuff under 'security', that's why I never found it! I can see now why it's sort of a security issue, but I, like you, use them just for convenience instead, and think of defaults, invariants, and appends as all in the same family, with different logic choices. Markus Jelsma wrote: Interesting! I haven't met the appends method before and i'll be sure to give it a try tomorrow. Try, the wiki [1] is not very clear on what it really does. More suggestions before tomorrow? [1]: http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication -Original message- From: Jonathan Rochkind rochk...@jhu.edu Sent: Wed 08-09-2010 19:19 To: solr-user@lucene.apache.org; markus.jel...@buyways.nl; Subject: Re: Invariants on a specific fq value I just found out about 'invariants', and I found out about another thing too: appends. (I don't think either of these are actually documented anywhere?). I think maybe appends rather than invariants, with your fq you want always to be there might be exactly what you want? I actually forget whether it's append or appends, and am not sure if it's documented anywhere, try both I guess. But apparently it does exist in 1.4. Jonathan Markus Jelsma wrote: Hi, I have an index with several collections. Every document has a collection field that specifies the collection it belongs to. To make querying easier (and restrict exposed parameters) i have a request handler for each collection. The request handlers are largely the same and preset all parameters using invariants. Well, this is all very nice. But there is a catch, i cannot make an invariant of the fq parameter because it's being used (from the outside) to navigate through the facets. This means that the outside world can specify any value for the fq parameter. With the fq parameter being exposed, it is possible for request handler X to query documents that belong to collection Y and vice versa. But, as you might guess by now, request handler X should only be allowed to retrieve documents that belong to collection X. I know there are some discussions on how to restrict users to certain documents but i'd like to know if it is doable to patch the request handler logic to add an invariant-like directive that allows me to restrict a certain value for a certain parameter, but allow different values for that parameters. To give an example: requestHandler name=collection_x lst name=invariants str name=defTypedismax/str ... More invariants here /lst lst name=what_should_we_call_this? str name=fqfieldName:collection_x/str /lst /requestHandler The above configuration won't allow to change the defType and won't allow a value to be specified for the fieldName through the fq parameter. It will allow the outside worls to specify a value on another field through the fq parameter such as : fq:anotherField:someValue. Any ideas? Cheers, Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: How to import data with a different date format
how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Why would you want to tokenize a -mm-dd value? I'm liking the 'string' type. If you do -mm-dd, then you can even sort properly, and range query with endpoints also specified as -mm-dd, no? Okay, I'll stop spamming the thread now, heh. Jonathan
Re: Re: Invariants on a specific fq value
2010 at 1:32 PM, Markus Jelsma markus.jel...@buyways.nl wrote: Interesting! I haven't met the appends method before and i'll be sure to give it a try tomorrow. Try, the wiki [1] is not very clear on what it really does. Here's a comment from the example solrconfig.xml: !-- In addition to defaults, appends params can be specified to identify values which should be appended to the list of multi-val params from the query (or the existing defaults). In this example, the param fq=instock:true will be appended to any query time fq params the user may specify, as a mechanism for partitioning the index, independent of any user selected filtering that may also be desired (perhaps as a result of faceted searching). NOTE: there is *absolutely* nothing a client can do to prevent these appends values from being used, so don't use this mechanism unless you are sure you always want it. -- -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
RE: Re: Re: Invariants on a specific fq value
Sounds great! I'll be very sure to put it to the test tomorrow and perhaps add documentation on these types to the solrconfigxml wiki page for reference. -Original message- From: Yonik Seeley yo...@lucidimagination.com Sent: Wed 08-09-2010 19:38 To: solr-user@lucene.apache.org; Subject: Re: Re: Invariants on a specific fq value 2010 at 1:32 PM, Markus Jelsma markus.jel...@buyways.nl wrote: Interesting! I haven't met the appends method before and i'll be sure to give it a try tomorrow. Try, the wiki [1] is not very clear on what it really does. Here's a comment from the example solrconfig.xml: !-- In addition to defaults, appends params can be specified to identify values which should be appended to the list of multi-val params from the query (or the existing defaults). In this example, the param fq=instock:true will be appended to any query time fq params the user may specify, as a mechanism for partitioning the index, independent of any user selected filtering that may also be desired (perhaps as a result of faceted searching). NOTE: there is *absolutely* nothing a client can do to prevent these appends values from being used, so don't use this mechanism unless you are sure you always want it. -- -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
Re: How to import data with a different date format
I'm doing something similar for dates/times/timestamps. I'm actually trying to do, 'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). Fairly simple search of: What items have a start time BEFORE now, and an end time AFTER now? My thoughts were to store: unix time stamp BIGINTS (64 bit) ISO_DATE ISO_TIME strings Which is going to be faster: 1/ Indexing? 2/ Searching? How does the 'tint' field mentioned below apply? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 10:27 AM Just throwing it out there, I'd consider a different approach for an actual real app, although it might not be easier to get up quickly. (For quickly, yeah, I'd just store it as a string, more on that at bottom). If none of your dates have times, they're all just full days, I'm not sure you really need the date type at all. Convert the date to number-of-days since epoch integer. (Most languages will have a way to do this, but I don't know about pure XSLT). Store _that_ in a 1.4 'int' field. On top of that, make it a tint (precision non-zero) for faster range queries. But now your actual interface will have to convert from number of days since epoch to a displayable date. (And if you allow user input, convert the input to number-of-days-since-epoch before making a range query or fq, but you'd have to do that anyway even with solr dates, users aren't going to be entering W3CDate raw, I don't think). That is probably the most efficient way to have solr handle it -- using an actual date field type gives you a lot more precision than you need, which is going to hurt performance on range queries. Which you can compensate for with trie date sure, but if you don't really need that precision to begin with, why use it? Also the extra precision can end up doing unexpected things and making it easier to have bugs (range queries on that high precision stuff, you need to make sure your start date has 00:00:00 set and your end date has 23:59:59 set, to do what you probably expect). If you aren't going to use the extra precision, makes everything a lot simpler to not use a date field. Alternately, for your get this done quick method, yeah, I'd just store it as a string. With a string exactly as you've specified, sorting and range queries won't work how you'd want. But if you can make it a string of the format /mm/dd instead (always two-digit month and year), then you can even sort and do range queries on your string dates. For the quick and dirty prototype, I'd just do that. In fact, while this might make range queries and sorting _slightly_ slower than if you use an int or a tint, this might really be good enough even for a real app (hey, it's what lots of people did before the trie-based fields existed). Jonathan Erick Erickson wrote: I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the
Re: Invariants on a specific fq value
If there is no default or request-provided value, will the appends still be used? I suspect so, but let us know, perhaps by adding it to the wiki page! Markus Jelsma wrote: Sounds great! I'll be very sure to put it to the test tomorrow and perhaps add documentation on these types to the solrconfigxml wiki page for reference. -Original message- From: Yonik Seeley yo...@lucidimagination.com Sent: Wed 08-09-2010 19:38 To: solr-user@lucene.apache.org; Subject: Re: Re: Invariants on a specific fq value 2010 at 1:32 PM, Markus Jelsma markus.jel...@buyways.nl wrote: Interesting! I haven't met the appends method before and i'll be sure to give it a try tomorrow. Try, the wiki [1] is not very clear on what it really does. Here's a comment from the example solrconfig.xml: !-- In addition to defaults, appends params can be specified to identify values which should be appended to the list of multi-val params from the query (or the existing defaults). In this example, the param fq=instock:true will be appended to any query time fq params the user may specify, as a mechanism for partitioning the index, independent of any user selected filtering that may also be desired (perhaps as a result of faceted searching). NOTE: there is *absolutely* nothing a client can do to prevent these appends values from being used, so don't use this mechanism unless you are sure you always want it. -- -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
Re: How to import data with a different date format
That was a general comment on SOLR string types. Mostly I wanted to prompt Rico to try some searching before getting too hung up on indexing refinements. I'd far rather demo a prototype being able to say Dates don't work yet, but you can search than searching is broken to pieces, but dates work fine!. FWIW Erick On Wed, Sep 8, 2010 at 1:33 PM, Jonathan Rochkind rochk...@jhu.edu wrote: how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Why would you want to tokenize a -mm-dd value? I'm liking the 'string' type. If you do -mm-dd, then you can even sort properly, and range query with endpoints also specified as -mm-dd, no? Okay, I'll stop spamming the thread now, heh. Jonathan
Re: How to import data with a different date format
So the standard 'int' field in Solr 1.4 is a trie based field, although the example int type in the default solrconfig.xml has a precision set to 0, which means it's not really doing trie things. If you set the precision to something greater than 0, as in the default example tint type, then it's really using 'trie' functionality. 'trie' functionality speeds up range queries by putting each value into 'buckets' (my own term), per the precision specified, so solr has to do less to grab all values within a certain range. That's all tint/non-zero-precision-trie does, speed up range queries. Your use case involves range queries though, so it's worth investigating. If you use a string or other textual type for sorting or range queries, you need to make sure your values sort the way you want them to as strings. But -mm-dd will. More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ I think there probably won't be much of a difference at query time between non-trie int and string, although I'm not sure, and it may depend on the nature of your data and queries. Using a trie int will be faster for (and only for) range queries, if you have a lot of data. (There are some cases, depending on the data and the nature of your queries, where the overhead of a non-zero-precision trie may outweigh the hypothetical gain, but generally it's faster). I don't think there should be any appreciable difference between how long a non-trie int or a string will take to index -- at least as far as solr is concerned, if your app preparing the documents for solr takes longer to prepare one than another, that's another story. An actual trie (non-zero-precision) theoretically has indexing-time overhead, but I doubt it would be noticeable, unless you have a really really lean mean indexing setup where ever microsecond counts. Jonathan Dennis Gearon wrote: I'm doing something similar for dates/times/timestamps. I'm actually trying to do, 'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). Fairly simple search of: What items have a start time BEFORE now, and an end time AFTER now? My thoughts were to store: unix time stamp BIGINTS (64 bit) ISO_DATE ISO_TIME strings Which is going to be faster: 1/ Indexing? 2/ Searching? How does the 'tint' field mentioned below apply? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 10:27 AM Just throwing it out there, I'd consider a different approach for an actual real app, although it might not be easier to get up quickly. (For quickly, yeah, I'd just store it as a string, more on that at bottom). If none of your dates have times, they're all just full days, I'm not sure you really need the date type at all. Convert the date to number-of-days since epoch integer. (Most languages will have a way to do this, but I don't know about pure XSLT). Store _that_ in a 1.4 'int' field. On top of that, make it a tint (precision non-zero) for faster range queries. But now your actual interface will have to convert from number of days since epoch to a displayable date. (And if you allow user input, convert the input to number-of-days-since-epoch before making a range query or fq, but you'd have to do that anyway even with solr dates, users aren't going to be entering W3CDate raw, I don't think). That is probably the most efficient way to have solr handle it -- using an actual date field type gives you a lot more precision than you need, which is going to hurt performance on range queries. Which you can compensate for with trie date sure, but if you don't really need that precision to begin with, why use it? Also the extra precision can end up doing unexpected things and making it easier to have bugs (range queries on that high precision stuff, you need to make sure your start date has 00:00:00 set and your end date has 23:59:59 set, to do what you probably expect). If you aren't going to use the extra precision, makes everything a lot simpler to not use a date field. Alternately, for your get this done quick method, yeah, I'd just store it as a string. With a string exactly as you've specified, sorting and range queries won't work how you'd want. But if you can make it a string of the format /mm/dd instead (always two-digit month and year), then you can even sort and do range queries on your string dates. For the quick and dirty prototype, I'd just do that. In fact, while this might make range queries and sorting _slightly_ slower than if you use an int or a
Re: How to import data with a different date format
So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right? And nothing date/timestamp related has come out since, making 'trie'/INT the field of choice for timestamps, right? Seems like the fastest choice. I will have to read up on it. Seems like my original choice to use unix timestamp as storage in my SQL database, vs native Postgres timestamp, will make everything easier between: PHP Symfony Postgres Solr It's probably going to be a good idea to store two other columns in the search index for display, 'date', 'time'. That is, unless I force the user's javascript to generate the time and date from the unix timestamp. hmm. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 11:35 AM So the standard 'int' field in Solr 1.4 is a trie based field, although the example int type in the default solrconfig.xml has a precision set to 0, which means it's not really doing trie things. If you set the precision to something greater than 0, as in the default example tint type, then it's really using 'trie' functionality. 'trie' functionality speeds up range queries by putting each value into 'buckets' (my own term), per the precision specified, so solr has to do less to grab all values within a certain range. That's all tint/non-zero-precision-trie does, speed up range queries. Your use case involves range queries though, so it's worth investigating. If you use a string or other textual type for sorting or range queries, you need to make sure your values sort the way you want them to as strings. But -mm-dd will. More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ I think there probably won't be much of a difference at query time between non-trie int and string, although I'm not sure, and it may depend on the nature of your data and queries. Using a trie int will be faster for (and only for) range queries, if you have a lot of data. (There are some cases, depending on the data and the nature of your queries, where the overhead of a non-zero-precision trie may outweigh the hypothetical gain, but generally it's faster). I don't think there should be any appreciable difference between how long a non-trie int or a string will take to index -- at least as far as solr is concerned, if your app preparing the documents for solr takes longer to prepare one than another, that's another story. An actual trie (non-zero-precision) theoretically has indexing-time overhead, but I doubt it would be noticeable, unless you have a really really lean mean indexing setup where ever microsecond counts. Jonathan Dennis Gearon wrote: I'm doing something similar for dates/times/timestamps. I'm actually trying to do, 'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). Fairly simple search of: What items have a start time BEFORE now, and an end time AFTER now? My thoughts were to store: unix time stamp BIGINTS (64 bit) ISO_DATE ISO_TIME strings Which is going to be faster: 1/ Indexing? 2/ Searching? How does the 'tint' field mentioned below apply? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 10:27 AM Just throwing it out there, I'd consider a different approach for an actual real app, although it might not be easier to get up quickly. (For quickly, yeah, I'd just store it as a string, more on that at bottom). If none of your dates have times, they're all just full days, I'm not sure you really need the date type at all. Convert the date to number-of-days since epoch integer. (Most languages will have a way to do this, but I don't know about pure XSLT). Store _that_ in a 1.4 'int' field. On top of that, make it a tint (precision non-zero) for faster range queries. But now your actual interface will have to convert from number of days since epoch to a displayable date. (And if you allow user input, convert the input to number-of-days-since-epoch before making a range query or fq, but you'd have to do that anyway even with
RE: Re: Re: Invariants on a specific fq value
: Sounds great! I'll be very sure to put it to the test tomorrow and : perhaps add documentation on these types to the solrconfigxml wiki page : for reference. SolrConfigXml wouldn't really be an appropriate place to document this -- it's not a general config item, it's a feature of the SearchHandler... http://wiki.apache.org/solr/SearchHandler That wiki page already documented defaults, i've updated it to add details on appends and invariants. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
RE: Re: Re: Invariants on a specific fq value
Excellent! You already made my day for tomorrow! I'll check it's behavior with fq parameters specifying the a filter for the same field! -Original message- From: Chris Hostetter hossman_luc...@fucit.org Sent: Wed 08-09-2010 21:04 To: solr-user@lucene.apache.org; Subject: RE: Re: Re: Invariants on a specific fq value : Sounds great! I'll be very sure to put it to the test tomorrow and : perhaps add documentation on these types to the solrconfigxml wiki page : for reference. SolrConfigXml wouldn't really be an appropriate place to document this -- it's not a general config item, it's a feature of the SearchHandler... http://wiki.apache.org/solr/SearchHandler That wiki page already documented defaults, i've updated it to add details on appends and invariants. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: How to import data with a different date format
: If none of your dates have times, they're all just full days, I'm not sure you : really need the date type at all. : : Convert the date to number-of-days since epoch integer. (Most languages will : have a way to do this, but I don't know about pure XSLT). Store _that_ in a : 1.4 'int' field. On top of that, make it a tint (precision non-zero) for : faster range queries. There's really no advantage to doing this over using the TrieDateField (available in Solr 1.4). It's esentially how it's implemented under the covers (you can pick the precision just like TrieInt) except that: 1) it uses a long instead of an int 2) it supports DateMath expressions 3) it supports Date Faceting -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Randomly slow response times for range queries
Hello all, I am running two range queries on a double value as filter queries using Solr 1.4, and for the most part am getting great performance (qTime 100ms). However, at certain QPS, I start getting very slow queries (2000+ms). I've tried this using the new trie fields, and using standar sdouble fields, and have had similar results. Is there a known issue with randomly slow queries when doing range searches with Solr? Thanks for any support you can offer. -- View this message in context: http://lucene.472066.n3.nabble.com/Randomly-slow-response-times-for-range-queries-tp1441724p1441724.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is semicolon a character that needs escaping?
: I am using 1.3 without a sort param which explains it, I think. It would : be nice to update to 1.4 but we try to avoid such actions on a : production server as long as everything runs fine (the semicolon thing : was only reported recently). if you don't currenlty use sort at all, then adding a default sort param of score desc to your solr config for that handler, you shouldn't have to ever worry about semicolons again. (i'm fairly certainSolr 1.3 supported Defaults - i may be wrong ... you might have to add that hardcoded sort param in your client) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: How to use TermsComponent when I need a filter
: Subject: How to use TermsComponent when I need a filter : In-Reply-To: 8ffbbf6788bd5842b5a7274ef0f6837e01c3d...@msex85.morningstar.com : References: 8ffbbf6788bd5842b5a7274ef0f6837e01c3d...@msex85.morningstar.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
How to use TermsComponent when I need a filter
Hi, I have a solr index, which for simplicity is just a list of names, and a list of associations. (either a multivalue field e.g. {A1, A2, A3, A6} or a string concatenation list e.g. A1 A2 A3 A6) I want to be able to provide autocomplete but with a specific association. E.g. Names beginning with Bob in association A5. Is this possible? I would prefer not to have to have one index per association, since the number of associations is pretty large Cheers, David
Re: Randomly slow response times for range queries
Well, throw enough queries at any server and it'll slow right down, so how many are we talking here? But no, there're no SOLR issues like this that I know of. That said, you could be getting cache thrashing. You could be getting garbage collection by the JVM. You could be executing commits somehow (are you updating?) and causing your caches to be refilled. You could be The admin/stats.jsp page (also linked from the admin page) can give you some clues, look particularly for evictions most the way down the page. Best Erick On Wed, Sep 8, 2010 at 3:11 PM, oleg.gnatovskiy crooke...@gmail.com wrote: Hello all, I am running two range queries on a double value as filter queries using Solr 1.4, and for the most part am getting great performance (qTime 100ms). However, at certain QPS, I start getting very slow queries (2000+ms). I've tried this using the new trie fields, and using standar sdouble fields, and have had similar results. Is there a known issue with randomly slow queries when doing range searches with Solr? Thanks for any support you can offer. -- View this message in context: http://lucene.472066.n3.nabble.com/Randomly-slow-response-times-for-range-queries-tp1441724p1441724.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DataImportHandlerException for custom DIH Transformer
I am experiencing a similar situation? Any comments? -Original Message- From: Shashikant Kore [mailto:shashik...@gmail.com] Sent: Wednesday, September 08, 2010 2:54 AM To: solr-user@lucene.apache.org Subject: Re: DataImportHandlerException for custom DIH Transformer Resurrecting an old thread. I faced exact problem as Tommy and the jar was in {solr.home}/lib as Noble had suggested. My custom transformer overrides following method as per the specification of Transformer class. public Object transformRow(MapString, Object row, Context context); But, in the code (EntityProcessorWrapper.java), I see the following line. final Method meth = clazz.getMethod(TRANSFORM_ROW, Map.class); This doesn't match the method signature in Transformer. I think this should be final Method meth = clazz.getMethod(TRANSFORM_ROW, Map.class, Context.class); I have verified that adding a method transformRow(MapString, Object row) works. Am I missing something? --shashi 2010/2/8 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com On Mon, Feb 8, 2010 at 9:13 AM, Tommy Chheng tommy.chh...@gmail.com wrote: I'm having trouble making a custom DIH transformer in solr 1.4. I compiled the General TrimTransformer into a jar. (just copy/paste sample code from http://wiki.apache.org/solr/DIHCustomTransformer) I placed the jar along with the dataimporthandler jar in solr/lib (same directory as the jetty jar) do not keep in solr/lib it wont work. keep it in {solr.home}/lib Then I added to my DIH data-config.xml file: transformer=DateFormatTransformer, RegexTransformer, com.chheng.dih.transformers.TrimTransformer Now I get this exception when I try running the import. org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodException: com.chheng.dih.transformers.TrimTransformer.transformRow(java.util.Map) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.loadTransformers(EntityProcessorWrapper.java:120) I noticed the exception lists TrimTransformer.transformRow(java.util.Map) but the abstract Transformer class defines a two parameter method: transformRow(MapString, Object row, Context context)? -- Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com -- - Noble Paul | Systems Architect| AOL | http://aol.com This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
Re: Implementing synonym NewBie
I believe the synonym filter does not find phrases, only individual words. It is possible that you could use the Shingle tools to create terms that are word pairs. This would be very inefficient. On Tue, Sep 7, 2010 at 6:23 AM, Jak Akdemir jakde...@gmail.com wrote: If you think to improve your synonyms file by time I would recommend you query time indexing. By the way you don't have to re-index when you need to add something more. On Sat, Aug 28, 2010 at 10:01 AM, Jonty Rhods jonty.rh...@gmail.com wrote: Hi All, I want to use synonym for my search. Still I am in learning phase of solr. So please help me to implement synonym in my search. according to wiki synonym can be implemented in two ways. 1 at index time 2 at search time] I have combination 10 of phrase for synonym so which will be better in my case. something like : live show in new york=live show in clifornia= live show = live show in DC = live show in USA is synonym will effect my original search? thanks with regards Jonty -- Lance Norskog goks...@gmail.com
Re: How to retrieve the full corpus
If you want to do a mass scan of an index, the most scalable way is to make a variation of the Lucene CheckIndex program. Unfortunately, CheckIndex does not know any of the Solr types. But first, you should try the above techniques because they are much much easier. On Mon, Sep 6, 2010 at 7:59 AM, Markus Jelsma markus.jel...@buyways.nl wrote: You can use Luke to inspect a Lucene index. Check the schema browser in your Solr admin interface for an example. On Monday 06 September 2010 16:52:03 Roland Villemoes wrote: Hi All, How can I retrieve all words from a Solr core? I need a list of all the words and how often they occur in the index. med venlig hilsen/best regards Roland Villemoes Tel: (+45) 22 69 59 62 E-Mail: mailto:r...@alpha-solutions.dk Alpha Solutions A/S Borgergade 2, 3.sal, 1300 København K Tel: (+45) 70 20 65 38 Web: http://www.alpha-solutions.dkhttp://www.alpha-solutions.dk/ ** This message including any attachments may contain confidential and/or privileged information intended only for the person or entity to which it is addressed. If you are not the intended recipient you should delete this message. Any printing, copying, distribution or other use of this message is strictly prohibited. If you have received this message in error, please notify the sender immediately by telephone, or e-mail and delete all copies of this message and any attachments from your system. Thank you. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 -- Lance Norskog goks...@gmail.com
Re: How to import data with a different date format
Solr 1.4 was the first tagged release with trie fields. And Solr 1.4+ also includes a 'date' field based on 'trie' just for dates. If your dates are actually going to include hour/minute/second, not just calendar day-of-month, then I'd definitely use the built in solr trie date field, that's what it's for, will do the translation from calendar date-time to integer for you (in both directions), and add trie buckets for fast range querying too. I was suggesting that just using 'int' might be simpler if you don't need hour/minute/second precision, but are just storing year-month-day. If you've got hour-minute-second too, no reason not to use Solr's date type, and lots of reasons to do so. Jonathan Dennis Gearon wrote: So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right? And nothing date/timestamp related has come out since, making 'trie'/INT the field of choice for timestamps, right? Seems like the fastest choice. I will have to read up on it. Seems like my original choice to use unix timestamp as storage in my SQL database, vs native Postgres timestamp, will make everything easier between: PHP Symfony Postgres Solr It's probably going to be a good idea to store two other columns in the search index for display, 'date', 'time'. That is, unless I force the user's javascript to generate the time and date from the unix timestamp. hmm. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 11:35 AM So the standard 'int' field in Solr 1.4 is a trie based field, although the example int type in the default solrconfig.xml has a precision set to 0, which means it's not really doing trie things. If you set the precision to something greater than 0, as in the default example tint type, then it's really using 'trie' functionality. 'trie' functionality speeds up range queries by putting each value into 'buckets' (my own term), per the precision specified, so solr has to do less to grab all values within a certain range. That's all tint/non-zero-precision-trie does, speed up range queries. Your use case involves range queries though, so it's worth investigating. If you use a string or other textual type for sorting or range queries, you need to make sure your values sort the way you want them to as strings. But -mm-dd will. More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ I think there probably won't be much of a difference at query time between non-trie int and string, although I'm not sure, and it may depend on the nature of your data and queries. Using a trie int will be faster for (and only for) range queries, if you have a lot of data. (There are some cases, depending on the data and the nature of your queries, where the overhead of a non-zero-precision trie may outweigh the hypothetical gain, but generally it's faster). I don't think there should be any appreciable difference between how long a non-trie int or a string will take to index -- at least as far as solr is concerned, if your app preparing the documents for solr takes longer to prepare one than another, that's another story. An actual trie (non-zero-precision) theoretically has indexing-time overhead, but I doubt it would be noticeable, unless you have a really really lean mean indexing setup where ever microsecond counts. Jonathan Dennis Gearon wrote: I'm doing something similar for dates/times/timestamps. I'm actually trying to do, 'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). Fairly simple search of: What items have a start time BEFORE now, and an end time AFTER now? My thoughts were to store: unix time stamp BIGINTS (64 bit) ISO_DATE ISO_TIME strings Which is going to be faster: 1/ Indexing? 2/ Searching? How does the 'tint' field mentioned below apply? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 10:27 AM Just throwing it out there, I'd consider a different approach for an actual real app, although it might not be easier to get up quickly. (For quickly, yeah,
Delta Import with something other than Date
Hi, I have a table that I want to index, and the table has no datetime stamp. However, the table is append only so the primary key can only go up. Is it possible to store the last primary key, and use some delta query=select id where id${last_id_value} Cheers, David
Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.
Yonik Seeley wrote: On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer r...@0ape.com wrote: Short summary: * Mixing Facets and Shards give me a NullPointerException when not all docs have all facets. https://issues.apache.org/jira/browse/SOLR-2110 I believe the underlying real issue stemmed from your use of a complex key involvement/race_facet. Thanks!Yes - that looks like the actual reason, rather than what I was guessing. I spent a while this morning trying to reproduce the problem with a simpler example, and wasn't able to - probably because I overlooked that part. I see changes have been made (based on comments in) SOLR-2110 and SOLR-2111, so I'll try with the current trunk.. [trying now with trunk as of a few minutes ago] Looking much better. I'm seeing this in the log files: SEVERE: Exception during facet.field of {!terms=$involvement/gender_facet__terms}involvement/gender_facet:org.a pache.lucene.queryParser.ParseException: Expected identifier at pos 20 str='{!terms=$involvement/gender_facet__ terms}involvement/gender_facet' at org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:718) at org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:165) ... but at least I'm getting results, and results that look right for both the body of the document and for most of the facets. Perhaps next thing I try will be simplifying my keys for my own sanity as much as for solr's.
Re: Delta Import with something other than Date
Of course you can store whatever you want in a solr index. And if you store an integer as a Solr 1.4 int type, you can certainly query for all documents that have greater than some specified integer in a field. You can't use SQL to query Solr though. I'm not sure what you're really asking? Jonathan David Yang wrote: Hi, I have a table that I want to index, and the table has no datetime stamp. However, the table is append only so the primary key can only go up. Is it possible to store the last primary key, and use some delta query=select id where id${last_id_value} Cheers, David
RE: Delta Import with something other than Date
Currently DIH delta import uses the SQL query of type select id from item where last_modified ${dataimporter.last_index_time} What I need is some field like ${dataimporter.last_primary_key} wiki.apache.org/solr/DataImportHandler I am thinking of storing the last primary key externally and calling the delta-import with a parameter and using ${dataimporter.request.last_primary_key} but that seems like a very brittle approach Cheers, David -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, September 08, 2010 6:38 PM To: solr-user@lucene.apache.org Subject: Re: Delta Import with something other than Date Of course you can store whatever you want in a solr index. And if you store an integer as a Solr 1.4 int type, you can certainly query for all documents that have greater than some specified integer in a field. You can't use SQL to query Solr though. I'm not sure what you're really asking? Jonathan David Yang wrote: Hi, I have a table that I want to index, and the table has no datetime stamp. However, the table is append only so the primary key can only go up. Is it possible to store the last primary key, and use some delta query=select id where id${last_id_value} Cheers, David
Need Advice for Finding Freelance Solr Expert
Hi, We need someone who knows Solr to help us prepare and index some data. Any advice on where to find people who know Solr? Thanks, John
Re: Randomly slow response times for range queries
Well I am only sending about 50 QPS at it at the time that it temporarily slows down, and then it's able to get all the way up to 100 QPS+ with no problems (until the next random queries). I suppose it could be the garbage collection. Is there a good way to limit this? -- View this message in context: http://lucene.472066.n3.nabble.com/Randomly-slow-response-times-for-range-queries-tp1441724p1443086.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Randomly slow response times for range queries
Also, does anyone know the best precisionStep to use on a trie field (float) definition to achieve optimal performance? -- View this message in context: http://lucene.472066.n3.nabble.com/Randomly-slow-response-times-for-range-queries-tp1441724p1443096.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need Advice for Finding Freelance Solr Expert
There's a page on the Solr/Lucene site for this. I myself will be in the market for one late this year. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, John Roberts jsro...@hotmail.com wrote: From: John Roberts jsro...@hotmail.com Subject: Need Advice for Finding Freelance Solr Expert To: solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 3:50 PM Hi, We need someone who knows Solr to help us prepare and index some data. Any advice on where to find people who know Solr? Thanks, John
Re: How to import data with a different date format
I already have the issue of how to store between different databases, languages, platforms, and frameworks. Settling on LONGINT/unix timestamp solves the problem on all fronts. I may even send them to the browser and have the JScript convert them to date/times (maybe ;-) So, it's *nix timestamp or bust! Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 3:07 PM Solr 1.4 was the first tagged release with trie fields. And Solr 1.4+ also includes a 'date' field based on 'trie' just for dates. If your dates are actually going to include hour/minute/second, not just calendar day-of-month, then I'd definitely use the built in solr trie date field, that's what it's for, will do the translation from calendar date-time to integer for you (in both directions), and add trie buckets for fast range querying too. I was suggesting that just using 'int' might be simpler if you don't need hour/minute/second precision, but are just storing year-month-day. If you've got hour-minute-second too, no reason not to use Solr's date type, and lots of reasons to do so. Jonathan Dennis Gearon wrote: So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right? And nothing date/timestamp related has come out since, making 'trie'/INT the field of choice for timestamps, right? Seems like the fastest choice. I will have to read up on it. Seems like my original choice to use unix timestamp as storage in my SQL database, vs native Postgres timestamp, will make everything easier between: PHP Symfony Postgres Solr It's probably going to be a good idea to store two other columns in the search index for display, 'date', 'time'. That is, unless I force the user's javascript to generate the time and date from the unix timestamp. hmm. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: How to import data with a different date format To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 11:35 AM So the standard 'int' field in Solr 1.4 is a trie based field, although the example int type in the default solrconfig.xml has a precision set to 0, which means it's not really doing trie things. If you set the precision to something greater than 0, as in the default example tint type, then it's really using 'trie' functionality. 'trie' functionality speeds up range queries by putting each value into 'buckets' (my own term), per the precision specified, so solr has to do less to grab all values within a certain range. That's all tint/non-zero-precision-trie does, speed up range queries. Your use case involves range queries though, so it's worth investigating. If you use a string or other textual type for sorting or range queries, you need to make sure your values sort the way you want them to as strings. But -mm-dd will. More on trie: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ I think there probably won't be much of a difference at query time between non-trie int and string, although I'm not sure, and it may depend on the nature of your data and queries. Using a trie int will be faster for (and only for) range queries, if you have a lot of data. (There are some cases, depending on the data and the nature of your queries, where the overhead of a non-zero-precision trie may outweigh the hypothetical gain, but generally it's faster). I don't think there should be any appreciable difference between how long a non-trie int or a string will take to index -- at least as far as solr is concerned, if your app preparing the documents for solr takes longer to prepare one than another, that's another story. An actual trie (non-zero-precision) theoretically has indexing-time overhead, but I doubt it would be noticeable, unless you have a really really lean mean indexing setup where ever microsecond counts. Jonathan Dennis Gearon wrote: I'm doing something similar for dates/times/timestamps. I'm actually trying to do, 'now' is within the range of what appointments(date/time from and to combos, i.e. timestamps). Fairly simple search of:
Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.
I just checked in the last part of those changes that should eliminate any restriction on key. But, that last part dealt with escaping keys that contained whitespace or } Your example really should have worked after my previous 2 commits. Perhaps not all of the servers got successfully upgraded? Can you try trunk again now? -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 On Wed, Sep 8, 2010 at 6:28 PM, Ron Mayer r...@0ape.com wrote: Yonik Seeley wrote: On Tue, Sep 7, 2010 at 8:31 PM, Ron Mayer r...@0ape.com wrote: Short summary: * Mixing Facets and Shards give me a NullPointerException when not all docs have all facets. https://issues.apache.org/jira/browse/SOLR-2110 I believe the underlying real issue stemmed from your use of a complex key involvement/race_facet. Thanks! Yes - that looks like the actual reason, rather than what I was guessing. I spent a while this morning trying to reproduce the problem with a simpler example, and wasn't able to - probably because I overlooked that part. I see changes have been made (based on comments in) SOLR-2110 and SOLR-2111, so I'll try with the current trunk.. [trying now with trunk as of a few minutes ago] Looking much better. I'm seeing this in the log files: SEVERE: Exception during facet.field of {!terms=$involvement/gender_facet__terms}involvement/gender_facet:org.a pache.lucene.queryParser.ParseException: Expected identifier at pos 20 str='{!terms=$involvement/gender_facet__ terms}involvement/gender_facet' at org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:718) at org.apache.solr.search.QueryParsing.parseLocalParams(QueryParsing.java:165) ... but at least I'm getting results, and results that look right for both the body of the document and for most of the facets. Perhaps next thing I try will be simplifying my keys for my own sanity as much as for solr's.
Re: Null Pointer Exception with shardsfacets where some shards have no values for some facets.
Yonik Seeley wrote: I just checked in the last part of those changes that should eliminate any restriction on key. But, that last part dealt with escaping keys that contained whitespace or } Your example really should have worked after my previous 2 commits. Perhaps not all of the servers got successfully upgraded? Yes, quite possible. Can you try trunk again now? Will check sometime tomorrow.
Re: Creating a sub-index from another
: I have a Solr Index with several million documents. I need to implement some : text mining processes and I would like to create a million documents index : from the original for some tests. Which million documents do you want? If you're just looking for a one time kind of experimental test index... 1) take a snapshot of your live index 2) copy it onto your dev machine 3) load it into Solr 4) execute delete commands (acording to some criteria you choose) until you only have 1 million documents left. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Solr Highlighting Question
Anybody? On 09/08/2010 11:26 AM, Jed Glazner wrote: Thanks for taking time to read through this. I'm using a checkout from the solr 3.x branch My problem is with the highlighter and wildcards I can get the highlighter to work with wild cards just fine, the problem is that solr is returning the term matched, when what I want it to do is highlight the chars in the term that were matched. Example: http://192.168.1.75:8983/solr/music/select?indent=onq=name_title:wel*qt=beyondhl=truehl.fl=name_titlef.name_title.hl.usePhraseHighlighter=truef.name_title.hl.highlightMultiTerm=true The results that come back look like this: emWelcome/em to the Jungle What I want them to look like is this: emWel/emcome to the Jungle From what I gathered by searching the archives is that solr 1.1 used to do this... Is there a way to get that functionality? Thanks! -- This email and its attachments (if any) are for the sole use of the intended recipient, and may contain private, confidential, and privileged material. Any review, copying, or distribution of this email, its attachments or the information contained herein is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments.
[ANN] Webinar, Sep 15: Mastering the Power of Faceted Search
Folks, here's an upcoming Solr webinar sponsored by my employer. It's Hoss on faceting, so it should be good! -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 --- Webinar Details Join us for a free webcast Mastering the Power of Faceted Search with Chris Hostetter Wednesday, September 15, 2010 9:00 AM PST / 12:00 PM EST / 17:00 GMT Click here to sign up http://www.eventsvc.com/lucidimagination/event/f5d87726f8ab4ed4911aad605f94f455?trk=AP Few search features have contributed as much to findability and user search experience as Facets. By organizing and classifying underlying information into an intuitive method for filtering information, faceted searching gives users a powerful tool for navigation and discovery. Once the province of costly proprietary commercial systems, the Faceted Searching capabilities of the Apache Solr Open Source Search have led developers around the world to build this popular feature into their search apps. Yet many of its more powerful features are not as widely known and used, and offer yet more powerful improvements to the search experience. Join Apache Lucene/Solr committer Chris Hostetter of Lucid Imagination for an in depth technical workshop on the what, why and how of faceting with Solr, the Lucene Search Server. This presentation will cover: * the different types of facets that Solr supports * techniques for dealing with complex faceting use cases * performance factors to be aware of * new faceting features on the horizon About the presenter: Chris Hossman Hostetter is a Member of the Apache Software Foundation, and serves on the Lucene Project Management Committee. Prior to joining Lucid Imagination in 2010 to work full time on Solr development, he spent 11 years as Principal Software Engineer for CNET Networks, thinking about searching structured data that was never as structured as it should have been.
Re: Multi core schema file
A demonstration of this feature would be a good addition to the example/multicore directory. On Wed, Sep 8, 2010 at 3:45 AM, Grijesh.singh pintu.grij...@gmail.com wrote: solr.xml allows you to mention the other properties as well like instanceDir, config,schema in the cores/core tag So , sharing the entire conf dir may not be possible , but it is possible to share solrconfig.xml and schema.xml U can see the detail parameters at wiki page http://wiki.apache.org/solr/CoreAdmin - Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-core-schema-file-tp1438460p1438720.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Query result ranking - Score independent
Generally speaking it is a bad idea to change the schema without reindexing. I found several little things that could go wrong back when I had a huge index and could not reindex. On Wed, Sep 8, 2010 at 4:58 AM, Erick Erickson erickerick...@gmail.com wrote: Ooops, hit send too quickly. Could you show us the entire URL you send that produces the error? Erick On Wed, Sep 8, 2010 at 7:58 AM, Erick Erickson erickerick...@gmail.comwrote: The change in the schema shouldn't matter (emphasis on the should). What version of SOLR are you using? I tried this query and it works just fine for me, I'm using 1.4.1 Best Erick On Wed, Sep 8, 2010 at 4:38 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: My request was very simple: q= astronomy^0 And Solr returned the exception. Maybe the zero boost factor is not causing the exception? 1) We indexed n documents with a Schema.xml. 2)Then we changed some field type in the Schema.xml 3)Then we indexed other m documents Maybe this could cause the exception? 2010/9/7 Grant Ingersoll gsing...@apache.org On Sep 7, 2010, at 7:08 AM, Alessandro Benedetti wrote: Hi all, I need to retrieve query-results with a ranking independent from each query-result's default lucene score, which means assigning the same score to each query result. I tried to use a zero boost factor ( ^0 ) to reset to zero each query-result's score. This strategy seems to work within the example solr instance, but in my Solr instance, using a zero boost factor causes a Buffer Exception ( HTTP Status 500 - null java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:249) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:123) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:70) at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:93) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:210) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:948) at org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:506) at org.apache.lucene.index.IndexReader.document(IndexReader.java:947) ) Hmm, that stack trace doesn't align w/ the boost factor. What was your request? I think there might be something else wrong here. Do you know any other technique to reset to some fixed constant value, all the query-result's scores? Each query result should obtain the same score. Any suggestion? The ConstantScoreQuery or a Filter should do this. You could do something like: q=*:*fq=the real query, as in q=*:*fq=field:foo -Grant -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8 -- -- Benedetti Alessandro Personal Page: http://tigerbolt.altervista.org Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- Lance Norskog goks...@gmail.com
Re: Solr Highlighting Question
(10/09/09 2:26), Jed Glazner wrote: Thanks for taking time to read through this. I'm using a checkout from the solr 3.x branch My problem is with the highlighter and wildcards I can get the highlighter to work with wild cards just fine, the problem is that solr is returning the term matched, when what I want it to do is highlight the chars in the term that were matched. Example: http://192.168.1.75:8983/solr/music/select?indent=onq=name_title:wel*qt=beyondhl=truehl.fl=name_titlef.name_title.hl.usePhraseHighlighter=truef.name_title.hl.highlightMultiTerm=true The results that come back look like this: emWelcome/em to the Jungle What I want them to look like is this: emWel/emcome to the Jungle From what I gathered by searching the archives is that solr 1.1 used to do this... Is there a way to get that functionality? Thanks! Try to use FastVectorHighlighter on n-gram field for highlighting problem... But FVH cannot process wildcard query. So you should query wel instead of wel*. Then this makes you got unwanted hit like voemwel/em. I don't think there is a solution for both of them with OOTB today. There is a JIRA issue, but no patches there: https://issues.apache.org/jira/browse/SOLR-1926 Koji -- http://www.rondhuit.com/en/
Re: Solr, c/s type ?
I'd just like to use solr for in-house which is not web application. But I don't know how should i do? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-c-s-type-tp1392952p1444175.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delta Import with something other than Date
https://issues.apache.org/jira/browse/SOLR-1499 This is a patch (not committed) that queries a Solr instance and returns the values as a DIH document. This allows you to do a sort query to Solr, ask for the first result, and continue indexing after that. Scary, but it works. Lance David Yang wrote: Currently DIH delta import uses the SQL query of type select id from item where last_modified ${dataimporter.last_index_time} What I need is some field like ${dataimporter.last_primary_key} wiki.apache.org/solr/DataImportHandler I am thinking of storing the last primary key externally and calling the delta-import with a parameter and using ${dataimporter.request.last_primary_key} but that seems like a very brittle approach Cheers, David -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, September 08, 2010 6:38 PM To: solr-user@lucene.apache.org Subject: Re: Delta Import with something other than Date Of course you can store whatever you want in a solr index. And if you store an integer as a Solr 1.4 int type, you can certainly query for all documents that have greater than some specified integer in a field. You can't use SQL to query Solr though. I'm not sure what you're really asking? Jonathan David Yang wrote: Hi, I have a table that I want to index, and the table has no datetime stamp. However, the table is append only so the primary key can only go up. Is it possible to store the last primary key, and use some delta query=select id where id${last_id_value} Cheers, David
Re: Solr, c/s type ?
You would set up a Java server (container) and run Solr/Lucene. Not sure how to do the following, but then you block the standard port for Solr/Lucene on that machine from being accessible except locally. In whatever code/applicaiton that you are working with, on that machine, you then use it's libraries to access 'the web', but only actually the 'localhost' 127.0.0.1, usually, @ the port for Solr/Lucene. Learn, learn, learn, and study some more about using/modifiying data importers, indexes, putting in filters, stemmmers,shinglers, carpenters (joke), blah, blah, blah, and last but not least, the almight QUERY to access the index, filters, etc. Then you will have a local search engine on whatever data you had put into it. There is also the 'embedded server' which I have only heard about. Anybody else on this list other than me is much more experienced in general, and can advise you better on those. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/8/10, Jason, Kim hialo...@gmail.com wrote: From: Jason, Kim hialo...@gmail.com Subject: Re: Solr, c/s type ? To: solr-user@lucene.apache.org Date: Wednesday, September 8, 2010, 9:32 PM I'd just like to use solr for in-house which is not web application. But I don't know how should i do? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-c-s-type-tp1392952p1444175.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr, c/s type ?
You _could_ use SolrJ with EmbeddedSolrServer. But personally I wouldn't unless there's a reason to. There's no automatic reason not to use the ordinary Solr HTTP api, even for an in-house application which is not a web application. Unless you have a real reason to use embedded solr, I'd use the HTTP api, possibly via SolrJ if your local application is Java. http://wiki.apache.org/solr/Solrj In my (very limited, so if someone else knows better and has something to say, listen to them) experience, using EmbeddedSolrServer ends up biting you down the line, it doesn't work _quite_ like ordinary/typical Solr, and some things end up not working. And you're going to be mostly on your own for scaling/concurrency issues. Why re-invent the wheel when ordinary HTTP solr already works so well? But EmbeddedSolrServer is there, if you actually have a need for it. But there's no reason you can't use Solr's HTTP api for a non-web application, the fact that your application talks to Solr over HTTP does not mean your application has to talk to it's users over HTTP, two different things. Incidentally, using EmbeddedSolrServer would in fact _not_ be a client/server setup between your app and solr, per your original question. HTTP is a client/server protocol, using the ordinary Solr HTTP api is the way to set up a client/server relationship between your app and Solr. Jonathan From: Jason, Kim [hialo...@gmail.com] Sent: Thursday, September 09, 2010 12:32 AM To: solr-user@lucene.apache.org Subject: Re: Solr, c/s type ? I'd just like to use solr for in-house which is not web application. But I don't know how should i do? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-c-s-type-tp1392952p1444175.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Distance sorting with spatial filtering
It says that the field sum(1) is not indexed. You don't have a field called 'sum(1)'. I know there has been a lot of changes in query parsing, and sorting by functions may be on the list. But the _val_ trick is the older one and, and you noted, still works. The _val_ trick sets the ranking value to the output of the function, thus indirectly doing what sort= does. Lance Scott K wrote: I get the error on all functions. GET 'http://localhost:8983/solr/select?q=*:*sort=sum(1)+asc' Error 400 can not sort on unindexed field: sum(1) I tried another nightly build from today, Sep 7th, with the same results. I attached the schema.xml Thanks for the help! Scott On Wed, Sep 1, 2010 at 18:43, Lance Norskoggoks...@gmail.com wrote: Post your schema. On Mon, Aug 30, 2010 at 2:04 PM, Scott Ks...@skister.com wrote: The new spatial filtering (SOLR-1586) works great and is much faster than fq={!frange. However, I am having problems sorting by distance. If I try GET 'http://localhost:8983/solr/select/?q=*:*sort=dist(2,latitude,longitude,0,0)+asc' I get an error: Error 400 can not sort on unindexed field: dist(2,latitude,longitude,0,0) I was able to work around this with GET 'http://localhost:8983/solr/select/?q=*:* AND _val_:recip(dist(2, latitude, longitude, 0,0),1,1,1)fl=*,score' But why isn't sorting by functions working? I get this error with any function I try to sort on.This is a nightly trunk build from Aug 25th. I see SOLR-1297 was reopened, but that seems to be for edge cases. Second question: I am using the LatLonType from the Spatial Filtering wiki, http://wiki.apache.org/solr/SpatialSearch Are there any distance sorting functions that use this field, or do I need to have three indexed fields, store_lat_lon, latitude, and longitude, if I want both filtering and sorting by distance. Thanks, Scott -- Lance Norskog goks...@gmail.com
Re: Delta Import with something other than Date
On 09.09.2010, at 00:44, David Yang wrote: Currently DIH delta import uses the SQL query of type select id from item where last_modified ${dataimporter.last_index_time} What I need is some field like ${dataimporter.last_primary_key} wiki.apache.org/solr/DataImportHandler I am thinking of storing the last primary key externally and calling the delta-import with a parameter and using ${dataimporter.request.last_primary_key} but that seems like a very brittle approach i am also using request parameters in my DIH import. we are not yet in production but in our tests it worked fine. regards, Lukas Kahwe Smith m...@pooteeweet.org