Re: catchall field minus one field
thanks a lot for your advice, I'll try that. Best regards, Elisabeth 2012/1/11 Erick Erickson erickerick...@gmail.com Hmmm, Once the data is included in the catch-all, it's indistinguishable from all the rest of the data, so I don't see how you could do this. A clause like: -excludeField:[* TO *] would exclude all documents that had any data in the field so that's probably not what you want. Could you approach it the other way? Do NOT put the special field in the catch-all field in the first place, but massage the input to add a clause there? I.e. your usual case would have catchall:all your terms exclude_field:all your terms, but your special one would just be catchall:all your terms. You could set up request handlers to do this under the covers, so your queries would really be ...solr/usual?q=all your terms ...solr/special?q=all your terms and two different request handlers (edismax-style I'm thinking) would differ only by the qf field containing or not containing your special field. the other way, of course, would be to have a second catch-all field that didn't have your special field, then use one or the other depending, but as you say that would increase the size of your index... Best Erick On Wed, Jan 11, 2012 at 9:47 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I have a catchall field, and I need to do some request in all fields of that catchall field, minus one. To avoid duplicating my index, I'd like to know if there is a way to use my catch field while excluding that one field. Thanks, Elisabeth
Restricting access to shards / collections with SolrCloud
Hi. We're currently looking at SolrCloud to improve management of our Solr cluster. There is one use case which I am wondering if SolrCloud provide any support for out of the box, or if our best bet is to stick with our current solution. The use case is: We have a large number of shards, using the same schema - so, perfect for SolrCloud. Some of these shards should have restricted access, meaning only customers with certain privileges will be able to query them. The way we solve this today is to maintain a database listing those users who have access to these restricted shards. When building the shards-parameter for querying Solr, we then use this database to append the URLs of the restricted shards ONLY if the user has access to them. With SolrCloud it would be great to be able to use the distrib=true parameter, but that would override the approach we're currently using. My questions are: 1. would it be an idea to create a separate collection for the shards that are restricted? If so, is there currently any support for specifying which collections to search so that we could implement the solution outlined above, but for collections rather than shards? 2. If no-go on #1, are we better off sticking with our current approach and skip using distrib=true which would query all shards? Any input appreciated! Best, Jaran -- Jaran Nilsen Skype: jaran.nilsen jarannilsen.com || codemunchies.com || notpod.com twitter.com/jarannilsen // www.linkedin.com/in/jarannilsen // facebook.com/jaran.nilsen
Re: Large data set or data corpus
http://www.data.gov/ has lots of datasets available for free -- View this message in context: http://lucene.472066.n3.nabble.com/Large-data-set-or-data-corpus-tp3650316p3653154.html Sent from the Solr - User mailing list archive at Nabble.com.
Not able to see output in XML output
Hi, In my SOLR, I have a query based data-config written and was able to manage below steps but i was not able to see the output 1) Register Data Import Request handler in Solr-config.xml 2) Modify Data-Config.xml for the appropriate query to get data imported from which includes making use of Jtds Driver for Sql server 3) Modify SolrConfig.xml file for registering db-data-config.xml in Request Handler item 4) Modify schema.xml for the output result. Right now we are facing issues here.please let me attach 2 files 1) schema.xml 2) db-data-config.xml. Schema.xml ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.2 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true / fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0 / fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0 / fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer /fieldType /types fields field name=FileId type=string indexed=true stored=true required=true / field name=Title type=string indexed=true stored=true required=true / /fields uniqueKeyFileId/uniqueKey defaultSearchFieldFileId/defaultSearchField solrQueryParser defaultOperator=AND / /schema db-data-config.xml dataConfig dataSource type=JdbcDataSource driver=net.sourceforge.jtds.jdbc.Driver url=jdbc:jtds:sqlserver://17.30.199.667:1433;databaseName= user= password=XXX / document entity name=Files query=Select FileID,Title from files field column=FileID name=FileID / field column=Title name=Title / /entity /document /dataConfig 5) Make full import http request for data to get indexed into solr server. Even though i see that all the rows are indexed but not able to find results when search is clicked on the admin page 6) Am i missing any step to configure the output,I have changed db-data-config,schema.xml and solrconfig.xml files ,Do i need to change any other files for the output Thanks Raj Deep -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-see-output-in-XML-output-tp3653445p3653445.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Relevancy and random sorting
Erick, This document already has a field that indicates the source (site). The issue we are trying to solve is when we list all documents without any specific criteria. Since we bring the most recent ones and the ones that contains images, we end up having a lot of listings from a single site, since the documents are indexed in batches from the same site. At some point we have several documents from the same site in the same date/time and having images. I'm trying to give some random aspect to this search so other documents can also appear in between that big dataset from the same source. Does the grouping help to achieve this? Alexandre On Thu, Jan 12, 2012 at 12:31 AM, Erick Erickson erickerick...@gmail.comwrote: Alexandre: Have you thought about grouping? If you can analyze the incoming documents and include a field such that similar documents map to the same value, than group on that value you'll get output that isn't dominated by repeated copies of the similar documents. It depends, though, on being able to do a suitable mapping. In your case, could the mapping just be the site from which you got the data? Best Erick On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco alel...@gmail.com wrote: Erick, Probably I really written something silly. You are right on either sorting by field or ranking. I just need to change the ranking to shift things around as you said. To clarify the use case: We have a listing aggregator that gets product listings from a lot of different sites and since they are added in batches, sometimes you see a lot of pages from the same source (site). We are working on some changes to shift things around and reduce this blocking effect, so we can present mixed sources on the result pages. I guess I will start with the document random field and later try to develop a custom plugin to make things better. Thanks for the pointers. Regards, Alexandre On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson erickerick...@gmail.com wrote: I really don't understand what this means: random sorting for the records but also preserving the ranking Either you're sorting on rank or you're not. If you mean you're trying to shift things around just a little bit, *mostly* respecting relevance then I guess you can do what you're thinking. You could create your own function query to do the boosting, see: http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser which would keep you from having to re-index your data to get a different randomness. You could also consider external file fields, but I think your own function query would be cleaner. I don't think math.random is a supported function OOB Best Erick On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco alel...@gmail.com wrote: Hello all, Recently i've been trying to tweak some aspects of relevancy in one listing project. I need to give a higher score to newer documents and also boost the document based on a boolean field that indicates the listing has pictures. On top of that, in some situations we need a random sorting for the records but also preserving the ranking. I tried to combine some techniques described in the Solr Relevancy FAQ wiki, but when I add the random sorting, the ranking gets messy (as expected). This works well: http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,score This does not work, gives a random order on what is already ranked http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,scoresort=random_1+desc The only way I see is to create another field on the schema containing a random value and use it to boost the document the same way that was tone on the boolean field. Anyone tried something like this before and knows some way to get it working? Thanks, Alexandre
Re: Relevancy and random sorting
Does the random sort function help you here? http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html However, you will get some very old listings then, if it's okay for you. -Kuli Am 12.01.2012 14:38, schrieb Alexandre Rocco: Erick, This document already has a field that indicates the source (site). The issue we are trying to solve is when we list all documents without any specific criteria. Since we bring the most recent ones and the ones that contains images, we end up having a lot of listings from a single site, since the documents are indexed in batches from the same site. At some point we have several documents from the same site in the same date/time and having images. I'm trying to give some random aspect to this search so other documents can also appear in between that big dataset from the same source. Does the grouping help to achieve this? Alexandre On Thu, Jan 12, 2012 at 12:31 AM, Erick Ericksonerickerick...@gmail.comwrote: Alexandre: Have you thought about grouping? If you can analyze the incoming documents and include a field such that similar documents map to the same value, than group on that value you'll get output that isn't dominated by repeated copies of the similar documents. It depends, though, on being able to do a suitable mapping. In your case, could the mapping just be the site from which you got the data? Best Erick On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Roccoalel...@gmail.com wrote: Erick, Probably I really written something silly. You are right on either sorting by field or ranking. I just need to change the ranking to shift things around as you said. To clarify the use case: We have a listing aggregator that gets product listings from a lot of different sites and since they are added in batches, sometimes you see a lot of pages from the same source (site). We are working on some changes to shift things around and reduce this blocking effect, so we can present mixed sources on the result pages. I guess I will start with the document random field and later try to develop a custom plugin to make things better. Thanks for the pointers. Regards, Alexandre On Wed, Jan 11, 2012 at 1:58 PM, Erick Ericksonerickerick...@gmail.com wrote: I really don't understand what this means: random sorting for the records but also preserving the ranking Either you're sorting on rank or you're not. If you mean you're trying to shift things around just a little bit, *mostly* respecting relevance then I guess you can do what you're thinking. You could create your own function query to do the boosting, see: http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser which would keep you from having to re-index your data to get a different randomness. You could also consider external file fields, but I think your own function query would be cleaner. I don't think math.random is a supported function OOB Best Erick On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Roccoalel...@gmail.com wrote: Hello all, Recently i've been trying to tweak some aspects of relevancy in one listing project. I need to give a higher score to newer documents and also boost the document based on a boolean field that indicates the listing has pictures. On top of that, in some situations we need a random sorting for the records but also preserving the ranking. I tried to combine some techniques described in the Solr Relevancy FAQ wiki, but when I add the random sorting, the ranking gets messy (as expected). This works well: http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,score This does not work, gives a random order on what is already ranked http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,scoresort=random_1+desc The only way I see is to create another field on the schema containing a random value and use it to boost the document the same way that was tone on the boolean field. Anyone tried something like this before and knows some way to get it working? Thanks, Alexandre
Re: Highlighting issue with PlainTextEntityProcessor.
Hi Erik.. Thanks for your reply. And yes data was on index. but i found the problem , the problem was not of PlainTextEntityProcessor. highlighting was returning in multivalued field and in non-multivalued field there was less highlight. so i thought problem may be in PlainTextEntityProcessor. But the actual problem was my Search field is very big... i increased hl.MaxAnalyzedChar length... and it get working. -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-issue-with-PlainTextEntityProcessor-tp3650004p3653708.html Sent from the Solr - User mailing list archive at Nabble.com.
FacetComponent: suppress original query
Hello list, I need to split the incoming original facet query into a list of sub-queries. The logic is done and each sub-query gets added into outgoing queue with rb.addRequest(), where rb is instance of ResponseBuilder. In the logs I see that along with the sub-queries the original query gets submitted too. Is there a way of suppressing the original query? -- Regards, Dmitry Kan
Re: Relevancy and random sorting
Michael, We are using the random sorting in combination with date and other fields but I am trying to change this to affect the ranking instead of sorting directly. That way we can also use other useful tweaks on the rank itself. Alexandre On Thu, Jan 12, 2012 at 11:46 AM, Michael Kuhlmann k...@solarier.de wrote: Does the random sort function help you here? http://lucene.apache.org/solr/**api/org/apache/solr/schema/** RandomSortField.htmlhttp://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html However, you will get some very old listings then, if it's okay for you. -Kuli Am 12.01.2012 14:38, schrieb Alexandre Rocco: Erick, This document already has a field that indicates the source (site). The issue we are trying to solve is when we list all documents without any specific criteria. Since we bring the most recent ones and the ones that contains images, we end up having a lot of listings from a single site, since the documents are indexed in batches from the same site. At some point we have several documents from the same site in the same date/time and having images. I'm trying to give some random aspect to this search so other documents can also appear in between that big dataset from the same source. Does the grouping help to achieve this? Alexandre On Thu, Jan 12, 2012 at 12:31 AM, Erick Ericksonerickerickson@gmail.** com erickerick...@gmail.comwrote: Alexandre: Have you thought about grouping? If you can analyze the incoming documents and include a field such that similar documents map to the same value, than group on that value you'll get output that isn't dominated by repeated copies of the similar documents. It depends, though, on being able to do a suitable mapping. In your case, could the mapping just be the site from which you got the data? Best Erick On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Roccoalel...@gmail.com wrote: Erick, Probably I really written something silly. You are right on either sorting by field or ranking. I just need to change the ranking to shift things around as you said. To clarify the use case: We have a listing aggregator that gets product listings from a lot of different sites and since they are added in batches, sometimes you see a lot of pages from the same source (site). We are working on some changes to shift things around and reduce this blocking effect, so we can present mixed sources on the result pages. I guess I will start with the document random field and later try to develop a custom plugin to make things better. Thanks for the pointers. Regards, Alexandre On Wed, Jan 11, 2012 at 1:58 PM, Erick Ericksonerickerickson@gmail.** com erickerick...@gmail.com wrote: I really don't understand what this means: random sorting for the records but also preserving the ranking Either you're sorting on rank or you're not. If you mean you're trying to shift things around just a little bit, *mostly* respecting relevance then I guess you can do what you're thinking. You could create your own function query to do the boosting, see: http://wiki.apache.org/solr/**SolrPlugins#ValueSourceParserhttp://wiki.apache.org/solr/SolrPlugins#ValueSourceParser which would keep you from having to re-index your data to get a different randomness. You could also consider external file fields, but I think your own function query would be cleaner. I don't think math.random is a supported function OOB Best Erick On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Roccoalel...@gmail.com wrote: Hello all, Recently i've been trying to tweak some aspects of relevancy in one listing project. I need to give a higher score to newer documents and also boost the document based on a boolean field that indicates the listing has pictures. On top of that, in some situations we need a random sorting for the records but also preserving the ranking. I tried to combine some techniques described in the Solr Relevancy FAQ wiki, but when I add the random sorting, the ranking gets messy (as expected). This works well: http://localhost:18979/solr/**select/?start=0rows=15q={!** boost%20b=recip(ms(NOW/HOUR,**date_updated),3.16e-11,1,1)}** active%3a%22true%22+AND+**featured%3a%22false%22+_val_:%** haspicture%22fl=*,score This does not work, gives a random order on what is already ranked http://localhost:18979/solr/**select/?start=0rows=15q={!** boost%20b=recip(ms(NOW/HOUR,**date_updated),3.16e-11,1,1)}** active%3a%22true%22+AND+**featured%3a%22false%22+_val_:%** haspicture%22fl=*,scoresort=**random_1+desc The only way I see is to create another field on the schema containing a random value and use it to boost the document the same way that was tone on the boolean field. Anyone tried something like this before and knows some way to get it working? Thanks, Alexandre
Re: Relevancy and random sorting
This document already has a field that indicates the source (site). The issue we are trying to solve is when we list all documents without any specific criteria. Since we bring the most recent ones and the ones that contains images, we end up having a lot of listings from a single site, since the documents are indexed in batches from the same site. At some point we have several documents from the same site in the same date/time and having images. I'm trying to give some random aspect to this search so other documents can also appear in between that big dataset from the same source. Does the grouping help to achieve this? Yes, http://wiki.apache.org/solr/FieldCollapsing You will display only 3 documents at most from a single site. You will put a link saying that, there are xxx more documents from site yyy, click here to see all of them.
Re: Search Specific Boosting
Hi Erick, Yeah, I've reviewed the debug output and can't make sense of why they are scoring the same. I have double checked that they are being indexed with different boost values for the search field. I've also increased the factors trying to get them be more granular so instead of boosting 1,2,3,4,5 I did 100,200,300,400,500... Same result. Here's and example of the debug output with two documents having different field boost values but receiving the same score. Does anything stick out? Any other ideas on how to get the results I am looking for? 69.694855 = (MATCH) product of: 104.54228 = (MATCH) sum of: 0.08869071 = (MATCH) MatchAllDocsQuery, product of: 0.08869071 = queryNorm 104.45359 = (MATCH) weight(searchfe2684d248eab25404c3668711d4642e_boost:true in 4016) [DefaultSimilarity], result of: 104.45359 = score(doc=4016,freq=1.0 = termFreq=1 ), product of: 0.48125002 = queryWeight, product of: 5.4261603 = idf(docFreq=81, maxDocs=6856) 0.08869071 = queryNorm 217.04642 = fieldWeight in 4016, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1 5.4261603 = idf(docFreq=81, maxDocs=6856) 40.0 = fieldNorm(doc=4016) 0.667 = coord(2/3) 69.694855 = (MATCH) product of: 104.54228 = (MATCH) sum of: 0.08869071 = (MATCH) MatchAllDocsQuery, product of: 0.08869071 = queryNorm 104.45359 = (MATCH) weight(searchfe2684d248eab25404c3668711d4642e_boost:true in 4106) [DefaultSimilarity], result of: 104.45359 = score(doc=4106,freq=1.0 = termFreq=1 ), product of: 0.48125002 = queryWeight, product of: 5.4261603 = idf(docFreq=81, maxDocs=6856) 0.08869071 = queryNorm 217.04642 = fieldWeight in 4106, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1 5.4261603 = idf(docFreq=81, maxDocs=6856) 40.0 = fieldNorm(doc=4106) 0.667 = coord(2/3) On 1/11/2012 9:46 PM, Erick Erickson wrote: Boosts are fairly coarse-grained. I suspect your boost factors are just being rounded into the same buckets. AttachingdebugQuery=on and looking at how the scores were calculated should help you figure out if this is the case. Best Erick On Wed, Jan 11, 2012 at 7:57 PM, Brettbr...@chopshop.org wrote: I'm implementing a feature where admins have the ability to control the order of the results by adding a boost to any specific search. The search is a faceted interface (no text input) and which we take a hash of the search parameters (to form a unique search id) and then boost that field for the document. The field is a wild card field so that it might look like this: field name=search395eff966b26a91d82935c8e1197330c_boost boost=90true/field The problem is that in these search results I am seeing is that my results are being grouped and the individual boost values are not having the granular effect I am looking for. Say on a result set of 75 documents. I see results with search boosts of 60-70 receiving the same score even though they were indexed with different boost values. There are always more than one group. Does anyone know what might be causing this? Is there a better way to do what I am looking for? Thanks, Brett Field Definition: fieldType name=boost class=solr.TextField sortMissingLast=true omitNorms=false omitTermFreqAndPositions=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType
SolrException: Invalid Date String:'oracle.sql.TIMESTAMP
Hi, I am trying to use a dataImportHandler to import data from an oracle DB. It works for non-date fields but is throwing an exception once I included the MODIFIEDDATE field (oracle.timestamp field). Can anyone see what I'm doing wrong here? Thanks. schema.xml field name=catModifiedDate type=date indexed=true stored=true / db-data-config.xml entity name=category datasource=jdbc query=SELECT ID,PARENTID,ICONID,SORTORDER,MODIFIEDDATE FROM CATEGORY field column=ID name=masterId / field column=PARENTID name=catParentId / field column=ICONID name=catIconId / field column=SORTORDER name=catSortOrder / field column=MODIFIEDDATE name=catModifiedDate/ WARNING: Error creating document : SolrInputDocument[{catModifiedDate=catModifiedDate(1.0)={oracle.sql.TIMESTAMP@1e58565}, masterId=masterId(1.0)={124}, catParentId=catParentId(1.0)={118}, catIconId=catIconId(1.0)={304856}}] org.apache.solr.common.SolrException: ERROR: [doc=124] Error adding field 'catModifiedDate'='oracle.sql.TIMESTAMP@1e58565' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:324) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:636) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) Caused by: org.apache.solr.common.SolrException: Invalid Date String:'oracle.sql.TIMESTAMP@1e58565' at org.apache.solr.schema.DateField.parseMath(DateField.java:165) at org.apache.solr.schema.TrieField.createField(TrieField.java:421) at org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:281) -- View this message in context: http://lucene.472066.n3.nabble.com/SolrException-Invalid-Date-String-oracle-sql-TIMESTAMP-tp3654419p3654419.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrException: Invalid Date String:'oracle.sql.TIMESTAMP
Hi, It looks like a date formatting issue, the Solr date field expects something like 1995-12-31T23:59:59.999Z See http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html The data import handler does have a date transformer to convert dates http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer Colin. -Original Message- From: Joey Grimm [mailto:jgr...@rim.com] Sent: Thursday, January 12, 2012 1:05 PM To: solr-user@lucene.apache.org Subject: SolrException: Invalid Date String:'oracle.sql.TIMESTAMP Hi, I am trying to use a dataImportHandler to import data from an oracle DB. It works for non-date fields but is throwing an exception once I included the MODIFIEDDATE field (oracle.timestamp field). Can anyone see what I'm doing wrong here? Thanks. schema.xml field name=catModifiedDate type=date indexed=true stored=true / db-data-config.xml entity name=category datasource=jdbc query=SELECT ID,PARENTID,ICONID,SORTORDER,MODIFIEDDATE FROM CATEGORY field column=ID name=masterId / field column=PARENTID name=catParentId / field column=ICONID name=catIconId / field column=SORTORDER name=catSortOrder / field column=MODIFIEDDATE name=catModifiedDate/ WARNING: Error creating document : SolrInputDocument[{catModifiedDate=catModifiedDate(1.0)={oracle.sql.TIMESTAM P@1e58565}, masterId=masterId(1.0)={124}, catParentId=catParentId(1.0)={118}, catIconId=catIconId(1.0)={304856}}] org.apache.solr.common.SolrException: ERROR: [doc=124] Error adding field 'catModifiedDate'='oracle.sql.TIMESTAMP@1e58565' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:324) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProc essorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProc essorFactory.java:115) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHand ler.java:293) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 636) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268 ) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja va:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427 ) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) Caused by: org.apache.solr.common.SolrException: Invalid Date String:'oracle.sql.TIMESTAMP@1e58565' at org.apache.solr.schema.DateField.parseMath(DateField.java:165) at org.apache.solr.schema.TrieField.createField(TrieField.java:421) at org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:281) -- View this message in context: http://lucene.472066.n3.nabble.com/SolrException-Invalid-Date-String-oracle- sql-TIMESTAMP-tp3654419p3654419.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Determining which shard is failing using partialResults / some other technique?
Hi all, Is there at least a way to print out which shard is being called in the logging and maybe logging a failure? INFO: [master] webapp=/solr path=/select params={facet=truefacet.mincount=1facet.sort=countq=(content_1500_chars:( (allied+irish+banks+OR++aib+)+AND+NOT+(bluray+OR+RAR++OR+mega+pack)) +OR+title:((allied+irish+banks+OR++aib+)+AND+NOT+(bluray+OR+RAR++OR+m ega+pack)))+facet.limit=10facet.shard.limit=300distrib=truefacet.field= organisationwt=javabinfq=harvest_time_long:[131077440+TO+132641279 ]rows=0version=2} status=0 QTime=16192 Regards, Gilles From: Gilles Comeau [mailto:gilles.com...@polecat.co] Sent: 12 January 2012 07:02 To: 'solr-user@lucene.apache.org' Subject: Determining which shard is failing using partialResults / some other technique? Hi Solr Users, Does anyone happen to know if the keyword partialResults be used in a solr http request? (partialResults is turned off at the .xml level) Something like: http://server:8080/solr/master/select?distrib=true http://server:8080/solr/master/select?distrib=truerows=500fl=*,scorestar t=0partialResults=trueq=my+and+queryfq=harvest_time_long:%5b132537600 +TO+132537600%5d rows=500fl=*,scorestart=0partialResults=trueq=my+and+queryfq=harvest_t ime_long:[132537600+TO+132537600] We have a Solr instance that is periodically failing on distributed requests, and I am trying to narrow down which one of the shards is causing the failure. If the above doesn't work, can someone point me to a resource or give advice on how to find out which node might be causing the issue? Regards, Gilles
a way to marshall xml doc into a SolrInputDocument
If I have individual files in the expected Solr format (having just ONE doc per file): add doc field name=idGB18030TEST/field field name=nameTest with some GB18030 encoded characters/field field name=featuresNo accents here/field field name=featuresÕâÊÇÒ»¸ö¹¦ÄÜ/field field name=price0/field /doc /add Is not there a way to easily marshal that file into a SolrInputDocument? Do I have to do the parsing myself? I need them in java pojo cause I want to modify some fields before indexing. I would think that is possible with built in methods in Solr but cannot find a way. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/a-way-to-marshall-xml-doc-into-a-SolrInputDocument-tp3654777p3654777.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: a way to marshall xml doc into a SolrInputDocument
Can those modifications be made on the server side? If so, you could create an UpdateRequestProcessor. See http://wiki.apache.org/solr/UpdateRequestProcessor On Thu, Jan 12, 2012 at 5:19 PM, jmuguruza jmugur...@gmail.com wrote: If I have individual files in the expected Solr format (having just ONE doc per file): add doc field name=idGB18030TEST/field field name=nameTest with some GB18030 encoded characters/field field name=featuresNo accents here/field field name=featuresÕâÊÇÒ»¸ö¹¦ÄÜ/field field name=price0/field /doc /add Is not there a way to easily marshal that file into a SolrInputDocument? Do I have to do the parsing myself? I need them in java pojo cause I want to modify some fields before indexing. I would think that is possible with built in methods in Solr but cannot find a way. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/a-way-to-marshall-xml-doc-into-a-SolrInputDocument-tp3654777p3654777.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: a way to marshall xml doc into a SolrInputDocument
even if they could (not sure if they could be done there, as they involve properly formatting some fields so dates are in correct format etc, and maybe the format is checked first) I would prefer to do it in the solrj side as the code will be much simpler for me. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/a-way-to-marshall-xml-doc-into-a-SolrInputDocument-tp3654777p3655033.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SpatialSearch, geofilt and documents missing a value in sfield
Hi Tanguy, On Jan 11, 2012, at 6:14 AM, Tanguy Moal wrote: Dear ML, I'm performing some developments relying on spatial capabilities of solr. I'm using Solr 3.5, have been reading http://wiki.apache.org/solr/SpatialSearch#Spatial_Query_Parameters and have the basic behaviours I wanted working. I use geofilt on a latlong field, with geodist() in the bf parameter. When I doq=*:*fq={!geofilt pt=x,y d=r unit=km sfield=coordinates}defType=edismax everything works fine. But in some cases, documents don't have coordinates. For example, some of them refer to a city, so they have coordinates, while others are not so precisely geolocated and simply refer to a broader area, a region or a state, if you will. You've seen this; right? http://wiki.apache.org/solr/SpatialSearch#How_to_combine_with_a_sub-query_to_expand_results I tried with different queries : - Include results from a broader area : q=*:*fq=(state:FL OR _query_:{!geofilt ...}) . = That works fine (i.e. results showing up), but not as expected : this only returns documents having FL as value in the state field AND some value in the coordinates field *or* documents around my point but not documents without a value in the coordinates field… Your explanation of what happens is not consistent with with this query does. The filter query is OR, not AND. The xml example docs that come with Solr don't all include a value in the store LatLonType field, so if what you claim is true, you should be able to prove it with a query against that data set we all have. Please try to do so; I think you are mistaken. - Include results from a broader area, feeling lucky : q=*:*fq=((state:FL%20AND%20-coordinates:[*%20TO%20*])%20OR%20_query_:{!geofilt%20pt=x,y%20d=r%20unit=km%20sfield=coordinates}) = which does what is asked to... Return both the results with FL in the state field and no value in the coordinates field *plus* results within a radius around a point, *but* the problem is that in that case, the solr search layer dies unconditionnally with the following stack : Problem accessing /solr/geo_xpe/select. Reason: null java.lang.NullPointerException at org.apache.lucene.spatial.DistanceUtils.parsePoint(DistanceUtils.java:351) at org.apache.solr.schema.LatLonType.getRangeQuery(LatLonType.java:95) at org.apache.solr.search.SolrQueryParser.getRangeQuery(SolrQueryParser.java:165) ... Of course, it doesn't make sense to expect the distance computation to work with documents lacking value in the coordinate field! Arguably this is a bug. LatLonType doesn't handle open-ended range queries and it didn't check for a null argument defensively either. This will happen wether there is indexed data or not. [* TO *] queries are slow, particularly when there are many values -- like at least a thousand. If you want to perform this type of query, instead index a boolean field corresponding to another field that indicates wether that field has a value. This would be a good use of an UpdateRequestProcessor but you can just as well do it elsewhere. From a user perspective, having the possibility to define a default distance to be returned for document missing a value in the coordinate field could be helpful... If something like sortMissingFirst or sortMissingLast is specified on the field. * sortMissingLast=true could be obtained with a +Inf distance returned if no value in the field * sortMissingFirst=true could be obtained with a 0 distance returned if no value in the field I may be misunderstanding concepts, but those sorting attributes seem to only apply for sorting and not to the documents selection process (geofilt)..? I know that since solr3.5, it's possible to define sortMissing(Last|First) on trie-based fields, but I don't know what happens for fields defined that way : ... types ... fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=latlong class=solr.LatLonType indexed=true sortMissingLast=true omitNorms=true subFieldType=double / ... types ... fields ... field name=coordinates type=latlong indexed=true stored=true mutliValued=false/ ... /fields ... Help is welcome! Indeed, sortMissing,etc. are used in sorting, and play no part in wether a document matches or not. And for LatLonType, they won't do anything. LatLonType uses the a pair of double fields under the hood, as seen in your schema excerpt. You could put those attributes there but I don't think that would work. I was playing around with blank values yesterday and I found that blank values result in a distance away from the query point that is very large… I forget what value it was but you can try yourself. ~ David Smiley
replication failure, logs or notice?
I think maybe my Solr 1.4 replications have been failing for quite some time, without me realizing it, possibly due to lack of disk space to replicate some large segments. Where would I look to see if a replication failed? Just the standard solr log? What would I look for? There's no facility to have, like an email sent if replication fails or anything, is there? I realize that Solr/java logging is something that still confuses me, I've done whatever was easiest, but I'm vaguely remembering now that by picking the right logging framework and configuring it properly, maybe you can send different types of events to different logs, like maybe replication events to their own log? Is this a thing? Thanks for any ideas, Jonathan
can solr automatically search for different punctuation of a word
Hello, I would like to know if solr has a functionality to automatically search for a different punctuation of a word. For example if I if a user searches for a word Uber, and stemmer is german lang, then solr looks for both Uber and Über, like in synonyms. Is it possible to give a file with a list of possible substitutions of letters to solr and have it search for all possible punctuations? Thanks. Alex.
Re: SolrException: Invalid Date String:'oracle.sql.TIMESTAMP
I guess you probably run into the issue between different date value format in your oracle db and in solr field. Solr only expects XML date value in UTC format - http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html. You might need to consider DateFormatTransformer - http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer Yunfei On Thu, Jan 12, 2012 at 10:05 AM, Joey Grimm jgr...@rim.com wrote: Hi, I am trying to use a dataImportHandler to import data from an oracle DB. It works for non-date fields but is throwing an exception once I included the MODIFIEDDATE field (oracle.timestamp field). Can anyone see what I'm doing wrong here? Thanks. schema.xml field name=catModifiedDate type=date indexed=true stored=true / db-data-config.xml entity name=category datasource=jdbc query=SELECT ID,PARENTID,ICONID,SORTORDER,MODIFIEDDATE FROM CATEGORY field column=ID name=masterId / field column=PARENTID name=catParentId / field column=ICONID name=catIconId / field column=SORTORDER name=catSortOrder / field column=MODIFIEDDATE name=catModifiedDate/ WARNING: Error creating document : SolrInputDocument[{catModifiedDate=catModifiedDate(1.0)={oracle.sql.TIMESTAMP@1e58565 }, masterId=masterId(1.0)={124}, catParentId=catParentId(1.0)={118}, catIconId=catIconId(1.0)={304856}}] org.apache.solr.common.SolrException: ERROR: [doc=124] Error adding field 'catModifiedDate'='oracle.sql.TIMESTAMP@1e58565' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:324) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:636) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) Caused by: org.apache.solr.common.SolrException: Invalid Date String:'oracle.sql.TIMESTAMP@1e58565' at org.apache.solr.schema.DateField.parseMath(DateField.java:165) at org.apache.solr.schema.TrieField.createField(TrieField.java:421) at org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:281) -- View this message in context: http://lucene.472066.n3.nabble.com/SolrException-Invalid-Date-String-oracle-sql-TIMESTAMP-tp3654419p3654419.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Relevancy and random sorting
: We have a listing aggregator that gets product listings from a lot of : different sites and since they are added in batches, sometimes you see a : lot of pages from the same source (site). We are working on some changes to : shift things around and reduce this blocking effect, so we can present : mixed sources on the result pages. if the problem you are seeing is strings of docs all i na clump because they have the same *score* then just add a secondary sort on your random field - in the example you posted, you completley replace the sort by score with sort by random... sort = score desc, random_1 desc but that will only help differentiate when the scores are identical. alternatively: you could probably use a random field in your baising function, although you should probably use something like the map or scale functions to keep it from having too much of a profound impact on the final score. maybe something like... q={!boost b=product(scale(random_1,1,5),recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1))} active:true AND featured:false +_val_:haspicture -Hoss
Re: Question about updating index with custom field types
Hi Sylvain, I'm very sorry that I could not help you for I'm also doing pure English project... Erick, Thanks for your approach, I'll try it. Luo Sai On Wed, Jan 11, 2012 at 10:08 PM, Erick Erickson erickerick...@gmail.comwrote: I'm not sure what custom field types have to do with XML here. Somewhere, you have to have defined a *field* in your schema.xml that references your custom type, something like: field name=the_offer type=offer . / then the XML is just like any other field doc field name=the_offer attr1=val156.75/field /doc WARNING: I don't quite know how to access the attributes down in your special code, I haven't had the occasion to actually do that so I don't know whether the attributes are carried down through the document parsing Best Erick On Tue, Jan 10, 2012 at 4:20 AM, 罗赛 seraph@gmail.com wrote: Hello everyone, I have a question on how to update index using xml messages when there are some complex custom field types in my index...like: fieldtype name=offer class=com.xxx.OfferField/ And field offer has some attributes in it... I've read page, http://wiki.apache.org/solr/UpdateXmlMessages and example shows that xml should be like: add doc field name=employeeId05991/field field name=officeBridgewater/field field name=skillsPerl/field field name=skillsJava/field /doc [doc ... /doc[doc ... /doc]] /add So, could u tell me how to write the XML or is there any other method to update index with custom field types? Thanks, -- Best wishes Sai -- Best wishes 罗赛 Tel 13811219876
Re: Solr 3.3 crashes after ~18 hours?
I believe this issue is related to this Jetty bug report: https://bugs.eclipse.org/bugs/show_bug.cgi?id=357318 Gili -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-3-crashes-after-18-hours-tp3218496p3655937.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stemming numbers
: We've had some issues with people searching for a document with the : search term '200 movies'. The document is actually title 'two hundred : movies'. : : Do we need to add every number to our synonyms dictionary to : accomplish this? Is it best done at index or search time? if all you care about is english, there's actually an English.longToEnglish method in the lucene test-framework that was used to generate test corpuses back in the Lucene 1.x days .. i don't actaully think it's used in any Lucene tests anymore at all. could probably whip up a filter using that in about a dozen lines of code ... but it still wouldn't handle things like dozen (or half dozen or gross) but it's there if you want to try. -Hoss