Re: Dismax + Dynamic fields
Norberto Meijome wrote: > Thanks Yonik. ok, that matches what I've seen - if i know the actual > name of the field I'm after, I can use it in a query it, but i can't > use the dynamic_field_name_* (with wildcard) in the config. > > Is adding support for this something that is desirable / needed > (doable??) , and is it being worked on ? You can use a wildcard with copyFrom to copy the dynamic fields that match the pattern to another field that you can then query on. It seems like that would cover your needs, no? Daniel
Re: expression in an fq parameter fails
Ezra Epstein wrote: storeAvailableDate:[* TO NOW] storeExpirationDate:[NOW TO *] ... This works perfectly. Only trouble is that the two data fields may actually be empty, in which case this filters out such records and we want to include them. I think the easiest thing to do would be either use a zero-date for storeAvailableDate and an infinity-date for storeExpirationDate instead of having them be empty for things you want to be always available or always expired (if I've understood your problem) or, add another field alwaysAvailable or neverExpiring, and then do an OR off of that. Maybe that's cheating? HTH, Daniel
Re: Fwd: Grouping products
Vender Livre wrote: But it can find the most probable product, can't it? Is there a library or tool that do something like that? Someone told me SOLR would solve this problem. I wouldn't say solr would solve this problem... sounds like someone sold you snake oil! If you wanted to use solr, I think your best bet is to use a nightly and run a MoreLikeThis query - http://wiki.apache.org/solr/MoreLikeThis - but whether that's going to work well for you with so few terms, I have no idea. Good luck! Daniel
Re: Extending XmlRequestHandler
Alexander Ramos Jardim wrote: Ok, Thanks for the advice! I got the XmlRequestHandler code. I see it uses Stax right at the XML it gets. There isn't anything to plug in or out to get an easy way to change the xml format. To maybe save you from reinventing the wheel, when I asked a similar question a couple weeks back, hossman pointed me towards SOLR-285 and SOLR-370. 285 does XSLT, 270 does STX. Daniel
Re: SOLR-470 & default value in schema with NOW (update)
Chris Hostetter wrote: > The two exceptions you cited both indicate there was at least one date > instance with no millis included -- NOW can't do that. it always inlcudes > millis (even though it shouldn't). I've seen people suggest, for performance reasons, that they reduce the granularity of the timestamps they're storing down to what they need - i.e. minute, hour, or day, instead of millisecond. But it seems that functionality will break if you don't store it with millis. I'm just trying to make sure I'm reconciling these here-- Is the goal of reducing the granularity simply to reduce the cardinality of the indexed date terms? If so, is the best practice when you don't need significance beyond date to just fill the rest of the date with zeros, and index, say, 2008-07-05T00:00:00.000Z? (Hope this doesn't count as a threadjack!) Daniel
Re: XSLT transform before update?
Shalin Shekhar Mangar wrote: Hi Daniel, Maybe if you can give us a sample of how your XML looks like, we can suggest how to use SOLR-469 (Data Import Handler) to index it. Most of the use-cases we have yet encountered are solvable using the XPathEntityProcessor in DataImportHandler without using XSLT, for details look at http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476 I think even if it is possible to use SOLR-469 for my needs, I'd still prefer the XSLT approach, because it's going to be a bit of configuration either way, and I'd rather it be an XSLT stylesheet than solrconfig.xml. In addition, I haven't yet decided whether I want to apply any patches to the version that we will deploy, but if I do go down the route of the XSLT transform patch, if I end up having to back it out the amount of work that it would be for me to do the transform at the XML source would be negligible, where it would be quite a bit of work ahead of me to go from using the DataImportHandler to not using it at all. Because both the solr instance and the XML source are in house, I have the ability to apply the XSLT at the source instead of at solr. However, there are different teams of people that control the XML source and solr, so it would require a bit more office coordination to do it on the backend. The data is a filemaker XML export (DTD fmresultset) and it looks roughly like this: 125 Ford Foundation ... Y5-A John Smith Y5-B Jane Doe I'm taking the product of the resultset and the relatedset, using both IDs concatenated as a unique identifier, like so: 125Y5-A Ford Foundation John Smith 125Y5-B Ford Foundation Jane Doe I can do the transform pretty simply with XSLT. I suppose it is possible to get the DataImportHandler to do this, but I'm not yet convinced that it's easier. Daniel
XSLT transform before update?
Hey everyone, I'm experimenting with updating solr from a remote XML source, using an XSLT transform to get it into the solr XML syntax (and yes, I've looked into SOLR-469, but disregarded it as I need to do quite a bit using XSLT to get it to what I can index) to let me maintain an index. I'm looking at using stream.url, but I need to do the XSLT at some point in there. I would prefer to do the XSLT on the client (solr) side of the transfer, for various reasons. Is there a way to implement a custom request handler or similar to get solr to apply an XSLT transform to the content stream before it attempts to parse it? If not possible OOTB, where would be the right place to add said functionality? Thanks much for your help, Daniel
Re: how to suppress result
Evgeniy Strokin wrote: I'm sorry, I didn't explain my case clearly. My Index base should stay the same. User run query, and each time he runs query he wants to suppress his own IDs. The example will be a merchant, who sell books. He sell only fantasy books and he wants to see all fantasy books in stock of wholesaler except books he already has in his own stack. So he provides a list of books he already has and want them to be excluded from his search result. So suppression is per query actually (it would be better to say per user's session, but since Solr has no sessions I'd say per query). Obviously other book shop has his own book list and his own query and he wants to search and suppress from the same index base of wholesaler. What I would do is index book-merchant pairs, instead of books and merchants separately. Each document would have the merchant's ID in there, so you can just add a fq statement to exclude the current merchant. It's a far ways from normalized data, but this is an index, not an RDBMS. Denormalize the data into documents, and index that. Daniel
Re: matching exact/whole phrase
Sandeep Shetty wrote: Hi people, I am looking to provide exact phrase match, along with the full text search with solr. I want to achieve the same effect in solr rather than use a separate SQL query. I want to do the following as an example The indexed field has the text "car repair" (without the double quotes) for a document and I want this document to come in the search result only if someone searches for "car repair". The document should not show up for "repair" and "car" searches. Is it possible to do this type of exact phrase matching if needed with solr itself? It sounds like you want to do an exact string match, and not a text match, so I don't think there's anything complex you'd need to do... just store the field with "car repair" as type="string" and do all of the literal searches you want. But if you are working off a field that contains something beyond the exact match of what you want to search for, you'll just need to define a new field type and use only the analysis filters that you need, and you'll have to think more about what you need if that's the case. Daniel
Re: Multiple schemas?
tim robertson wrote: Hi, Would I be correct in thinking that for each schema I want, I need a new SOLR instance running? Hey Tim, Documents aren't required to have all of the fields (it's not a database), so what I would do is just have all of the field definitions in a single schema.xml file. That approach would only be a problem if you needed to have a field name mean one thing some of the time and something else another -- I'd suggest using consistent naming so that fields named the same way were treated the same way, and then using a single solr instance. Daniel
Re: Update schema.xml without restarting Solr?
[EMAIL PROTECTED] wrote: Quoting Daniel Papasian <[EMAIL PROTECTED]>: Or if you're adding a new field to the schema (perhaps the most common need for editing schema.xml), you don't need to reindex any documents at all, right? Unless I'm missing something? Well, it all depends on if that "field" (not solar/lucene field) exists on the already indexed material, but that particular field was never indexed. Lets say that we have a bunch of articles, that has a field "author" that someone decided that it doesn't need to be in the index. But then later he changes his mind, and add the author field to the schema. In this case all articles that has a populated author field should now be reindexed. Yeah, I guess the use case I was thinking of was someone who had multiple different types of content in their index (say, articles, events, organizations) and when they added a new content type (book review) if they found the need to add a new field for that content type (say, publisher) that would only be relevant for that type -- as you're adding it before any data that would have it was indexed, I believe you'd be fine making that schema change without reindexing anything. I suppose if you add a new dynamic field specification that conflicts with existing fields, reindexing is probably a good idea, but if you're doing that... well, I probably don't want to know. I must say that I'm abit confused by these dynamic fields. Can someone tell me if there is any reasonable use of dynamic fields without having the "variable type" (for example i for int/sint) in the name? Well, perhaps this is fulfilling your requirement on a technicality, but there's always higher order types... Offhand, I can think of things where you might want to define a dynamic field like *_propername or *_cost and then you'd be able to use fields like author_propername or editor_propername, or book_cost or volume_cost or what have you. Daniel
Re: Update schema.xml without restarting Solr?
[EMAIL PROTECTED] wrote: > Quoting Jeryl Cook <[EMAIL PROTECTED]>: > >> 2. Make the "schema.xml" configurable at runtime, not really sure the >> best way to address this, because changing the schema would require >> "re-indexing" the documents. > > Isn't the best way to address this just to leave it to the persons that > integrate solr into their system? I mean, if a change in the schema only > effects 1% of all documents, then it's a bad idea to reindex them all > (at least if the dataset is big). Or if you're adding a new field to the schema (perhaps the most common need for editing schema.xml), you don't need to reindex any documents at all, right? Unless I'm missing something? I suppose if you add a new dynamic field specification that conflicts with existing fields, reindexing is probably a good idea, but if you're doing that... well, I probably don't want to know. Daniel