Getting the offset of search keyword in a document
Hello, I am new to Solr/Lucene and I am evaluating if they suit my need and replace our in-house system. Our requirements: 1. I have multiple documents (1M) 2. Each document contains text ranged from few KB to a few MB 3. I want to search for a keyword, search thru all theses document, and it return the matched document(s), AND ALSO the offset of that 'keyword' inside the document. Is it possible for requirement 3?
Re: Tree Faceting in Solr 1.4
Hi Geert-Jan, What did you mean by this: Also, just a suggestion, consider using id's instead of names for filtering; Thanks, -S
Re: a bug of solr distributed search
Okay, but than LiLi did something wrong, right? I mean, if the document exists only at one shard, it should get the same score whenever one requests it, no? Of course, this only applies if nothing gets changed between the requests. The only remaining problem here would be, that you need distributed IDF (like at the mentioned JIRA-issue) to normalize your results's scoring. But the mentioned problem at this mailing-list-posting has nothing to do with that... Regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p991907.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud in production?
Is anyone using ZooKeeper-based Solr Cloud in production yet? Any war stories? Any problematic missing features? Thanks, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-in-production-tp991995p991995.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tree Faceting in Solr 1.4
Perhaps completely unnessecery when you have a controlled domain, but I meant to use ids for places instead of names, because names will quickly become ambiguous, e.g.: there are numerous different places over the world called washington, etc. 2010/7/24 SR r.steve@gmail.com Hi Geert-Jan, What did you mean by this: Also, just a suggestion, consider using id's instead of names for filtering; Thanks, -S
RE: Novice seeking help to change filters to search without diacritics
Hi HSingh, Usually people set up two fields, one with diacritics and one without. Then searches are against both fields. If you think a match against the field with diacritics is more valuable, you can give that field a boost. Steve -Original Message- From: HSingh [mailto:hsin...@gmail.com] Sent: Friday, July 23, 2010 5:20 PM To: solr-user@lucene.apache.org Subject: RE: Novice seeking help to change filters to search without diacritics Hi Steve, This is extremely helpful! What is the best way to also preserve/append the diacritics in the index in case someone searches using them? I deeply appreciate your help! -- View this message in context: http://lucene.472066.n3.nabble.com/Novice- seeking-help-to-change-filters-to-search-without-diacritics- tp971263p990949.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance issues when querying on large documents
Are you storing the full 1,000 pages in the index? If so, that is probably not helping either. On 7/23/10, ahammad ahmed.ham...@gmail.com wrote: Hello, I have an index with lots of different types of documents. One of those types basically contains extracts of PDF docs. Some of those PDFs can have 1000+ pages, so there would be a lot of stuff to search through. I am experiencing really terrible performance when querying. My whole index has about 270k documents, but less than 1000 of those are the PDF extracts. The slow querying occurs when I search only on those PDF extracts (by specifying filters), and return 100 results. The 100 results definitely adds to the issue, but even cutting that down can be slow. Is there a way to improve querying with such large results? To give an idea, querying for a single word can take a little over a minute, which isn't really viable for an application that revolves around searching. For now, I have limited the results to 20, which makes the query execute in roughly 10-15 seconds. However, I would like to have the option of returning 100 results. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-issues-when-querying-on-large-documents-tp990590p990590.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sent from my mobile device
RE: Tree Faceting in Solr 1.4
Perhaps completely unnessecery when you have a controlled domain, but I meant to use ids for places instead of names, because names will quickly become ambiguous, e.g.: there are numerous different places over the world called washington, etc. This is related to something I've been thinking about. Okay, say you use ID's instead of names. Now, you've got to translate those ID's to names before you display them, of course. One way to do that would be to keep the id-to-name lookup in some non-solr store (rdbms, or non-sql store) Is that what you'd do? Is there any non-crazy way to do that without an external store, just with solr? Any way to do it with term payloads? Anything else? Jonathan
Re: Tree Faceting in Solr 1.4
Hi Jonathan, I too am using IDs instead of names, one reason being that URLs are easier to read and they are more safe, because special chars in names could break the URLs etc. I am keeping the id-to-name lookups in SOLR though, I just use some lookup fields where I put id and name into one field, separated by some fixed delimiter, e.g. 134982__Some name I am going to lookup later The separator here would be two underscores (__). So I can query for that lookup field, extract id and name and store them into an array or something to loop them up in my (PHP) frontend. If you don't have too many different values you could also map id-to-name in a simple text file (as suggested in the SOLR book e.g.) Cheers, Stefan Perhaps completely unnessecery when you have a controlled domain, but I meant to use ids for places instead of names, because names will quickly become ambiguous, e.g.: there are numerous different places over the world called washington, etc. This is related to something I've been thinking about. Okay, say you use ID's instead of names. Now, you've got to translate those ID's to names before you display them, of course. One way to do that would be to keep the id-to-name lookup in some non-solr store (rdbms, or non-sql store) Is that what you'd do? Is there any non-crazy way to do that without an external store, just with solr? Any way to do it with term payloads? Anything else? Jonathan -- *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de ***
RE: Tree Faceting in Solr 1.4
I am keeping the id-to-name lookups in SOLR though, I just use some lookup fields where I put id and name into one field, separated by some fixed delimiter, e.g. 134982__Some name I am going to lookup later The separator here would be two underscores (__). So I can query for that lookup field, extract id and name and store them into an array or something to loop them up in my (PHP) frontend. Interesting, thanks. Do you use a prefix query, then, to find that value? Still confused thinking about how this would work. Each of your documents has only one ID? In my case, its more like the geographic hieararchical stuff this thread began with. The ID is not the documents' ID, it's the ID of essentially a facet value, which can be multi-valued (or maybe even hierarchical). Document X: * United State * China Document Y * China * Russia If we turn those actual values into ID__label strings... it gets confusing how to query for them. ESPECIALLY if we try to introduce the hierarchy into it. Document X: * 1234__United States/677_Michigan/987_Detroit I think actually trying to store things like that would break either of the techniques in the wiki page about hierarchical facetting. Maybe an external store is really the only way to go that doesn't turn into a mess.
Re: Tree Faceting in Solr 1.4
I believe we use an in-process weakhashmap to store the id-name relationship. It's not that we're talking billions of values here. For anything more mem-intensive we use no-sql (tokyo tyrant through memcached protocol at the moment) 2010/7/24 Jonathan Rochkind rochk...@jhu.edu Perhaps completely unnessecery when you have a controlled domain, but I meant to use ids for places instead of names, because names will quickly become ambiguous, e.g.: there are numerous different places over the world called washington, etc. This is related to something I've been thinking about. Okay, say you use ID's instead of names. Now, you've got to translate those ID's to names before you display them, of course. One way to do that would be to keep the id-to-name lookup in some non-solr store (rdbms, or non-sql store) Is that what you'd do? Is there any non-crazy way to do that without an external store, just with solr? Any way to do it with term payloads? Anything else? Jonathan
Re: SolrCloud in production?
Boy, if it does what it says it does, it's really a powerful tool. How is such a thing hosted, I wonder? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Sat, 7/24/10, Andrew Clegg andrew.cl...@gmail.com wrote: From: Andrew Clegg andrew.cl...@gmail.com Subject: SolrCloud in production? To: solr-user@lucene.apache.org Date: Saturday, July 24, 2010, 5:18 AM Is anyone using ZooKeeper-based Solr Cloud in production yet? Any war stories? Any problematic missing features? Thanks, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-in-production-tp991995p991995.html Sent from the Solr - User mailing list archive at Nabble.com.
Which is a good XPath generator?
Hi, I am looking for a XPath generator that can generate xpath by picking a specific tag inside a html. Do you know a good xpath generator? If possible, free xpath generator would be great. Thanks.
Re: Performance issues when querying on large documents
What are you returning? I'd be quite surprised if it was the search, so first I'd look elsewhere. In particular, are you returning all 1,000 pages? What happens if you specify returning a small field (the fl= parameter). Also, look at the debug output of the query, it breaks down the various phases of the query processing and that might give you a hint. If none of that does the trick, please post the query and the relevant parts of your schema as well as debug output... Best Erick On Fri, Jul 23, 2010 at 2:52 PM, ahammad ahmed.ham...@gmail.com wrote: Hello, I have an index with lots of different types of documents. One of those types basically contains extracts of PDF docs. Some of those PDFs can have 1000+ pages, so there would be a lot of stuff to search through. I am experiencing really terrible performance when querying. My whole index has about 270k documents, but less than 1000 of those are the PDF extracts. The slow querying occurs when I search only on those PDF extracts (by specifying filters), and return 100 results. The 100 results definitely adds to the issue, but even cutting that down can be slow. Is there a way to improve querying with such large results? To give an idea, querying for a single word can take a little over a minute, which isn't really viable for an application that revolves around searching. For now, I have limited the results to 20, which makes the query execute in roughly 10-15 seconds. However, I would like to have the option of returning 100 results. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-issues-when-querying-on-large-documents-tp990590p990590.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Novice seeking help to change filters to search without diacritics
: Usually people set up two fields, one with diacritics and one without. : Then searches are against both fields. If you think a match against the field : with diacritics is more valuable, you can give that field a boost. Hi Steve, where can one setup these two fields? Thank you for your kind assistance! -- View this message in context: http://lucene.472066.n3.nabble.com/Novice-seeking-help-to-change-filters-to-search-without-diacritics-tp971263p993150.html Sent from the Solr - User mailing list archive at Nabble.com.