Re: solrj returning no results but curl can get them
It was pilot error. I just reviewed my servlet and noticed a parameter in web.xml that was looking to find data for the new product in the production index which doesn't have that data yet while my curl command was running against the staging index. I rebuilt the servlet with the fixed parameter and life is now good. -- View this message in context: http://lucene.472066.n3.nabble.com/solrj-returning-no-results-but-curl-can-get-them-tp4183053p4183119.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Does DocValues improve Grouping performance ?
Hi Shamik, We use DocValues for grouping, and although I have nothing to compare it to (we started with DocValues), we are also seeing similar poor results as you: easily 60% overhead compared to non-group queries. Looking around for some solution, no quick fix is presenting itself unfortunately. CollapsingQParserPlugin also is too limited for our needs. -Original Message- From: Shamik Bandopadhyay [mailto:sham...@gmail.com] Sent: Thursday, January 15, 2015 6:02 PM To: solr-user@lucene.apache.org Subject: Does DocValues improve Grouping performance ? Hi, Does use of DocValues provide any performance improvement for Grouping ? I' looked into the blog which mentions improving Grouping performance through DocValues. https://lucidworks.com/blog/fun-with-docvalues-in-solr-4-2/ Right now, Group by queries (which I can't sadly avoid) has become a huge bottleneck. It has an overhead of 60-70% compared to the same query san group by. Unfortunately, I'm not able to be CollapsingQParserPlugin as it doesn't have a support similar to group.facet feature. My understanding on DocValues is that it's intended for faceting and sorting. Just wondering if anyone have tried DocValues for Grouping and saw any improvements ? -Thanks, Shamik
Re: AW: AW: AW: CoreContainer#createAndLoad, existing cores not loaded
On 1/29/2015 11:37 PM, Clemens Wyss DEV wrote: The recommendation these days is to NOT use the embedded server We would love to, as it is clear that this is not the Solr-way to go. The reason for us building upon EmbeddedSolrServer is, we have more than 150sites, each with ist own index (core). If we'd go client server then we could no easily update the solr server(s) without also updating all clients (i.e. the 150 sites) at same time. And having a dedicated Solr server for every client/site is not really an option, is it? Or can for example a 4.10.3 client talk to a Solr 5/6 Server? Also when updating the Solr server, doesn't that also require a re-index of all data as the Luncene-storage format might have changed? Cross-version compatibility between SolrJ and Solr is very high, as long as you're not running SolrCloud. SolrCloud is *incredibly* awesome, but it's not for everyone. Without SolrCloud, the communication is http only, using very stable APIs that have been around since pretty much the beginning of Solr. In the 1.x and 3.x days, there were occasional code tweaks required for cross-version compatibility, but the API has been extremely stable since early 4.x -- for a couple of years now. SolrCloud is much more recent and far more complex, so problems or deficiencies are sometimes found with the API. Fixing those bugs sometimes requires changes that are incompatible with other versions of the Java client. The SolrJ java client is an integral part of Solr itself, so SolrCloud functionality in the client is tightly coupled to specifics in the API that are undergoing rapid change from version to version. I don't think that SolrCloud is even possible with the embedded server, because it requires HTTP for inter-server communication. The embedded server doesn't listen for HTTP. Thanks, Shawn
Removing a stored field from solrcloud 4.4
Hello, I have a field which is indexed and stored in the solr schema( 4.4.solr cloud).This field is relatively huge and I plan to only index the field and not to store.Is there a need to re-index the documents once this change is made?. Thanks, Nishanth
Calling custom request handler with data import
Hi, I am using data import handler to import data from mysql, and I want to identify name entities from it. So I am using following example( http://www.searchbox.com/named-entity-recognition-ner-in-solr/). where I am using stanford ner to identify name entities. I am using following requesthandler requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-import.xml/str /lst /requestHandler for importing data from mysql and requestHandler name=/ner class=com.searchbox.ner.NerHandler / updateRequestProcessorChain name=mychain processor class=com.searchbox.ner.NerProcessorFactory lst name=queryFields str name=queryFieldcontent/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainmychain/str /lst /requestHandler for identifying name entities.NER request handler identifies name entities from content field, but store extracted entities in solr fields. NER request handler was working when I am using nutch with solr. But When I am importing data from mysql, ner request handler is not invoked. So entities are not stored in solr for imported documents. Can anybody tell me how to call custom request handler in data import handler. Otherwise if I can invoke ner request handler externally, so that it can index person, organization and location in solr for imported document. It is also fine. Any suggestion are welcome. Thanks Vineet Yadav
timestamp field and atomic updates
I have a timestamp field in my schema to track when each doc was indexed: field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false / Recently, we have switched over to use atomic update instead of re-indexing when we need to update a doc in the index. It looks to me that the timestamp field is not updated during an atomic update. I have also looked into TimestampUpdateProcessorFactory and it looks to me that won't help in my case. Is there anything within Solr that I can use to update the timestamp during atomic update, or do I have to explicitly include the timestamp field as part of the atomic update? Bill
Re: solrj returning no results but curl can get them
Hi Dmitri, I do have a question mark in my search. I see that I dropped that accidentally when I was copying/pasting/formatting the details. My curl command is curl http://myserver/myapp/myproduct?fl=*,.; And, it works fine whether I have .../myproduct/?fl=*, or if I leave out the / before ?fl=*. The curl command works perfectly with any of the four request handlers so I believe the data to be correct and my solrj code works perfectly with three out of four of the request handlers so I believe the code to be correct as well. Thanks. Sol -- View this message in context: http://lucene.472066.n3.nabble.com/solrj-returning-no-results-but-curl-can-get-them-tp4183053p4183116.html Sent from the Solr - User mailing list archive at Nabble.com.
Hit Highlighting and More Like This
Hi all, I'm fairly new to Solr. It seems like it should be possible to enable the hit highlighting feature and more like this feature at the same time, with the key words from the MLT query being the terms highlighted. Is this possible? I am trying right now to do this, but I am not having any snippets returned to me. Thanks!
Re: Removing a stored field from solrcloud 4.4
Yes and no. Solr should continue to work fine, just all new documents won't have the stored field to return to the clients. As you re-index docs, subsequent merges will purge the stored data _for the docs you've re-indexed_. But I would re-index just to get my system in a consistent state. Best Erick On Fri, Jan 30, 2015 at 9:40 AM, Nishanth S nishanth.2...@gmail.com wrote: Hello, I have a field which is indexed and stored in the solr schema( 4.4.solr cloud).This field is relatively huge and I plan to only index the field and not to store.Is there a need to re-index the documents once this change is made?. Thanks, Nishanth
Re: Suggesting broken words with solr.WordBreakSolrSpellChecker
Nice! It works indeed! Sorry I didn't noticed that before. But what if I want the same for the iPhone? I mean suggesting I phone for users who searched iphone. Minbreaklength of 1 is just too small isn't it? Il sabato 31 gennaio 2015, Dyer, James-2 [via Lucene] ml-node+s472066n4183176...@n3.nabble.com ha scritto: You need to decrease this to at least 2 because the length of go is 3. int name=minBreakLength3/int James Dyer Ingram Content Group -Original Message- From: fabio.bozzo [mailto:[hidden email] http:///user/SendEmail.jtp?type=nodenode=4183176i=0] Sent: Wednesday, January 28, 2015 4:55 PM To: [hidden email] http:///user/SendEmail.jtp?type=nodenode=4183176i=1 Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker I tried increasing my alternativeTermCount to 5 and enable extended results. I also added a filter fq parameter to clarify what I mean: *Querying for go pro is good:* { responseHeader: { status: 0, QTime: 2, params: { q: go pro, indent: true, fq: marchio:\GO PRO\, rows: 1, wt: json, spellcheck.extendedResults: true, _: 1422485581792 } }, response: { numFound: 27, start: 0, docs: [ { codice_produttore_s: DK00150020, codice_s: 5.BAT.27407, id: 27407, marchio: GO PRO, barcode_interno_s: 185323000958, prezzo_acquisto_d: 16.12, data_aggiornamento_dt: 2012-06-21T00:00:00Z, descrizione: BATTERIA GO PRO HERO , prezzo_vendita_d: 39.9, categoria: Batterie, _version_: 1491583424191791000 }, ] }, spellcheck: { suggestions: [ go pro, { numFound: 1, startOffset: 0, endOffset: 6, origFreq: 433, suggestion: [ { word: gopro, freq: 2 } ] }, correctlySpelled, false, collation, [ collationQuery, gopro, hits, 3, misspellingsAndCorrections, [ go pro, gopro ] ] ] } } While querying for gopro is not: { responseHeader: { status: 0, QTime: 6, params: { q: gopro, indent: true, fq: marchio:\GO PRO\, rows: 1, wt: json, spellcheck.extendedResults: true, _: 1422485629480 } }, response: { numFound: 3, start: 0, docs: [ { codice_produttore_s: DK0030010, codice_s: 5.VID.39163, id: 38814, marchio: GO PRO, barcode_interno_s: 818279012477, prezzo_acquisto_d: 150.84, data_aggiornamento_dt: 2014-12-24T00:00:00Z, descrizione: VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM, prezzo_vendita_d: 219, categoria: Fotografia, _version_: 1491583425479442400 }, ] }, spellcheck: { suggestions: [ gopro, { numFound: 1, startOffset: 0, endOffset: 5, origFreq: 2, suggestion: [ { word: giro, freq: 6 } ] }, correctlySpelled, false ] } } --- I'd like go pro as a suggestion for gopro too. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4183176.html To unsubscribe from Suggesting broken words with solr.WordBreakSolrSpellChecker, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4182172code=Zi5ib3p6b0AzLXcuaXR8NDE4MjE3MnwxODkyODA0NDQy . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Fabio Bozzo SW Engineer 3W s.r.l. Via Luisetti,7 13900-Biella ( BI ) Tel. 015.84.97.804 / 015.89.76.350 Fax 015.84.70.450 Registro imprese Biella n.01965270026 R.E.A. BI 175416 Questo messaggio di posta elettronica contiene informazioni di carattere confidenziale rivolte esclusivamente al destinatario sopra indicato. E' vietato l'uso, la diffusione, distribuzione o riproduzione da parte di ogni altra persona. Nel caso aveste ricevuto questo messaggio di posta elettronica per
Replication in solrloud
Hi, We have 4 servers in Solrcloud with one shard. 2 of the servers are not in sync with other two.We like to force replication manually to keep all the servers in sync.Do we have a command to force replication? (other than Solr restart). Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Replication-in-solrloud-tp4183103.html Sent from the Solr - User mailing list archive at Nabble.com.
New UI for SOLR-based projects
Hi everybody, There exists a new open-source implementation of a search interface for SOLR. It is written in Javascript (using Backbone), currently in version v1.0.19 - but new features are constantly coming. Rather than describing it in words, please see it in action for yourself at http://ui.adslabs.org - I'd recommend exploring facets, the query form, and visualizations. The code lives at: http://github.com/adsabs/bumblebee Best, Roman
RE: Suggesting broken words with solr.WordBreakSolrSpellChecker
You need to decrease this to at least 2 because the length of go is 3. int name=minBreakLength3/int James Dyer Ingram Content Group -Original Message- From: fabio.bozzo [mailto:f.bo...@3-w.it] Sent: Wednesday, January 28, 2015 4:55 PM To: solr-user@lucene.apache.org Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker I tried increasing my alternativeTermCount to 5 and enable extended results. I also added a filter fq parameter to clarify what I mean: *Querying for go pro is good:* { responseHeader: { status: 0, QTime: 2, params: { q: go pro, indent: true, fq: marchio:\GO PRO\, rows: 1, wt: json, spellcheck.extendedResults: true, _: 1422485581792 } }, response: { numFound: 27, start: 0, docs: [ { codice_produttore_s: DK00150020, codice_s: 5.BAT.27407, id: 27407, marchio: GO PRO, barcode_interno_s: 185323000958, prezzo_acquisto_d: 16.12, data_aggiornamento_dt: 2012-06-21T00:00:00Z, descrizione: BATTERIA GO PRO HERO , prezzo_vendita_d: 39.9, categoria: Batterie, _version_: 1491583424191791000 }, ] }, spellcheck: { suggestions: [ go pro, { numFound: 1, startOffset: 0, endOffset: 6, origFreq: 433, suggestion: [ { word: gopro, freq: 2 } ] }, correctlySpelled, false, collation, [ collationQuery, gopro, hits, 3, misspellingsAndCorrections, [ go pro, gopro ] ] ] } } While querying for gopro is not: { responseHeader: { status: 0, QTime: 6, params: { q: gopro, indent: true, fq: marchio:\GO PRO\, rows: 1, wt: json, spellcheck.extendedResults: true, _: 1422485629480 } }, response: { numFound: 3, start: 0, docs: [ { codice_produttore_s: DK0030010, codice_s: 5.VID.39163, id: 38814, marchio: GO PRO, barcode_interno_s: 818279012477, prezzo_acquisto_d: 150.84, data_aggiornamento_dt: 2014-12-24T00:00:00Z, descrizione: VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM, prezzo_vendita_d: 219, categoria: Fotografia, _version_: 1491583425479442400 }, ] }, spellcheck: { suggestions: [ gopro, { numFound: 1, startOffset: 0, endOffset: 5, origFreq: 2, suggestion: [ { word: giro, freq: 6 } ] }, correctlySpelled, false ] } } --- I'd like go pro as a suggestion for gopro too. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: New UI for SOLR-based projects
On 1/30/2015 1:07 PM, Roman Chyla wrote: There exists a new open-source implementation of a search interface for SOLR. It is written in Javascript (using Backbone), currently in version v1.0.19 - but new features are constantly coming. Rather than describing it in words, please see it in action for yourself at http://ui.adslabs.org - I'd recommend exploring facets, the query form, and visualizations. The code lives at: http://github.com/adsabs/bumblebee I have no wish to trivialize the work you've done. I haven't looked into the code, but a high-level glance at the documentation suggests that you've put a lot of work into it. I do however have a strong caveat for your users. I'm the guy holding the big sign that says the end is near to anyone who will listen! By itself, this is an awesome tool for prototyping, but without some additional expertise and work, there are severe security implications. If this gets used for a public Internet facing service, the Solr server must be accessible from the end user's machine, which might mean that it must be available to the entire Internet. If the Solr server is not sitting behind some kind of intelligent proxy that can detect and deny aattempts to access certain parts of the Solr API, then Solr will be wide open to attack. A knowledgeable user that has unfiltered access to a Solr server will be able to completely delete the index, change any piece of information in the index, or send denial of service queries that will make it unable to respond to legitimate traffic. Setting up such a proxy is not a trivial task. I know that some people have done it, but so far I have not seen anyone share those configurations. Even with such a proxy, it might still be possible to easily send denial of service queries. I cannot find any information in your README or the documentation links that mentions any of these concerns. I suspect that many who incorporate this client into their websites will be unaware that their setup may be insecure, or how to protect it. Thanks, Shawn
Re: Calling custom request handler with data import
The Data Import Handler isn't pushing data into the /update request handler. However, Data Import Handler can be extended with transformers. Two such transformers are the TemplateTransformer and the ScriptTransformer. It may be possible to get a script function to load your custom Java code. You could also just write a StandfordNerTransformer. Hope this helps, Dan On Fri, Jan 30, 2015 at 9:07 AM, vineet yadav vineet.yadav.i...@gmail.com wrote: Hi, I am using data import handler to import data from mysql, and I want to identify name entities from it. So I am using following example( http://www.searchbox.com/named-entity-recognition-ner-in-solr/). where I am using stanford ner to identify name entities. I am using following requesthandler requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-import.xml/str /lst /requestHandler for importing data from mysql and requestHandler name=/ner class=com.searchbox.ner.NerHandler / updateRequestProcessorChain name=mychain processor class=com.searchbox.ner.NerProcessorFactory lst name=queryFields str name=queryFieldcontent/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainmychain/str /lst /requestHandler for identifying name entities.NER request handler identifies name entities from content field, but store extracted entities in solr fields. NER request handler was working when I am using nutch with solr. But When I am importing data from mysql, ner request handler is not invoked. So entities are not stored in solr for imported documents. Can anybody tell me how to call custom request handler in data import handler. Otherwise if I can invoke ner request handler externally, so that it can index person, organization and location in solr for imported document. It is also fine. Any suggestion are welcome. Thanks Vineet Yadav
Re: New UI for SOLR-based projects
I gather from your comment that I should update readme, because there could be people who would be inclined to use bumblebee development server in production: Beware those who enter through this gate! :-) Your point, that so far you haven't seen anybody share their middle layer can be addressed by pointing to the following projects: https://github.com/adsabs/solr-service https://github.com/adsabs/adsws These are also open source, we use them in production, and have oauth, microservices, rest, and rate limits, we know it is not perfect, but what is? ;-) pull requests welcome! Thanks, Roman On 30 Jan 2015 21:51, Shawn Heisey apa...@elyograg.org wrote: On 1/30/2015 1:07 PM, Roman Chyla wrote: There exists a new open-source implementation of a search interface for SOLR. It is written in Javascript (using Backbone), currently in version v1.0.19 - but new features are constantly coming. Rather than describing it in words, please see it in action for yourself at http://ui.adslabs.org - I'd recommend exploring facets, the query form, and visualizations. The code lives at: http://github.com/adsabs/bumblebee I have no wish to trivialize the work you've done. I haven't looked into the code, but a high-level glance at the documentation suggests that you've put a lot of work into it. I do however have a strong caveat for your users. I'm the guy holding the big sign that says the end is near to anyone who will listen! By itself, this is an awesome tool for prototyping, but without some additional expertise and work, there are severe security implications. If this gets used for a public Internet facing service, the Solr server must be accessible from the end user's machine, which might mean that it must be available to the entire Internet. If the Solr server is not sitting behind some kind of intelligent proxy that can detect and deny aattempts to access certain parts of the Solr API, then Solr will be wide open to attack. A knowledgeable user that has unfiltered access to a Solr server will be able to completely delete the index, change any piece of information in the index, or send denial of service queries that will make it unable to respond to legitimate traffic. Setting up such a proxy is not a trivial task. I know that some people have done it, but so far I have not seen anyone share those configurations. Even with such a proxy, it might still be possible to easily send denial of service queries. I cannot find any information in your README or the documentation links that mentions any of these concerns. I suspect that many who incorporate this client into their websites will be unaware that their setup may be insecure, or how to protect it. Thanks, Shawn
Re: Calling custom request handler with data import
You know, another thing you can do is just write some Java/perl/whatever to pull data out of your database and push it to Solr.Not as convenient for development perhaps, but it has more legs in the long run. Data Import Handler does not easily multi-thread. On Sat, Jan 31, 2015 at 12:34 AM, Dan Davis dansm...@gmail.com wrote: The Data Import Handler isn't pushing data into the /update request handler. However, Data Import Handler can be extended with transformers. Two such transformers are the TemplateTransformer and the ScriptTransformer. It may be possible to get a script function to load your custom Java code. You could also just write a StandfordNerTransformer. Hope this helps, Dan On Fri, Jan 30, 2015 at 9:07 AM, vineet yadav vineet.yadav.i...@gmail.com wrote: Hi, I am using data import handler to import data from mysql, and I want to identify name entities from it. So I am using following example( http://www.searchbox.com/named-entity-recognition-ner-in-solr/). where I am using stanford ner to identify name entities. I am using following requesthandler requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-import.xml/str /lst /requestHandler for importing data from mysql and requestHandler name=/ner class=com.searchbox.ner.NerHandler / updateRequestProcessorChain name=mychain processor class=com.searchbox.ner.NerProcessorFactory lst name=queryFields str name=queryFieldcontent/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainmychain/str /lst /requestHandler for identifying name entities.NER request handler identifies name entities from content field, but store extracted entities in solr fields. NER request handler was working when I am using nutch with solr. But When I am importing data from mysql, ner request handler is not invoked. So entities are not stored in solr for imported documents. Can anybody tell me how to call custom request handler in data import handler. Otherwise if I can invoke ner request handler externally, so that it can index person, organization and location in solr for imported document. It is also fine. Any suggestion are welcome. Thanks Vineet Yadav
role of the wiki and cwiki
I've been thinking of https://wiki.apache.org/solr/ as the Old Wiki and https://cwiki.apache.org/confluence/display/solr as the New Wiki. I guess that's the wrong way to think about it - Confluence is being used for the Solr Reference Guide, and MoinMoin is being used as a wiki. Is this the correct understanding?
Re: role of the wiki and cwiki
On 1/30/2015 10:59 PM, Dan Davis wrote: I've been thinking of https://wiki.apache.org/solr/ as the Old Wiki and https://cwiki.apache.org/confluence/display/solr as the New Wiki. I guess that's the wrong way to think about it - Confluence is being used for the Solr Reference Guide, and MoinMoin is being used as a wiki. Is this the correct understanding? Yes, your understanding is correct. Because the Solr Reference Guide is released as official documentation in PDF form shortly after each new minor Solr version, only committers have the ability to edit the confluence wiki. Anyone can comment on it, so we do have a feedback mechanism. Anyone can edit the MoinMoin wiki, after they ask for edit rights and provide their username for the Solr portion of that wiki. Asking for edit permission is typically done via this mailing list or the IRC channel. Because they have different potential authors, the two systems now serve different purposes. There are still some pages on the MoinMoin wiki that contain documentation that should be in the reference guide, but isn't. The MoinMoin wiki is still useful, as a place where users can collect information that is useful to others, but doesn't qualify as official documentation, or perhaps simply hasn't been verified. I believe this means that a lot of information which has been migrated into the reference guide will eventually be removed from MoinMoin. Thanks, Shawn
Re: role of the wiki and cwiki
Hi Dan, I would say that the wiki is old and dated and that gap is only increasing. I would highly recommend everyone to use the Reference Guide instead of the wiki, unless there's something that they can't find. In case you are unable to find something on the wiki, it'd be good to comment on confluence about the missing content, better still, contribute :-). Now, about the reference guide. The link you've shared above is always the next version of the ref guide e.g. right now, all the content there is w.r.t. 5.0 and is unreleased. The best way to use the reference guide is to download the ref guide for the version you're using. On Fri, Jan 30, 2015 at 9:59 PM, Dan Davis dansm...@gmail.com wrote: I've been thinking of https://wiki.apache.org/solr/ as the Old Wiki and https://cwiki.apache.org/confluence/display/solr as the New Wiki. I guess that's the wrong way to think about it - Confluence is being used for the Solr Reference Guide, and MoinMoin is being used as a wiki. Is this the correct understanding? -- Anshum Gupta http://about.me/anshumgupta
Re: New UI for SOLR-based projects
Nice work Roman! Lukas On Sat, Jan 31, 2015 at 4:36 AM, Roman Chyla roman.ch...@gmail.com wrote: I gather from your comment that I should update readme, because there could be people who would be inclined to use bumblebee development server in production: Beware those who enter through this gate! :-) Your point, that so far you haven't seen anybody share their middle layer can be addressed by pointing to the following projects: https://github.com/adsabs/solr-service https://github.com/adsabs/adsws These are also open source, we use them in production, and have oauth, microservices, rest, and rate limits, we know it is not perfect, but what is? ;-) pull requests welcome! Thanks, Roman On 30 Jan 2015 21:51, Shawn Heisey apa...@elyograg.org wrote: On 1/30/2015 1:07 PM, Roman Chyla wrote: There exists a new open-source implementation of a search interface for SOLR. It is written in Javascript (using Backbone), currently in version v1.0.19 - but new features are constantly coming. Rather than describing it in words, please see it in action for yourself at http://ui.adslabs.org - I'd recommend exploring facets, the query form, and visualizations. The code lives at: http://github.com/adsabs/bumblebee I have no wish to trivialize the work you've done. I haven't looked into the code, but a high-level glance at the documentation suggests that you've put a lot of work into it. I do however have a strong caveat for your users. I'm the guy holding the big sign that says the end is near to anyone who will listen! By itself, this is an awesome tool for prototyping, but without some additional expertise and work, there are severe security implications. If this gets used for a public Internet facing service, the Solr server must be accessible from the end user's machine, which might mean that it must be available to the entire Internet. If the Solr server is not sitting behind some kind of intelligent proxy that can detect and deny aattempts to access certain parts of the Solr API, then Solr will be wide open to attack. A knowledgeable user that has unfiltered access to a Solr server will be able to completely delete the index, change any piece of information in the index, or send denial of service queries that will make it unable to respond to legitimate traffic. Setting up such a proxy is not a trivial task. I know that some people have done it, but so far I have not seen anyone share those configurations. Even with such a proxy, it might still be possible to easily send denial of service queries. I cannot find any information in your README or the documentation links that mentions any of these concerns. I suspect that many who incorporate this client into their websites will be unaware that their setup may be insecure, or how to protect it. Thanks, Shawn
Re: Does DocValues improve Grouping performance ?
A few questions so we can better understand the scale of grouping you're trying to accomplish: How many distinct groups do you typically have in a search result? How many distinct groups are there in the field you are grouping on? How many results are you trying to group in a query? Joel Bernstein Search Engineer at Heliosearch On Fri, Jan 30, 2015 at 4:10 PM, Cario, Elaine elaine.ca...@wolterskluwer.com wrote: Hi Shamik, We use DocValues for grouping, and although I have nothing to compare it to (we started with DocValues), we are also seeing similar poor results as you: easily 60% overhead compared to non-group queries. Looking around for some solution, no quick fix is presenting itself unfortunately. CollapsingQParserPlugin also is too limited for our needs. -Original Message- From: Shamik Bandopadhyay [mailto:sham...@gmail.com] Sent: Thursday, January 15, 2015 6:02 PM To: solr-user@lucene.apache.org Subject: Does DocValues improve Grouping performance ? Hi, Does use of DocValues provide any performance improvement for Grouping ? I' looked into the blog which mentions improving Grouping performance through DocValues. https://lucidworks.com/blog/fun-with-docvalues-in-solr-4-2/ Right now, Group by queries (which I can't sadly avoid) has become a huge bottleneck. It has an overhead of 60-70% compared to the same query san group by. Unfortunately, I'm not able to be CollapsingQParserPlugin as it doesn't have a support similar to group.facet feature. My understanding on DocValues is that it's intended for faceting and sorting. Just wondering if anyone have tried DocValues for Grouping and saw any improvements ? -Thanks, Shamik
AW: AW: AW: CoreContainer#createAndLoad, existing cores not loaded
I looked into sources of CoreAdminHandler#handleCreateAction ... SolrCore core = coreContainer.create(dcore); // only write out the descriptor if the core is successfully created coreContainer.getCoresLocator().create(coreContainer, dcore); ... I was missing the coreContainer.getCoresLocator().create(coreContainer, dcore); When doing the two calls: a) Core.properties is being created AND b) the cores are being loaded upon container-startup ;) :-) -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Freitag, 30. Januar 2015 07:38 An: solr-user@lucene.apache.org Betreff: AW: AW: AW: CoreContainer#createAndLoad, existing cores not loaded The recommendation these days is to NOT use the embedded server We would love to, as it is clear that this is not the Solr-way to go. The reason for us building upon EmbeddedSolrServer is, we have more than 150sites, each with ist own index (core). If we'd go client server then we could no easily update the solr server(s) without also updating all clients (i.e. the 150 sites) at same time. And having a dedicated Solr server for every client/site is not really an option, is it? Or can for example a 4.10.3 client talk to a Solr 5/6 Server? Also when updating the Solr server, doesn't that also require a re-index of all data as the Luncene-storage format might have changed? -Ursprüngliche Nachricht- Von: Shawn Heisey [mailto:apa...@elyograg.org] Gesendet: Donnerstag, 29. Januar 2015 20:30 An: solr-user@lucene.apache.org Betreff: Re: AW: AW: CoreContainer#createAndLoad, existing cores not loaded On 1/29/2015 10:15 AM, Clemens Wyss DEV wrote: to put your solr home inside the extracted WAR We are NOT using war's coreRootDirectory I don't have this property in my sorl.xml If there will only be core.properties files in that cores directory Again, I see no core.properties file. I am creating my cores through CoreContainer.createCore( CordeDescriptor). The folder(s) are created but no core.properties file I am pretty clueless when it comes to the embedded server, but if you are creating the cores in the java code every time you create the container, I bet what I'm telling you doesn't apply at all. The solr.xml file may not even be used. The recommendation these days is to NOT use the embedded server. There are too many limitations and it doesn't receive as much user testing as the webapp. Start Solr as a separate process and access it over http. The overhead of http on a LAN is minimal, and over localhost it's almost nothing. To do that, you would just need to change your code to use one of the client objects. That would probably be HttpSolrServer, which is renamed to HttpSolrClient in 5.0. They share the same parent object as EmbeddedSolrServer. Most of the relevant methods used come from the parent class, so you would need very few code changes. Thanks, Shawn
Re: solrj returning no results but curl can get them
Hi, Some sanity checking: does the solr server base url in the code match the one you use with curl? What if you curl against http://myserver/myapp/ http://myserver/myapp/myproduct%5C ? http://myserver/myapp/myproduct%5C On Fri, Jan 30, 2015 at 5:58 AM, S L sol.leder...@gmail.com wrote: I'm stumped. I've got some solrj 3.6.1 code that works fine against three of my request handlers but not the fourth. The very odd thing is that I have no trouble retrieving results with curl against all of the result handlers. My solrj code sets some parameters: ModifiableSolrParams params = new ModifiableSolrParams(); params.set(fl,*,score); params.set(rows,500); params.set(qt,/+product); params.set(hl, on); params.set(hl.fl, title snippet); params.set(hl.fragsize,50); params.set(hl.simple.pre,); params.set(hl.simple.post,); queryString = ( + queryString + s[s.length-1] + ); I have various request handlers that key off of the product value. I'll call the one that doesn't work myproduct. I send the parameter string to catalina.out for debugging: System.out.println(params.toString()); I get this: fl=*%2Cscorerows=500qt=%2Fmyproducthl=onhl.fl=title+snippethl.fragsize=50 hl.simple.pre=%3Cspan+class%3D%22hlt%22%3E hl.simple.post=%3C%2Fspan%3Eq=title%3A%28brain%29+OR+snippet%3A%28brain%29 I get no results when I let the solrj code do the search although the code works fine with the other three products. To convince myself that there is nothing wrong with the data I unencode the parameter string and run this command: curl http://myserver/myapp/myproduct\ fl=*,scorerows=500qt=/myproducthl=onhl.fl=title+snippethl.fragsize=50\ hl.simple.pre=span+class=quot;hltquot;hl.simple.post=\ q=title:brain%20OR%20snippet:brain It runs just fine. How can I debug this? Thanks very much. -- View this message in context: http://lucene.472066.n3.nabble.com/solrj-returning-no-results-but-curl-can-get-them-tp4183053.html Sent from the Solr - User mailing list archive at Nabble.com. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: WordDelimiterFilterFactory and position increment.
Hi, Do you use WordDelimiterFilter on query side as well? On Fri, Jan 30, 2015 at 12:51 PM, Modassar Ather modather1...@gmail.com wrote: Hi, An insight in the behavior of WordDelimiterFilter will be very helpful. Please share your inputs. Thanks, Modassar On Thu, Jan 22, 2015 at 2:54 PM, Modassar Ather modather1...@gmail.com wrote: Hi, I am using WordDelimiterFilter while indexing. Parser used is edismax. Phrase search is failing for terms like 3d image. On the analysis page it shows following four tokens for *3d* and there positions. *token position* 3d 1 3 1 3d 1 d 2 image 3 Here the token d is at position 2 which per my understanding causes the phrase search 3d image fail. 3d image~1 works fine. Same behavior is present for wi-fi device and other few queries starting with token which is tokenized as shown above in the table. Kindly help me understand the behavior and let me know how the phrase search is possible in such cases without the slop. Thanks, Modassar -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: WordDelimiterFilterFactory and position increment.
Hi, An insight in the behavior of WordDelimiterFilter will be very helpful. Please share your inputs. Thanks, Modassar On Thu, Jan 22, 2015 at 2:54 PM, Modassar Ather modather1...@gmail.com wrote: Hi, I am using WordDelimiterFilter while indexing. Parser used is edismax. Phrase search is failing for terms like 3d image. On the analysis page it shows following four tokens for *3d* and there positions. *token position* 3d 1 3 1 3d 1 d 2 image 3 Here the token d is at position 2 which per my understanding causes the phrase search 3d image fail. 3d image~1 works fine. Same behavior is present for wi-fi device and other few queries starting with token which is tokenized as shown above in the table. Kindly help me understand the behavior and let me know how the phrase search is possible in such cases without the slop. Thanks, Modassar