multivalued field using DIH
I am using solr 4.7 and am importing data directly from a mysql database table using the DIH. I have a column that looks like similar to this below in that it has multiple values in the database. material cotton polyester blend rayon I would like the data to look like the following when imported. str name=materialcotton/str str name=materialpolyester blend/str str name=materialrayon/str. In other words. If there is multiple data points for a particular column and the mapped field is multivalued, create multiple str name fields. If there are quotes around multiple words, treat them as one token. Is this possible? -- View this message in context: http://lucene.472066.n3.nabble.com/multivalued-field-using-DIH-tp4127297.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.3.1 memory swapping
On 3/26/2014 10:26 PM, Darrell Burgan wrote: Okay well it didn't take long for the swapping to start happening on one of our nodes. Here is a screen shot of the Solr console: https://s3-us-west-2.amazonaws.com/panswers-darrell/solr.png And here is a shot of top, with processes sorted by VIRT: https://s3-us-west-2.amazonaws.com/panswers-darrell/top.png As shown, we have used up more than 25% of the swap space, over 1GB, even though there is 16GB of OS RAM available, and the Solr JVM has been allocated only 10GB. Further, we're only consuming 1.5/4GB of the 10GB of JVM heap. Top shows that the Solr process 21582 is using 2.4GB resident but has a virtual size of 82.4GB. Presumably that virtual size is due to the memory mapped file. The other Java process 27619 is Zookeeper. So my question remains - why did we use any swap space at all? Doesn't seem like we're experiencing memory pressure at the moment ... I'm confused. :-) The virtual memory value is indeed that large because of the mmapped file. There is definitely something wrong here. I don't know whether it's Java, RHEL, or something strange with the S3 virtual machine, possibly a bad interaction with the older kernel. With your -Xmx value, Java should never use more than about 10.5 GB of physical memory, and the top output indicates that it's only using 2.4GB of memory. 13GB is used by the OS disk cache. You might notice that I'm not mentioning Solr in the list of possible problems. This is because an unmodified Solr install only utilizes the Java heap, so it's Java that is in charge of allocating memory from the operating system. Here is a script that will tell you what's using swap and how much. This will let you be absolutely sure about whether or not Java is the problem child: http://stackoverflow.com/a/7180078/2665648 There are instructions in the comments of the script for sorting the output. The only major thing I saw in your JVM config (aside from perhaps reducing the max heap) that I would change is the garbage collector tuning. I'm the original author mentioned in this wiki page: http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems Here's a screenshot from my dev solr server, where you can see that there is zero swap usage: https://www.dropbox.com/s/mftgi3q2hn7w9qp/solr-centos6-top.png This is a baremetal server with 16GB of RAM, running CentOS 6.5 and a pre-release snapshot of Solr 4.7.1. With an Intel Xeon X3430, I'm pretty sure the processor architecture is NUMA, but the motherboard only has one CPU slot, so it's only got one NUMA node. As you can see by my virtual memory value, I have a lot more index data on this machine than you have on yours. My heap is 7GB. The other three java processes that you can see running are in-house software related to Solr. Performance is fairly slow with that much index and so little disk cache, but it's a dev server. The production environment has plenty of RAM to cache the entire index. Thanks, Shawn
Re: multivalued field using DIH
On 3/27/2014 12:49 AM, scallawa wrote: I am using solr 4.7 and am importing data directly from a mysql database table using the DIH. I have a column that looks like similar to this below in that it has multiple values in the database. material cotton polyester blend rayon I would like the data to look like the following when imported. str name=materialcotton/str str name=materialpolyester blend/str str name=materialrayon/str. In other words. If there is multiple data points for a particular column and the mapped field is multivalued, create multiple str name fields. If there are quotes around multiple words, treat them as one token. Is this possible? In a direct manner, I do not think so. If the input data were simply space separated and didn't have the quoted string that includes a space, you could use the RegexTransformer in DIH and do a simple 'splitBy' on the field. If you know how to write a regex that would only match the spaces outside of the quotes, you could still use that method. I have no idea how to do that. Alternatively, you can write a custom update processor for Solr that knows how to break up the input, remove the original field, and reinsert it with the multiple values. Custom update processors are not very difficult if you already know how to write a program, but it's not trivial. If the database actually has multiple values in a table rather than the space separation, there are two possibilities: 1) Use nested DIH entities, which makes a query to the database for every document. 2) Use a JOIN with GROUP_CONCAT to construct a value with a delimiter other than space - something that won't ever show up in the actual data. You can then use the splitBy method that I already mentioned. You'd need to consult a database expert for help with JOIN and GROUP_CONCAT. Thanks, Shawn
Re: FE Integration with JSON
On 3/27/2014 2:11 AM, Bernhard Prange wrote: I am looking for a simple solution to construct a frontend search. The search provider just gave me a JSON Url. Anybody has a simple guide or some snippets for that? There are no details here. What specifically do you need help with? Presumably you want help with Solr because you're on the solr-user mailing list, but the only technology you've mentioned is JSON. Let's say that you are wanting to add search to System X. The first question that comes to mind is: What programming language is System X written in? The answer will make a big difference in where the discussion goes. Thanks, Shawn
AW: Indexing parts of an HTML file differently
Thanks for your answer Jack. @Gora: How are you fetching the HTML content, and indexing it into Solr? We are using SolR with the OpenText Delivery Server. The Delivery Server generated HTML representations of the published pages and writes them to the directory, which is used by solr to get data content. It is probably best to handle this requirement at that point. Haven't used Nutch ( http://nutch.apache.org/) recently, but you might be able to use it for this. Do you mean the web crawler way? From the first view, it fits us not very good. In this case we need to implement ourselves the OpenText Search layer. Theoretically, we can try to teach DeliveryServer to understand external indexes. But the crawling itself is not the preferred solution - it is not so responsive, as the DS-way; in case of existing authorization restrictions, it should be many crawler users for every role; etc... -Ursprüngliche Nachricht- Von: Gora Mohanty [mailto:g...@mimirtech.com] Gesendet: Dienstag, 25. März 2014 11:32 An: solr-user@lucene.apache.org Betreff: Re: Indexing parts of an HTML file differently On 25 March 2014 15:59, Michael Clivot cli...@netmedia.de wrote: Hello, I have the following issue and need help: One HTML file has different parts for different countries. For example: !-- Country: FR, BE --- Address for France and Benelux !-- Country End -- !-- Country: CH -- Address for Switzerland !-- Country End -- Depending on a parameter, I show or hide the parts on the website Logically, all parts are in the index and therefore all items are found by SolR. My question is: how can I have only the items for the current country in my result list? How are you fetching the HTML content, and indexing it into Solr? It is probably best to handle this requirement at that point. Haven't used Nutch ( http://nutch.apache.org/ ) recently, but you might be able to use it for this. Regards, Gora
Re: FE Integration with JSON
right :) Thanks Shawn. It is the Frontend of a Webpage. (HTML5). The search provider offers me an URL where I get a query result of solr (in JSON). That's what I have. What I need is a How to for the UI rendering of this file. (And the search query functionality). The SOLR Server is on a remote location. Am 27.03.2014 09:25, schrieb Shawn Heisey: On 3/27/2014 2:11 AM, Bernhard Prange wrote: I am looking for a simple solution to construct a frontend search. The search provider just gave me a JSON Url. Anybody has a simple guide or some snippets for that? There are no details here. What specifically do you need help with? Presumably you want help with Solr because you're on the solr-user mailing list, but the only technology you've mentioned is JSON. Let's say that you are wanting to add search to System X. The first question that comes to mind is: What programming language is System X written in? The answer will make a big difference in where the discussion goes. Thanks, Shawn
Re: FE Integration with JSON
Still not enough details. But let me try to understand: There is a third party provider. They are exposing Solr directly to the internet and you have a particular query that returns Solr results in JSON form. You want to know if there are libraries/components that will know how to parse that Solr JSON result and present it on a screen. Is that about right? If so, there is one big issue to resolve before wasting time on anything else. Specifically, Solr should not be exposed directly to the web as it is not built for security. Unless this third party provider is specifically building some sort of hardened-hosted-Solr service, in which case I am very curious to know who they are. Usually, there is a middle-ware implementation that talks to Solr (like to a database) and then sends domain-specific results to the client. There is also a question of what features you are using. E.g. Facets? Folding? Auto-complete? Etc. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Mar 27, 2014 at 3:39 PM, Bernhard Prange m...@bernhard-prange.de wrote: right :) Thanks Shawn. It is the Frontend of a Webpage. (HTML5). The search provider offers me an URL where I get a query result of solr (in JSON). That's what I have. What I need is a How to for the UI rendering of this file. (And the search query functionality). The SOLR Server is on a remote location. Am 27.03.2014 09:25, schrieb Shawn Heisey: On 3/27/2014 2:11 AM, Bernhard Prange wrote: I am looking for a simple solution to construct a frontend search. The search provider just gave me a JSON Url. Anybody has a simple guide or some snippets for that? There are no details here. What specifically do you need help with? Presumably you want help with Solr because you're on the solr-user mailing list, but the only technology you've mentioned is JSON. Let's say that you are wanting to add search to System X. The first question that comes to mind is: What programming language is System X written in? The answer will make a big difference in where the discussion goes. Thanks, Shawn
Re: Indexing parts of an HTML file differently
Can you get Delivery Server to generate Solr-style XML or JSON update file? Might be easier than generating and then re-parsing HTML? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Mar 27, 2014 at 3:28 PM, Michael Clivot cli...@netmedia.de wrote: Thanks for your answer Jack. @Gora: How are you fetching the HTML content, and indexing it into Solr? We are using SolR with the OpenText Delivery Server. The Delivery Server generated HTML representations of the published pages and writes them to the directory, which is used by solr to get data content. It is probably best to handle this requirement at that point. Haven't used Nutch ( http://nutch.apache.org/) recently, but you might be able to use it for this. Do you mean the web crawler way? From the first view, it fits us not very good. In this case we need to implement ourselves the OpenText Search layer. Theoretically, we can try to teach DeliveryServer to understand external indexes. But the crawling itself is not the preferred solution - it is not so responsive, as the DS-way; in case of existing authorization restrictions, it should be many crawler users for every role; etc... -Ursprüngliche Nachricht- Von: Gora Mohanty [mailto:g...@mimirtech.com] Gesendet: Dienstag, 25. März 2014 11:32 An: solr-user@lucene.apache.org Betreff: Re: Indexing parts of an HTML file differently On 25 March 2014 15:59, Michael Clivot cli...@netmedia.de wrote: Hello, I have the following issue and need help: One HTML file has different parts for different countries. For example: !-- Country: FR, BE --- Address for France and Benelux !-- Country End -- !-- Country: CH -- Address for Switzerland !-- Country End -- Depending on a parameter, I show or hide the parts on the website Logically, all parts are in the index and therefore all items are found by SolR. My question is: how can I have only the items for the current country in my result list? How are you fetching the HTML content, and indexing it into Solr? It is probably best to handle this requirement at that point. Haven't used Nutch ( http://nutch.apache.org/ ) recently, but you might be able to use it for this. Regards, Gora
dih data-config.xml onImportEnd event
i would like to call a url after the import is finished whith the event document onImportEnd=. how can i do this?
Re: Facetting by field then query
I don't think you can do it, as pivot facetinghttp://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting doesn't let you use facet queries. The closer query I can imagine is: - q=sentence:bar OR sentence:foo - facet=true - facet.pivot=media_id,sentence At least the q will make faceting only by those documents containing foo and bar but depending on the size of sentence field you cant get a huge response. Hope it helps. On Wed, Mar 26, 2014 at 11:12 PM, David Larochelle dlaroche...@cyber.law.harvard.edu wrote: I have the following schema field name=id type=string indexed=true stored=true required=true multiValued=false / field name=media_id type=int indexed=true stored=true required=false multiValued=false / field name=sentence type=text_general indexed=true stored=true termVectors=true termPositions=true termOffsets=true / I'd like to be able to facet by a field and then by queries. i.e. facet_fields: {media_id: [1:{ sentence:foo: 102410, sentence:bar: 29710}2: { sentence:foo: 600, sentence:bar: 220} 3: { sentence:foo: 80, sentence:bar: 2330}]} However, when I try: http://localhost:8983/solr/collection1/select?q=*:*wt=jsonindent=truefacet=truefacet.query=sentence%3A%foofacet.query=sentence%3Abarfacet.field=media_id the facet counts for the queries and media_id are listed separately rather than hierarchically. I realize that I could use 2 separate requests and programmatically combine the results but would much prefer to use a single Solr request. Is there any way to go this in Solr? Thanks in advance, David
Re: dih data-config.xml onImportEnd event
I don't think there is one like that. But you might be able to use a custom UpdateRequestProcessor? Or a postCommit hook in solrconfig.xml Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Mar 27, 2014 at 3:58 PM, Andreas Owen a.o...@gmx.net wrote: i would like to call a url after the import is finished whith the event document onImportEnd=. how can i do this?
Re: dih data-config.xml onImportEnd event
Hi Andres, Here is a snippet you can use for starting point. import org.apache.solr.handler.dataimport.Context; import org.apache.solr.handler.dataimport.EventListener; public class MyEventListener implements EventListener { public void onEvent(Context ctx) { if (Context.DELTA_DUMP.equals(ctx.currentProcess())) { // do something call a URL } } } http://wiki.apache.org/solr/DataImportHandler#EventListeners Ahmet On Thursday, March 27, 2014 11:08 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: I don't think there is one like that. But you might be able to use a custom UpdateRequestProcessor? Or a postCommit hook in solrconfig.xml Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Mar 27, 2014 at 3:58 PM, Andreas Owen a.o...@gmx.net wrote: i would like to call a url after the import is finished whith the event document onImportEnd=. how can i do this?
Re: dih data-config.xml onImportEnd event
Oops. Ignore my email. I learnt something today that I have not seen anybody else use. Are there live open-source examples of the DIH EventListeners? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Mar 27, 2014 at 4:11 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Andres, Here is a snippet you can use for starting point. import org.apache.solr.handler.dataimport.Context; import org.apache.solr.handler.dataimport.EventListener; public class MyEventListener implements EventListener { public void onEvent(Context ctx) { if (Context.DELTA_DUMP.equals(ctx.currentProcess())) { // do something call a URL } } } http://wiki.apache.org/solr/DataImportHandler#EventListeners Ahmet On Thursday, March 27, 2014 11:08 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: I don't think there is one like that. But you might be able to use a custom UpdateRequestProcessor? Or a postCommit hook in solrconfig.xml Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Mar 27, 2014 at 3:58 PM, Andreas Owen a.o...@gmx.net wrote: i would like to call a url after the import is finished whith the event document onImportEnd=. how can i do this?
Re: MergingSolrIndexes not supported by SolrCloud?why?
I use hdfs to test, that not work. I tried: (1) **/indexDir=hdfs://ip/solr/sample/data/index (2) **/indexDir=/solr/sample/data/index not work well. I also try: (3) **/srcCore=sample not work well. can give me some success sample. 3x! I insert data, hdfs appear index files, that is ok, but mergeindex is not work well. solr 4.4 and cloudear hdfs. -- View this message in context: http://lucene.472066.n3.nabble.com/MergingSolrIndexes-not-supported-by-SolrCloud-why-tp4127111p4127350.html Sent from the Solr - User mailing list archive at Nabble.com.
facet doesnt display all possibilities after selecting one
when i select a facet in thema_f all the others in the group disapear but the other facets keep the original findings. it seems like it should work. maybe the underscore is the wrong char for the seperator? example documents in index doc arr name=thema_f str1_Produkte/str /arr str name=iddms:381/str /doc doc arr name=thema_f str1_Beratung/str str1_Beratung_Beratungsportal PK/str /arr str name=iddms:2679/str /doc doc arr name=thema_f str1_Beratung/str str1_Beratung_Beratungsportal PK/str /arr str name=iddms:190/str /doc solrconfig.xml requestHandler name=/select2 class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypesynonym_edismax/str str name=synonymstrue/str str name=qfplain_text^10 editorschoice^200 title^20 h_*^14 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10 productsegment^5 productgroup^5 contentmanager^5 links^5 last_modified^5 url^5 /str str name=bq(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str str name=bfdiv(clicks,max(displays,1))^8/str !-- tested -- str name=dftext/str str name=fl*,path,score/str str name=wtjson/str str name=q.opAND/str !-- Highlighting defaults -- str name=hlon/str str name=hl.flplain_text,title/str str name=hl.fragSize200/str str name=hl.simple.prelt;bgt;/str str name=hl.simple.postlt;/bgt;/str !-- lst name=invariants -- str name=faceton/str str name=facet.mincount1/str str name=facet.missingfalse/str str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str str name=f.inhaltstyp_s.facet.sortindex/str str name=facet.field{!ex=doctype}doctype/str str name=f.doctype.facet.sortindex/str str name=facet.field{!ex=thema_f}thema_f/str str name=f.thema_f.facet.sortindex/str str name=facet.field{!ex=productsegment_f}productsegment_f/str str name=f.productsegment_f.facet.sortindex/str str name=facet.field{!ex=productgroup_f}productgroup_f/str str name=f.productgroup_f.facet.sortindex/str str name=facet.field{!ex=author_s}author_s/str str name=f.author_s.facet.sortindex/str str name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str str name=f.sachverstaendiger_s.facet.sortindex/str str name=facet.field{!ex=veranstaltung_s}veranstaltung_s/str str name=f.veranstaltung_s.facet.sortindex/str str name=facet.field{!ex=kundensegment_aktive_beratung}kundensegment_aktive_beratung/str str name=f.kundensegment_aktive_beratung.facet.sortindex/str str name=facet.date{!ex=last_modified}last_modified/str str name=facet.date.gap+1MONTH/str str name=facet.date.endNOW/MONTH+1MONTH/str str name=facet.date.startNOW/MONTH-36MONTHS/str str name=facet.date.otherafter/str /lst /requestHandler schema.xml fieldType name=text_thema class=solr.TextField positionIncrementGap=100 !-- analyzer tokenizer class=solr.PatternTokenizerFactory pattern=_/ /analyzer-- analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType
dih data-config.xml onImportEnd event
i would like to call a url after the import is finished whith the event document onImportEnd=. how can i do this?
Re: Facetting by field then query
For pivot facets in SolrCloud, see https://issues.apache.org/jira/browse/SOLR-2894 Resolution: Unresolved Fix Version/s 4.8 I am waiting patiently ... On 03/27/2014 05:04 AM, Alvaro Cabrerizo wrote: I don't think you can do it, as pivot facetinghttp://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting doesn't let you use facet queries. The closer query I can imagine is: - q=sentence:bar OR sentence:foo - facet=true - facet.pivot=media_id,sentence At least the q will make faceting only by those documents containing foo and bar but depending on the size of sentence field you cant get a huge response. Hope it helps. On Wed, Mar 26, 2014 at 11:12 PM, David Larochelle dlaroche...@cyber.law.harvard.edu wrote: I have the following schema field name=id type=string indexed=true stored=true required=true multiValued=false / field name=media_id type=int indexed=true stored=true required=false multiValued=false / field name=sentence type=text_general indexed=true stored=true termVectors=true termPositions=true termOffsets=true / I'd like to be able to facet by a field and then by queries. i.e. facet_fields: {media_id: [1:{ sentence:foo: 102410, sentence:bar: 29710}2: { sentence:foo: 600, sentence:bar: 220} 3: { sentence:foo: 80, sentence:bar: 2330}]} However, when I try: http://localhost:8983/solr/collection1/select?q=*:*wt=jsonindent=truefacet=truefacet.query=sentence%3A%foofacet.query=sentence%3Abarfacet.field=media_id the facet counts for the queries and media_id are listed separately rather than hierarchically. I realize that I could use 2 separate requests and programmatically combine the results but would much prefer to use a single Solr request. Is there any way to go this in Solr? Thanks in advance, David
Re: dih data-config.xml onImportEnd event
I would suggest you read the replies to your last mail (containing the very same question) first? -Stefan On Thursday, March 27, 2014 at 1:56 PM, Andreas Owen wrote: i would like to call a url after the import is finished whith the event document onImportEnd=. how can i do this?
Re: facet doesnt display all possibilities after selecting one
On Thu, Mar 27, 2014 at 8:56 AM, Andreas Owen ao...@swissonline.ch wrote: when i select a facet in thema_f all the others in the group disapear OK, I see you're excluding filters tagged with thema_f when faceting on the thema_f field. str name=facet.field{!ex=thema_f}thema_f/str Now all you should need to do is tag the right filter with that when you select the facet. fq={!tag=thema_f}thema_f:1_Beratung http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams -Yonik http://heliosearch.org - solve Solr GC pauses with off-heap filters and fieldcache but the other facets keep the original findings. it seems like it should work. maybe the underscore is the wrong char for the seperator? example documents in index doc arr name=thema_f str1_Produkte/str /arr str name=iddms:381/str /doc doc arr name=thema_f str1_Beratung/str str1_Beratung_Beratungsportal PK/str /arr str name=iddms:2679/str /doc doc arr name=thema_f str1_Beratung/str str1_Beratung_Beratungsportal PK/str /arr str name=iddms:190/str /doc solrconfig.xml requestHandler name=/select2 class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypesynonym_edismax/str str name=synonymstrue/str str name=qfplain_text^10 editorschoice^200 title^20 h_*^14 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10 productsegment^5 productgroup^5 contentmanager^5 links^5 last_modified^5 url^5 /str str name=bq(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str str name=bfdiv(clicks,max(displays,1))^8/str !-- tested -- str name=dftext/str str name=fl*,path,score/str str name=wtjson/str str name=q.opAND/str !-- Highlighting defaults -- str name=hlon/str str name=hl.flplain_text,title/str str name=hl.fragSize200/str str name=hl.simple.prelt;bgt;/str str name=hl.simple.postlt;/bgt;/str !-- lst name=invariants -- str name=faceton/str str name=facet.mincount1/str str name=facet.missingfalse/str str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str str name=f.inhaltstyp_s.facet.sortindex/str str name=facet.field{!ex=doctype}doctype/str str name=f.doctype.facet.sortindex/str str name=facet.field{!ex=thema_f}thema_f/str str name=f.thema_f.facet.sortindex/str str name=facet.field{!ex=productsegment_f}productsegment_f/str str name=f.productsegment_f.facet.sortindex/str str name=facet.field{!ex=productgroup_f}productgroup_f/str str name=f.productgroup_f.facet.sortindex/str str name=facet.field{!ex=author_s}author_s/str str name=f.author_s.facet.sortindex/str str name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str str name=f.sachverstaendiger_s.facet.sortindex/str str name=facet.field{!ex=veranstaltung_s}veranstaltung_s/str str name=f.veranstaltung_s.facet.sortindex/str str name=facet.field{!ex=kundensegment_aktive_beratung}kundensegment_aktive_beratung/str str name=f.kundensegment_aktive_beratung.facet.sortindex/str str name=facet.date{!ex=last_modified}last_modified/str str name=facet.date.gap+1MONTH/str str name=facet.date.endNOW/MONTH+1MONTH/str str name=facet.date.startNOW/MONTH-36MONTHS/str str name=facet.date.otherafter/str /lst /requestHandler schema.xml fieldType name=text_thema class=solr.TextField positionIncrementGap=100 !-- analyzer tokenizer class=solr.PatternTokenizerFactory pattern=_/ /analyzer-- analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType
Block until replication finishes
Hi, we are moving to native replication with SOLR 3.5.1. Because we want to control the replication from another program (a cron job), we decided to curl the slave to issue a fetchIndex command. The problem we have is that the curl returns immediately, while the replication still goes in the background. We need to know when the replication is done, and then resume the cron job. Is there a way to block on the replication call until it's done similar to waitForSearcher=true when committing ? If not, what other possibilities we have? Just in case, here is the solrconfig part in the slave (we pass masterUrl in the curl url) requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrl/str /lst /requestHandler Many thanks in advance -- Fermin Silva
Please remove this thread.
Hello Admin, Can you please remove this thread http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/93279 There is no reason to have this thread live. Please and thank you. Baruch!
Logging which client connected to Solr
Hello, I’m investigating the possibility of logging the username of the client who did the search on Solr along with the normal logging information. The username is in the basic auth headers of the request, and the access control is managed by an Apache instance proxying to Solr. Is there a way to append that information to the Solr query log, so that the log would look like this: INFO - 2014-03-27 11:16:24.000; org.apache.solr.core.SolrCore; [generic] webapp=/solr path=/select params={lots of params} hits=0 status=0 QTime=49 username=juha I need to log both username and the query, and if I do it directly in Apache then I lose the information about amount of hits and the query time. If I log it with Solr then I get query time and hits, but no username. Username logging is higher priority requirement than the hits and query time, but I’m looking for solution that covers both cases. Has anyone implemented this kind of logging scheme, and how would I accomplish this? I couldn’t find this as a configuration option. Regards, Juha
Re: Logging which client connected to Solr
We do something similar and include the server's hostname in solr's response. To accomplish this you'll have to write a class that extends org.apache.solr.servlet.SolrDispatchFilter and put your custom class in place as the SolrRequestFilter in solr's web.xml. Thanks, Greg On Mar 27, 2014, at 8:59 AM, Juha Haaga juha.ha...@codenomicon.com wrote: Hello, I’m investigating the possibility of logging the username of the client who did the search on Solr along with the normal logging information. The username is in the basic auth headers of the request, and the access control is managed by an Apache instance proxying to Solr. Is there a way to append that information to the Solr query log, so that the log would look like this: INFO - 2014-03-27 11:16:24.000; org.apache.solr.core.SolrCore; [generic] webapp=/solr path=/select params={lots of params} hits=0 status=0 QTime=49 username=juha I need to log both username and the query, and if I do it directly in Apache then I lose the information about amount of hits and the query time. If I log it with Solr then I get query time and hits, but no username. Username logging is higher priority requirement than the hits and query time, but I’m looking for solution that covers both cases. Has anyone implemented this kind of logging scheme, and how would I accomplish this? I couldn’t find this as a configuration option. Regards, Juha
[ANN] Solr in Action book release (Solr 4.7)
I'm excited to announce the final print release of *Solr in Action*, the newest Solr book by Manning publications covering through Solr 4.7 (the current version). The book is available for immediate purchase in print and ebook formats, and the *outline*, some *free chapters* as well as the *full source code are also available* at http://solrinaction.com. I would love it if you would check the book out, and I would also appreciate your feedback on it, especially if you find the book to be a useful guide as you are working with Solr! Timothy Potter and I (Trey Grainger) worked tirelessly on the book for nearly 2 years to bring you a thorough (664 pg.) and fantastic example-driven guide to the best Solr has to offer. *Solr in Action* is intentionally designed to be a learning guide as opposed to a reference manual. It builds from an initial introduction to Solr all the way to advanced topics such as implementing a predictive search experience, writing your own Solr plugins for function queries and multilingual text analysis, using Solr for big data analytics, and even building your own Solr-based recommendation engine. The book uses fun real-world examples, including analyzing the text of tweets, searching and faceting on restaurants, grouping similar items in an ecommerce application, highlighting interesting keywords in UFO sighting reports, and even building a personalized job search experience. For a more detailed write-up about the book and it's contents, you can also visit the Solr homepage at https://lucene.apache.org/solr/books.html#solr-in-action. Thanks in advance for checking it out, and I really hope many of you find the book to be personally useful! All the best, Trey Grainger Co-author, *Solr in Action*Director of Engineering, Search Analytics @CareerBuilder
Re: Logging which client connected to Solr
You could always just pass the username as part of the GET params for the query. Solr will faithfully ignore and log any parameters it doesn¹t recognize, so it¹d show up in your {lot of params}. That means your log parser would need more intelligence, and your client would have to pass in the data, but it would save any custom work on the server side. On 3/27/14, 7:07 AM, Greg Walters greg.walt...@answers.com wrote: We do something similar and include the server's hostname in solr's response. To accomplish this you'll have to write a class that extends org.apache.solr.servlet.SolrDispatchFilter and put your custom class in place as the SolrRequestFilter in solr's web.xml. Thanks, Greg On Mar 27, 2014, at 8:59 AM, Juha Haaga juha.ha...@codenomicon.com wrote: Hello, I¹m investigating the possibility of logging the username of the client who did the search on Solr along with the normal logging information. The username is in the basic auth headers of the request, and the access control is managed by an Apache instance proxying to Solr. Is there a way to append that information to the Solr query log, so that the log would look like this: INFO - 2014-03-27 11:16:24.000; org.apache.solr.core.SolrCore; [generic] webapp=/solr path=/select params={lots of params} hits=0 status=0 QTime=49 username=juha I need to log both username and the query, and if I do it directly in Apache then I lose the information about amount of hits and the query time. If I log it with Solr then I get query time and hits, but no username. Username logging is higher priority requirement than the hits and query time, but I¹m looking for solution that covers both cases. Has anyone implemented this kind of logging scheme, and how would I accomplish this? I couldn¹t find this as a configuration option. Regards, Juha
Timeout when deleting collections or aliases in Solr 4.6.1
I'm trying to delete some data on a 12 node Solr cloud environment. The cluster is running Solr 4.6.1. When I try to delete an alias the collections api returns: org.apache.solr.common.SolrException: deletealias the collection time out:60s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:204) at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:185) at org.apache.solr.handler.admin.CollectionsHandler.handleDeleteAliasAction(CollectionsHandler.java:274) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:154) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:673) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744) It doesn't seem to matter which server in the cluster I runt this on. There's no data being inserted and no queries being performed. I have the same problem whether I attempt to remove a collection or a collection alias. How do I find the source of this problem? Many Thanks, -D
Re: [ANN] Solr in Action book release (Solr 4.7)
Nice, Congrats! -- Mark Miller about.me/markrmiller On March 27, 2014 at 11:17:49 AM, Trey Grainger (solrt...@gmail.com) wrote: I'm excited to announce the final print release of *Solr in Action*, the newest Solr book by Manning publications covering through Solr 4.7 (the current version). The book is available for immediate purchase in print and ebook formats, and the *outline*, some *free chapters* as well as the *full source code are also available* at http://solrinaction.com. I would love it if you would check the book out, and I would also appreciate your feedback on it, especially if you find the book to be a useful guide as you are working with Solr! Timothy Potter and I (Trey Grainger) worked tirelessly on the book for nearly 2 years to bring you a thorough (664 pg.) and fantastic example-driven guide to the best Solr has to offer. *Solr in Action* is intentionally designed to be a learning guide as opposed to a reference manual. It builds from an initial introduction to Solr all the way to advanced topics such as implementing a predictive search experience, writing your own Solr plugins for function queries and multilingual text analysis, using Solr for big data analytics, and even building your own Solr-based recommendation engine. The book uses fun real-world examples, including analyzing the text of tweets, searching and faceting on restaurants, grouping similar items in an ecommerce application, highlighting interesting keywords in UFO sighting reports, and even building a personalized job search experience. For a more detailed write-up about the book and it's contents, you can also visit the Solr homepage at https://lucene.apache.org/solr/books.html#solr-in-action. Thanks in advance for checking it out, and I really hope many of you find the book to be personally useful! All the best, Trey Grainger Co-author, *Solr in Action*Director of Engineering, Search Analytics @CareerBuilder
Re: Please remove this thread.
On 3/27/2014 7:37 AM, Baruch wrote: Can you please remove this thread http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/93279 There is no reason to have this thread live. This is an Apache mailing list. Apache almost never honors requests to remove anything from its mailing list archive. http://www.apache.org/foundation/public-archives.html The URL that you used to indicate what to remove illustrates one of the main reasons why Apache's policy exists: You linked to gmane.org, a site that Apache does not control. This list is also mirrored in the nabble.com forums, and several other places. Apache cannot make changes to any of them except its own archive at http://mail-archives.apache.org/ . Thanks, Shawn
WordDelimiterFilterFactory splits up hyphenated terms although splitOnNumerics, generateWordParts and generateNumberParts are set to 0 (false)
I am using Solr 4.7 and have got a serious problem with WordDelimiterFilterFactory. WordDelimiterFilterFactory behaves different on hyphenated terms if they contain charaters (a-Z) or characters AND numbers. Splitting up hyphenated terms is deactivated in my configuration: *This is the fieldType setup from my schema:* {code} fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=lang/synonyms_de.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} The given search term is: *X-002-99-495* WordDelimiterFilterFactory indexes the following word parts: * X-002-99-495 * X (shouldn't be there) * 00299495 (shouldn't be there) * X00299495 But the 'X' should not be indexed or queried as a single term. You can see that splitting is completely deactivated in the schema. I can move the charater part around in the search term: Searching for *002-abc-99-495* gives me * 002-abc-99-495 * 002 (shouldn't be there) * abc (shouldn't be there) * 99495 (shouldn't be there) * 002abc99495 Searching for Searching for *002-99-495* (no character) gives me * 002-99-495 * 00299495 This result is what I would expect. Any ideas?
Re: Logging which client connected to Solr
I assume you are passing extra info to Solr. Then you can write servletfilter to put it in NDC or MDC which can then be picked up by log4j config pattern. This approach is not Solr specific. Just usual servlet/log stuff. Regards, Alex On 27/03/2014 9:00 pm, Juha Haaga juha.ha...@codenomicon.com wrote: Hello, I’m investigating the possibility of logging the username of the client who did the search on Solr along with the normal logging information. The username is in the basic auth headers of the request, and the access control is managed by an Apache instance proxying to Solr. Is there a way to append that information to the Solr query log, so that the log would look like this: INFO - 2014-03-27 11:16:24.000; org.apache.solr.core.SolrCore; [generic] webapp=/solr path=/select params={lots of params} hits=0 status=0 QTime=49 username=juha I need to log both username and the query, and if I do it directly in Apache then I lose the information about amount of hits and the query time. If I log it with Solr then I get query time and hits, but no username. Username logging is higher priority requirement than the hits and query time, but I’m looking for solution that covers both cases. Has anyone implemented this kind of logging scheme, and how would I accomplish this? I couldn’t find this as a configuration option. Regards, Juha
timeAllowed query parameter not working?
Hi Solr users, currently I have some really long running user entered pure wildcards queries (like *??) , these are hogging the CPU for several minutes. So what I tried is setting the timeAllowed query parameter via the search handler in solrconfig.xml. But without any luck, the parameter does not seem tob e working. Here is my search handler definition: requestHandler name=/select class=solr.SearchHandler default=true lst name=defaults int name=rows10/int str name=dfTEXT/str int name=timeAllowed1/int /lst /requestHandler Thanks for your help! Leander
Re: Block until replication finishes
Hi You can use the details command to check the status of replication. http://localhost:8983/solr/core_name/replication?command=details The command returns an xml output and look out for the isReplicating field in the output. Keep running the command in a loop until the flag becomes false. Thats when you know its done. I would also recommend you to check the # of docs in the output at source/destination after the replication to be sure HTH On Thu, Mar 27, 2014 at 6:35 AM, Fermin Silva ferm...@olx.com wrote: Hi, we are moving to native replication with SOLR 3.5.1. Because we want to control the replication from another program (a cron job), we decided to curl the slave to issue a fetchIndex command. The problem we have is that the curl returns immediately, while the replication still goes in the background. We need to know when the replication is done, and then resume the cron job. Is there a way to block on the replication call until it's done similar to waitForSearcher=true when committing ? If not, what other possibilities we have? Just in case, here is the solrconfig part in the slave (we pass masterUrl in the curl url) requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrl/str /lst /requestHandler Many thanks in advance -- Fermin Silva -- Best -- C
Re: New to Solr can someone help me to know if Solr fits my use case
Can anyone help me please. Hi All, I am new to Solr and from initial reading i am quite convinced Solr will be of great help. Can anyone help in making that decision. Usecase: 1. I will have PDF,Word docs generated daily/weekly ( lot of them ) which kinds of get overwritten frequently. 2. I have a dictionary kind of thing ( having a list of which words/small sentences should be part of above docs , words which cannot be and alternatives for some ). 3. Now i want Solr to search my Docs produced in step 1 to be searched for words/small sentences from step 2 and give me my Doc Name/line no in which they exist. Will Solr be a good help to me, If anybody can help giving some examples that will be great. Appreciate your help and patience. Thanks Saurabh
Re: [ANN] Solr in Action book release (Solr 4.7)
Hi Philippe, Yes if you've purchased the eBook then the PDF is available now and the other formats (ePub and Kindle) are supposed to be available for download on April 8th. It's also worth mentioning that the eBook formats are all available for free with the purchase of the print book. Best regards, Trey Grainger Co-author, Solr in Action Director of Engineering, Search Analytics @CareerBuilder On Thu, Mar 27, 2014 at 12:04 PM, Philippe Soares soa...@genomequest.com wrote: Thanks Trey ! I just tried to download my copy from my manning account, and this final version appears only in PDF format. Any idea about when they'll release the other formats ?
Re: Searching multivalue fields.
Sounds good... for Lucene users, but for Solr users... sounds like a Jira is needed. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Wednesday, March 26, 2014 4:54 PM To: solr-user@lucene.apache.org ; kokatnur.vi...@gmail.com Subject: Re: Searching multivalue fields. Hi Vijay, After reading the documentation it seems that following query is what you are after. It will return OrderId:345 without matching OrderId:123 SpanQuery q1 = new SpanTermQuery(new Term(BookingRecordId, 234)); SpanQuery q2 = new SpanTermQuery(new Term(OrderLineType, 11)); SpanQuery q2m new FieldMaskingSpanQuery(q2, BookingRecordId); Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false); Ahmet On Wednesday, March 26, 2014 10:39 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, I personally don't understand joins very well. Just a guess may be FieldMaskingSpanQuery could be used? http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html Ahmet On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hi, I am bumping this thread again one last time to see if anyone has a solution. In it's current state, our application is storing child items as multivalue fields. Consider some orders, for example - { OrderId:123 BookingRecordId : [145, 987, *234*] OrderLineType : [11, 12, *13*] . } { OrderId:345 BookingRecordId : [945, 882, *234*] OrderLineType : [1, 12, *11*] . } { OrderId:678 BookingRecordId : [444] OrderLineType : [11] . } Here, If you look up for an Order with BookingRecordId: 234 And OrderLineType:11. You will get two orders with orderId : 123 and 345, which is correct. You have two arrays in both the orders that satisfy this condition. However, for OrderId:123, the value at 3rd index of OrderLineType array is 13 and not 11( this is for OrderId:345). So orderId 123 should be excluded. This is what I am trying to achieve. I got some suggestions from a solr-user to use FieldsCollapsing, Join, Block-join or string concatenation. None of these approaches can be used without re-indexing schema. Has anyone found a non-invasive solution for this? Thanks, -Vijay
What are my options?
We have a collection named items. These are simply products that we sell. A large part of our scoring involves boosting on certain metrics for each product (amount sold, total GMS, ratings, etc). Some of these metrics are actually split across multiple tables. We are currently re-indexing the complete document anytime any of these values changes. I'm wondering if there is a better way? Some ideas: 1) Partial update the document. Is this even possible? 2) Add a parent-child relationship on Item and its metrics? 3) Dump all metrics to a file and use that as it changes throughout the day? I forgot the actual component that does it. Either way, can it handle multiple values? 4) Something else? I appreciate any feedback. Thanks
Re: What are my options?
Consider DataStax Enterprise - a true real-time database with rich search (Cassandra plus Solr). -- Jack Krupansky -Original Message- From: Software Dev Sent: Thursday, March 27, 2014 1:11 PM To: solr-user@lucene.apache.org Subject: What are my options? We have a collection named items. These are simply products that we sell. A large part of our scoring involves boosting on certain metrics for each product (amount sold, total GMS, ratings, etc). Some of these metrics are actually split across multiple tables. We are currently re-indexing the complete document anytime any of these values changes. I'm wondering if there is a better way? Some ideas: 1) Partial update the document. Is this even possible? 2) Add a parent-child relationship on Item and its metrics? 3) Dump all metrics to a file and use that as it changes throughout the day? I forgot the actual component that does it. Either way, can it handle multiple values? 4) Something else? I appreciate any feedback. Thanks
Re: stored=true vs stored=false, in terms of storage
You can consider DocValues as well. There you can control whether they ever use heap memory or only file space. See: https://cwiki.apache.org/confluence/display/solr/DocValues -- Jack Krupansky -Original Message- From: Pramod Negi Sent: Wednesday, March 26, 2014 1:27 PM To: solr-user@lucene.apache.org Subject: stored=true vs stored=false, in terms of storage Hi, I am using Solr and I have one doubt. If any field has stored=false, does it mean that this fields is stored in disk and not in main memory. and this will be loaded whenever asked. The scenario I would like to handle this, In my case there are lots of information which I need to show when debugQuery=true, so i can take the latency hit on debugQuery=true. Can i save all the information in a field with indexed=false and stored=true. And how do normally DebugInformation is saved Regards, Pramod Negi
RE: Solr 4.3.1 memory swapping
Thanks for the advice Shawn - gives me a direction to head. My next step is probably to update the operating system and the JVM to see if the behavior changes. If not, I'll pull in Red Hat support. Thanks, Darrell -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, March 27, 2014 2:59 AM To: solr-user@lucene.apache.org Subject: Re: Solr 4.3.1 memory swapping On 3/26/2014 10:26 PM, Darrell Burgan wrote: Okay well it didn't take long for the swapping to start happening on one of our nodes. Here is a screen shot of the Solr console: https://s3-us-west-2.amazonaws.com/panswers-darrell/solr.png And here is a shot of top, with processes sorted by VIRT: https://s3-us-west-2.amazonaws.com/panswers-darrell/top.png As shown, we have used up more than 25% of the swap space, over 1GB, even though there is 16GB of OS RAM available, and the Solr JVM has been allocated only 10GB. Further, we're only consuming 1.5/4GB of the 10GB of JVM heap. Top shows that the Solr process 21582 is using 2.4GB resident but has a virtual size of 82.4GB. Presumably that virtual size is due to the memory mapped file. The other Java process 27619 is Zookeeper. So my question remains - why did we use any swap space at all? Doesn't seem like we're experiencing memory pressure at the moment ... I'm confused. :-) The virtual memory value is indeed that large because of the mmapped file. There is definitely something wrong here. I don't know whether it's Java, RHEL, or something strange with the S3 virtual machine, possibly a bad interaction with the older kernel. With your -Xmx value, Java should never use more than about 10.5 GB of physical memory, and the top output indicates that it's only using 2.4GB of memory. 13GB is used by the OS disk cache. You might notice that I'm not mentioning Solr in the list of possible problems. This is because an unmodified Solr install only utilizes the Java heap, so it's Java that is in charge of allocating memory from the operating system. Here is a script that will tell you what's using swap and how much. This will let you be absolutely sure about whether or not Java is the problem child: http://stackoverflow.com/a/7180078/2665648 There are instructions in the comments of the script for sorting the output. The only major thing I saw in your JVM config (aside from perhaps reducing the max heap) that I would change is the garbage collector tuning. I'm the original author mentioned in this wiki page: http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems Here's a screenshot from my dev solr server, where you can see that there is zero swap usage: https://www.dropbox.com/s/mftgi3q2hn7w9qp/solr-centos6-top.png This is a baremetal server with 16GB of RAM, running CentOS 6.5 and a pre-release snapshot of Solr 4.7.1. With an Intel Xeon X3430, I'm pretty sure the processor architecture is NUMA, but the motherboard only has one CPU slot, so it's only got one NUMA node. As you can see by my virtual memory value, I have a lot more index data on this machine than you have on yours. My heap is 7GB. The other three java processes that you can see running are in-house software related to Solr. Performance is fairly slow with that much index and so little disk cache, but it's a dev server. The production environment has plenty of RAM to cache the entire index. Thanks, Shawn
Re: dih data-config.xml onImportEnd event
sorry, the previous conversation was started with a false email-address. On Thu, 27 Mar 2014 14:06:57 +0100, Stefan Matheis matheis.ste...@gmail.com wrote: I would suggest you read the replies to your last mail (containing the very same question) first? -Stefan On Thursday, March 27, 2014 at 1:56 PM, Andreas Owen wrote: i would like to call a url after the import is finished whith the event document onImportEnd=. how can i do this? -- Using Opera's mail client: http://www.opera.com/mail/
RE: timeAllowed query parameter not working?
Unfortunately the timeAllowed parameter doesn't apply to the part of the processing that makes wildcard queries so slow. It only applies to a later part of the processing when the matching documents are being collected. There's some discussion in the original ticket that implemented this (https://issues.apache.org/jira/browse/SOLR-502). I'm not sure if there's a newer ticket for implementing an end-to-end timeout. -Michael -Original Message- From: Mario-Leander Reimer [mailto:mario-leander.rei...@qaware.de] Sent: Thursday, March 27, 2014 12:15 PM To: solr-user@lucene.apache.org Subject: timeAllowed query parameter not working? Hi Solr users, currently I have some really long running user entered pure wildcards queries (like *??) , these are hogging the CPU for several minutes. So what I tried is setting the timeAllowed query parameter via the search handler in solrconfig.xml. But without any luck, the parameter does not seem tob e working. Here is my search handler definition: requestHandler name=/select class=solr.SearchHandler default=true lst name=defaults int name=rows10/int str name=dfTEXT/str int name=timeAllowed1/int /lst /requestHandler Thanks for your help! Leander
Stats Filter Exclusion Throwing Error
I'm using the latest nightly build of 4.8 and testing this patch: https://issues.apache.org/jira/browse/SOLR-3177 using this set of fq / stats.field query params: fq={!tag=INTEGER_4}INTEGER_4:(2)stats.field={!ex=INTEGER_4}INTEGER_4 with Solr throwing the following error: ERROR - 2014-03-27 16:13:12.164; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: undefined field: {!ex=INTEGER_4}INTEGER_4 at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1172) at org.apache.solr.handler.component.StatsInfo.parse(StatsComponent.java:190) at org.apache.solr.handler.component.StatsComponent.modifyRequest(StatsComponent.java:97) at org.apache.solr.handler.component.ResponseBuilder.addRequest(ResponseBuilder.java:147) at org.apache.solr.handler.component.QueryComponent.createMainQuery(QueryComponent.java:816) at org.apache.solr.handler.component.QueryComponent.regularDistributedProcess(QueryComponent.java:649) at org.apache.solr.handler.component.QueryComponent.distributedProcess(QueryComponent.java:602) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:253) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1939) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1805) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Have I got the syntax wrong?
SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2
All, I am running SOLR Cloud 4.6, everything looks ok, except for this warn message constantly in the logs. 2014-03-27 17:09:03,982 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 2014-03-27 17:09:05,517 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 2014-03-27 17:09:06,774 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 2014-03-27 17:09:08,085 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 2014-03-27 17:09:09,114 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 2014-03-27 17:09:10,238 WARN [commitScheduler-15-thread-1] [] SolrCore - [index_shard16_replica1] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 Searched around a bit, looks like my solrconfig.xml is configured fine and verified there are no explicit commits sent by our clients. My solrconfig.xml autoCommit maxDocs1/maxDocs maxTime6/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit Any idea why its warning every second, the only config that has 1 second is softcommit. Thanks, Rishi.
Re: DIH dataimport.properties Zulu time
Thank you for the response. This works if I invoke start.jar with java. In my usecase however, I need to invoke start.jar directly (consoleless service so that the user cannot close it accidentally). It doesnt pickup user.timezone property when done this way. Is it possible to do this using the tag below somehow. I tried setting locale=UTC and it didnt work. propertyWriter dateFormat=-MM-dd HH:mm:ss type=SimplePropertiesWriter directory=data filename=my_dih.properties locale=en_US / On Tue, Mar 25, 2014 at 7:45 PM, Gora Mohanty g...@mimirtech.com wrote: On 26 March 2014 02:44, Kiran J kiranjuni...@gmail.com wrote: Hi Is it possible to set up the data import handler so that it keeps track of the last imported time in Zulu time and not local time ? [...] Start your JVM with the desired timezone, e.g., java -Duser.timezone=UTC -jar start.jar Regards, Gora
Re: [ANN] Solr in Action book release (Solr 4.7)
Many Congrats, 600+ pages can make me feel the tireless two years handwork behind it. On Fri, Mar 28, 2014 at 4:04 AM, Trey Grainger solrt...@gmail.com wrote: Hi Philippe, Yes if you've purchased the eBook then the PDF is available now and the other formats (ePub and Kindle) are supposed to be available for download on April 8th. It's also worth mentioning that the eBook formats are all available for free with the purchase of the print book. Best regards, Trey Grainger Co-author, Solr in Action Director of Engineering, Search Analytics @CareerBuilder On Thu, Mar 27, 2014 at 12:04 PM, Philippe Soares soa...@genomequest.com wrote: Thanks Trey ! I just tried to download my copy from my manning account, and this final version appears only in PDF format. Any idea about when they'll release the other formats ?
Re: Multiple Languages in Same Core
In addition to the two approaches Liu Bo mentioned (separate core per language and separate field per language), it is also possible to put multiple languages in a single field. This saves you the overhead of multiple cores and of having to search across multiple fields at query time. The idea here is that you can run multiple analyzers (i.e. one for German, one for English, one for Chinese, etc.) and stack the outputted TokenStreams for each of these within a single field. It is also possible to swap out the languages you want to use on a case-by-case basis (i.e. per-document, per field, or even per word) if you really need to for advanced use cases. All three of these methods, including code examples and the pros and cons of each are discussed in the Multilingual Search chapter of Solr in Action, which Alexandre referenced. If you don't have the book, you can also just download and run the code examples for free, though they may be harder to follow without the context from the book. Thanks, Trey Grainger Co-author, Solr in Action Director of Engineering, Search Analytics @CareerBuilder On Wed, Mar 26, 2014 at 4:34 AM, Liu Bo diabl...@gmail.com wrote: Hi Jeremy There're a lot of multi language discussions, two main approaches 1. like yours, a language is one core 2. all in one core, different language has it's own field. We have multi-language support in a single core, each multilingual field has it's own suffix such as name_en_US. We customized query handler to hide the query details to client. The main reason we want to do this is about NRT index and search, take product for example: product has price, quantity which is common and it's used by filtering and sorting, name, description is multi language field, if we split product in do different cores, the common field updating may end up a update in all of the multi language cores. As to scalability, we don't change solr cores/collections when a new language is added, but we probably need update our customized index process and run a full re-index. This approach suits our requirement for now, but you may have your own concerns. We have similar suggest filter problem like yours, we want to return suggest result filtering by stores. I can't find a way to build dictionary with query at my version of solr 4.6 What I do is run a query on a N-Gram analyzed field and with filter queries on store_id field. The suggest is actually a query. It may not perform as well as suggestion but can do the trick. You can try it to build a additional N-GRAM field for suggestion only and search on it with fq on your Locale field. All the best Liu Bo On 25 March 2014 09:15, Alexandre Rafalovitch arafa...@gmail.com wrote: Solr In Action has a significant discussion on the multi-lingual approach. They also have some code samples out there. Might be worth a look Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson jer...@thomersonfamily.com wrote: I recently deployed Solr to back the site search feature of a site I work on. The site itself is available in hundreds of languages. With the initial release of site search we have enabled the feature for ten of those languages. This is distributed across eight cores, with two Chinese languages plus Korean combined into one CJK core and each of the other seven languages in their own individual cores. The reason for splitting these into separate cores was so that we could have the same field names across all cores but have different configuration for analyzers, etc, per core. Now I have some questions on this approach. 1) Scalability: Considering I need to scale this to many dozens more languages, perhaps hundreds more, is there a better way so that I don't end up needing dozens or hundreds of cores? My initial plan was that many languages that didn't have special support within Solr would simply get lumped into a single default core that has some default analyzers that are applicable to the majority of languages. 1b) Related to this: is there a practical limit to the number of cores that can be run on one instance of Lucene? 2) Auto Suggest: In phase two I intend to add auto-suggestions as a user types a query. In reviewing how this is implemented and how the suggestion dictionary is built I have concerns. If I have more than one language in a single core (and I keep the same field name for suggestions on all languages within a core) then it seems that I could get suggestions from another language returned with a suggest query. Is there a way to build a separate dictionary for
Re: DIH dataimport.properties Zulu time
I figured it out. I use SQL Server, so this is my solution : propertyWriter dateFormat=*-MM-dd'T'HH:mm:ssXXX* type=SimplePropertiesWriter / In TSQL, this can be converted to a UTC date time using : CONVERT(datetimeoffset, '${dih.last_index_time}', 127) Refs: http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html http://msdn.microsoft.com/en-us/library/ms187928.aspx On Thu, Mar 27, 2014 at 2:17 PM, Kiran J kiranjuni...@gmail.com wrote: Thank you for the response. This works if I invoke start.jar with java. In my usecase however, I need to invoke start.jar directly (consoleless service so that the user cannot close it accidentally). It doesnt pickup user.timezone property when done this way. Is it possible to do this using the tag below somehow. I tried setting locale=UTC and it didnt work. propertyWriter dateFormat=-MM-dd HH:mm:ss type=SimplePropertiesWriter directory=data filename=my_dih.properties locale=en_US / On Tue, Mar 25, 2014 at 7:45 PM, Gora Mohanty g...@mimirtech.com wrote: On 26 March 2014 02:44, Kiran J kiranjuni...@gmail.com wrote: Hi Is it possible to set up the data import handler so that it keeps track of the last imported time in Zulu time and not local time ? [...] Start your JVM with the desired timezone, e.g., java -Duser.timezone=UTC -jar start.jar Regards, Gora
Re: String Cast Error
: I have a search that sorts on a boolean field. This search is pulling : the following error: java.lang.String cannot be cast to : org.apache.lucene.util.BytesRef. This is almost certainly another manifestation of SOLR-5920... https://issues.apache.org/jira/browse/SOLR-5920 -Hoss http://www.lucidworks.com/
Re: New to Solr can someone help me to know if Solr fits my use case
This feels somewhat backwards. It's very hard to extract Line-Number information out of MSWord and next to impossible from PDF. So, it's not whether the Solr is a good fit or not here is that maybe your whole architecture has a major issue. Can you do this/what you want by hand at least once? Down to the precision you want? If you can, then yes you probably can automate the searching with Solr, though you will still have serious issues (sentence crossing line-boundaries, etc). But I suspect your whole approach will change once you try to do this manually. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Mar 27, 2014 at 11:46 PM, Saurabh Agarwal sagarwal1...@gmail.com wrote: Can anyone help me please. Hi All, I am new to Solr and from initial reading i am quite convinced Solr will be of great help. Can anyone help in making that decision. Usecase: 1. I will have PDF,Word docs generated daily/weekly ( lot of them ) which kinds of get overwritten frequently. 2. I have a dictionary kind of thing ( having a list of which words/small sentences should be part of above docs , words which cannot be and alternatives for some ). 3. Now i want Solr to search my Docs produced in step 1 to be searched for words/small sentences from step 2 and give me my Doc Name/line no in which they exist. Will Solr be a good help to me, If anybody can help giving some examples that will be great. Appreciate your help and patience. Thanks Saurabh
Re: document level security filter solution for Solr
Yonik, your reply was incredibly helpful. Thank you very much! The join approach to document security you explained is somewhat similar to what I called Option 2 (ACL PostFilter) since permissions are stored in each document, but it's much simpler in that I'm not required to write, compile, and distribute my own QParserPlugin. In addition, by using dynamic fields (for now anyway), I don't even have to distribute a new schema.xml. It justs works! (Once you re-index.) At least it seems to work. I'm declaring this a new option, Option 5. :) The crux of the solution is creating a new document type to join on, a new group type. For me, this new group type sits along side some other document types I had defined already (dataverses, datasets, and files in my case). Each of my older types, my existing documents, now get tagged with the id of one or more of the new group documents. It's like saying, This document can be seen by these groups I'm tagging it with. To make this more concrete, I thought I'd post some curl output showing how I'm now tagging my existing dataverse documents with new permissions such as group_2 and group_public which represent actual groups as well as what I'll call User Private Groups (UPG*) which is one group per user with the user's name. (Unlike your example where user joe is part of a group called joe I'm putting user1 in the name of the group such as groups_user1. But that's still the joe group that only joe is a part of.) At runtime, I'll check to see which groups a user is part of and then run one or more joins (separated by OR's) for each group. Anonymous users only get to see documents tagged with the group called public, as you had illustrated. If you're part of a lot of groups, I guess there will be a lot of OR's in the filter query. Output from curl is below. Comments are welcome! (Any objections to this approach?) Thanks again! Phil Exisiting dataverse documents, now tagged with various groups under the perms_ss field, and two example joins, separated by an OR: [pdurbin@localhost ~]$ curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100wt=jsonindent=truesort=id+descq=*fq=({!join+from=groups_s+to=perms_ss}id:group_public+OR+{!join+from=groups_s+to=perms_ss}id:group_user1)' | jq '.response.docs[] | {id,perms_ss,dvtype}' | head -17 { dvtype: dataverses, perms_ss: [ group_user1, group_user5, group_2 ], id: dataverse_9 } { dvtype: dataverses, perms_ss: [ group_public, group_2 ], id: dataverse_7 } New groups documents that are used in the join: [pdurbin@localhost ~]$ curl -s 'http://localhost:8983/solr/collection1/select?rows=100wt=jsonindent=truesort=id+ascq=id:group**' | jq '.response.docs[] | {id,groups_s,dvtype}' | grep group_public -B7 -A6 { dvtype: groups, groups_s: group_4, id: group_4 } { dvtype: groups, groups_s: group_public, id: group_public } { dvtype: groups, groups_s: group_user1, id: group_user1 } * User Private Groups (UPG) is what Red Hat calls them: Red Hat Enterprise Linux uses a user private group (UPG) scheme, which makes UNIX groups easier to manage. A user private group is created whenever a new user is added to the system. It has the same name as the user for which it was created and that user is the only member of the user private group. -- https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/ch-Managing_Users_and_Groups.html#s2-users-groups-private-groups On Tue, Mar 25, 2014 at 3:40 PM, Yonik Seeley yo...@heliosearch.com wrote: Depending on requirements, another option for simple security is to store the security info in the index and utilize a join. This really only works when you have a single shard since joins aren't distributed. # the documents, with permissions id:doc1, perms:public,... id:doc2, perms:group1 group2 joe, ... id:doc3, perms:group3, ... # documents modeling users and what groups they belong to id:joe, groups:joe public group3 id:mark, groups:mark public group1 group2 And then if joe does a query, you add a filter query like the following fq={!join from=groups to=perms v=id:joe} The user documents can either be in the same collection, or in a separate core as long as it's co-located in the same JVM (core container), and you can do a cross-core join. -Yonik http://heliosearch.org - solve Solr GC pauses with off-heap filters and fieldcache On Tue, Mar 25, 2014 at 3:06 PM, Philip Durbin philip_dur...@harvard.edu wrote: I'm new to Solr and I'm looking for a document level security filter solution. Anonymous users searching my application should be able to find public data. Logged in users should be able to find public data and private data they have access to. Earlier today I wrote about shards as a possible solution. I got a great reply from Shalin Shekhar Mangar of LucidWorks explaining how to achieve something technical but I'd like to back up a minute and consider
Re: New to Solr can someone help me to know if Solr fits my use case
Thanks a lot Alex for your reply, Appreciate the same. So if i leave the line no part. 1. I guess putting pdf/word in solr for search can be done, These documents will go go in solr. 2. For search any automatic way to give a excel sheet or large search keywords to search for . ie i have 1000's of words that i want to search in doc can i do it collectively or send search queries one by one. Thanks Saurabh On Fri, Mar 28, 2014 at 6:48 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: This feels somewhat backwards. It's very hard to extract Line-Number information out of MSWord and next to impossible from PDF. So, it's not whether the Solr is a good fit or not here is that maybe your whole architecture has a major issue. Can you do this/what you want by hand at least once? Down to the precision you want? If you can, then yes you probably can automate the searching with Solr, though you will still have serious issues (sentence crossing line-boundaries, etc). But I suspect your whole approach will change once you try to do this manually. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Mar 27, 2014 at 11:46 PM, Saurabh Agarwal sagarwal1...@gmail.com wrote: Can anyone help me please. Hi All, I am new to Solr and from initial reading i am quite convinced Solr will be of great help. Can anyone help in making that decision. Usecase: 1. I will have PDF,Word docs generated daily/weekly ( lot of them ) which kinds of get overwritten frequently. 2. I have a dictionary kind of thing ( having a list of which words/small sentences should be part of above docs , words which cannot be and alternatives for some ). 3. Now i want Solr to search my Docs produced in step 1 to be searched for words/small sentences from step 2 and give me my Doc Name/line no in which they exist. Will Solr be a good help to me, If anybody can help giving some examples that will be great. Appreciate your help and patience. Thanks Saurabh
Re: New to Solr can someone help me to know if Solr fits my use case
1. You don't actually put PDF/Word into Solr. Instead, it is run through content and metadata extraction process and then index that. This is important because a computer does not understand what you are looking for when you open a PDF. It only understand whatever text is possible to extract. In case of PDF it is often not much at all, unless it was generated with accessibility layer in place. You can experiment with what you can extract by downloading a standalone Apache Tika install, which has a command line version or using Solr's extractOnly flag. Solr, internally, uses Tika, so the results should be the same. 2) When you do a search you can do field:(Keyword1 Keyword2 Keyword3 Keyword4) and you get as results any document that matches one of those. Not sure about 1000 of them in one go, but certainly a large number. On the other hand, if you have same keywords all the time and you are trying to match documents against them, you might be more interested in Elastic Search's percolator (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html ) or in Luwak (https://github.com/flaxsearch/luwak). Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Mar 28, 2014 at 10:05 AM, Saurabh Agarwal sagarwal1...@gmail.com wrote: Thanks a lot Alex for your reply, Appreciate the same. So if i leave the line no part. 1. I guess putting pdf/word in solr for search can be done, These documents will go go in solr. 2. For search any automatic way to give a excel sheet or large search keywords to search for . ie i have 1000's of words that i want to search in doc can i do it collectively or send search queries one by one. Thanks Saurabh On Fri, Mar 28, 2014 at 6:48 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: This feels somewhat backwards. It's very hard to extract Line-Number information out of MSWord and next to impossible from PDF. So, it's not whether the Solr is a good fit or not here is that maybe your whole architecture has a major issue. Can you do this/what you want by hand at least once? Down to the precision you want? If you can, then yes you probably can automate the searching with Solr, though you will still have serious issues (sentence crossing line-boundaries, etc). But I suspect your whole approach will change once you try to do this manually. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Mar 27, 2014 at 11:46 PM, Saurabh Agarwal sagarwal1...@gmail.com wrote: Can anyone help me please. Hi All, I am new to Solr and from initial reading i am quite convinced Solr will be of great help. Can anyone help in making that decision. Usecase: 1. I will have PDF,Word docs generated daily/weekly ( lot of them ) which kinds of get overwritten frequently. 2. I have a dictionary kind of thing ( having a list of which words/small sentences should be part of above docs , words which cannot be and alternatives for some ). 3. Now i want Solr to search my Docs produced in step 1 to be searched for words/small sentences from step 2 and give me my Doc Name/line no in which they exist. Will Solr be a good help to me, If anybody can help giving some examples that will be great. Appreciate your help and patience. Thanks Saurabh
[RE-BALACE of Collection] Re-balancing of collection after adding nodes to clustered node
Hi, I found the email addresses from a slide-share @ http://www.slideshare.net/thelabdude/tjp-solr-webinar. It's very useful. We are developing SOLR search using CDH4 Cloudera and embedded SOLR 4.4.0-search-1.1.0. We created a Collection when the cluster had 2 slave nodes. Then two extra nodes added. In those extra nodes SOLR service runs, but Zoo Keeper service does not run in those nodes. Zoo Keeper service runs only in earlier nodes. When cluster had 2 nodes then indexing tool run successfully. But after adding two nodes when again indexing tool runs then it throws and error *no active slice servicing hashcode*. The error seems that re-balancing of collection didn't happen after adding extra SOLR nodes. So when indexing tool runs then tool tries to shard/distribute the indexing information into extra node(s) which is/are not aware of that collection and throws an error. Number of sharding is: 2. Composite routing policy is used. My question is, is it possible to re-balancing the collection information after creating new SOLR nodes? In your slide share it's written that re-balancing is available in SOLR-5025, what's SOLR-5025? Thanks Regards Debasis
Re: Question on highlighting edgegrams
Certainly I am not the only user experiencing this? On Wed, Mar 26, 2014 at 1:11 PM, Software Dev static.void@gmail.com wrote: Is this a known bug? On Tue, Mar 25, 2014 at 1:12 PM, Software Dev static.void@gmail.com wrote: Same problem here: http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html On Tue, Mar 25, 2014 at 9:39 AM, Software Dev static.void@gmail.com wrote: Bump On Mon, Mar 24, 2014 at 3:00 PM, Software Dev static.void@gmail.com wrote: In 3.5.0 we have the following. fieldType name=autocomplete class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=30/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType If we searched for c with highlighting enabled we would get back results such as: emc/emdat emc/emrocdile emce/mool beans But in the latest Solr (4.7) we get the full words highlighted back. Did something change from these versions with regards to highlighting? Thanks
Re: Question on highlighting edgegrams
Yes, there are known bugs with EdgeNGram filters. I think they are fixed in 4.4 See https://issues.apache.org/jira/browse/LUCENE-3907 On Fri, Mar 28, 2014 at 10:17 AM, Software Dev static.void@gmail.com wrote: Certainly I am not the only user experiencing this? On Wed, Mar 26, 2014 at 1:11 PM, Software Dev static.void@gmail.com wrote: Is this a known bug? On Tue, Mar 25, 2014 at 1:12 PM, Software Dev static.void@gmail.com wrote: Same problem here: http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html On Tue, Mar 25, 2014 at 9:39 AM, Software Dev static.void@gmail.com wrote: Bump On Mon, Mar 24, 2014 at 3:00 PM, Software Dev static.void@gmail.com wrote: In 3.5.0 we have the following. fieldType name=autocomplete class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=30/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType If we searched for c with highlighting enabled we would get back results such as: emc/emdat emc/emrocdile emce/mool beans But in the latest Solr (4.7) we get the full words highlighted back. Did something change from these versions with regards to highlighting? Thanks -- Regards, Shalin Shekhar Mangar.
Product index schema for solr
Original Message Subject:Product index schema for solr Date: Fri, 28 Mar 2014 10:46:20 +0530 From: Ajay Patel apa...@officebeacon.com To: solr-user-ow...@lucene.apache.org Hi Solr user developers. i am new in the world of solr search engine. i have a complex product database structure in postgres. Product has many product_quantity_price attrbutes in range For e.g Product iD 1 price range is stored in product_quantity_price table in following manner. min_qty max_qty price_per_qty 1504 51 100 3.5 1011503 151200 2.5 the range is not fixed for any product it can be different for different product. now my question is that how can i save this data in solr in optimized way so that i can create facets on qty and prices. Thanks in advance. Ajay Patel.