Re: No Live server exception: Solr Cloud 6.6.6
I think you are going in the wrong direction in your upgrade path…. While it may *seem* simpler to go from master/slave 6.6.6 to SolrCloud 6.6.6, you are much better off just going from master/slave 6.6.6 to SolrCloud on 8.7 (or whatever is the latest). SolrCloud has evolved since Solr 6 by two MAJOR versions, and is much more robust with so many fixes. Today, I suspect very few folks who know the innards of Solr are actually still familiar with the 6.x line! This is also a really good opportunity to relook at your schema as well, and make sure you are using all the features in the best way possible. > On Jan 6, 2021, at 1:40 AM, Ritvik Sharma wrote: > > Hi Guys, > > Any update. > > On Tue, 5 Jan 2021 at 18:06, Ritvik Sharma wrote: > >> Hi Guys >> >> Happy New Year. >> >> We are trying to move to solr cloud 6.6.6 as we are using same version >> master-slave arch. >> >> solr cloud: 6.6.6 >> zk: 3.4.10 >> >> We are facing few errors >> 1. Every time we upload a model-store using curl XPUT command , it is >> showing at that time but after reloading collection , it is removed >> automatically. >> >> 2.While querying the data, we are getting below exception, >> >> "msg": "org.apache.solr.client.solrj.SolrServerException: No live >> SolrServers available to handle this request:[ >> http://x.x.x.x:8983/solr/solrcollection_shard1_replica2, >> http://x.x.x.y:8983/solr/solrcollection_shard1_replica1]","trace": >> "org.apache.solr.common.SolrException: >> org.apache.solr.client.solrj.SolrServerException: No live SolrServers >> available to handle this request:[ >> http://x.x.x.x:8983/solr/solrcollection_shard1_replica2, >> http://x.x.x.y:8983/solr/solrcollection_shard1_replica1]\n\tat >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:416)\n\tat >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)\n\tat >> org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)\n\tat >> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:724)\n\tat >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:530)\n\tat >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)\n\tat >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)\n\tat >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\n\tat >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat >> >> >> >> >> ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: data import handler deprecated?
You don’t need to abandon DIH right now…. You can just use the Github hosted version…. The more people who use it, the better a community it will form around it!It’s a bit chicken and egg, since no one is actively discussing it, submitting PR’s etc, it may languish. If you use it, and test it, and support other community folks using it, then it will continue on! > On Nov 29, 2020, at 12:12 PM, Dmitri Maziuk wrote: > > On 11/29/2020 10:32 AM, Erick Erickson wrote: > >> And I absolutely agree with Walter that the DB is often where >> the bottleneck lies. You might be able to >> use multiple threads and/or processes to query the >> DB if that’s the case and you can find some kind of partition >> key. > > IME the difficult part has always been dealing with incremental updates, if > we were to roll our own, my vote would be for a database trigger that does a > POST in whichever language the DBMS likes. > > But this has not been a part of our "solr 6.5 update" project until now. > > Thanks everyone, > Dima ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: Solr 8.6.2 - Admin UI Issue
I’ve seen this behavior as well jumping between versions of Solr.Typically in the browser console I see some sort of very opaque Javascript error. > On Oct 8, 2020, at 5:54 AM, Colvin Cowie wrote: > > Images won't be included on the mailing list. You need to put them > somewhere else and link to them. > > With that said, if you're switching between versions, maybe your browser > has the old UI cached? Try clearing the cache / viewing it in a private > window and see if it's any different. > > On Wed, 7 Oct 2020 at 11:22, Vinay Rajput <mailto:vinayrajput4...@gmail.com>> wrote: > >> Hi All, >> >> We are currently using Solr 7.3.1 in cloud mode and planning to upgrade. >> When I bootstrapped Solr 8.6.2 in my local machine and uploaded all >> necessary configs, I noticed one issue in admin UI. >> >> If I select a collection and go to files, it shows the content tree having >> all files and folders present in that collection. In Solr 8.6.2, it is >> somehow not showing the folders correctly. In my screenshot, you can see >> that velocity and xslt are the folders and we have some config files inside >> these two folders. Because of this issue, I can't click on folder nodes and >> see children nodes. I checked the network calls and it looks like we are >> getting the correct data from Solr. So, it looks like an Admin UI issue to >> me. >> >> Does anyone know if this is a* known issue* or I am missing something >> here? Has anyone noticed the similar issue? I can confirm that It works >> fine with Solr 7.3.1. >> >> [image: image.png][image: image.png] >> >> Left image is for 8.6.2 and right image is for 7.3.1 >> >> Thanks, >> Vinay ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: Master/Slave
nd the deprecation of CDCR. >>>> >>>> So we are left with the question whether we should expect Master/Slave >>>> replication also to be deprecated; and if so, with what is it expected to >>>> be replaced (since not with CDCR)? Or is it now sufficiently safe to assume >>>> that Master/Slave replication will continue to be supported after all >>>> (since the assertion that it would be replaced by CDCR has been >>>> discredited)? In either case, are there other suggested implementations of >>>> having a read-only SolrCloud receive data from a read/write SolrCloud? >>>> >>>> >>>> Thanks >>>> >>>> -Original Message- >>>> From: Shawn Heisey >>>> Sent: Tuesday, May 21, 2019 11:15 AM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: SolrCloud (7.3) and Legacy replication slaves >>>> >>>> On 5/21/2019 8:48 AM, Michael Tracey wrote: >>>>> Is it possible set up an existing SolrCloud cluster as the master for >>>>> legacy replication to a slave server or two? It looks like another >>>> option >>>>> is to use Uni-direction CDCR, but not sure what is the best option in >>>> this >>>>> case. >>>> >>>> You're asking for problems if you try to combine legacy replication with >>>> SolrCloud. The two features are not guaranteed to work together. >>>> >>>> CDCR is your best bet. This replicates from one SolrCloud cluster to >>>> another. >>>> >>>> Thanks, >>>> Shawn >>>> >> > ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: Solr client
Konstantinos, have you seen https://solr.cool/? It’s an aggregation site for all the extensions to Solr. You can add your project there, and that should get some more awareness! > On Sep 2, 2020, at 2:21 AM, Konstantinos Koukouvis > wrote: > > Hi everybody, sorry in advance if I’m using the mailing list wrong, this is > the first time I’m attempting such a thing. > > To all you gophers out there we at Mecenat, have been working at a new solr > client wrapper with focus on single solr instance usage, that supports the > search API, schema API and core admin API. With this email I’m trying to > raise awareness to the community, get some feedback by having more people to > test every nook and cranny of it, so that we can improve our solution and > hopefully help you find that client that makes using solr in go more > intuitive and simple. > > Here’s the link, and thank you all for your time: > https://github.com/mecenat/solr > > With regards, > Konstantinos > ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Loading JSON docs into Solr with Streaming Expressions?
Hey all, I wanted to load some JSON docs into Solr and as I load them, do some manipulations to the documents as they go in. I looked at https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html <https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html>, however I also wanted to see if Streaming would help. I’ve used the combination of cat and parseCSV streaming functions successfully to load data into Solr, so I looked a bit at what we could do with JSON source format. I didn’t see an obvious path for taking a .json file and loading it, so I played around and made this JSON w/ Lines formatted file streaming expression: https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3 <https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3> The expression looks like commit(icecat, update(icecat, parseJSONL( cat('two_docs.jsonl') ) ) ) I was curious what other folks have done? I saw that there is a JSONTupleStream, but it didn’t quite seem to fit the need. Eric ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Checking my understanding of SOLR_HOME
I am struggling with using the zkHost and the JDBC end point (https://lucene.apache.org/solr/guide/6_6/parallel-sql-interface.html#jdbc-driver) and I believe it’s because when I deploy, it gets a IP address that is internal to the network accessible, but accessible externally via DNS name: http://quepid-solr.dev.o19s.com:8985/solr/#/~cloud?view=tree I’m also using Docker, so the internal :8983 gets mapped to the external :8985 port. I *think* what I need to do is: 1) Use the SOLR_HOST parameter to make sure the hostname is “quepid-solr.dev.o19s.com” in my startup Script. 2) Set the environment variable SOLR_PORT to be 8985 instead of using the Docker mapping of ports. If this is correct understanding, then I think adding a bit more documentation to https://lucene.apache.org/solr/guide/8_4/taking-solr-to-production.html#solr-hostname would be useful, and happy to add a documentation PR as it’s not super clear to me. Eric ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: How do I add my own Streaming Expressions?
The documentation in the StreamHandler suggests adding into Solrconfig some streamFunctions: * * org.apache.solr.client.solrj.io.stream.ReducerStream * org.apache.solr.client.solrj.io.stream.RecordCountStream * https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/StreamHandler.java#L114 What is happening in StreamHandler doesn’t seem to be working, however in the similar GraphHandler, there is a call to “streamFunctions”: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/GraphHandler.java#L90 I’m still debugging this… Eric > On Nov 15, 2019, at 9:43 PM, Eric Pugh > wrote: > > What is the process for adding new Streaming Expressions? > > It appears that the org.apache.solr.client.solrj.io.Lang method statically > loads all the streaming expressions? > > Eric > > ___ > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com <http://www.opensourceconnections.com/> > | My Free/Busy <http://tinyurl.com/eric-cal> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed > <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
How do I add my own Streaming Expressions?
What is the process for adding new Streaming Expressions? It appears that the org.apache.solr.client.solrj.io.Lang method statically loads all the streaming expressions? Eric ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: regarding Extracting text from Images
Just to stir the pot on this topic, here is an article about why and how to use Tika inside of Solr: https://opensourceconnections.com/blog/2019/10/24/it-s-okay-to-run-tika-inside-of-solr-if-and-only-if/ > On Oct 23, 2019, at 7:21 PM, Erick Erickson wrote: > > Here’s a blog about why and how to use Tika outside Solr (and an RDBMS too, > but you can pull that part out pretty easily): > https://lucidworks.com/post/indexing-with-solrj/ > > > >> On Oct 23, 2019, at 7:16 PM, Alexandre Rafalovitch >> wrote: >> >> Again, I think you are best to do it out of Solr. >> >> But even of you want to get it to work in Solr, I think you start by >> getting it to work directly in Tika. Then, get the missing libraries and >> configuration into Solr. >> >> Regards, >> Alex >> >> On Wed, Oct 23, 2019, 7:08 PM suresh pendap, wrote: >> >>> Hi Alex, >>> Thanks for your reply. How do we integrate tesseract with Solr? Do we have >>> to implement Custom update processor or extend the >>> ExtractingRequestProcessor? >>> >>> Regards >>> Suresh >>> >>> On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch >>> >>> wrote: >>> >>>> I believe Tika that powers this can do so with extra libraries >>> (tesseract?) >>>> But Solr does not bundle those extras. >>>> >>>> In any case, you may want to run Tika externally to avoid the >>>> conversion/extraction process be a burden to Solr itself. >>>> >>>> Regards, >>>>Alex >>>> >>>> On Wed, Oct 23, 2019, 1:58 PM suresh pendap, >>>> wrote: >>>> >>>>> Hello, >>>>> I am reading the Solr documentation about integration with Tika and >>> Solr >>>>> Cell framework over here >>>>> >>>>> >>>> >>> https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html >>>>> >>>>> I would like to know if the can Solr Cell framework also be used to >>>> extract >>>>> text from the image files? >>>>> >>>>> Regards >>>>> Suresh >>>>> >>>> >>> > ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: Solr Paryload example
Have you checked out https://github.com/o19s/payload-component On Mon, Oct 21, 2019 at 2:47 PM Erik Hatcher wrote: > How about a single field, with terms like: > > store1_USD|125.0 store2_EUR|220.0 store3_GBP|225.0 > > Would that do the trick? > > And yeah, payload decoding is currently limited to float and int with the > built-in payload() function. We'd need a new way to pull out > textual/bytes payloads - like maybe a DocTransformer? > > Erik > > > > On Oct 21, 2019, at 9:59 AM, Vincenzo D'Amore > wrote: > > > > Hi Erick, > > > > thanks for getting back to me. We started to use payloads because we have > > the classical per-store pricing problem. > > Thousands of stores across and different prices. > > Then we found the payloads very useful started to use it for many > reasons, > > like enabling/disabling the product for such store, save the stock > > availability, or save the other info like buy/sell price, discount rates, > > and so on. > > All those information are numbers, but stores can also be in different > > countries, I mean would be useful also have the currency and other > > attributes related to the store. > > > > Thinking about an alternative for payloads maybe I could use the dynamic > > fields, well, I know it is ugly. > > > > Consider this hypothetical case where I have two field payload : > > > > payloadPrice: [ > > "store1|125.0", > > "store2|220.0", > > "store3|225.0" > > ] > > > > payloadCurrency: [ > > "store1|USD", > > "store2|EUR", > > "store3|GBP" > > ] > > > > with dynamic fields I could have different fields for each document. > > > > currency_store1_s: "USD" > > currency_store2_s: "EUR" > > currency_store3_s: "GBP" > > > > But how many dynamic fields like this can I have? more than thousands? > > > > Again, I've just started to look at solr-ocrhighlighting github project > you > > suggested. > > Those seems have written their own payload object type where store ocr > > highlighting information. > > It seems interesting, I'll take a look immediately. > > > > Thanks again for your time. > > > > Best regards, > > Vincenzo > > > > > > On Mon, Oct 21, 2019 at 2:55 PM Erick Erickson > > wrote: > > > >> This is one of those situations where I know a client did it, but didn’t > >> see the code myself. > >> > >> So I can’t help much. > >> > >> Perhaps a good question at this point, though, is “why do you want to > add > >> string payloads anyway”? > >> > >> This isn’t the client, but it might give you some pointers: > >> > >> > >> > https://github.com/dbmdz/solr-ocrpayload-plugin/blob/master/src/main/java/de/digitalcollections/solr/plugin/components/ocrhighlighting/OcrHighlighting.java > >> > >> Best, > >> Erick > >> > >>> On Oct 21, 2019, at 6:37 AM, Vincenzo D'Amore > >> wrote: > >>> > >>> Hi Erick, > >>> > >>> It seems I've reached a dead-point, or at least it seems looking at the > >>> code, it seems I can't easily add a custom decoder: > >>> > >>> Looking at PayloadUtils class there is getPayloadDecoder method invoked > >> to > >>> return the PayloadDecoder : > >>> > >>> public static PayloadDecoder getPayloadDecoder(FieldType fieldType) { > >>> PayloadDecoder decoder = null; > >>> > >>> String encoder = getPayloadEncoder(fieldType); > >>> > >>> if ("integer".equals(encoder)) { > >>> decoder = (BytesRef payload) -> payload == null ? 1 : > >>> PayloadHelper.decodeInt(payload.bytes, payload.offset); > >>> } > >>> if ("float".equals(encoder)) { > >>> decoder = (BytesRef payload) -> payload == null ? 1 : > >>> PayloadHelper.decodeFloat(payload.bytes, payload.offset); > >>> } > >>> // encoder could be "identity" at this point, in the case of > >>> DelimitedTokenFilterFactory encoder="identity" > >>> > >>> // TODO: support pluggable payload decoders? > >>> > >>> return decoder; > >>> } > >>> > >>> Any advice to work around this situation? > >>> > >>> > >>> On Mon, Oct 21, 2019 at 1:51 AM Erick Erickson < > erickerick...@gmail.com> > >>> wrote: > >>> > You’d need to write one. Payloads are generally intended to hold > >> numerics > you can then use in a function query to factor into the score… > > Best, > Erick > > > On Oct 20, 2019, at 4:57 PM, Vincenzo D'Amore > wrote: > > > > Sorry, I just realized that I was wrong in how I'm using the payload > > function. > > Give that the payload function only handles a numeric (integer or > >> float) > > payload, could you suggest me an alternative function that handles > strings? > > If not, should I write one? > > > > On Sun, Oct 20, 2019 at 10:43 PM Vincenzo D'Amore < > v.dam...@gmail.com> > > wrote: > > > >> Hi all, > >> > >> I'm trying to understand what I did wrong with a payload query that > >> returns > >> > >> error: { > >> metadata: [ "error-class", "org.apache.solr.common.SolrException", > >> "root-error-class", "org.apache.solr.common.SolrException" ], > >> msg: "No
Custom Jars for a config in the Solr Cloud world..
I've got a Solr instance with a number of cores that are each configured by upload the configuration information to ZooKeeper. The newest index needs the UIMA jars. Normally I would put them in the core's /lib directory, but since I am only accessing my server via ZooKeeper, I don't have that directory as an option. I know I could manually upload the jars onto the server, and then put some sort of path to them, but I'm hoping to manage all uploading of core specific configurations (and jars) via ZooKeeper. I'm wondering if I am missing something in this new ZooKeeper enabled world? Just for fun, I'm going to try and put the ~ 2 MB worth of Jars inside my /conf/ directory and then upload through ZooKeeper to see what happens. Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Apache Solr 3 Enterprise Search Server available from http://www.packtpub.com/apache-solr-3-enterprise-search-server/book This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: Custom Jars for a config in the Solr Cloud world..
And I can now confirm that yes, ZooKeeper blows up when I attempted to add all the UIMA and content extraction jars to my conf/ directory in ZooKeeper! A couple small jars did upload, and then it started sending back java.io.IOException: Broken pipe errors. So any thoughts on the best way to manage Jars that seem like they should be part of your config? Small jars I think will work, and maybe I just need to tweak my lib/ definitions in my solrconfig.xml to look for all the places that Jars may exist, even though on my local box it's different then on my integration Solr box. Just seems a bit messy ;-) Eric On Aug 14, 2012, at 4:40 PM, Jack Krupansky wrote: Dear Eric The Brave, As per the wiki:znodes are limited to the amount of data that they can have. ZooKeeper was designed to store coordination data: status information, configuration, location information, etc. This kind of meta-information is usually measured in kilobytes, if not bytes. ZooKeeper has a built-in sanity check of 1M, to prevent it from being used as a large data store, but in general it is used to store much smaller pieces of data. See: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ProjectDescription Also:jute.maxbuffer: (Java system property: jute.maxbuffer) This option can only be set as a Java system property. There is no zookeeper prefix on it. It specifies the maximum size of the data that can be stored in a znode. The default is 0xf, or just under 1M. If this option is changed, the system property must be set on all servers and clients otherwise problems will arise. This is really a sanity check. ZooKeeper is designed to store data on the order of kilobytes in size. See: http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html -- Jack Krupansky -Original Message- From: Eric Pugh Sent: Tuesday, August 14, 2012 4:11 PM To: solr-user@lucene.apache.org Subject: Custom Jars for a config in the Solr Cloud world.. I've got a Solr instance with a number of cores that are each configured by upload the configuration information to ZooKeeper. The newest index needs the UIMA jars. Normally I would put them in the core's /lib directory, but since I am only accessing my server via ZooKeeper, I don't have that directory as an option. I know I could manually upload the jars onto the server, and then put some sort of path to them, but I'm hoping to manage all uploading of core specific configurations (and jars) via ZooKeeper. I'm wondering if I am missing something in this new ZooKeeper enabled world? Just for fun, I'm going to try and put the ~ 2 MB worth of Jars inside my /conf/ directory and then upload through ZooKeeper to see what happens. Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Apache Solr 3 Enterprise Search Server available from http://www.packtpub.com/apache-solr-3-enterprise-search-server/book This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Apache Solr 3 Enterprise Search Server available from http://www.packtpub.com/apache-solr-3-enterprise-search-server/book This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Staggering Replication start times
I am playing with an index that is sharded many times, between 64 and 128. One thing I noticed is that with replication set to happen every 5 minutes, it means that each slave hits the master at the same moment asking for updates: :00:00, :05:00, :10:00, :15:00 etc. Replication takes very little time, so it seems like I may be flooding the network with a bunch of traffic requests, and then goes away. I tweaked the replication start time code to instead just start 5 minutes after a shard starts up, which means instead of all of the slaves hitting at the same moment, they are a bit staggered. :00:00, :00:01, :00:02, :00:04 etcetera. Which presumably will use my network pipe more efficiently. Any thoughts on this? I know it means the slaves are more likely to be slightly out of sync, but over a 5 minute range will get back in sync. Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Apache Solr 3 Enterprise Search Server available from http://www.packtpub.com/apache-solr-3-enterprise-search-server/book This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?
Depending on the project, I either pull from ASF Mirrors or Source. However, I do reference Maven repository when writing Java code that is built by Maven. And it's often a pain getting it to work! On Jan 18, 2011, at 4:23 PM, Ryan Aylward wrote: [X] ASF Mirrors (linked in our release announcements or via the Lucene website) [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [X] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal
Re: What is the maximum number of documents that can be indexed ?
I would recommend looking at the work the HathiTrust has done. They have published some really great blog articles about the work they have done in scaling Solr, and have put in huge amounts of data. The good news is that there isn't a exact number, because It depends. The bad news is that there isn't an exact number because it depends! Eric On Oct 13, 2010, at 8:58 PM, Otis Gospodnetic wrote: Marco (use solr-u...@lucene list to follow up, please), There are no precise answers to such questions. Solr can keep indexing. The limit is, I think, the available disk space. I've never pushed Solr or Lucene to the point where Lucene index segments would become a serious pain, but even that can be controlled. Same thing with number of open files, large file support, etc. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Marco Ciaramella ciaramellama...@gmail.com To: d...@lucene.apache.org Sent: Wed, October 13, 2010 6:19:15 PM Subject: What is the maximum number of documents that can be indexed ? Hi all, I am working on a performance specification document on a Solr/Lucene-based application; this document is intended for the final customer. My question is: what is the maximum number of document I can index assuming 10 or 20kbytes for each document? I could not find a precise answer to this question, and I tend to consider that Solr index can be virtually limited only by the JVM, the Operating System (limits to large file support), or by hardware constraints (mainly RAM, etc. ... ). Thanks Marco - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal
Re: Many Tomcat Processes on Server ?!?!?
My guess would be that commons-daemon is somehow thinking that Tomcat has gone down and started up multiple copies... You only need one Tomcat process for your 4 core Solr instance! You may have many other WAR applications hosted in Tomcat, I know a lot of places would have 1 tomcat per deployed WAR pattern. On Jun 2, 2010, at 9:59 AM, stockii wrote: Hello. Our Server is a 8-Core Server with 12 GB RAM. Solr is running with 4 Cores. 55 Tomcat 5.5 processes are running. ist this normal ??? htop show me a list of these processes of the server. and tomcat have about 55. every process using: /usr/share/java/commons-daemon.jar:/usr/share/tomcat5.5/bin/bootstrap.jar. is this normal ? -- View this message in context: http://lucene.472066.n3.nabble.com/Many-Tomcat-Processes-on-Server-tp864732p864732.html Sent from the Solr - User mailing list archive at Nabble.com. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal
Re: RIA sample and minimal JARs required to embed Solr
Glad to hear someone looking at Solr not just as web enabled search engine, but as a simpler/more powerful interface to Lucene! When you download the source code, look at the Chapter 8 Crawler project, specifically Indexer.java, it demonstrates how to index into both a traditional separate Solr process and how to fire up an embedded Solr. It is remarkably easy to interact with an embedded Solr! In terms of minimal dependencies, what you need for a standalone Solr (outside of the servlet container like Tomcat/Jetty) is what you need for an embedded Solr. Eric On May 29, 2010, at 9:32 PM, Thomas J. Buhr wrote: Solr, The Solr 1.4 EES book arrived yesterday and I'm very much enjoying it. I was glad to see that rich clients are one case for embedding Solr as this is the case for my application. Multi Cores will also be important for my RIA. The book covers a lot and makes it clear that Solr has extensive abilities. There is however no clean and simple sample of embedding Solr in a RIA in the book, only a few alternate language usage samples. Is there a link to a Java sample that simply embeds Solr for local indexing and searching using Multi Cores? Also, what kind of memory footprint am I looking at for embedding Solr? What are the minimal dependancies? Thom - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal
Re: Tomcat 5.5 Security Constraint
I've had the exact same frustration with Multicore and Solr... You need to explicitly layout each pattern with the corename in it. On Fri, Mar 26, 2010 at 8:35 AM, stockii st...@shopgate.com wrote: Heya hey. i have little trouble with my tomcat and my security-constraint i have 4 cores, in these cores all should be protected via username and pwd, but not the select! my cores are so. .../solr/search/admin/ .../solr/suggest/admin/ .../solr/searchpg/admin/ .../solr/suggestpg/admin/ this is my security-constraint: security-constraint web-resource-collection web-resource-namesolr/web-resource-name url-pattern/solr/*/admin/*/url-pattern http-methodGET/http-method http-methodPOST/http-method /web-resource-collection login-config auth-methodBASIC/auth-method realm-name*/realm-name /login-config url-pattern/solr/*/admin/*/url-pattern only this should be closed. no url-pattern are working. only -- /* can any help me ? thx !! -- View this message in context: http://n3.nabble.com/Tomcat-5-5-Security-Constraint-tp676516p676516.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Updating FAQ for International Characters?
So I am using Sunspot to post over, which means an extra layer of indirection between mean and my XML! I will look tomorrow. On Mar 10, 2010, at 7:21 PM, Chris Hostetter wrote: : Any time a character like that was index Solr through a unknown entity error. : But if converted to #192; or Agrave; then everything works great. : : I tried out using Tomcat versus Jetty and got the same results. Before I edit Uh, you mean like the characters in exampledocs/utf8-example.xml ? it contains literale utf8 characters, and it works fine. Based on your #192; comment I assume you are posting XML ... are you sure you are using the utf8 charset? -Hoss - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal
Updating FAQ for International Characters?
Hi all, On the wiki page http://wiki.apache.org/solr/FAQ under the section Why don't International Characters Work? there are a number of options specified for dealing with a character like À (an A with a caret, the agrave character). Any time a character like that was index Solr through a unknown entity error. But if converted to #192; or Agrave; then everything works great. I tried out using Tomcat versus Jetty and got the same results. Before I edit the FAQ, wanted to touch base that others haven't been able to fully index documents with characters like À. Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal
Re: Solr YUI autocomplete
It does, have you looked at http://wiki.apache.org/solr/SolJSON?highlight=%28json%29#Using_Solr.27s_JSON_output_for_AJAX. Also, in my book on Solr, there is an example, but using the jquery autocomplete, which I think was answered earlier on the thread! Hope that helps. ANKITBHATNAGAR wrote: Does Solr supports JSONP (JSON with Padding) in the response? -Ankit -Original Message- From: Ankit Bhatnagar [mailto:abhatna...@vantage.com] Sent: Friday, October 30, 2009 10:27 AM To: 'solr-user@lucene.apache.org' Subject: Solr YUI autocomplete Hi Guys, I have question regarding - how to specify the I am using YUI autocomplete widget and it expects the JSONP response. http://localhost:8983/solr/select/?q=monitorversion=2.2start=0rows=10indent=onwt=jsonjson.wrf= I am not sure how should I specify the json.wrf=function Thanks Ankit -- View this message in context: http://old.nabble.com/JQuery-and-autosuggest-tp26130209p26157130.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 schedule?
Very soon I think is the answer. As well as when its ready. Solr 1.4 is waiting for the next release of Lucene, which is very soon. Once Lucene comes out, Solr will follow in a week or two barring release issues. Also, if you look at JIRA: http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truemode=hidesorter/order=DESCsorter/field=priorityresolution=-1pid=12310230fixfor=12313351 you can see that there are 34 open issues still assigned to 1.4 Eric On Tue, Aug 4, 2009 at 8:08 AM, Robert Youngr...@roryoung.co.uk wrote: Hi, When is Solr 1.4 scheduled for release? Is there any ballpark date yet? Thanks Rob
Re: Merging SOLR Documents
What you are talking about is federated search, and is beyond the scope of Solr. However, maybe you can merge the two indexes into one index, and then distribute over multiple servers to get the performance you are looking for? http://wiki.apache.org/solr/DistributedSearch Eric On Jul 3, 2009, at 7:24 AM, Amandeep Singh09 wrote: Hi list, I am new to this list and just starting solr. My question is how can we merge the results of two different searches. I mean if we have a function that has two threads so it has to go to two differen solr servers to get the result. Is there any way to merge the result using solr and solrj or dow we have to do it in java only? Thanks Amandeep Singh CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Upgrade to solr 1.4
Solr in general is fairly stable in trunk. That isn't to say that a critical error can't get through, because that does happen, but the test suite is pretty comprehensive. With Solr 1.4 getting closer and closer, I think you'll see the pace of change dropping off. I think it's one of those things that you have to judge for yourself.. Are the features/fixes/enhancements in 1.4 trunk worth a potential risk? I assume that as part of deployment into production you have some sort of defined criteria that says Solr can be added? Testing of server capacity/performance etc? Those might tell you if there are any issues with Solr 1.4 trunk that would need to delay your deployment. Eric On Jun 26, 2009, at 10:58 AM, Julian Davchev wrote: David Baker wrote: Hi, I need to upgrade from solr 1.3 to solr 1.4. I was wondering if there is a particular revision of 1.4 that I should use that is considered very stable for a production environment? Well it it's not pronounced stable and given in download page I don't think you can rely on being very stable for production environment. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: building custom RequestHandlers
Are you using the JavaScript interface to Solr? http://wiki.apache.org/solr/SolrJS It may provide much of what you are looking for! Eric On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote: I am using solr and php quite nicely. Currently the work flow includes some manipulation on php side so I correctly format the query string and pass to tomcat/solr. I somehow want to build own request handler in java so I skip the whole apache/php request that is just for formating. This will saves me tons of requests to apache since I use solr directly from javascript. Would like to ask if there is something ready that I can use and adjust. I am kinda new in Java but once I get the pointers I think should be able to pull out. Thanks, JD - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: building custom RequestHandlers
Like most things JavaScript, I found that I had to just dig through it and play with it. However, the Reuters demo site was very easy to customize to interact with my own Solr instance, and I went from there. On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote: Never used it.. I am just looking in docs how can I extend solr but no luck so far :( Hoping for some docs or real extend example. Eric Pugh wrote: Are you using the JavaScript interface to Solr? http://wiki.apache.org/solr/SolrJS It may provide much of what you are looking for! Eric On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote: I am using solr and php quite nicely. Currently the work flow includes some manipulation on php side so I correctly format the query string and pass to tomcat/solr. I somehow want to build own request handler in java so I skip the whole apache/php request that is just for formating. This will saves me tons of requests to apache since I use solr directly from javascript. Would like to ask if there is something ready that I can use and adjust. I am kinda new in Java but once I get the pointers I think should be able to pull out. Thanks, JD - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Possible Containers
Can you highlight what problems you've had? Solr doesn't have any really odd aspects about it that would prevent it from running in any kind of servlet container. Eric On Jun 15, 2009, at 6:18 PM, John Martyniak wrote: I have been using jetty and have been really happy with the ease of use and performance. -John On Jun 15, 2009, at 3:41 PM, Andrew Oliver wrote: I've had it running in Jetty and Tomcat. Tomcat 6 + JDK6 have some nice performance semantics especially with non-blocking IO, persistent connections, etc. It is likely that it will run in Resin, though I haven't tried it. It will also likely run in any of the Tomcat-based stuff (i.e. TC Server from Spring Source, JBossAS from Red Hat) -Andy On Mon, Jun 15, 2009 at 2:25 PM, Mukerjee, Neiloy (Neil)neil.muker...@alcatel-lucent.com wrote: Having tried Tomcat and not come to much success upon the realization that I'm using Tomcat 5.5 for other projects I'm working on and that I would be best off using Tomcat 6 for Solr v1.3.0, I am in search of another possible container. What have people used successfully that would be a good starting point for me to try out? John Martyniak President/CEO Before Dawn Solutions, Inc. 9457 S. University Blvd #266 Highlands Ranch, CO 80126 o: 877-499-1562 c: 303-522-1756 e: j...@beforedawnsoutions.com w: http://www.beforedawnsolutions.com - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: user feedback in solr
You can look at the HTTP server logs output by Jetty (or whatever server you have) that provides a lot of visibility into what people are looking for. However, there isn't that I know of a ready to roll analytics package for Solr It would be cool though! Eric On Jun 10, 2009, at 8:28 AM, Pooja Verlani wrote: Hi all, I wanted to know if there is any provision to accommodate user feedback in the form of query logs and click logs, to improve the search relevance and ranking. Also, is there a possibility of it being included in the next version ? Thank you, Regards, Pooja - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: How to disable posting updates from a remote server
Take a look at the security section in the wiki, u could do this with firewall rules or password access. On Thursday, June 4, 2009, ashokc ash...@qualcomm.com wrote: Hi, I find that I am freely able to post to my production SOLR server, from any other host that can run the post command. So somebody can wipe out the whole index by posting a delete query. Is there a way SOLR can be configured so that it will take updates ONLY from the server on which it is running? Thanks - ashok -- View this message in context: http://www.nabble.com/How-to-disable-posting-updates-from-a-remote-server-tp23876170p23876170.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get number of optimizes
Not sure if it's simpler, but the JMX interface is more structured. I think that just grabbing the page and parsing out the content with your favorite tool (Ruby Hpricot) is pretty simple. Eric On Jun 1, 2009, at 1:17 PM, iamithink wrote: Hello, I'm looking for a simple way to automate (in a shell script) a request for the number of times an index has been optimized (since the Solr webapp has last started). I know that this information is available on the Solr stats page (http://host:port/solr/admin/stats.jsp) under Update Handlers/stats/optimizes, but I'm looking for a simpler way than to retrieve the page using wget or similar and parse the HTML. More generally, is there a convenient way to get at the other data presented on the Stats page? I'm currently using Solr 1.2 but will be migrating to 1.3 soon in case that makes a difference. Thanks... -- View this message in context: http://www.nabble.com/How-to-get-number-of-optimizes-tp23818563p23818563.html Sent from the Solr - User mailing list archive at Nabble.com. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Map tika attribute to be the id in Solr Cell
Hi all, I want to use the Tika attribute stream_name as my unique key, which I can do if I specify uniqueKeystream_name/uniqueKey/ and run curl: curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text \ext.capture=stream_name\ext.map.stream_name=stream_name -F fi...@angeleyes.kar However, this means that I can't use the ext.metadata.prefix to capture the other metadata fields via: curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text \ext.metadata.prefix=metadata_\ext.capture=stream_name \ext.map.stream_name=stream_name -F fi...@angeleyes.kar If I do, it seems like stream_name is lost becasue it is now metadata_stream_name, but I can't use that name in my ext.capture and ext.map: curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text \ext.metadata.prefix=metadata_\ext.capture=metadata_stream_name \ext.map.metadata_stream_name=stream_name -F fi...@angeleyes.kar Any ideas? Currently seems like an either/or, but I'd like both! Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Map tika attribute to be the id in Solr Cell
Grant, I went back and tried to recreate my bug using the example app. And indexing example/site/tutorial.pdf I get the error with this command: budapest:site epugh$ curl http://localhost:8983/solr/update/extract?ext.def.fl=text \ext.metadata.prefix=metadata_\ext.map.stream_name=id -F fi...@tutorial.pdf If I remove the ext.metadata.prefix, then I am okay, but then I can't use dynamic fields for indexing metadata fields. So this works, but I have to manually create all my fields: budapest:site epugh$ curl http://localhost:8983/solr/update/extract?ext.def.fl=text \ext.map.stream_name=id -F fi...@tutorial.pdf Eric On May 28, 2009, at 8:28 PM, Grant Ingersoll wrote: On May 28, 2009, at 11:29 AM, Eric Pugh wrote: Hi all, I want to use the Tika attribute stream_name as my unique key, which I can do if I specify uniqueKeystream_name/uniqueKey/ and run curl: curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text \ext.capture=stream_name\ext.map.stream_name=stream_name -F fi...@angeleyes.kar Why do you need to have the ext.capture and why do you need to map stream_name to stream_name? If the name in tika metadata is a field name, you don't need to map. Also, I assume I'm missing something here because why can't you just pass in id=name of the stream since presumably, in your examples anyway, you have this info, right? If not, I don't know where else you are getting it from, b/c it is a Solr thing, not a Tika thing. In fact, that reminds me, I should document those values that the ERH adds to the Metadata. However, this means that I can't use the ext.metadata.prefix to capture the other metadata fields via: curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text \ext.metadata.prefix=metadata_\ext.capture=stream_name \ext.map.stream_name=stream_name -F fi...@angeleyes.kar If I do, it seems like stream_name is lost becasue it is now metadata_stream_name, but I can't use that name in my ext.capture and ext.map: curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text \ext.metadata.prefix=metadata_\ext.capture=metadata_stream_name \ext.map.metadata_stream_name=stream_name -F fi...@angeleyes.kar Any ideas? Currently seems like an either/or, but I'd like both! Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Map tika attribute to be the id in Solr Cell
Updating to latest and greatest added that data, thank you for the pointer. Too many copies of Solr 1.4 trunk, and I'd neglected to update. However, the issue with the mapping not working with the ext.metadata.prefix seems to remain: budapest:site epugh$ curl http://localhost:8983/solr/update/extract?ext.def.fl=text \ext.map.stream_name=id\ext.metadata.prefix=metadata_ -F fi...@tutorial.pdf bodyh2HTTP ERROR: 500/ h2preorg.apache.solr.common.SolrException: Document [null] missing required field: id Eric On May 28, 2009, at 8:56 PM, Grant Ingersoll wrote: On May 28, 2009, at 8:47 PM, Eric Pugh wrote: Grant, you are quite right! I was too far down in the weeds, and didn't need to be doing all that crazyness. And I don't actually see the metadata fields. I would expect to however! What revision are you running? The following was added to ERH on 4/24/09, r768281, (see SOLR-1128) to solve this exact problem: String[] names = metadata.names(); NamedList metadataNL = new NamedList(); for (int i = 0; i names.length; i++) { String[] vals = metadata.getValues(names[i]); metadataNL.add(names[i], vals); } rsp.add(stream.getName() + _metadata, metadataNL); - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: DIH uses == instead of = in SQL
Argh... I learned a lesson (yet again!)... I spent an hour setting up detailed logging, digging around in lots of DIH source, with no real luck finding the offending == versus =. Mentioned my frustration to a colleague and he pointed out right where I had checked multiple times that I had typed in == versus = in my SQL statement! Eric On May 23, 2009, at 12:02 AM, Noble Paul നോബിള് नोब्ळ् wrote: are you using delta-import w/o a deltaImportQuery ? pls paste the relevant portion of data-config.xml On Sat, May 23, 2009 at 12:13 AM, Eric Pugh ep...@opensourceconnections.com wrote: I am getting this error: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '=='1433'' at line 1 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) during a select for a specific institution: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select institution_id, name, acronym as i_acronym from institutions where institution_id=='1433' Processing Document # 1 at org.apache.solr.handler.dataimport.JdbcDataSource $ResultSetIterator.init(JdbcDataSource.java:248) at org .apache .solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java: 205) at org .apache .solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java: 38) at org .apache .solr .handler .dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org .apache .solr .handler .dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) I just switched to using the paired deltaImportQuery and deltaQuery approach. I am using the latest from trunk. Any ideas? Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal -- - Noble Paul | Principal Engineer| AOL | http://aol.com - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
DIH uses == instead of = in SQL
I am getting this error: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '=='1433'' at line 1 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) during a select for a specific institution: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select institution_id, name, acronym as i_acronym from institutions where institution_id=='1433' Processing Document # 1 at org.apache.solr.handler.dataimport.JdbcDataSource $ResultSetIterator.init(JdbcDataSource.java:248) at org .apache .solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:205) at org .apache .solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) at org .apache .solr .handler .dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org .apache .solr .handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java: 71) I just switched to using the paired deltaImportQuery and deltaQuery approach. I am using the latest from trunk. Any ideas? Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Cleanly shutting down Solr/Jetty on Windows
Wouldn't you want to run it as a windows service and use net start/ net stop? If you download and install Jetty it comes with the appropriate scripts to be installed as a service. Eric On May 20, 2009, at 12:39 PM, Chris Harris wrote: I'm running Solr with the default Jetty setup on Windows. If I start solr with java -jar start.jar from a command window, then I can cleanly shut down Solr/Jetty by hitting Control-C. In particular, this causes the shutdown hook to execute, which appears to be important. However, I don't especially want to run Solr from a command window. Instead, I want to launch it from a scheduled task, which does the java -jar start.jar in a non-interactive way and which does not bring up a command window. If I were on unix I could use the kill command to send an appropriate signal to the JVM, but I gather this doesn't work on Windows. As such, what is the proper way to cleanly shut down Solr/Jetty on Windows, if they are not running in a command window? The main way I know how to kill Solr right now if it's running outside a command window is to go to the Windows task manager and kill the java.exe process there. But this seems to kill java immediately, so I'm doubtful that the shutdown hook is getting executed. I found a couple of threads through Google suggesting that Jetty now has a stop.jar script that's capable of stopping Jetty in a clean way across platforms. Is this maybe the best option? If so, would it be possible to include stop.jar in the Solr example/ directory? - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Shutting down an instance of EmbeddedSolrServer
I created ticket SOLR-1178 for the small tweak. https://issues.apache.org/jira/browse/SOLR-1178 Eric On May 5, 2009, at 12:26 AM, Noble Paul നോബിള് नोब्ळ् wrote: hi Eric, there should be a getter for CoreContainer in EmbeddedSolrServer. Open an issue --Noble On Tue, May 5, 2009 at 12:17 AM, Eric Pugh ep...@opensourceconnections.com wrote: Hi all, I notice that when I use EmbeddedSolrServer I have to use Control C to stop the process. I think the way to shut it down is by calling coreContainer.shutdown(). However, is it possible to get the coreContainer from a SolrServer object? Right now it is defined as protected final CoreContainer coreContainer;. I wanted to do: ((EmbeddedSolrServer)solr)getCoreContainer.shutdown(); But is seem I need to keep my own reference to the coreContainer? Is changing this worth a patch? Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal -- --Noble Paul - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
XPath query support in Solr Cell
So I am trying to filter down what I am indexing, and the basic XPath queries don't work. For example, working with tutorial.pdf this indexes all the div/: curl http://localhost:8983/solr/update/extract?ext.idx.attr=true \ext.def.fl=text\ext.map.div=foo_t\ext.capture=div \ext.literal.id=126\ext.xpath=\/xhtml:html\/xhtml:body\/ descendant:node\(\) -F tutori...@tutorial.pdf However, if I want to only index the first div, I expect to do this: budapest:site epugh$ curl http://localhost:8983/solr/update/extract?ext.idx.attr=true \ext.def.fl=text\ext.map.div=foo_t\ext.capture=div \ext.literal.id=126\ext.xpath=\/xhtml:html\/xhtml:body\/ xhtml:div[1] -F tutori...@tutorial.pdf But I keep getting back an issue from curl. My attempts to escape the [1] have failed. Any suggestions? curl: (3) [globbing] error: bad range specification after pos 174 Eric PS, Also, this site seems to be okay as a place to upload your html and practice xpath: http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm I did have to trip out the namespace stuff though. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Solr vs Sphinx
Something that would be interesting is to share solr configs for various types of indexing tasks. From a solr configuration aimed at indexing web pages to one doing large amounts of text to one that indexes specific structured data. I could see those being posted on the wiki and helping folks who say I want to do X, is there an example?. I think most folks start with the example Solr install and tweak from there, which probably isn't the best path... Eric On May 15, 2009, at 8:09 AM, Mark Miller wrote: In the spirit of good defaults: I think we should change the Solr highlighter to highlight phrase queries by default, as well as prefix,range,wildcard constantscore queries. Its awkward to have to tell people you have to turn those on. I'd certainly prefer to have to turn them off if I have some limitation rather than on. - Mark - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: CommonsHttpSolrServer vs EmbeddedSolrServer
CommonsHttpSolrServer is how you access Solr from a Java client via HTTP. You can connect to a Solr running anywhere EmbeddedSolrServer starts up Solr internally, and connects directly, all in a single JVM... Embedded may be faster, the jury is out, but you have to have your Solr server and your Solr client on the same box... Unless you really need it, I would start with CommonsHttpSolrServer, it's easier to configure and get going with and more flexible. Eric On May 14, 2009, at 1:30 PM, sachin78 wrote: What is the difference between EmbeddedSolrServer and CommonsHttpSolrServer. Which is the preferred server to use? In some blog i read that EmbeddedSolrServer is 50% faster than CommonsHttpSolrServer,then why do we need to use CommonsHttpSolrServer. Can anyone please guide me the right path/way.So that i pick the right implementation. Thanks in advance. --Sachin -- View this message in context: http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html Sent from the Solr - User mailing list archive at Nabble.com. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: StatsComponent and 1.3
I'm guessing that manipulating the client end, acts_as_solr, is an easier approach then backporting server side functionality. Especially as you will have to forward migrate at some point. Out of curiosity, which version of acts_as_solr are you using? The plugin has moved homes a couple of times, and I have heard and found that the version by Mathias Meyer at http://github.com/mattmatt/acts_as_solr/tree/master is the best. I've used it with 1.4 trunk with no issues, and Mathias has been very responsive. Eric On May 7, 2009, at 10:25 PM, David Shettler wrote: Foreword: I'm not a java developer :) OSVDB.org and datalossdb.org make use of solr pretty extensively via acts_as_solr. I found myself with a real need for some of the StatsComponent stuff (mainly the sum feature), so I pulled down a nightly build and played with it. StatsComponent proved perfect, but... the nightly build output seems to be different, and thus incompatible with acts_as_solr. Now, I realize this is more or less an acts_as_solr issue, but... Is it possible, with some degree of effort (obviously) for me to essentially port some of the functionality of StatsComponent to 1.3 myself? It's that, or waiting for 1.4 to come out and someone developing support for it into acts_as_solr, or myself fixing what I have for acts_as_solr to work with the output. I'm just trying to gauge the easiest solution :) Any feedback or suggestions would be grand. Thanks, Dave Open Security Foundation - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: How to index the documents in Apache Solr
I would also recommend starting with the out of the box Jetty Otherwise you are both trying to learn the basics of Solr and how to stand it up in Tomcat. It's not hard, but learn Solr basics first, then move to more advanced topics. Eric On May 6, 2009, at 9:57 AM, Erik Hatcher wrote: On May 6, 2009, at 5:11 AM, uday kumar maddigatla wrote: The link which shows the things in Jetty. But i'm using Tomcat. hi, If i run the command which is given in the link, it is tryinge to post the indexes at port number 8983. But in my case my tomcat is running on 8080. Where to change the port. ~/dev/solr/example/exampledocs: java -jar post.jar -help SimplePostTool: version 1.2 This is a simple command line tool for POSTing raw XML to a Solr port. XML data can be read from files specified as commandline args; as raw commandline arg strings; or via STDIN. Examples: java -Ddata=files -jar post.jar *.xml java -Ddata=args -jar post.jar 'deleteid42/id/delete' java -Ddata=stdin -jar post.jar hd.xml Other options controlled by System Properties include the Solr URL to POST to, and whether a commit should be executed. These are the defaults for all System Properties... -Ddata=files -Durl=http://localhost:8983/solr/update -Dcommit=yes Erik - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Shutting down an instance of EmbeddedSolrServer
Hi all, I notice that when I use EmbeddedSolrServer I have to use Control C to stop the process. I think the way to shut it down is by calling coreContainer.shutdown(). However, is it possible to get the coreContainer from a SolrServer object? Right now it is defined as protected final CoreContainer coreContainer;. I wanted to do: ((EmbeddedSolrServer)solr)getCoreContainer.shutdown(); But is seem I need to keep my own reference to the coreContainer? Is changing this worth a patch? Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: How to submit code improvement/suggestions
Yup! One thing though is that if you see some big changes you want to make, you should probably join the solr-dev list and broach the topic there first to make sure you are headed on the right path. The committers typically don't want to introduce change for change's sake, but cleanup and better code docs is always welcome on open source projects, and a great way to learn the code and the community. Eric On Apr 30, 2009, at 1:27 PM, Amit Nithian wrote: My apologies if this sounds like a silly question but for this project, how do I go about submitting code suggestions/improvements? They aren't necessarily bugs as such but rather just cleaning up some perceived strangeness (or even suggesting a package change). Would I need to create a JIRA ticket and submit a patch? Thanks Amit - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Is solr right for this scenario?
It seems like you have three components to your system: 1) Data indexing from multiple sources 2) Search for specific words in documents 3) Preserve rating and search term. I think that Solr comes into play on #1 and #2. You can index content in any number of approaches, either via the new DataImportHandler architecture, or the more traditional write a loader script that puts the documents in Solr. You can store in Solr when a document was indexed, and use that to check against the original documents to see if they changed. Check a last published tag on an RSS feed, or the last updated time on a physical file. This is a very common use case for Solr. For #2, you could have users issue queries, and make them favorites, storing them in the DB. Assuming they like the results they mark the documents with the ratings, which you could store in Solr, but I would put in a DB.. Easier to manage User A says 1, User B says 0. Then for the UI, just issue the search baseed on queries stored in the db, and match the id's up with the ranking in the DB. Simple! As far as the last part, Solr works best in filesystem, that is part of why it is so fast, no clunky SQL. There are scripts for backing up and restoring indexes that you can use, check the wiki http://wiki.apache.org/solr/SolrOperationsTools . Eric On Apr 24, 2009, at 6:18 AM, Developer In London wrote: Hi All, I am new to the whole Solr/Lucene community. But I think this might be the solution ot what I am looking to do. I would appreciate any feedback on how I can go about doing this with Solr: I am looking to make a system where - a) mainly lots of different blog sites, web journals, articles are indexed on a regular basis. Data that has already been indexed needs to be revisited to see if there are any changes. b) The end users has very fixed search terms, eg 'Lloyds TSB' and 'Corporate Banking'. All the documents that are found matching this are presented to a human to analyse. c) Once the human analyses the document he gives it a rating of 1, 0 or -1. This rating needs to be saved somewhere and be linked with the specific document and also with the search term (eg 'Lloyds TSB' 'Corporate Banking' in this case). d) End users can then see these documents with the ratings next to them. What would be the best approach to this? Should I set up a different database to save the rating and relevant mappings, or is there any way to put it in to Solr? My 2nd question is, can Solr Index be saved in a database in any way? Whats the backup and recovery method on Solr? Thanks in advance. Nayeem - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Is solr right for this scenario?
On Apr 24, 2009, at 7:54 AM, Developer In London wrote: Thanks for the fast reply. Wow this seems a very active community. I have a few more questions in that case: 1) If Solr is going to be file-based, os it then preferable to run multiple Solrs with Shards? How can I determine what capacity 1 Solr can cope? It depends! Solr can manage up to X records easily in a single index, however your milage may vary. One of the nice things about Solr is it is very scalable, and offers you many options. I would go with the most simple setup for Solr for now, and then as your development progresses, and you load data then investigate sharding etc. Solr, properly managed, won't be your bottleneck, it's be your data loading scripts or elsewhere. 2) I am presuming there is already tokenizers for hypertext and xml in Solr so that it can use extract the right information out? There are a number of different options available out there for indexing content. 3) I need to also get the 'author' information out for things like blogs. I guess theres no universal way of doing it and I have to have someone manually go through the documents and feed the solr index with the author information? Your loading script will be bespoke to your situation, however any competent developer can put together scripts to load from your varous data sources. When you mention 'write a loader script...', do you mean I should incorporate the date checking in the loader script? Solr has no internal way of checking the timestamp in a document and updating? Solr makes no assumptions about your data sources, it isn't a document management system, it is just a search engine. Well, that isn't totally true, the new DataImportHandler architecture does allow you to preserve some information about when did I last run an update, what has been updated since, however it's pretty new stuff. Eric Thanks, Nayeem 2009/4/24 Eric Pugh ep...@opensourceconnections.com It seems like you have three components to your system: 1) Data indexing from multiple sources 2) Search for specific words in documents 3) Preserve rating and search term. I think that Solr comes into play on #1 and #2. You can index content in any number of approaches, either via the new DataImportHandler architecture, or the more traditional write a loader script that puts the documents in Solr. You can store in Solr when a document was indexed, and use that to check against the original documents to see if they changed. Check a last published tag on an RSS feed, or the last updated time on a physical file. This is a very common use case for Solr. For #2, you could have users issue queries, and make them favorites, storing them in the DB. Assuming they like the results they mark the documents with the ratings, which you could store in Solr, but I would put in a DB.. Easier to manage User A says 1, User B says 0. Then for the UI, just issue the search baseed on queries stored in the db, and match the id's up with the ranking in the DB. Simple! As far as the last part, Solr works best in filesystem, that is part of why it is so fast, no clunky SQL. There are scripts for backing up and restoring indexes that you can use, check the wiki http://wiki.apache.org/solr/SolrOperationsTools. Eric On Apr 24, 2009, at 6:18 AM, Developer In London wrote: Hi All, I am new to the whole Solr/Lucene community. But I think this might be the solution ot what I am looking to do. I would appreciate any feedback on how I can go about doing this with Solr: I am looking to make a system where - a) mainly lots of different blog sites, web journals, articles are indexed on a regular basis. Data that has already been indexed needs to be revisited to see if there are any changes. b) The end users has very fixed search terms, eg 'Lloyds TSB' and 'Corporate Banking'. All the documents that are found matching this are presented to a human to analyse. c) Once the human analyses the document he gives it a rating of 1, 0 or -1. This rating needs to be saved somewhere and be linked with the specific document and also with the search term (eg 'Lloyds TSB' 'Corporate Banking' in this case). d) End users can then see these documents with the ratings next to them. What would be the best approach to this? Should I set up a different database to save the rating and relevant mappings, or is there any way to put it in to Solr? My 2nd question is, can Solr Index be saved in a database in any way? Whats the backup and recovery method on Solr? Thanks in advance. Nayeem - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal -- cashflowclublondon.co.uk (`-''-/).___..--''`-._ `6_ 6
Re: Adding text document
I would work through this tutorial and then ask specific questions: http://lucene.apache.org/solr/tutorial.html Alternatively there are some commercial support options: http://wiki.apache.org/solr/Support Eric On Mar 30, 2009, at 6:36 PM, nga pham wrote: Hi All, I am new to Solr. Can you please tell me, how can I add a text document? Thank you, Nga - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Best way to unit test solr integration
So my first thought is that unit test + solr integration is an oxymoron. In the sense that unit test implies the smallest functional unit, and solr integration implies multiple units working together. It sounds like you have two different tasks. the code that generate queies, you can test that without Solr. If you need to parse some sort of solr document to generate a query based on it, then mock up the query. A lot of folks will just use Solr to build a result set, and then save that on the filesystem. my_big_result1.xml and read it in and feed it to your code. On the other hand, for you code testing indexing and retrieval, again, if you can use the same approach to decouple what solr does from your code. Unless you've patched Solr, you shouldn't need to unit test Solr, Solr has very nice unit testing built in. On the other hand, if you are doing integration testing, where you want a more end to end view of your application, then you probably already have a test solr setup in your environment somewhere that you can rely on to use. Spinning up and shutting down Solr for tests can be done, and I can think of use cases for why you might want to do it, but it does incur a penalty of being more work. And you still need to validate that your embedded/unit test solr works the same as your integration/test environment Solr. Eric On Mar 27, 2009, at 11:59 AM, Joe Pollard wrote: Hello, On our project, we have quite a bit of code used to generate Solr queries, and I need to create some unit tests to ensure that these continue to work. In addition, I need to generate some unit tests that will test indexing and retrieval of certain documents, based on our current schema and the application logic that generates the indexable documents as well as generates the Solr queries. My question is - what's the best way for me to unit test our Solr integration? I'd like to be able to spin up an embedded/in-memory solr, or that failing just start one up as part of my test case setup, fill it with interesting documents, and do some queries, comparing the results to expected results. Are there wiki pages or other documented examples of doing this? It seems rather straight-forward, but who knows, it may be dead simple with some unknown feature. Thanks! -Joe - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Best way to unit test solr integration
So in the building block story you talked about, that sounds like an integration (functional? user acceptance?) test.. And I would treat Solr the same way you treat your database that you are storing model objects in. If in your tests you bring up a fresh version of the db, populate it with tables etc, put in sample data, then you should do the same with Solr. My guess is that you have a test database running, and therefore you need a live supported test Solr. And the same processes you use so that two functional tests don't step on each others data in the database can be applied to Solr! You can think of tweaking solr config changes as similar to tweaking indexes in your db.. Both require Configuration Management to track those changes, ensure they are deployed, and don't regress anything. Let us know how you get on! Eric On Mar 27, 2009, at 12:50 PM, Joe Pollard wrote: Thanks for the tips, I like the suggestion of testing the document and query generation without having solr involved. That seems like a more bite-sized unit; I think I'll do that. However, here's the test case that I'm considering where I'd like to have a live solr instance: During an exercise of optimizing our schema, I'm going to be making wholesale changes that I'd like to ensure don't break some portion of our app. It seems like a good method for this would be to write a test with the following steps: (arguably not a unit test, but a very valuable test indeed in our application) * take some defined model object generated at test time, store it in db * run it through our document creation code * submit it into solr * generate a query using our custom criteria-based generation code * ensure that the query returns the results as expected * flesh out the new model objects from the db using only the id fields returned from Solr * In the end, it would be expected to have model objects retrieved from the db that match model objects at the beginning of the test. These building blocks could be stacked in numerous ways to test almost all the different scenarios in which we use Solr. Also, when/if we start making solr config changes, I can ensure that they change nothing from my app's functional point of view (with the exception of ridding us of dreaded OOMs). Thanks, -Joe -Original Message- From: Eric Pugh [mailto:ep...@opensourceconnections.com] Sent: Friday, March 27, 2009 11:27 AM To: solr-user@lucene.apache.org Subject: Re: Best way to unit test solr integration So my first thought is that unit test + solr integration is an oxymoron. In the sense that unit test implies the smallest functional unit, and solr integration implies multiple units working together. It sounds like you have two different tasks. the code that generate queies, you can test that without Solr. If you need to parse some sort of solr document to generate a query based on it, then mock up the query. A lot of folks will just use Solr to build a result set, and then save that on the filesystem. my_big_result1.xml and read it in and feed it to your code. On the other hand, for you code testing indexing and retrieval, again, if you can use the same approach to decouple what solr does from your code. Unless you've patched Solr, you shouldn't need to unit test Solr, Solr has very nice unit testing built in. On the other hand, if you are doing integration testing, where you want a more end to end view of your application, then you probably already have a test solr setup in your environment somewhere that you can rely on to use. Spinning up and shutting down Solr for tests can be done, and I can think of use cases for why you might want to do it, but it does incur a penalty of being more work. And you still need to validate that your embedded/unit test solr works the same as your integration/test environment Solr. Eric On Mar 27, 2009, at 11:59 AM, Joe Pollard wrote: Hello, On our project, we have quite a bit of code used to generate Solr queries, and I need to create some unit tests to ensure that these continue to work. In addition, I need to generate some unit tests that will test indexing and retrieval of certain documents, based on our current schema and the application logic that generates the indexable documents as well as generates the Solr queries. My question is - what's the best way for me to unit test our Solr integration? I'd like to be able to spin up an embedded/in-memory solr, or that failing just start one up as part of my test case setup, fill it with interesting documents, and do some queries, comparing the results to expected results. Are there wiki pages or other documented examples of doing this? It seems rather straight-forward, but who knows, it may be dead simple with some unknown feature. Thanks! -Joe - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http
Re: How do I accomplish this (semi-)complicated setup?
You could index the user name or ID, and then in your application add as filter the username as you pass the query back to Solr. Maybe have a access_type that is Public or Private, and then for public searches only include the ones that meet the access_type of Public. Eric On Mar 25, 2009, at 12:52 PM, Jesper Nøhr wrote: Hi list, I've finally settled on Solr, seeing as it has almost everything I could want out of the box. My setup is a complicated one. It will serve as the search backend on Bitbucket.org, a mercurial hosting site. We have literally thousands of code repositories, as well as users and other data. All this needs to be indexed. The complication comes in when we have private repositories. Only select users have access to these, but we still need to index them. How would I go about accomplishing this? I can't think of a clean way to do it. Any pointers much appreciated. Jesper - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Indexing the directory
Victor, I'd recommend look at the tutorial at http://lucene.apache.org/solr/tutorial.html and using the list for more specific questions. Also, there a list of companies (as well as mine!) that do support of Solr at http://wiki.apache.org/solr/Support that eTrade can contract with to provide indepth support. Eric Pugh On Mar 16, 2009, at 6:25 PM, Huang, Zijian(Victor) wrote: Hi, all: I am new to SOLR, can anyone please tell me what do I do to index a some text files in a local directory? Thanks Victor - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Is wiki page still accurate
Folks, Is this section title Full Import Example on http://wiki.apache.org/solr/DataImportHandler still accurate? The steps referring to the example-solr-home.jar and the SOLR-469 patch seem out of date with where 1.4 is today? Seems like the example-DIH stuff is simpler/more direct example??? Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Organizing POJO's in a heirarchy in Solr
Solr really isn't organized for tree structures of data. I think you might do better using a database with a tree structure. pojo would be a table of pojo's serialized out. And the parent_id could point to another structure that builds the tree. Can you flesh out your use case more of why they need to be in a tree structure. Eric On Mar 11, 2009, at 8:29 AM, PKJ wrote: Is there anyone who have any idea solve this issue? Please give your thoughts. Regards, Praveen PKJ wrote: Hi Eric, Thanks for your response. Yes you are right! Am trying to place POJOs into Solr directly and this is working fine. I want to search them based on the object properties, need to organize them in a heirarchy but not by package names. Something like: /Repository | |_ Folder1 | |_ POJO 1 It must store the object in this hierarchy. I might be asking which is not at all supported by Solr. Please give your valuable inputs. Regards, Praveen Eric Pugh-4 wrote: Are you trying to Java objects in Solr in order to be searchable? How about just dumping them as text using POJO -- to text formats such as JSON or Betwixt (http://commons.apache.org/betwixt/). Then you can just search on the package structure... ?q=com.abc.lucene.* to return everything under that structure? Eric On Mar 10, 2009, at 7:13 AM, Praveen_Kumar_J wrote: Someone please throw some light on this post. Thanks in advance. Praveen_Kumar_J wrote: Hi I just upload simple POJOs into Solr by creating custom types and dynamic fields in Solr schema as shown below, ... fieldType name=TestType class=com.abc.lucene.TestType sortMissingLast=true omitNorms=true/ dynamicField name=*_i_i_s_m type=integerindexed=true stored=true multiValued=true/ dynamicField name=*_i_i_s_nm type=integerindexed=true stored=true multiValued=false/ dynamicField name=*_i_i_ns_m type=integerindexed=true stored=false multiValued=true/ But I need to organize these POJOs in a hierarchy which can be navigated easily (something like explorer). Am not sure whether this feature is supported by Solr. But still planning to implement it somehow (With the help of DB). /Root | |_ POJO Type1 | | | |_POJO Type1_1 | |_POJO Type2 | |_POJO Type2_1 I need to organize the POJOs as shown above. Is there any way to achieve this requirement?? Regards, Praveen -- View this message in context: http://www.nabble.com/Organizing-POJO%27s-in-a-heirarchy-in-Solr-tp22427900p22432121.html Sent from the Solr - User mailing list archive at Nabble.com. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal -- View this message in context: http://www.nabble.com/Organizing-POJO%27s-in-a-heirarchy-in-Solr-tp22427900p22454101.html Sent from the Solr - User mailing list archive at Nabble.com. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Organizing POJO's in a heirarchy in Solr
Are you trying to Java objects in Solr in order to be searchable? How about just dumping them as text using POJO -- to text formats such as JSON or Betwixt (http://commons.apache.org/betwixt/). Then you can just search on the package structure... ?q=com.abc.lucene.* to return everything under that structure? Eric On Mar 10, 2009, at 7:13 AM, Praveen_Kumar_J wrote: Someone please throw some light on this post. Thanks in advance. Praveen_Kumar_J wrote: Hi I just upload simple POJOs into Solr by creating custom types and dynamic fields in Solr schema as shown below, ... fieldType name=TestType class=com.abc.lucene.TestType sortMissingLast=true omitNorms=true/ dynamicField name=*_i_i_s_m type=integerindexed=true stored=true multiValued=true/ dynamicField name=*_i_i_s_nm type=integerindexed=true stored=true multiValued=false/ dynamicField name=*_i_i_ns_m type=integerindexed=true stored=false multiValued=true/ But I need to organize these POJOs in a hierarchy which can be navigated easily (something like explorer). Am not sure whether this feature is supported by Solr. But still planning to implement it somehow (With the help of DB). /Root | |_ POJO Type1 | | | |_POJO Type1_1 | |_POJO Type2 | |_POJO Type2_1 I need to organize the POJOs as shown above. Is there any way to achieve this requirement?? Regards, Praveen -- View this message in context: http://www.nabble.com/Organizing-POJO%27s-in-a-heirarchy-in-Solr-tp22427900p22432121.html Sent from the Solr - User mailing list archive at Nabble.com. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal