DIH caching URLDataSource/XPath entity (not root)
Hi there, my index is created from XML files that are downloaded on the fly. This also includes downloading a mapping file that is used to resolve IDs in the main file (root entity) and map them onto names. The basic functionality works - the supplier_name is set for each document. However, the mapping file is downloaded with every iteration of the root entity. In order to avoid this and only have it downloaded once and the mapping cached, I have set the cacheKey and cacheLookup properties but the file is still requested over and over again. Has someone worked with multiple different XMLs files with mappings loaded via different DIH entities? I’d appreciate any samples or hints. Or maybe someone is able to spot the error in the following configuration? (The custom DataSource is a subclass of URLDataSource and handles Basic Auth as well as decompression.)
Re: Nested JSON Facets (Subfacets)
> > Interesting I don't recall a bug like that being fixed. > Anyway, glad it works for you now! > -Yonik Then it’s probably because it’s Christmas time! :-)
Re: Nested JSON Facets (Subfacets)
Hi Yonik, after upgrading to Solr 6.3.0, the nested function works as expected! (Both with and without docValues.) "facets":{ "count":3179500, "all_pop":1.5901646171168616E8, "shop_cat":{ "buckets":[{ "val":"Kontaktlinsen > Torische Linsen", "count":75168, "cat_sum":3752665.0497611803}, Thanks, Chantal > Am 15.12.2016 um 16:00 schrieb Chantal Ackermann <c.ackerm...@it-agenten.com>: > > Hi Yonik, > > are you certain that nesting a function works as documented on > http://yonik.com/solr-subfacets/? > > top_authors:{ >type: terms, >field: author, >limit: 7, >sort: "revenue desc", >facet:{ > revenue: "sum(sales)" >} > } > > > I’m getting the feeling that the function is never really executed because, > for my index, calling sum() with a non-number field (e.g. a multi-valued > string field) throws an error when *not nested* but does *not throw an error* > when nested: > >json.facet={all_pop: "sum(gtin)“} > >"error":{ >"trace":“java.lang.UnsupportedOperationException > at > org.apache.lucene.queries.function.FunctionValues.doubleVal(FunctionValues.java:47) > > And the following does not throw an error but definitely should if the > function would be executed: > >json.facet={all_pop:"sum(popularity)",shop_cat: {type:terms, > field:shop_cat, facet: {cat_pop:"sum(gtin)"}}} > > returns: > > "facets":{ >"count":2815500, >"all_pop":1.4065865823321116E8, >"shop_cat":{ > "buckets":[{ > "val":"Kontaktlinsen > Torische Linsen", > "count":75168, > "cat_pop":0.0}, >{ > "val":"Damen-Mode/Inspirationen", > "count":47053, > "cat_pop":0.0}, > > For completeness: here is the field directive for „gtin“ with > „text_noleadzero“ based on „solr.TextField“: > > required="false" multiValued="true“/> > > > Is this a bug or is the documentation a glimpse of the future? I will try > version 6.3.0, now. I was still on 6.1.0 for the above tests. > (I have also tried with the „avg“ function, just to make sure, but same > there.) > > Cheers, > Chantal
Re: Nested JSON Facets (Subfacets)
Hi Yonik, are you certain that nesting a function works as documented on http://yonik.com/solr-subfacets/? top_authors:{ type: terms, field: author, limit: 7, sort: "revenue desc", facet:{ revenue: "sum(sales)" } } I’m getting the feeling that the function is never really executed because, for my index, calling sum() with a non-number field (e.g. a multi-valued string field) throws an error when *not nested* but does *not throw an error* when nested: json.facet={all_pop: "sum(gtin)“} "error":{ "trace":“java.lang.UnsupportedOperationException at org.apache.lucene.queries.function.FunctionValues.doubleVal(FunctionValues.java:47) And the following does not throw an error but definitely should if the function would be executed: json.facet={all_pop:"sum(popularity)",shop_cat: {type:terms, field:shop_cat, facet: {cat_pop:"sum(gtin)"}}} returns: "facets":{ "count":2815500, "all_pop":1.4065865823321116E8, "shop_cat":{ "buckets":[{ "val":"Kontaktlinsen > Torische Linsen", "count":75168, "cat_pop":0.0}, { "val":"Damen-Mode/Inspirationen", "count":47053, "cat_pop":0.0}, For completeness: here is the field directive for „gtin“ with „text_noleadzero“ based on „solr.TextField“: required="false" multiValued="false" docValues="true“/> > > I have also re-indexed (removed data/ and indexed from scratch). The > popularity field is populated with random values (as I don’t have the real > values from production) meaning that all documents have values > 0. > > Here the statistics output: > > "stats":{ >"stats_fields":{ > "popularity":{ >"min":7.952374289743602E-5, >"max":99.3896484375, >"count":1687500, >"missing":0, >"sum":8.436878611434968E7, >"sumOfSquares":5.626142812197906E9, >"mean":49.9963176973924, >"stddev":28.885623872869992}, > > And this is the relevant facet output from calling > > /solr//query? > json.facet={ > num_pop:{query: "popularity[* TO *]“}, > all_pop: "sum(popularity)“, > shop_cat: {type:terms, field:shop_cat, facet: {cat_pop: > "sum(popularity)"}}}=*:*=1=popularity=json > > "facets":{ >"count":1687500, >"all_pop":1.5893775613258794E8, >"num_pop":{ > "count":1687500}, >"shop_cat":{ > "buckets":[{ > "val":"Kontaktlinsen > Torische Linsen", > "count":75168, > "cat_pop":0.0}, >{ > "val":"Neu", > "count":31772, > "cat_pop":0.0}, >{ > "val":"Gesundheit & Schönheit > Gesundheitspflege", > "count":20281, > "cat_pop":0.0}, > [… more facets omitted] > > > The /query handler is an edismax configuration, though I don’t think this > matters as long as the results include documents with popularity > 0 which is > the case as seen in the facet output (and sum() works in general for all of > the documents just not for the buckets as seen in „all_pop"). > > I will try to explicitly turn off the docValues and add stored=„true“ just to > try out more. If someone has any other suggestions that I should try - I > would be glad to here them. If it is not possible to retrieve the sum in this > way I would have to fetch each sum separately which would be a huge > performance impact. > > Thanks! > Chantal > > > > > >> Am 15.12.2016 um 10:16 schrieb CA: >> >>> num_pop:{query:"popularity:[* TO *]"} >
Re: Nested JSON Facets (Subfacets)
Hi Yonik, here is an update on what I’ve tried so far, unfortunately without any more luck. The field directive is (should have included this when asking the question): /query? json.facet={ num_pop:{query: "popularity[* TO *]“}, all_pop: "sum(popularity)“, shop_cat: {type:terms, field:shop_cat, facet: {cat_pop: "sum(popularity)"}}}=*:*=1=popularity=json "facets":{ "count":1687500, "all_pop":1.5893775613258794E8, "num_pop":{ "count":1687500}, "shop_cat":{ "buckets":[{ "val":"Kontaktlinsen > Torische Linsen", "count":75168, "cat_pop":0.0}, { "val":"Neu", "count":31772, "cat_pop":0.0}, { "val":"Gesundheit & Schönheit > Gesundheitspflege", "count":20281, "cat_pop":0.0}, [… more facets omitted] The /query handler is an edismax configuration, though I don’t think this matters as long as the results include documents with popularity > 0 which is the case as seen in the facet output (and sum() works in general for all of the documents just not for the buckets as seen in „all_pop"). I will try to explicitly turn off the docValues and add stored=„true“ just to try out more. If someone has any other suggestions that I should try - I would be glad to here them. If it is not possible to retrieve the sum in this way I would have to fetch each sum separately which would be a huge performance impact. Thanks! Chantal > Am 15.12.2016 um 10:16 schrieb CA: > >> num_pop:{query:"popularity:[* TO *]"}
Re: Blog Post: Integration Testing SOLR Index with Maven
Hi, @Lance - thanks, it's a pleasure to give something back to the community. Even if it is comparatively small. :-) @Paul - it's definitly not 15 min but rather 2 min. Actually, the testing part of this setup is very regular compared to other Maven projects. The copying of the WAR file and repackaging is not that time consuming. (This is still Maven - widely used and proven - it wouldn't be if it was not practical?) Cheers, Chantal
Blog Post: Integration Testing SOLR Index with Maven
Hi all, this is not a question. I just wanted to announce that I've written a blog post on how to set up Maven for packaging and automatic testing of a SOLR index configuration. http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/ Feedback or comments appreciated! And again, thanks for that great piece of software. Chantal
Re: Blog Post: Integration Testing SOLR Index with Maven
Hi Paul, I'm sorry I cannot provide you with any numbers. I also doubt it would be wise to post any as I think the speed depends highly on what you are doing in your integration tests. Say you have several request handlers that you want to test (on different cores), and some more complex use cases like using output from one request handler as input to others. You would also import test data that would be representative enough to test these request handlers and use cases. The requests themselves, of course, only take as long as SolrJ takes to run and SOLR takes to answer them. In addition, there is the overhead of Maven starting up, running all the plugins, importing the data, executing the tests. Well, Maven is certainly not the fastest tool to start up and get going… If you are asking because you want to run rather a lot requests and test their output - JMeter might be preferrable? Hope that was not too vague an answer, Chantal Am 14.03.2013 um 09:51 schrieb Paul Libbrecht: Nice, Chantal can you indicate there or here what kind of speed for integration tests you've reached with this, from a bare source to a successfully tested application? (e.g. with 100 documents) thanks in advance Paul On 14 mars 2013, at 09:29, Chantal Ackermann wrote: Hi all, this is not a question. I just wanted to announce that I've written a blog post on how to set up Maven for packaging and automatic testing of a SOLR index configuration. http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/ Feedback or comments appreciated! And again, thanks for that great piece of software. Chantal
Re: Antwort: Re: Antwort: Re: Query during a query
Hi Johannes, on production, SOLR is better a backend service to your actual web application: Client (Browser) --- Web App --- Solr Server Very much like a database. The processes are implemented in your Web App, and when they require results from Solr for whatever reason they simply query it. Chantal Am 03.09.2012 um 06:48 schrieb johannes.schwendin...@blum.com: The problem is, that I don't know how to do this. :P My sequence: the user enters his search words. This is sent to solr. There I need to make another query first to get metadata from the index. with this metadata I have to connect to an external source to get some information about the user. With this information and the first search words I query then the solr index to get the search result. I hope its clear now wheres my problem and what I want to do Regards, Johannes Von: Jack Krupansky j...@basetechnology.com An: solr-user@lucene.apache.org Datum: 31.08.2012 15:03 Betreff: Re: Antwort: Re: Query during a query So, just do another query before doing the main query. What's the problem? Be more specific. Walk us through the sequence of processing that you need. -- Jack Krupansky -Original Message- From: johannes.schwendin...@blum.com Sent: Friday, August 31, 2012 1:52 AM To: solr-user@lucene.apache.org Subject: Antwort: Re: Query during a query Thanks for the answer, but I want to know how I can do a seperate query before the main query. And I only want this data in my programm. The user won't see it. I need the values from one field to get some information from an external source while the main query is executed. pravesh suyalprav...@yahoo.com schrieb am 31.08.2012 07:42:48: Von: pravesh suyalprav...@yahoo.com An: solr-user@lucene.apache.org Datum: 31.08.2012 07:43 Betreff: Re: Query during a query Did you checked SOLR Field Collapsing/Grouping. http://wiki.apache.org/solr/FieldCollapsing http://wiki.apache.org/solr/FieldCollapsing If this is what you are looking for. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/ Query-during-a-query-tp4004624p4004631.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configure logging with Solr 4 on Tomcat 7
Drop the logging.properties file into the solr.war at WEB-INF/classes . See here: http://lucidworks.lucidimagination.com/display/solr/Configuring+Logging Section Tomcat Logging Settings Cheers, Chantal Am 27.08.2012 um 16:43 schrieb Nicholas Ding: Hello, I've deployed Solr 4 on Tomcat 7, it is a multicore configuration, everything seems work fine, but I can't see any logs. How do I enable logging? Thanks Nicholas
Re: Solr Index problem
Are you committing? You have to commit for them to be actually added…. If DIH says it did not add any documents (added 0 documents) committing won't help. Likely, there is a problem with the mapping between DIH and the schema so that none of the fields make it into the index. We would need the DIH and the schema file, as Andy pointed out already. Cheers, Chantal -Original Message- From: ranmatrix S [mailto:ranmat...@gmail.com] Sent: Thursday, August 23, 2012 5:46 PM To: solr-user@lucene.apache.org Subject: Solr Index problem Hi, I have setup Solr to index data from Oracle DB through DIH handler. However through Solr admin I could see the DB connection is successfull, data retrieved from DB to Solr but not added into index. The message is that 0 documents added even when I am able to see that 9 records are returned back. The schema and fields in db-data-config.xml are one and the same. Please suggest if anything I should look for. -- Regards, Ran...
Re: Debugging DIH
I don't see that you have anything in the DIH that tells what columns from the query go into which fields in the index. You need something like field name=location column=location / field name=amount column=amount / field name=when column=when / That is not completely true. If the columns have the same names as the fields, the mapping is redundant. Nevertheless, it might be the problem. What I've experienced with Oracle, at least, is that the columns would be returned in uppercase even if my alias would be in lowercase. You might force it by adding quotes, though. Or try adding field name=location column=LOCATION / field name=amount column=AMOUNT / field name=when column=WHEN / You might check in your preferred SQL client how the column names are returned. It might be an indicator. (At least, in my case they would be uppercase in SQL Developer.) Cheers, Chantal
Re: Solr contribs build and jar-of-jars
Hi Lance, does this do what you want? http://maven.apache.org/plugins/maven-assembly-plugin/descriptor-refs.html#jar-with-dependencies It's maven but that would be an advantage I'd say… ;-) Chantal Am 05.08.2012 um 01:25 schrieb Lance Norskog: Has anybody tried packaging the contrib distribution jars in the jar-of-jars format? Or merging all included jars into one super-jar? The OpenNLP contrib has a Lucene analyze, 3 external jars, and Solr classes. Packaging this sucker is proving painful in the extreme. UIMA has the same problem. 'ant' has a task for generating the manifest class path for a jar-of-jars, and the technique actually works: http://ant.apache.org/manual/Tasks/manifestclasspath.html http://stackoverflow.com/questions/858766/generate-manifest-class-path-from-classpath-in-ant http://grokbase.com/t/ant/user/0213wdmn51/building-a-fileset-dynamically#20020103j47ufvwooklrovrjfdvirgohe4 If this works completely, it seems like the right way to build the dist/ jars for the contribs. -- Lance Norskog goks...@gmail.com
Re: How to update a core using SolrJ
Hi Roy, the example URL is correct if your core is available under that name (configured in solr.xml) and has started without errors. I think I observed that it makes a different whether there is a trailing slash or not (but that was a while ago, so maybe that has changed). If you can reach that URL via browser but SolrJ with exactly the same URL cannot, then - maybe the SolrJ application is running in a different environment? - there is authentication setup and you are authenticated via browser but SolrJ does not know of it - ...? Some log output would be definitely helpful. Cheers, Chantal Am 02.08.2012 um 22:42 schrieb Benjamin, Roy: I'm using SolrJ and CommonsHttpSolrServer. Before moving to multi-core configuration I constructed CommonsHttpSolrServer from http://localhost:8080/solr;, this worked fine. Now I have two cores. I have tried contructing CommonsHttpSolrServer from http://localhost:8080/solr/core0; but this does not work. The resource is not found when I try to add docs. How do I update Solr using SolrJ in a multi-core configuration? What is the correct form for the CommonsHttpSolrServer URL? Thanks! Roy
Re: matching with whole field
Hi Elisabeth, try adding the same tokenizer chain for query, as well, or simply remove the type=index from the analyzer element. Your chain is analyzing the input of the indexer and removing diacritics and lowercasing. With your current setup, the input to the search is not analyzed likewise so inputs that are not lowercased or contain diacritics will not match. You might want to use the analysis frontend in the Admin UI to see how input to the indexer and the searcher is transformed and matched. Cheers, Chantal Am 02.08.2012 um 09:56 schrieb elisabeth benoit: Hello, I am using Solr 3.4. I'm trying to define a type that it is possible to match with only if request contains exactly the same words. Let's say I have two different values for ONLY_EXACT_MATCH_FIELD ONLY_EXACT_MATCH_FIELD: salon de coiffure ONLY_EXACT_MATCH_FIELD: salon de coiffure pour femmes I would like to match only with the first ont when requesting Solr with fq=ONLY_EXACT_MATCH_FIELD:(salon de coiffure) As far has I understood, the solution is to do not tokenize on white spaces, and use instead solr.KeywordTokenizerFactory My actual type is defined as followed in schema.xml fieldType name=ONLY_EXACT_MATCH_FIELD class=solr.TextField omitNorms=true positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.LengthFilterFactory min=1 max=100 / /analyzer /fieldType But matching with fields with more then one word doesn't work. Does someone have a clue what I am doing wrong? Thanks, Elisabeth
Re: Solr upgrade from 1.4 to 3.6
Hi Kalyan, that is becouse SolrJ uses javabin as format which has class version numbers in the serialized objects that do not match. Set the format to XML (wt parameter) and it will work (maybe JSON would, as well). Chantal Am 31.07.2012 um 20:50 schrieb Manepalli, Kalyan: Hi all, We are trying to upgrade our solr instance from 1.4 to 3.6. We use SolrJ API to fetch the data from index. We see that SolrJ 3.6 version is not compatible with index generated with 1.4. Is this known issue and is there a workaround for this. Thanks, Kalyan Manepalli
Re: Rebuild index after database change
Hi Rodrigo, the data will only show in SOLR if the index is built after the data has been committed to the database it reads the data from. If the data does not show up in the index there could be several reasons why that is: a) different database b) permissions prevent that the data is visible (I think this unlikly as it does not seem from your description that it is restricted to certain tables/columns that are not seen at all) c) the data inserts and updates have not been committed when the data is being requested by SolrJ d) SolrJ requests the data before the new data has been inserted/updated and committed (well c) and d) are quite similar but in essence could be different) It might be difficult to start SolrJ with a cron job if the database is updated at irregular times. Better might be to trigger the indexer (the SolrJ job that updates the SOLR index) either: - from the database: you would have to check how the indexer is started and this script you would have to call from the database via trigger or callback or similar. Depends obviously on the possibilities your db offers and might also be not the best if there are several db instances and no defined master. - via polling: an often running cron that checks whether new data has been imported, and if so starts the indexer. Hope this was of some help. If not you might have to provide more details on how the indexer is started, at the moment. Cheers, Chantal Am 31.07.2012 um 04:18 schrieb Rodrigo P. Bregalanti: Hello, I am working on a Data warehouse project, which is making huge modifications at the database level directly. It is working fine, and everything is going on, but there is a third party application reading data from one of this databases. This application is using Solrj (with embedded server) and it is resulting in a big issue: the new data inserted directly into the database is not being showed by this application. I have researched a lot around that, but didn't find any way to make this new data available to this particular third party application. Is that something possible to do? Have someone faced this kind of issue before? Please let me know if I should put some additional detail. Thanks in advance. Best regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Rebuild-index-after-database-change-tp3998257.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Rebuild index after database change
Hi Rodrigo, as I understand it, you know where the index is located. If you can also find out where the configuration files are - if they are simply accessible on the FS - you could start a regular SOLR server that simply uses that config directory as SOLR_HOME (it will automatically use the correct data dir, or you can make sure it does by providing that data directory when you start the server). To find the config files you would be looking for these files, ideally in the following directory structure: single index: -conf/schema.xml -conf/solrconfig.xml multiple indexes (cores): -solr.xml -core1/ -conf/schema.xml -conf/solrconfig.xml -more core dirs … There is no dedicated Wiki page describing the structure but this one might help: http://wiki.apache.org/solr/CoreAdmin Also the examples in the sources reflect mostly the conventional structure except that the directory collection1 would be omitted: https://builds.apache.org/job/Solr-trunk/ws/checkout/solr/example/solr/collection1/conf/ SOLR_HOME in this case is solr/ which contains conf/ If they would be structured in that way you could simply start a regular solr.war as described in the SOLR documentation (e.g. via Jetty: http://wiki.apache.org/solr/SolrJetty), pointing it to that directory as SOLR_HOME. You would then have a running SOLR Server that you could update the way you suggested in your previous response. When in doubt about the configuration, include the content of the dataDir element in solrconfig.xml in the next posting. Also, describe the structure of the config directory and which files it includes. It might help. Or describe how the application initializes the embedded server. If the files are not located in the FS but somewhere inside the application where you cannot easily reference them when starting Jetty or if it is configured programmatically, you would have to create your own configuration directory by copying/creating them in the expected structure. The dataDir entry in solrconfig.xml has to point to the same data directory as the embedded solr server uses. Hu, I hope I was clear enough. Please ask if not. Cheers, Chantal Am 31.07.2012 um 11:23 schrieb Rodrigo P. Bregalanti: Hi Chantal, thanks for replying. It is very helpfull, and I think I am in the right path. As the database is not changed during the night, my idea is to add a cron job to re-index that at this time. The main problem is there is no separate service indexing the data. The applicaton is using Java+Grails and a Grails plugin for Solr, which integrates to the grails domain classes. When a domain class is saved through the application, this plugins add/remove that from the index automatically. After some research I noticed that if there was a separate Solr server, I could post a call using HTTP to the Solr server, but the application is using an embedded server. I have found in the file system where the solr data and indexes are being saved, but I don not know if Solrj has some utility class that could be called from command line using those files, in a cron job schedule. Do you know if it is possible? Best regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Rebuild-index-after-database-change-tp3998257p3998311.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Skip first word
Hi Simone, no I meant that you populate the two fields with the same input - best done via copyField directive. The first field will contain ngrams of size 1 and 2. The other field will contain ngrams of size 3 and longer (you might want to set a decent maxsize there). The query for the autocomplete list uses the first field when the input (typed in by the user) is one or two characters long. Your example was: D, G, or than Do or Ga. The result would search only on the single token field that contains for the input Dolce Gabbana only the ngrams D and Do. So, only the input D or Do would result in a hit on Dolce Gabbana. Once the user has typed in the third letter: Dol or Gab, you query the second, more tokenized field which would contain for Dolce Gabbana the ngrams Dol Dolc Dolce Gab Gabb Gabba etc. Both inputs Gab and Dol would then return Dolce Gabbana. 1. First field type: tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=2 side=front/ 2. Secong field type: tokenizer class=solr.WhitespaceTokenizerFactory/ !-- maybe add WordDelimiter etc. -- filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=10 side=front/ 3. field declarations: field name=short_prefix type=short_ngram … / field name=long_prefix type=long_ngram … / copyField source=short_prefix dest=long_prefix / Chantal Am 27.07.2012 um 11:05 schrieb Finotti Simone: Hi Chantal, if I understand correctly, this implies that I have to populate different fields according to their lenght. Since I'm not aware of any logical condition you can apply to copyField directive, it means that this logic has to be implementend by the process that populates the Solr core. Is this assumption correct? That's kind of bad, because I'd like to have this kind of rules in the Solr configuration. Of course, if that's the only way... :) Thank you Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com] Inviato: giovedì 26 luglio 2012 18.32 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word Hi, use two fields: 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for inputs of length 3, 2. the other one tokenized as appropriate with minsize=3 and longer for all longer inputs Cheers, Chantal Am 26.07.2012 um 09:05 schrieb Finotti Simone: Hi Ahmet, business asked me to apply EdgeNGram with minGramSize=1 on the first term and with minGramSize=3 on the latter terms. We are developing a search suggestion mechanism, the idea is that if the user types D, the engine should suggest Dolce Gabbana, but if we type G, it should suggest other brands. Only if users type Gab it should suggest Dolce Gabbana. Thanks S Inizio: Ahmet Arslan [iori...@yahoo.com] Inviato: mercoledì 25 luglio 2012 18.10 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word is there a tokenizer and/or a combination of filter to remove the first term from a field? For example: The quick brown fox should be tokenized as: quick brown fox There is no such filter that i know of. Though, you can implement one with modifying source code of LengthFilterFactory or StopFilterFactory. They both remove tokens. Out of curiosity, what is the use case for this?
Re: Skip first word
Your're welcome :-) C
Re: querying using filter query and lots of possible values
Hi Daniel, index the id into a field of type tint or tlong and use a range query (http://wiki.apache.org/solr/SolrQuerySyntax?highlight=%28rangequery%29): fq=id:[200 TO 2000] If you want to exclude certain ids it might be wiser to simply add an exclusion query in addition to the range query instead of listing all the single values. You will run into problems with too long request urls. If you cannot avoid long urls you might want to increase maxBooleanClauses (see http://wiki.apache.org/solr/SolrConfigXml/#The_Query_Section). Cheers, Chantal Am 26.07.2012 um 18:01 schrieb Daniel Brügge: Hi, i am facing the following issue: I have couple of million documents, which have a field called source_id. My problem is, that I want to retrieve all the documents which have a source_id in a specific range of values. This range can be pretty big, so for example a list of 200 to 2000 source ids. I was thinking that a filter query can be used like fq=source_id:(1 2 3 4 5 6 .) but this reminds me of SQLs WHERE IN (...) which was always bit slow for a huge number of values. Another solution that came into my mind was to assigned all the documents I want to retrieve a new kind of filter id. So all the documents which i want to analyse get a new id. But i need to update all the millions of documents for this and assign them a new id. This could take some time. Do you can think of a nicer way to solve this issue? Regards greetings Daniel
Re: Skip first word
Hi, use two fields: 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for inputs of length 3, 2. the other one tokenized as appropriate with minsize=3 and longer for all longer inputs Cheers, Chantal Am 26.07.2012 um 09:05 schrieb Finotti Simone: Hi Ahmet, business asked me to apply EdgeNGram with minGramSize=1 on the first term and with minGramSize=3 on the latter terms. We are developing a search suggestion mechanism, the idea is that if the user types D, the engine should suggest Dolce Gabbana, but if we type G, it should suggest other brands. Only if users type Gab it should suggest Dolce Gabbana. Thanks S Inizio: Ahmet Arslan [iori...@yahoo.com] Inviato: mercoledì 25 luglio 2012 18.10 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word is there a tokenizer and/or a combination of filter to remove the first term from a field? For example: The quick brown fox should be tokenized as: quick brown fox There is no such filter that i know of. Though, you can implement one with modifying source code of LengthFilterFactory or StopFilterFactory. They both remove tokens. Out of curiosity, what is the use case for this?
Re: querying using filter query and lots of possible values
Hi Daniel, depending on how you decide on the list of ids, in the first place, you could also create a new index (core) and populate it with DIH which would select only documents from your main index (core) in this range of ids. When updating you could try a delta import. Of course, this is only worth the effort if that core would exist for some time - but you've written that the subset of ids is constant for a longer time. Just another idea on top ;-) Chantal
Re: SOLR 4.0-ALPHA : DIH : Indexed and Committed Successfully but Index is empty
Hi Hoss, Did you perhaps forget to include RunUpdateProcessorFactory at the end? What is that? ;-) I had copied the config from http://wiki.apache.org/solr/UpdateRequestProcessor but removed the lines I thought I did not need. :-( I've changed my configuration, and this is now WORKING (4.0-ALPHA): updateRequestProcessorChain name=emptyFieldChain processor class=solr.RemoveBlankFieldUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=update.chainemptyFieldChain/str str name=configdata-config.xml/str str name=cleantrue/str str name=committrue/str str name=optimizetrue/str /lst /requestHandler LogUpdateProcessorFactory and RunUpdateProcessorFactory were missing. Once those are in place, DIH does commit and optimize and the documents are visible immediately to the WebGUI and the Searchers. It would be nice to have those two factories mentioned as required (or when you would want them in the config) in the example solrconfig.xml and the wiki: https://builds.apache.org/job/Solr-trunk/ws/checkout/solr/example/solr/collection1/conf/solrconfig.xml and http://wiki.apache.org/solr/UpdateRequestProcessor Javadoc of RunUpdateProcessorFactory doesn't tell me so much. I can have a look at the source and try to write something wise on the wiki about it. :-D Thanks Hoss! Chantal I'm going to update the other thread with how to handle empty number fields with either 3.6.1 or 4.0 http://lucene.472066.n3.nabble.com/NumberFormatException-while-indexing-TextField-with-LengthFilter-and-then-copying-to-tfloat-td3996250.html
Re: Invalid or unreadable WAR file : .../solr.war when starting solr 3.6.1 app on Tomcat 7 (upgrade to 7.0.29/upstream)?
HI, I haven't been following from the beginning but am still curious: is the war file on a shared fs? See also: http://www.mail-archive.com/users@tomcat.apache.org/msg79555.html http://stackoverflow.com/questions/5493931/java-lang-illegalargumentexception-invalid-or-unreadable-war-file-error-in-op If you have installed Tomcat via package manager you might want to install directly by simple unpacking the apache-tomcat-{verison}.tar.gz and copying the solr.war file into the /webapps/ subdirectory. What the answers in the stackoverflow thread suggest is packaging something into the solr.war. You could add the logging.properties file (JULI config) under WEB-INF/classes/ - I would recommend that anyway. (I never had problems with a clean solr.war in Tomcat (5,6,7), though.) Chantal Am 24.07.2012 um 19:50 schrieb Chris Hostetter: : I removed distro pacakged Tomcat from the eqaation, ... : replacing it with an upstream instance ... : Repeating the process, at attempt to 'start' the /solr webapp, there's : no change. I still get ... : java.lang.IllegalArgumentException: Invalid or unreadable WAR : file : /srv/solr_home/solr.war Are you sure you didn't accidently corrupt the war file in some way? what is the md5 or sha1sum of the war file you have? does jar tf solr.war give you any errors? .. I just used the following steps (basically the same as yours just different paths) and got solr running in tomcat 7.0.29 with no problems hossman@frisbee:/var/tmp$ ls -al total 110188 drwxrwxrwt 2 rootroot 4096 Jul 24 10:37 . drwxr-xr-x 13 rootroot 4096 Jul 18 09:34 .. -rw-rw-r-- 1 hossman hossman 105132366 Jul 24 10:29 apache-solr-4.0.0-ALPHA.tgz -rw-rw-r-- 1 hossman hossman 7679160 Jul 3 04:25 apache-tomcat-7.0.29.tar.gz -rw-rw-r-- 1 hossman hossman 183 Jul 24 10:29 solr-context-file.xml hossman@frisbee:/var/tmp$ tar -xzf apache-solr-4.0.0-ALPHA.tgz hossman@frisbee:/var/tmp$ tar -xzf apache-tomcat-7.0.29.tar.gz hossman@frisbee:/var/tmp$ cp -r apache-solr-4.0.0-ALPHA/example/solr solr-home hossman@frisbee:/var/tmp$ cp apache-solr-4.0.0-ALPHA/dist/apache-solr-4.0.0-ALPHA.war solr.war hossman@frisbee:/var/tmp$ sha1sum solr.war 51c9e4bf6f022ea3873ee315eb08a96687e71964 solr.war hossman@frisbee:/var/tmp$ md5sum solr.war a154197657bb5cb9cee13fb5cfca931b solr.war hossman@frisbee:/var/tmp$ cat solr-context-file.xml Context docBase=/var/tmp/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/var/tmp/solr-home override=true / /Context hossman@frisbee:/var/tmp$ mkdir -p apache-tomcat-7.0.29/conf/Catalina/localhost/ hossman@frisbee:/var/tmp$ cp solr-context-file.xml apache-tomcat-7.0.29/conf/Catalina/localhost/solr.xml hossman@frisbee:/var/tmp$ ./apache-tomcat-7.0.29/bin/catalina.sh start Using CATALINA_BASE: /var/tmp/apache-tomcat-7.0.29 Using CATALINA_HOME: /var/tmp/apache-tomcat-7.0.29 Using CATALINA_TMPDIR: /var/tmp/apache-tomcat-7.0.29/temp Using JRE_HOME:/usr/lib/jvm/java-6-openjdk-amd64/ Using CLASSPATH: /var/tmp/apache-tomcat-7.0.29/bin/bootstrap.jar:/var/tmp/apache-tomcat-7.0.29/bin/tomcat-juli.jar hossman@frisbee:/var/tmp$ ...and now solr is up and running on http://localhost:8080/solr/ and there are no errors in the logs. -Hoss
Re: NumberFormatException while indexing TextField with LengthFilter and then copying to tfloat
Here are the working solutions for: 3.6.1 (or lower probably) via ScriptTransformer in data-config.xml: function prepareData(row) { var cols = new java.util.ArrayList(); cols.add(spent_hours); cols.add(estimated_hours); cols.add(story_points); cols.add(pos); for (var i=0; icols.size(); i++) { var no = row.get(cols.get(i)); if (no != null no.trim().length() == 0) { row.remove(cols.get(i)); } } return row; } In the XPathEntityProcessor, add the ScriptTransformer: transformer=script:prepareData,… XPATHs: field column=spent_hours xpath=/issues/issue/spent_hours / field column=estimated_hours xpath=/issues/issue/estimated_hours / field column=story_points xpath=/issues/issue/story_points / field column=pos xpath=/issues/issue/position / All of these fields are of type tfloat, required=false. They will only get a value if it is not empty or null. 4.0-ALPHA ** No ScriptTransformer required, XPATH as above, same field type, required=false. In the dataimporthandler configuration section in solrconfig.xml specify: updateRequestProcessorChain name=emptyFieldChain processor class=solr.RemoveBlankFieldUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=update.chainemptyFieldChain/str str name=configdata-config.xml/str str name=cleantrue/str str name=committrue/str str name=optimizetrue/str /lst /requestHandler
Re: Autocomplete terms from the middle of name/description of a Doc
Hi Ugo, You can use facet.prefix on a tokenized field instead of a String field. Example: field name=product type=string … / field name=product_tokens type=text_split … /!-- use e.g. WhitespaceTokenizer or WordDelimiter and others, see example schema.xml that comes with SOLR -- facet.prefix on product will only return hits that match the start of the single token stored in that field. As product_tokens contains the value of product tokenized in a fashion that suites you, it can contain multiple tokens. facet.prefix on product_tokens will return hits that match *any* of these tokens - which is what you want. Chantal Am 25.07.2012 um 15:29 schrieb Ugo Matrangolo: Hi, I'm working on making our autocomplete engine a bit more smart. The actual impl is a basic facet based autocompletion as described in the 'SOLR 3 Enterprise Search' book: we use all the typed tokens except the last one to build a facet.prefix query on an autocomplete facet field we built at index time. This allows us to have something like: 'espress' -- '*espress*o machine', '*espress*o maker', etc We want something like: 'espress' - '*espress*o machine', '*espress*o maker', 'kMix *espress*o maker' Note that the last suggested term could be not obtained by quering on the facet prefix as we do now. What we need is a way to find the 'espress' string in the middle of the name/description of our products. Any suggestions ? Cheers, Ugo
Re: Autocomplete terms from the middle of name/description of a Doc
Suppose I have a product with a title='kMix Espresso maker'. If I tokenize this and put the result in product_tokens I should get '[kMix][Espresso][maker]'. If now I try to search with facet.field='product_tokens' and facet.prefix='espresso' I should get only 'espresso' while I want 'kMix Espresso maker'. Yes, you are probably right. I did use this approach at somepoint. Your remark has made me check my code again. I was using n_gram in the end. (facet.prefix on tokenized fields might work in certain circumstances where you can get the actual value from the string field (or its facet) in parallel.) This is the jquery autocomplete plugin instantiation: $(function() { $(#qterm).autocomplete({ minLength: 1, source: function(request,response) { jQuery.ajax({ url: /solr/select, dataType: json, data: { q : title_ngrams:\ + request.term + \, rows: 0, facet: true, facet.field: title, facet.mincount: 1, facet.sort: index, facet.limit: 10, fq: end_date:[NOW TO *] wt: json }, success: function( data ) { /*var result = jQuery.map( data.facet_counts.facet_fields.title, function( item, index ) { if (index%2) return null; else return { //label: item, value: item } });*/ var result = []; var facets = data.facet_counts.facet_fields.title; var j = 0; for (i=0; ifacets.length; i=i+2) { result[j] = facets[i]; j = j+1; } response(result); } }); } }); And here the fieldtype ngram for title_ngram. title is a string type field. !-- NGram configuration for searching for wordparts without the use of wildcards. This is for suggesting search terms e.g. sourcing an autocomplete widget. -- fieldType name=ngram class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LengthFilterFactory min=1 max=500 / filter class=solr.TrimFilterFactory / filter class=solr.ISOLatin1AccentFilterFactory / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateAll=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front/ filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.ISOLatin1AccentFilterFactory / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateAll=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Hope this one gets you going… Chantal
Re: NumberFormatException while indexing TextField with LengthFilter and then copying to tfloat
Hi Hoss, thank you for the quick response and the explanations! My suggestion would be to modify the XPath expression you are using to pull data out of your original XML files and ignore estimated_hours/ I don't think this is possible. That would include text() in the XPath which is not handled by the XPathRecordReader. I've checked in the code, as well, and the JavaDoc does not list this possibility. I've tried those patterns: /issues/issue/estimated_hours[text()] /issues/issue/estimated_hours/text() No value at all will be added for that field for any of the documents (including those that do have a value in the XML). Alternatively: there are some new UpdateProcessors available in 4.0 that let you easily prune field values based on various criteria (update porcessors happen well before copyField)... http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html Thanks for pointing me to it. I've switched to 4.0.0-ALPHA (hoping, the ALPHA doesn't show itself too often ;-) ). For anyone interested, my DataImportHandler Setup in solrconfig.xml now reads: updateRequestProcessorChain name=emptyFieldChain processor class=solr.RemoveBlankFieldUpdateProcessorFactory / /updateRequestProcessorChain requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=update.chainemptyFieldChain/str str name=configdata-config.xml/str str name=cleantrue/str str name=committrue/str str name=optimizetrue/str /lst /requestHandler Works as expected! And kudos to those working on the admin frontend, as well! The new admin is indeed slick! But i can certainly understand the confusion, i've opened SOLR-3657 to try and improve on this. Ideally the error message should make it clear that the value from source field was copied to dest field which then encountered error Thank you! Good Exception messages are certainly helpful! Chantal
SOLR 4.0-ALPHA : DIH : Indexed and Committed Successfully but Index is empty
Hi there, sorry for the length - it is mostly (really) log output. The basic issue is reflected in the subject: DIH runs fine, but even with an extra optimize on top (which should not be necessary given my DIH config) the index remains empty. (I have changed from 3.6.1 to 4.0-ALPHA because of Hoss' answer to my question NumberFormatException while indexing TextField with LengthFilter (on this same list). I had an index setup with 4.0-ALPHA today, I could verify that Hoss' suggestion works. But now, I seem not to be able to get that index filled yet another time. SOLR runs inside Jetty which is started via mvn jetty:run-war. SOLR_HOME is set to a subdirectory of maven's target dir. I have been using this setup successfully with SOLR 3.* for some time, now. While configuring the index, I often do a mvn clean; mvn jetty:run-war so SOLR_HOME including the index is completely removed and recreated from scratch.) After running a full import of DIH on core issues using: http://localhost:9090/solr/issues/dataimport?command=full-importimportfile=/absolute/path/to/issues.xml I get the response: response lst name=responseHeader int name=status0/int int name=QTime1/int /lst lst name=initArgs lst name=defaults str name=update.chainemptyFieldChain/str str name=configdata-config.xml/str str name=cleantrue/str str name=committrue/str str name=optimizetrue/str /lst /lst str name=statusidle/str str name=importResponse/ lst name=statusMessages str name=Total Requests made to DataSource0/str str name=Total Rows Fetched294/str str name=Total Documents Skipped0/str str name=Full Dump Started2012-07-24 15:46:27/str str name= Indexing completed. Added/Updated: 294 documents. Deleted 0 documents. /str str name=Committed2012-07-24 15:46:28/str str name=Optimized2012-07-24 15:46:28/str str name=Total Documents Processed294/str str name=Time taken0:0:0.605/str /lst str name=WARNING This response format is experimental. It is likely to change in the future. /str /response Meaning that everything went fine including commit and optimize and the index should now contain 294 documents. Well, it doesn't. Trying to get it working again, I have now replaced large parts of my solrconfig.xml with the new parts taken from the current 4.0-ALPHA (https://builds.apache.org/job/Solr-trunk/ws/checkout/) but this doesn't change a thing. The schema version is set to 1.5. When starting the server it outputs: 24.07.2012 16:00:16 org.apache.solr.core.SolrCore init INFO: [issues] Opening new SolrCore at target/classes/core_issues/, dataDir=target/classes/core_issues/data/ … 24.07.2012 16:00:16 org.apache.solr.core.SolrCore getNewIndexDir WARNUNG: New index directory detected: old=null new=target/classes/core_issues/data/index/ 24.07.2012 16:00:16 org.apache.solr.core.SolrCore initIndex WARNUNG: [issues] Solr index directory 'target/classes/core_issues/data/index' doesn't exist. Creating new index... 24.07.2012 16:00:16 org.apache.solr.core.SolrDeletionPolicy onCommit INFO: SolrDeletionPolicy.onCommit: commits:num=1 commit{dir=/path/to/maven-project/target/classes/core_issues/data/index,segFN=segments_1,generation=1,filenames=[segments_1] 24.07.2012 16:00:16 org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1 … 24.07.2012 16:00:16 org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@920ab60 main 24.07.2012 16:00:16 org.apache.solr.core.SolrCore registerSearcher INFO: [issues] Registered new searcher Searcher@920ab60 main{StandardDirectoryReader(segments_1:1)} 24.07.2012 16:00:16 org.apache.solr.update.CommitTracker init INFO: Hard AutoCommit: if uncommited for 15000ms; 24.07.2012 16:00:16 org.apache.solr.update.CommitTracker init INFO: Soft AutoCommit: disabled 24.07.2012 16:00:16 org.apache.solr.handler.dataimport.DataImportHandler processConfiguration INFO: Processing configuration from solrconfig.xml: {update.chain=emptyFieldChain,config=data-config.xml,clean=true,commit=true,optimize=true} 24.07.2012 16:00:16 org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully 24.07.2012 16:00:16 org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to Searcher@920ab60 main{StandardDirectoryReader(segments_1:1)} 24.07.2012 16:00:16 org.apache.solr.core.CoreContainer register INFO: registering core: issues When running the DIH full import, the log output is: 24.07.2012 16:00:31 org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import 24.07.2012 16:00:31 org.apache.solr.core.SolrCore execute INFO: [issues] webapp=/solr path=/dataimport params={command=full-importimportfile=/path/to/maven-project/src/test/resources/issues.xml} status=0 QTime=4 24.07.2012 16:00:31 org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties WARNUNG: Unable to read: dataimport.properties 24.07.2012 16:00:32 org.apache.solr.handler.dataimport.DocBuilder
NumberFormatException while indexing TextField with LengthFilter and then copying to tfloat
Hi all, I'm trying to index float values that are not required, input is an XML file. I have problems avoiding the NFE. I'm using SOLR 3.6. Index input: - XML using DataImportHandler with XPathProcessor Data: Optional, Float, CDATA like: estimated_hours2.0/estimated_hours or estimated_hours/ Original Problem: Empty values would cause a NumberFormatException when being loaded directly into a tfloat type field. Processing chain (to avoid NFE): via XPath loaded into a field of type text with a trim and length filter, then via copyField directive into the tfloat type field data-config.xml: field column=s_estimated_hours xpath=/issues/issue/estimated_hours / schema.xml: types... fieldtype name=text_not_empty class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LengthFilterFactory min=1 max=20 / /analyzer /fieldtype /types fields... field name=estimated_hours type=tfloat indexed=true stored=true required=false / field name=s_estimated_hours type=text_not_empty indexed=false stored=false / /fields copyField source=s_estimated_hours dest=estimated_hours / Problem: Well, yet another NFE. But this time reported on the text field s_estimated_hours: WARNUNG: Error creating document : SolrInputDocument[{id=id(1.0)={2930}, s_estimated_hours=s_estimated_hours(1.0)={}}] org.apache.solr.common.SolrException: ERROR: [doc=2930] Error adding field 's_estimated_hours'='' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:66) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:723) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426) Caused by: java.lang.NumberFormatException: empty String at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:992) at java.lang.Float.parseFloat(Float.java:422) at org.apache.solr.schema.TrieField.createField(TrieField.java:410) at org.apache.solr.schema.FieldType.createFields(FieldType.java:289) at org.apache.solr.schema.SchemaField.createFields(SchemaField.java:107) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:312) ... 11 more It is like it would copy the empty value - which must not make it through the LengthFilter of s_estimated_hours - to the tfloat field estimated_hours anyway. How can I avoid this? Or is there any other way to make the indexer ignore the empty values when creating the tfloat fields? If it could at least create the document and enter the other values… (onError=continue is not helping as this is only a Warning (I've tried)) BTW: I did try with the XPath that should only select those nodes with text: /issues/issue/estimated_hours[text()] The result was that no values would make it into the tfloat fields while all documents would be indexed without warnings or errors. (I discarded this option thinking that the xpath was not correctly evaluated.) Thank you for any suggestions! Chantal
Re: Velocity substring issue
Hi Henri, you have not provided very much information, so, here comes a guess: try ${bdte1} instead of $bdte1 - maybe Velocity resolves $bdte and concatenates 1 instead of trying the longer value as variable, first. Chantal On Wed, 2012-03-28 at 12:04 +0200, henri.gour...@laposte.net wrote: The following code fails on the $bdte1 substring. Both $bdte and $bdte1 appear to be identical! triggers the following error message: The problem persiste with various values of the indices. Am I missing something? -- View this message in context: http://lucene.472066.n3.nabble.com/Velocity-substring-issue-tp3864088p3864088.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem witch adding classpath
Hi, I put all those jars into SOLR_HOME/lib. I do not specify them in solrconfig.xml explicitely, and they are all found all right. Would that be an option for you? Chantal On Thu, 2012-03-15 at 17:43 +0100, ViruS wrote: Hello, I just now try to switch from 3.4.0 to 3.5.0 ... i make new instance and when I try use same config for adding libaries i have error. SEVERE: java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenStream This error only show when i use polish stempel. In config i have set (solr/vrs/conf/solrconfig.xml): lib path=../../../dist/lucene-stempel-3.5.0.jar / lib path=../../../dist/apache-solr-analysis-extras-3.5.0.jar / When I start Solr is adding path: INFO: Adding specified lib dirs to ClassLoader 2012-03-15 17:35:51 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/virus/appl/apache-solr-3.5.0/dist/lucene-stempel-3.5.0.jar' to classloader 2012-03-15 17:35:51 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/virus/appl/apache-solr-3.5.0/dist/apache-solr-analysis-extras-3.5.0.jar' to classloader Same problem I have witch Velocity In config (solr/ac/conf/solrconfig.xml: lib dir=../../../contrib/velocity/lib / ... queryResponseWriter name=velocity class=solr.VelocityResponseWriter enable=true/ When I satrt have this error: SEVERE: org.apache.solr.common.SolrException: Error Instantiating QueryResponseWriter, solr.VelocityResponseWriter is not a org.apache.solr.response.QueryResponseWriter INFO: Adding specified lib dirs to ClassLoader 2012-03-15 17:40:17 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/virus/appl/apache-solr-3.5.0/contrib/velocity/lib/velocity-tools-2.0.jar' to classloader Full start log here: http://piotrsikora.pl/solr.log Thanks in advanced!
Re: 400 Error adding field 'tags'='[a,b,c]'
Hi Alp, if you have not changed how SOLR logs in general, you should find the log output in the regular server logfile. For Tomcat you can find this in TOMCAT_HOME/catalina.out (or search for that name). If there is a problem with your schema, SOLR should be complaining about it during application/server start up. It would definitely print something if there is a field declared in your schema but cannot be initialized for some reason. I don't think that the names of the fields themselves are the problem. I never had an issue with the field name 'name'. Cheers, Chantal On Wed, 2012-03-14 at 02:53 +0100, jlark wrote: Interestingly I'm getting this on other fields now. I have the fieldfield name=name type=text_general indexed=true stored=true / which is copied to text copyField source=name dest=text/ and my text field is simplyfield name=text type=text_general indexed=true stored=true / I'm feedin my test document {url : TestDoc2, title : another test, ptag:[a,b],name:foo bar}, and when I try to feed I get. HTTP request sent, awaiting response... 400 ERROR: [doc=TestDoc2] Error adding field 'name'='foo bar' If I remove the field from the document though it works fine. I'm wondering if there is a set of reserved names that I'm using at this point. Jus twhish there was a way to get more helpfull error messages. Thanks for the help. Alp -- View this message in context: http://lucene.472066.n3.nabble.com/400-Error-adding-field-tags-a-b-c-tp3823853p3824126.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: sun-java6 alternatives for Solr 3.5
You can download Oracle's Java (which was Sun's) from Oracle directly. You will have to create an account with them. You can use the same account for reading the java forum and downloading other software like their famous DB. Simply download. JDK6 is still a binary as were all Sun packages before. Do a chmod +x and run it. You have to accept the license, and then it unpacks itself in that same directory - no root privileges required. As of JDK 7 you can download tar.gz packages. http://www.oracle.com/technetwork/java/javase/downloads/index.html Actually, you're better of downloading and installing by yourself because you can have several different versions in parallel and the automatic updates do not override your installed version. That comes in handy if you are a Java developer, at least... Cheers, Chantal On Mon, 2012-02-27 at 21:38 +0100, Demian Katz wrote: For what it's worth, I run Solr 3.5 on Ubuntu using the OpenJDK packages and I haven't run into any problems. I do realize that sometimes the Sun JDK has features that are missing from other Java implementations, but so far it hasn't affected my use of Solr. - Demian -Original Message- From: ku3ia [mailto:dem...@gmail.com] Sent: Monday, February 27, 2012 2:25 PM To: solr-user@lucene.apache.org Subject: sun-java6 alternatives for Solr 3.5 Hi all! I had installed an Ubuntu 10.04 LTS. I had added a 'partner' repository to my sources list and updated it, but I can't find a package sun-java6-*: root@ubuntu:~# apt-cache search java6 default-jdk - Standard Java or Java compatible Development Kit default-jre - Standard Java or Java compatible Runtime default-jre-headless - Standard Java or Java compatible Runtime (headless) openjdk-6-jdk - OpenJDK Development Kit (JDK) openjdk-6-jre - OpenJDK Java runtime, using Hotspot JIT openjdk-6-jre-headless - OpenJDK Java runtime, using Hotspot JIT (headless) Than I had goggled and found an article: https://lists.ubuntu.com/archives/ubuntu-security-announce/2011- December/001528.html I'm using Solr 3.5 and Apache Tomcat 6.0.32. Please advice me what I must do in this situation, because I always used sun-java6-* packages for Tomcat and Solr and it worked fine Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/sun-java6- alternatives-for-Solr-3-5-tp3781792p3781792.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can this type of sorting/boosting be done by solr
Hi Ritesh, you could add another field that contains the size of the list in the AREFS field. This way you'd simply sort by that field in descending order. Should you update AREFS dynamically, you'd have to update the field with the size, as well, of course. Chantal On Thu, 2012-02-23 at 11:27 +0100, rks_lucene wrote: Hi, I have a journal article citation schema like this: { AT - article_title AID - article_id (Unique id) AREFS - article_references_list (List of article id's referred/cited in this article. Multi-valued) AA - Article Abstract --- other_article_stuff ... } So for example, in order to search for all those articles that refer(cite) article id 51643, I simply need to search for AREFS:51643 and it will give me the list of articles that have 51643 listed in AREFS. Now, I want to be able to search in the text of articles and sort the results by most referred articles. How can I do this ? Say if my search query is q=AT:metal and it gives me 1700 results. How can I sort 1700 results by those that have received maximum number of citations by others. I have been researching function queries to solve this but have been unable to do so. Thanks in advance. Ritesh -- View this message in context: http://lucene.472066.n3.nabble.com/Can-this-type-of-sorting-boosting-be-done-by-solr-tp3769315p3769315.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can this type of sorting/boosting be done by solr
Sorry to have misunderstood. It seems the new Relevance Functions in Solr 4.0 might help - unless you need to use an official release. http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions On Thu, 2012-02-23 at 13:04 +0100, rks_lucene wrote: Dear Chantal, Thanks for your reply, but thats not what I was asking. Let me explain. The size of the list in AREFS would give me how many records are *referred by* an article and NOT how many records *refer to* an article. Say if an article id - 51463 has been published in 2002 and refers to 10 articles dating from 1990-2002. Then the count of AREFS would be 10 which is static once the journal has been published. However if the same article is being *referred to* by 20 articles published from 2003-2012 then I am talking about this 20 count. This count is dynamic and as we keep adding records to the index, there are more articles that will refer to article 51463 it in their AREFS field in the future. /(Obviously when we are adding article 51463 to the index we have no clue who will be referring to it in the future, so we can have another field in it for this, nor can be update 51463 everytime someone refers to it)/ So today, if I want to know who all are referring to 51463, by actually searching for this id in the AREFS field. The query is as simple as q=AREFS:51463 and it will given the list of articles from 2003 to 2012 and the result count would be 20. So back to the question, say if my search query is q=AT:metal and it gives me 1700 results. How can I sort 1700 results by those that have received maximum number of citations (till date) by others. (i.e., that have maximum number of results if I individually search their ids in the AREFS field). Hope this makes it clear. I feel this is a sort/boost by function query candidate. But I am not able to figure it out. Thanks Ritesh -- View this message in context: http://lucene.472066.n3.nabble.com/Can-this-type-of-sorting-boosting-be-done-by-solr-tp3769315p3769475.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to loop through the DataImportHandler query results?
If your script turns out too complex to maintain, and you are developing in Java, anyway, you could extend EntityProcessor and handle the data in a custom way. I've done that to transform a datamart like data structure back into a row based one. Basically you override the method that gets the data in a Map and transform it into a different Map which contains the fields as understood by your schema. Chantal On Thu, 2012-02-16 at 14:59 +0100, Mikhail Khludnev wrote: Hi Baranee, Some time ago I played with http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer - it was a pretty good stuff. Regards On Thu, Feb 16, 2012 at 3:53 PM, K, Baraneetharan baraneethara...@hp.comwrote: To avoid that we don't want to mention the column names in the field tag , but want to write a query to map all the fields in the table with solr fileds even if we don't know, how many columns are there in the table. I need a kind of loop which runs through all the query results and map that with solr fileds.
Re: Frequent garbage collections after a day of operation
Make sure your Tomcat instances are started each with a max heap size that adds up to something a lot lower than the complete RAM of your system. Frequent Garbage collection means that your applications request more RAM but your Java VM has no more resources, so it requires the Garbage Collector to free memory so that the requested new objects can be created. It's not indicating a memory leak unless you are running a custom EntityProcessor in DIH that runs into an infinite loop and creates huge amounts of schema fields. ;-) Also - if you are doing hot deploys on Tomcat, you will have to restart the Tomcat instance on a regular bases as hot deploys DO leak memory after a while. (You might be seeing class undeploy messages in catalina.out and later on OutOfMemory error messages.) If this is not of any help you will probably have to provide a bit more information on your Tomcat and SOLR configuration setup. Chantal On Thu, 2012-02-16 at 16:22 +0100, Matthias Käppler wrote: Hey everyone, we're running into some operational problems with our SOLR production setup here and were wondering if anyone else is affected or has even solved these problems before. We're running a vanilla SOLR 3.4.0 in several Tomcat 6 instances, so nothing out of the ordinary, but after a day or so of operation we see increased response times from SOLR, up to 3 times increases on average. During this time we see increased CPU load due to heavy garbage collection in the JVM, which bogs down the the whole system, so throughput decreases, naturally. When restarting the slaves, everything goes back to normal, but that's more like a brute force solution. The thing is, we don't know what's causing this and we don't have that much experience with Java stacks since we're for most parts a Rails company. Are Tomcat 6 or SOLR known to leak memory? Is anyone else seeing this, or can you think of a reason for this? Most of our queries to SOLR involve the DismaxHandler and the spatial search query components. We don't use any custom request handlers so far. Thanks in advance, -Matthias
Re: MoreLikeThis Question
Hi, you would not want to include the unique ID and similar stuff, though? No idea whether it would impact the number of hits but it would most probably influence the scoring if nothing else. E.g. if you compare by certain fields, I would expect that a score of 1.0 indicates a match on all of those fields (haven't tested that explicitly, though). If the unique ID is included you could never reach that score. Just my 2 cents... Chantal On Wed, 2012-02-15 at 07:27 +0100, Jamie Johnson wrote: Is there anyway with MLT to say get similar based on all fields or is it always a requirement to specify the fields?
Re: Solr as an part of api to unburden databases
does anyone of the maillinglist users use solr as an API to avoid database queries? [...] Like in a... cache? Why not use a cache then? (memcached, for example, but there are more). Good point. A cache only uses lookup by one kind of cache key while SOLR provides lookup by ... well... any search configuration that your index setup (mainly the schema) supports. If the database queries always do a find by unique id, then use a cache. Otherwise using SOLR is a valid option. Chantal
Re: Error Indexing in solr 3.5
Hi, I've got these errors when my client used a different SolrJ version from the SOLR server it connected to: SERVER 3.5 responding --- CLIENT some other version You haven't provided any information on your client, though. Chantal On Wed, 2012-02-15 at 13:09 +0100, mechravi25 wrote: Hi, When I tried to index in solr 3.5 i got the following exception org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at com.quartz.test.FullImport.callIndex(FullImport.java:80) at com.quartz.test.GetObjectTypes.checkObjectTypeProp(GetObjectTypes.java:245) at com.quartz.test.GetObjectTypes.execute(GetObjectTypes.java:640) at com.quartz.test.QuartzSchedMain.main(QuartzSchedMain.java:55) Caused by: java.lang.RuntimeException: Invalid version or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:466) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) I placed the latest solrj 3.5 jar in the example/solr/lib directory and then re-started the same but still I am getting the above mentioned exception. Please let me know if I am missing anything. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-Indexing-in-solr-3-5-tp3746735p3746735.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet on TrieDateField field without including date
I've done something like that by calculating the hours during indexing time (in the script part of the DIH config using java.util.Calendar which gives you all those field values without effort). I've also extracted information on which weekday it is (using the integer constants of Calendar). If you need this only for one timezone it is straight forward but if the queries come from different time zones you'll have to shift appropriately. I found that pre-calculating has the advantage that you end up with very simple data: simple integers. And it makes it quite easy to build more complex queries on that. For example I have created a grid (build from facets) where the columns are the weekdays and the rows are the hours of day. The facets are created using a field containing the combination of weekday and hour of day. Chantal On Wed, 2012-02-15 at 15:49 +0100, Yonik Seeley wrote: On Wed, Feb 15, 2012 at 9:30 AM, Jamie Johnson jej2...@gmail.com wrote: I think it would if I indexed the time information separately. Which was my original thought, but I was hoping to store this in one field instead of 2. So my idea was I'd store the time portion as as a number (an int might suffice from 0 to 24 since I only need this to have that level of granularity) then do range queries over that. I couldn't think of a way to do this using the date field though because it would give me bins broken up by hours in a particular day, something like 2012-01-01-00:00:00 - 2012-01-01-01:00:00 10 2012-01-01-01:00:00 - 2012-01-01-02:00:00 20 2012-01-01-02:00:00 - 2012-01-01-03:00:00 5 But what I really want is just the time portion across all days 00:00:00 - 01:00:00 10 01:00:00 - 02:00:00 20 02:00:00 - 03:00:00 5 I would then use the date field to limit the time range in which the facet was operating. Does that make sense? Is there a more efficient way of doing this? Hmm, no there's no way to do this. Even if you were to write a custom faceting component, it seems like it would still be very expensive to derive the hour of the day from ms for every doc. -Yonik lucidimagination.com On Wed, Feb 15, 2012 at 9:16 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Feb 15, 2012 at 8:58 AM, Jamie Johnson jej2...@gmail.com wrote: I would like to be able to facet based on the time of day items are purchased across a date span. I was hoping that I could do a query of something like date:[NOW-1WEEK TO NOW] and then specify I wanted facet broken into hourly bins. Is this possible? Do I Will range faceting do everything you need? http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range -Yonik lucidimagination.com
Re: Stemming and accents (HunspellStemFilterFactory)
Hi Bráulio, I don't know about HunspellStemFilterFactory especially but concerning accents: There are several accent filter that will remove accents from your tokens. If the Hunspell filter factory requires the accents, then simply add the accent filters after Hunspell in your index and query filter chains. You would then have Hunspell produce the tokens as result of the stemming and only afterwards the accents would be removed (your example: 'forum' instead of 'fórum'). Do the same on the query side in case someone inputs accents. Accent filters are: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory (lowercases, as well!) http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory and others on that page. Chantal On Tue, 2012-02-14 at 14:48 +0100, Bráulio Bhavamitra wrote: Hello all, I'm evaluating the HunspellStemFilterFactory I found it works with a pt_PT dictionary. For example, if I search for 'fóruns' it stems it to 'fórum' and then find 'fórum' references. But if I search for 'foruns' (without accent), then HunspellStemFilterFactory cannot stem word, as it does' not exist in its dictionary. It there any way to make HunspellStemFilterFactory work without accents differences? best, bráulio
Re: indexing with DIH (and with problems)
On Thu, 2012-02-09 at 23:45 +0100, alessio crisantemi wrote: hi all, I would index on solr my pdf files wich includeds on my directory c:\myfile\ so, I add on my solr/conf directory the file data-config.xml like the following: dataConfig dataSource type=BinFileDataSource / document entity name=f dataSource=null rootEntity=false Why do you set rootEntity=false on the root entity? This looks odd to me - but I can be wrong, of course. If DIH shows this: str name=*Total Requests made to DataSource**0*/str DIH hasn't even retrieved any data from you data source. Check that the call you have configured really returns any documents. Chantal processor=FileListEntityProcessor baseDir=c:\myfile\ fileName=*.pdf recursive=true entity name=tika-test processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=text field column=author name=author meta=true/ field column=title name=title meta=true/ field column=content_type name=content_type meta=true/ /entity /entity /document /dataConfig before, I add this part into solr-config.xml: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configc:\solr\conf\data-config.xml/str /lst /requestHandler but this is the result: * * str name=*command**delta-import*/str * * str name=*status**idle*/str * * str name=*importResponse* / *-*http://pc-alessio:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=delta-import# lst name=*statusMessages* * * str name=*Time Elapsed**0:0:2.512*/str * * str name=*Total Requests made to DataSource**0*/str * * str name=*Total Rows Fetched**0*/str * * str name=*Total Documents Processed**0*/str * * str name=*Total Documents Skipped**0*/str * * str name=*Full Dump Started**2012-02-09 23:37:07*/str * * str name=***Indexing failed. Rolled back all changes.*/str * * str name=*Rolledback**2012-02-09 23:37:07*/str * * /lst * * str name=*WARNING**This response format is experimental. It is likely to change in the future.*/str * * /response suggestions? thanks alessio
Re: can solr automatically search for different punctuation of a word
Hi Alex, the dependency tag is used in the Maven project file (pom.xml). If you are not using Maven to build your project then simply skip that part. The important thing is that the ICU jar (lucene-icu) and the analysis extra jar (solr-analysis-extra) are in your classpath. See also Erick's answer in respond to your question. The folder for additional jar files in solr is: ${SOLR_HOME}/lib/ Cheers, Chantal On Tue, 2012-01-31 at 04:38 +0100, alx...@aim.com wrote: Hi Chantal, In the readme file at solr/contrib/analysis-extras/README.txt it says to add the ICU library (in lib/) Do I need also add dependecy... and where? Thanks. Alex. -Original Message- From: Chantal Ackermann chantal.ackerm...@btelligent.de To: solr-user solr-user@lucene.apache.org Sent: Fri, Jan 13, 2012 1:52 am Subject: Re: can solr automatically search for different punctuation of a word Hi Alex, for me, ICUFoldingFilterFactory works very good. It does lowercasing and removes diacritica (this is how umlauts and accenting of letters is called - punctuation means comma, points etc.). It will work for any any language, not only German. And it will also handle apostrophs as in C'est bien. ICU requires additional libraries in the classpath. For an in-built solr solution have a look at ASCIIFoldingFilterFactory. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory Example configuration: fieldType name=text_sort class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.ICUFoldingFilterFactory / /analyzer /fieldType And dependencies (example for Maven) in addition to solr-core: dependency groupIdorg.apache.lucene/groupId artifactIdlucene-icu/artifactId version${solr.version}/version scoperuntime/scope /dependency dependency groupIdorg.apache.solr/groupId artifactIdsolr-analysis-extras/artifactId version${solr.version}/version scoperuntime/scope /dependency Cheers, Chantal On Fri, 2012-01-13 at 00:09 +0100, alx...@aim.com wrote: Hello, I would like to know if solr has a functionality to automatically search for a different punctuation of a word. For example if I if a user searches for a word Uber, and stemmer is german lang, then solr looks for both Uber and Über, like in synonyms. Is it possible to give a file with a list of possible substitutions of letters to solr and have it search for all possible punctuations? Thanks. Alex.
Re: Parameter for database host in DIH?
Hi wunder, for us, it works with internal dots when specifying the properties in $SOLR_HOME/[core]/conf/solrcore.properties: like this: db.url=xxx db.user=yyy db.passwd=zzz $SOLR_HOME/[core]/conf/data-config.xml: dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=${db.url} user=${db.user} password=${db.passwd} batchSize=1000 / Cheers, Chantal On Sat, 2012-01-21 at 01:01 +0100, Walter Underwood wrote: Weird. I can make it work with a request parameter and $dataimporter.request.dbhost: http://localhost:8983/solr/textbooks/dataimport?command=full-importdbhost=mydbhost Or I can make it work with a Java system property with no dots. But when I use a Java system property with internal dots, it doesn't work. wunder On Jan 20, 2012, at 3:53 PM, Walter Underwood wrote: On Jan 20, 2012, at 3:34 PM, Shawn Heisey wrote: On 1/20/2012 3:48 PM, Walter Underwood wrote: Is there a way to parameterize the JDBC URL in the data import handler? I tried this, but it did not insert the value of the property. I'm running Solr 3.3.0. dataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://${com.chegg.dbhost}/product Here's what I've got in mine. I pass in dbHost and dbSchema parameters (along with a bunch of others that get used in the entity SQL statements) when starting DIH. url=jdbc:mysql://${dataimporter.request.dbHost}:3306/${dataimporter.request.dbSchema}?zeroDateTimeBehavior=convertToNull Are those Java system properties? I didn't get a substitution when I ran: java -Dcom.chegg.dbhost=mydbhost The resulting JDBC URL was jdbc:mysql:///product, so it replaced the variable with empty string. Odd. wunder -- Walter Underwood wun...@wunderwood.org
Re: Validating solr user query
Hi Dipti, just to make sure: are you aware of http://wiki.apache.org/solr/DisMaxQParserPlugin This will handle the user input in a very conventional and user friendly way. You just have to specify on which fields you want it to search. With the 'mm' parameter you have a powerfull option to specify how much of a search query has to match (more flexible than defining a default operator). Cheers, Chantal On Fri, 2012-01-20 at 23:52 +0100, Dipti Srivastava wrote: Hi All, I ma using HTTP/JSON to search my documents in Solr. Now the client provides the query on which the search is based. What is a good way to validate the query string provided by the user. On the other hand, if I want the user to build this query using some Solr api instead of preparing a lucene query string which API can I use for this? I looked into SolrQuery in SolrJ but it does not appear to have a way to specify the more complex queries with the boolean operators and operators such as ~,+,- etc. Basically, I am trying to avoid running into bad query strings built by the caller. Thanks! Dipti This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.
Re: can solr automatically search for different punctuation of a word
Hi Alex, for me, ICUFoldingFilterFactory works very good. It does lowercasing and removes diacritica (this is how umlauts and accenting of letters is called - punctuation means comma, points etc.). It will work for any any language, not only German. And it will also handle apostrophs as in C'est bien. ICU requires additional libraries in the classpath. For an in-built solr solution have a look at ASCIIFoldingFilterFactory. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory Example configuration: fieldType name=text_sort class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.ICUFoldingFilterFactory / /analyzer /fieldType And dependencies (example for Maven) in addition to solr-core: dependency groupIdorg.apache.lucene/groupId artifactIdlucene-icu/artifactId version${solr.version}/version scoperuntime/scope /dependency dependency groupIdorg.apache.solr/groupId artifactIdsolr-analysis-extras/artifactId version${solr.version}/version scoperuntime/scope /dependency Cheers, Chantal On Fri, 2012-01-13 at 00:09 +0100, alx...@aim.com wrote: Hello, I would like to know if solr has a functionality to automatically search for a different punctuation of a word. For example if I if a user searches for a word Uber, and stemmer is german lang, then solr looks for both Uber and Über, like in synonyms. Is it possible to give a file with a list of possible substitutions of letters to solr and have it search for all possible punctuations? Thanks. Alex.
Re: Solr, SQL Server's LIKE
Thanks, Erick! That sounds great. I really do have to upgrade. Chantal On Sun, 2012-01-01 at 16:42 +0100, Erick Erickson wrote: Chantal: bq: The problem with the wildcard searches is that the input is not analyzed. As of 3.6/4.0, this is no longer entirely true. Some analysis is performed for wildcard searches by default and you can specify most anything you want if you really need to see: https://issues.apache.org/jira/browse/SOLR-2438 and http://wiki.apache.org/solr/MultitermQueryAnalysis Best Erick
RE: Solr, SQL Server's LIKE
The problem with the wildcard searches is that the input is not analyzed. For english, this might not be such a problem (except if you expect case insenstive search). But than again, you don't get that with like, either. Ngrams bring that and more. What I think is often forgotten when comparing 'like' and Solr search is: Solr's analyzer allow not only for case insenstive search but also for other analysis such as removing diacritics and this is also applied when sorting (you have to create a separate index in the DB, as well, if you want that). Say you have the following names: 'Van Hinden' 'van Hinden' 'Música' 'Musil' like 'mu%' - no hits like 'Mu%' - 1 hit like 'van%' - 1 hit like 'hin%' - no hits with Solr whitespace or standard tokenizer, ngrams and a diacritcs and lowercase filter (no wildcard search): 'mu'/'Mu' - 2 hits sorted ignoring case and diacritics 'van' - 2 hits 'hin' - 2 hits (This is written down from experience. I haven't checked those examples explicitly.) Cheers, Chantal On Fri, 2011-12-30 at 02:00 +0100, Chris Hostetter wrote: : Thanks. I know I'll be able to utilize some of Solr's free text : searching capabilities in other search types in this project. The : product manager wants this particular search to exactly mimic LIKE%. ... : Ex: If I search Albatross I want Albert to be excluded completely, : rather than having a low score. please be specific about the types of queries you want. ie: we need more then one example of the type of input you want to provide, the type of matches you want to see for that input, and the type of matches you want to get back. in your first message you said you need to match company titles pretty exactly but then seem to contradict yourself by saying the SQL's LIKE command fit's the bill -- even though the SQL LIKE command exists specificly for in-exact matches on field values. Based on your one example above of Albatross, you don't need anything special: don't use ngrams, don't use stemming, don't use fuzzy anything -- just search for Albatross and it will match Albatross but not Albert. if you want Albatross to match Albatross Road use some basic tokenization. If all you really care about is prefix searching (which seems suggested by your LIKE% comment above, which i'm guessing is shorthand for something similar to LIKE 'ABC%'), so that queries like abc and abcd both match abcdef and abcd but neither of them match abcd then just use prefix queries (ie: abcd*) -- they should be plenty efficient for your purposes. you only need to worry about ngrams when you want to efficiently match in the middle of a string. (ie: TITLE LIKE %ABC%) -Hoss
Re: Update schema.xml using solrj APIs
Hi Ahmed, if you have a multi core setup, you could change the file programmatically (e.g. via XML parser), copy the new file to the existing one (programmatically, of course), then reload the core. I haven't reloaded the core programmatically, yet, but that should be doable via SolrJ. Or - if you are not using Java, then call the specific core admin URL in your programme. You will have to re-index after changing the schema.xml. Chantal On Thu, 2011-12-22 at 04:34 +0100, Otis Gospodnetic wrote: Ahmed, At this point in time - no. You need to edit it manually and restart Solr to see the changed. This will change in the future. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Ahmed Abdeen Hamed ahmed.elma...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, December 21, 2011 4:12 PM Subject: Update schema.xml using solrj APIs Hello friend, I am new to Solrj and I am wondering if there is a away you can update the schema.xml file via the APIs. I would appreciate any help. Thanks very much, -Ahmed
Re: Exception using SolrJ
Hi Shawn, maybe the requests that fail have a certain pattern - for example that they are longer than all the others. Chantal
Re: full-data import suddenly stopped working. Total Rows Fetched remains 0
DIH does not simply fail. Without more information, it's hard just to guess. As your using MS SQLServer, maybe you ran into this? http://blogs.msdn.com/b/jdbcteam/archive/2011/11/07/supported-java-versions-november-2011.aspx Would be a problem caused by certain java versions. Have you turned the DEBUG level on for DIH and Solr in general? Chantal On Mon, 2011-12-19 at 18:55 +0100, PeterKerk wrote: Hi Chantal, I reduced my data-config.xml to a bare minimum: dataConfig dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost:1433;databaseName=tt user=sa password=dfgjLJSFSD / document name=weddinglocations entity name=location query=select * from locations WHERE isapproved='true' field name=id column=ID / field name=title column=TITLE / /entity /document /dataConfig I ran reload-config succesfully, but still the same behavior occurs. Oh and the query select * from locations WHERE isapproved='true' returns a lot of results when ran directly against my DB What else can it be? -- View this message in context: http://lucene.472066.n3.nabble.com/full-data-import-suddenly-stopped-working-Total-Rows-Fetched-remains-0-tp3599004p3599087.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exception using SolrJ
Hi Shawn, the exception indicates that the connection was lost. I'm sure you figured that out for yourself. Questions: - is that specific server instance really running? That is, can you reach it via browser? - If yes: how is your connection pool configured and how do you initialize it? More specifically: from what I know, CommonsHttp is already multi threaded so in your initializing code should not be using multiple threads to access it. Not completely sure about that in combination with SolrJ, though. I just had that issue when using CommonsHttp directly in the wrong way. I am using SolrJ with CommonsHttp pool for a some time now, and it all works very reliably. I've encountered those Connection reset exceptions also but they were always caused by the server not being reachable. Chantal From your pastebin: Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Connection reset at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:480) On Tue, 2011-12-20 at 01:11 +0100, Shawn Heisey wrote: On 12/16/2011 12:44 AM, Shawn Heisey wrote: I am seeing exceptions from some code I have written using SolrJ.I have placed it into a pastebin: http://pastebin.com/XnB83Jay No reply in three days, does nobody have any ideas for me? Thanks, Shawn
Re: multiple temporary indexes
You could also create a single index and use a field user to filter results for only a single user. This would also allow for statistics over the complete base. Chantal On Tue, 2011-12-20 at 12:43 +0100, graham wrote: Hi, I'm a complete newbie and currently at the stage of wondering whether Solr might be suitable for what I want. I need to take search results collected by another system in response to user requests and allow each user to view their set of results in different ways: sorting into different order, filtering by facets, etc. I am wondering whether it would be practical to do this by creating a Solr index for each result set on the fly. Two particular questions are: 1. Is it even practical to do this in real time? Assuming that each set of results contains low hundreds of elements (each a bibliographic record), and that the users' patience is not unlimited. 2. What would be the best way to manage a separate index for each query, given that the main constraint is time, and that the number of indexes needed simultaneously is not known in advance? Create a separate core for each query, or use a single index with a query id as one of the keys, or..? Thanks for any advice (or pointers to existing systems which work like this) Graham
Re: full-data import suddenly stopped working. Total Rows Fetched remains 0
Never would have thought that MS could help me earn such honours... ;D On Tue, 2011-12-20 at 12:57 +0100, PeterKerk wrote: Chantal...you are the queen! :p That was it, I downgraded to 6.27 and now it works again...thank god! -- View this message in context: http://lucene.472066.n3.nabble.com/full-data-import-suddenly-stopped-working-Total-Rows-Fetched-remains-0-tp3599004p3601013.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: full-data import suddenly stopped working. Total Rows Fetched remains 0
Hi Peter, the most probable cause is that your database query returns no results. Have you run the query that DIH is using directly on your database? In the output you can see that DIH has fetched 0 rows from the DB. Maybe your query contains a restriction that suddenly had this effect - like a restriction on a modification time or similar. Cheers, Chantal On Mon, 2011-12-19 at 18:21 +0100, PeterKerk wrote: str name=Total Rows Fetched0/str str name=Total Documents Processed0/str str name=Total Documents Skipped0/str
Re: Solr Best Practice Configuration
Hi Ben, what I understand from your post is: Advertiser (1) - (*) Advert (one-to-many where there can be 50,000 per single Advertiser) Your index entity is based on Advert which means that there can be 50,000 documents in the index that need to be changed if a field of an Advertiser is updated in the database. I am using multi-core setups with differently structured indexes for these needs. This means that some more complex lookups require queries on several cores. This has not been a problem, so far. Our indexes, however, have rather few data (ranging from a few hundred thousand entries to some millions, rather a lot of fields with short texts) and are highly dynamic (rebuilt several times a day, full rebuilt, no increments). Moving the Advertiser data out of the Advertiser index means: (1) on updates of the Advertiser fields you don't need to change the Advert index (2) the Advert index might be a bit smaller (if that matters) (3) the statistics on the Advertiser data will be in relation to the Advertiser data and not in relation to the Adverts, while the statistics on the Adverts won't contain any Advertiser data, anymore. (This list might not be complete.) What does (3) imply? You will not be able to facet or sort or group on Adverts using any of the Advertiser fields (as they reside in a different index core). If you need facetting or similar then consider first testing the performance of a massive update or rebuilding your index before starting to change to multiple cores. Maybe the performance is better than you fear it to be and no change is required. Cheers, Chantal On Fri, 2011-12-09 at 10:46 +0100, BenMccarthy wrote: Good Morning. I have now been through the various Solr tutorials and read the SOLR 3 Enterprise server book. Im not at the point of figuring out if Solr can help us with a scaling problem. Im looking for advice on the following scenario any pointers or references will be great: I have two sets of distinct data: Advert Advertiser An Advertiser has many Adverts in the db looking like Advert { id field a field b advertiser_id } Advertiser { id field c field d lat long } So ive followed some docs and ive created a DIH which pulls all this into one SOLR index. Which is great. The problem im looking at is that we have a massive churn on Advertiser updates and with the one index i dont think it will scale (Correct me if im wrong). Would it be possible to have two seperate cores each with its own index and then when issuing queries the results are returned as they are in a single core setup. Im basically looking for some pointers telling me if im going in the right direction. I dont want to have to update 5 adverts when a advertiser simply updated field c. This is a problem we have with our current search. Thanks Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Best-Practice-Configuration-tp3572492p3572492.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr - http error 404 when requesting solrconfig.xml or schema.xml
Hi Torsten, some more information would help us to help you: - does calling /apps/solrslave/admin/ return the Admin Homepage? - what is the path to your SOLR_HOME - where in the filesystem are solrconfig.xml and schema.xml (even if this sounds redundant, maybe they are just misplaced) - their read permissions (whether the server can access them) - where the server is looking for them (the value of the JNDI SOLR_HOME, the output of the logfile telling you which locations is actually being used as SOLR_HOME, and whether this is where you want it to be) Cheers, Chantal On Tue, 2011-11-29 at 10:50 +0100, Torsten Krah wrote: Hi, got some interesting problem and don't know how to debug further. I am using an external solr home configured via jndi. Deployed my war file (context is /apps/solrslave/) and if want to look at the schema: /apps/solrslave/admin/file/?contentType=text/xml;charset=utf-8file=schema.xml the response is 404. It doesn't matter if i am using Jetty 7.x, 8.x or Tomcat 6.0.33, 404 is the answer. Anyone an idea where to look for? regards Torsten
Re: DIH Strange Problem
Hi Yavar, my experience with similar problems was that there was something wrong with the database connection or the database. Chantal On Wed, 2011-11-23 at 11:57 +0100, Husain, Yavar wrote: I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing data. Indexing and all was working perfectly fine. However today when I started full indexing again, Solr halts/stucks at the line Creating a connection for entity. There are no further messages after that. I can see that DIH is busy and on the DIH console I can see A command is still running, I can also see total rows fetched = 0 and total request made to datasource = 1 and time is increasing however it is not doing anything. This is the exact configuration that worked for me. I am not really able to understand the problem here. Also in the index directory where I am storing the index there are just 3 files: 2 segment files + 1 lucene*-write.lock file. ... data-config.xml: dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders user=testUser password=password/ document . . Logs: INFO: Server startup in 2016 ms Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=11 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6] Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1322041133719 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity SampleText with URL: jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders /PRE BR **BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR **BR FAFLDBR PRE
Re: XSLT caching mechanism
In solrconfig.xml, change the xsltCacheLifetimeSeconds property of the XSLTResponseWriter to the desired value (this example 6000secs): queryResponseWriter name=xslt class=solr.XSLTResponseWriter int name=xsltCacheLifetimeSeconds6000/int /queryResponseWriter On Mon, 2011-11-14 at 15:31 +0100, vrpar...@gmail.com wrote: Hello All, i am using xslt to transform solr xml response, when made search;getting below warning WARNING [org.apache.solr.util.xslt.TransformerProvider] The TransformerProvider's simplistic XSLT caching mechanism is not appropriate for high load scenarios, unless a single XSLT transform is used and xsltCacheLifetimeSeconds is set to a sufficiently high value. how can i apply effective xslt caching for solr ? Thanks, Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/XSLT-caching-mechanism-tp3506979p3506979.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Xsl for query output
Hi Jeremy, The xsl files go into the subdirectory /xslt/ (you have to create that) in the /conf/ directory of the core that should return the transformed results. So, if you have a core /myCore/ that you want to return transformed results you need to put the example.xsl into: $SOLR_HOME/myCore/conf/xslt/example.xsl and in $SOLR_HOME/myCore/conf/solrconfig.xml you add (change the cache value to whatever appropriate): queryResponseWriter name=xslt class=solr.XSLTResponseWriter int name=xsltCacheLifetimeSeconds6000/int /queryResponseWriter Call this in a query: http://mysolrserver/solr/myCore/select?q=id:idwt=xslttr=example.xsl Chantal On Fri, 2011-10-14 at 07:22 +0200, Jeremy Cunningham wrote: Thanks for the response but I have seen this page and I had a few questions. 1. Since I am using tomcat, I had to move the example directory into the tomcat directory structure. In the multicore, there is no example.xsl. Where do I need to put it? Also, how do I send docs for indexing when running solr under tomcat? Thanks, Jeremy On 10/13/11 3:46 PM, Lance Norskog goks...@gmail.com wrote: http://wiki.apache.org/solr/XsltResponseWriter This is for the single-core example. It is easiest to just go to solr/example, run java -jar start.jar, and hit the URL in the above wiki page. Then poke around in solr/example/solr/conf/xslt. There is no solrconfig.xml change needed. It is generally easiest to use the solr/example 'java -jar start.jar' example to test out features. It is easy to break configuration linkages. Lance On Thu, Oct 13, 2011 at 12:42 PM, Jeremy Cunningham jeremy.cunningham.h...@statefarm.com wrote: I am new to solr and not a web developer. I am a data warehouse guy trying to use solr for the first time. I am familiar with xsl but I can't figure out how to get the example.xsl to be applied to my xml results. I am running tomcat and have solr working. I copied over the solr mulitiple core example to the conf directory on my tomcat server. I also added the war file and the search is fine. I can't seem to figure out what I need to add to the solrcofig.xml or where ever so that the example.xsl is used. Basically can someone tell me where to put the xsl and where to configure its usage? Thanks -- Lance Norskog goks...@gmail.com
Re: Interesting DIH challenge
Hi there, I have been using cores to built up new cores (because of various reasons). (I am not using SOLR as data storage, the cores are re-indexed frequently.) This solution works for releases 1.4 and 3 as it does not use the SolrEntityProcessor. To load data from another SOLR core and populate part of the new document I use: (1) in the target data-config.xml: entity name=content dataSource=sourceCore url=solr/gmaContent/select?q=contentid: ${targetDoc.ID}amp;wt=xsltamp;tr=response-to-update.xsl processor=my.custom.handler.dataimport.CachingXPathEntityProcessor cacheKey=${targetDoc.ID} useSolrAddSchema=true /entity (2) sourceCore's solrconfig.xml needs an entry (uncomment) for the xslt response writer: !-- XSLT response writer transforms the XML output by any xslt file found in Solr's conf/xslt directory. Changes to xslt files are checked for every xsltCacheLifetimeSeconds. -- queryResponseWriter name=xslt class=solr.XSLTResponseWriter int name=xsltCacheLifetimeSeconds6000/int /queryResponseWriter (2) response-to-update.xsl (this goes into $SOLR_HOME/sourceCore/conf/xslt/): ?xml version='1.0' encoding='UTF-8'? xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xsl:output method=xml media-type=text/xml;charset=utf-8 indent=yes encoding=UTF-8 omit-xml-declaration=no / xsl:template match='/' add xsl:apply-templates select=/response/result/doc / /add /xsl:template xsl:template match=doc doc xsl:choose xsl:when test=doc/*[name()='arr'] xsl:apply-templates select=//arr / /xsl:when xsl:otherwise xsl:apply-templates select=child::node() / /xsl:otherwise /xsl:choose /doc /xsl:template xsl:template match=//arr xsl:for-each select=child::node() xsl:element name=field xsl:attribute name=namexsl:value-of select=../@name/xsl:value-of /xsl:attribute xsl:value-of select=. / /xsl:element /xsl:for-each /xsl:template xsl:template match=child::node() xsl:element name=field xsl:attribute name=namexsl:value-of select=@name/xsl:value-of /xsl:attribute xsl:value-of select=. / /xsl:element /xsl:template /xsl:stylesheet Cheers, Chantal On Mon, 2011-10-10 at 06:26 +0200, Gora Mohanty wrote: On Mon, Oct 10, 2011 at 6:30 AM, Pulkit Singhal pulkitsing...@gmail.com wrote: @Gora Thank You! I know that Solr accepts xml with Solr specific elements that are commands that only it understands ... such as add/, commit/ etc. Question: Is there some way to ask Solr to dump out whatever it has in its index already ... as a Solr xml document? As far as I know, there is no way to do that out of the box. One would get the contents of each record with a normal Solr query, massage that into a Solr XML document, and use that to rebuild the index. Have not tried this, but it should be possible to get the desired output format with the XsltResponseWriter: http://wiki.apache.org/solr/XsltResponseWriter . All in all, it seems easier to me to just reindex from the base source, unless that is not possible for some reason. Plan: I intend to message that xml dump (add the field + value that I need in every doc's xml element) and then I should be able to push this dump back to Solr to get data indexed again, I hope. Yes, that should be the general idea. Regards, Gora
Property undefined in Schema Browser (Solr Admin)
Hi all, the Schema Browser in the SOLR Admin shows me the following information: Field: title Field Type: string Properties: Indexed, Stored, Multivalued, Omit Norms, undefined, Sort Missing Last Schema: Indexed, Stored, Multivalued, Omit Norms, undefined, Sort Missing Last Index: Indexed, Stored, Omit Norms I was wandering where this undefined property comes from. I had a look at: http://wiki.apache.org/solr/LukeRequestHandler and the schema.jsp but to no avail so far. Could someone give me a hint? I'm just wondering whether I am missing some problem with my field declaration which is: field name=title type=string indexed=true stored=true required=true multiValued=true/ Thanks a lot! Chantal
Re: Property undefined in Schema Browser (Solr Admin)
Hi Stefan, thanks for your time! There is a capital F which is not listed as key? But this is also the case in your example so probably I'm confusing something. Anyway, the respective output of: /admin/luke?fl=title is: lst name=title str name=typestring/str str name=schemaI-SM---OF---l/str str name=indexI-SO/str int name=docs16697/int int name=distinct8476/int − lst name=topTerms ... /lst − lst name=histogram ... /lst /lst /lst − lst name=info − lst name=key str name=IIndexed/str str name=TTokenized/str str name=SStored/str str name=MMultivalued/str str name=VTermVector Stored/str str name=oStore Offset With TermVector/str str name=pStore Position With TermVector/str str name=OOmit Norms/str str name=LLazy/str str name=BBinary/str str name=fSort Missing First/str str name=lSort Missing Last/str /lst Cheers, Chantal On Wed, 2011-08-24 at 11:44 +0200, Stefan Matheis wrote: Hi Chantal, how does your luke-output look like? What the Schema-Browser does is, it takes the schema- index-element: str name=schemaI-SOF---l/str str name=indexI-SO/str and does a lookup for every mentioned character in the key-hash: lst name=key str name=IIndexed/str str name=TTokenized/str str name=SStored/str str name=MMultivalued/str str name=VTermVector Stored/str str name=oStore Offset With TermVector/str str name=pStore Position With TermVector/str str name=OOmit Norms/str str name=LLazy/str str name=BBinary/str str name=fSort Missing First/str str name=lSort Missing Last/str /lst so i guess there is something in your output, that could not be mapped :/ i just checked this with the example schema .. so there may be cases which are not correct. Regards Stefan On Wed, Aug 24, 2011 at 10:48 AM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Hi all, the Schema Browser in the SOLR Admin shows me the following information: Field: title Field Type: string Properties: Indexed, Stored, Multivalued, Omit Norms, undefined, Sort Missing Last Schema: Indexed, Stored, Multivalued, Omit Norms, undefined, Sort Missing Last Index: Indexed, Stored, Omit Norms I was wandering where this undefined property comes from. I had a look at: http://wiki.apache.org/solr/LukeRequestHandler and the schema.jsp but to no avail so far. Could someone give me a hint? I'm just wondering whether I am missing some problem with my field declaration which is: field name=title type=string indexed=true stored=true required=true multiValued=true/ Thanks a lot! Chantal
Re: Property undefined in Schema Browser (Solr Admin)
Hi Stefan, I'm using Firefox 3.6.20 and Chromium 12.0.742.112 (90304) Ubuntu 10.10. The undefined appears with both of them. Chantal On Wed, 2011-08-24 at 14:09 +0200, Stefan Matheis wrote: Hi Chantal, On Wed, Aug 24, 2011 at 1:43 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: There is a capital F which is not listed as key? But this is also the case in your example so probably I'm confusing something. There's a quick hack in place, which tries: the character, the lowercase character the uppercase character - so there should be a least one correlation. But i'll add an additional check to the code, that 'undefined'-values will be skip for the list. Just to check that, which Browser are you using? The UI was developed using Firefox4 Chrome12+ and is not fully tested on others browsers :/ Regards Stefan
Re: How to copy and extract information from a multi-line text before the tokenizer
Hi Michael, have you considered the DataImportHandler? You could use the the LineEntityProcessor to create fields per line and then copyField to collect everything for the AllData field. http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor Chantal On Tue, 2011-08-23 at 12:28 +0200, Michael Kliewe wrote: Hello all, I have a custom schema which has a few fields, and I would like to create a new field in the schema that only has one special line of another field indexed. Lets use this example: field AllData (TextField) has for example this data: Title: exampleTitle of the book Author: Example Author Date: 01.01.1980 Each line is separated by a line break. I now need a new field named OnlyAuthor which only has the Author information in it, so I can search and facet for specific Author information. I added this to my schema: fieldType name=authorField class=solr.TextField analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=^.*\nAuthor: (.*?)\n.*$ replacement=$1 replace=all / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=^.*\nAuthor: (.*?)\n.*$ replacement=$1 replace=all / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType field name=OnlyAuthor type=authorField indexed=true stored=true / copyField source=AllData dest=OnlyAuthor/ But this is not working, the new AuthorOnly field contains all data, because the regex didn't match. But I need Example Author in that field (I think) to be able to search and facet only author information. I don't know where the problem is, perhaps someone of you can give me a hint, or a totally different method to achieve my goal to extract a single line from this multi-line-text. Kind regards and thanks for any help Michael
Re: Store complete XML record (DIH XPathEntityProcessor)
Hi g, ok, I understand your problem, now. (Sorry for answering that late.) I don't think PlainTextEntityProcessor can help you. It does not take a regex. LineEntityProcessor does but your record elements probably do not come on their own line each and you wouldn't want to depend on that, anyway. I guess you would be best off writing your own entity processor - maybe by extending XPath EP if that gives you some advantage. You can of course also implement your own importer using SolrJ and your favourite XML parser framework - or any other programming language. If you are looking for a config-only solution - i'm not sure that there is one. Someone else might be able to comment on that? Cheers, Chantal On Thu, 2011-07-28 at 19:17 +0200, solruser@9913 wrote: Thanks Chantal I am ok with the second call and I already tried using that. Unfortunatly It reads the whole file into a field. My file is as below example xml record ... /record record ... /record record ... /record /xml Now the XPATH does the 'for each /record' part. For each record I also need to store the raw log in there. If I use the PlainTextEntityProcessor then it gives me the whole file (from xml .. /xml ) and not each of the record /record Am I using the PlainTextEntityProcessor wrong? THanks g -- View this message in context: http://lucene.472066.n3.nabble.com/Store-complete-XML-record-DIH-XPathEntityProcessor-tp3205524p3207203.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Store complete XML record (DIH XPathEntityProcessor)
Hi g, have a look at the PlainTextEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#PlainTextEntityProcessor you will have to call the URL twice that way, but I don't think you can get the complete document (the root element with all structure) via xpath - so the XPathEntityProcessor cannot help you. If calling the URL twice slows your indexer down in unacceptable ways you can always subclass XPathEntityProcessor (knowing Java is helpful, thoug...). There surely is a way to make it return what you need. Or maybe an entity processor that caches the content and uses XPath EP and PlainText EP to accomplish your needs (not sure whether the API allows for that). Cheers, Chantal On Thu, 2011-07-28 at 05:53 +0200, solruser@9913 wrote: I am trying to use DIH to import an XML based file with multiple XML records in it. Each record corresponds to one document in Lucene. I am using the DIH FileListEntityProcessor (to get file list) followed by the XPathEntityProcessor to create the entities. It works perfectly and I am able to map XML elements to fields . however I also need to store the entire XML record as separate 'full text' field. Is there any way the XPathEntityProcessor provides a variable like 'rawLine' or 'plainText' that I can map to a field. I tried to use the Plain Text processor after this - but that does not recognize the XML boundaries and just gives the whole XML file. entity name=x rootEntity=truedataSource=logfilereader processor=XPathEntityProcessor url=${logfile.fileAbsolutePath} stream=false forEach=/xml/myrecord transformer= field column=mycol1 xpath=/xml/myrecord/@something / and so on ... This works perfectly. However I also need something like ... field column=fullxmlrecord name=plainText / Any help is much appreciated. I am a newbie and may be missing something obvious here -g -- View this message in context: http://lucene.472066.n3.nabble.com/Store-complete-XML-record-DIH-XPathEntityProcessor-tp3205524p3205524.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [POLL] How do you (like to) do logging with Solr
Please tick one of the options below with an [X]: [ ] I always use the JDK logging as bundled in solr.war, that's perfect [X] I sometimes use log4j or another framework and am happy with re-packaging solr.war actually : not so happy because our operations team has to repackage it. But there is no option for [X] add the logger configuration to the server's classpath, no repackaging! [ ] Give me solr.war WITHOUT an slf4j logger binding, so I can choose at deploy time [ ] Let me choose whether to bundle a binding or not at build time, using an ANT option [ ] What's wrong with the solr/example Jetty? I never run Solr elsewhere! [ ] What? Solr can do logging? How cool!
Maven : Specifying SNAPSHOT Artifacts and the Hudson Repository
Hi all, does anyone have a successfull setup (=pom.xml) that specifies the Hudson snapshot repository : https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/lastStableBuild/artifact/maven_artifacts (or that for trunk) and entries for any solr snapshot artifacts which are then found by Maven in this repository? I have specified the repository in my pom.xml as : repositories repository idsolr-snapshot-3.x/id urlhttps://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/lastSuccessfulBuild/artifact/maven_artifacts/url releases enabledfalse/enabled /releases snapshots enabledtrue/enabled /snapshots /repository /repositories And the dependencies: dependency groupIdorg.apache.solr/groupId artifactIdsolr-core/artifactId version3.2-SNAPSHOT/version /dependency dependency groupIdorg.apache.solr/groupId artifactIdsolr-dataimporthandler/artifactId version3.2-SNAPSHOT/version /dependency Maven's output is (for solr-core): Downloading: http://192.168.2.40:8081/nexus/content/groups/public/org/apache/solr/solr-core/3.2-SNAPSHOT/solr-core-3.2-SNAPSHOT.jar [INFO] Unable to find resource 'org.apache.solr:solr-core:jar:3.2-SNAPSHOT' in repository solr-snapshot-3.x (https://builds.apache.org/hudson/job/Lucene-Solr-Maven-3.x/lastSuccessfulBuild/artifact/maven_artifacts) I'm also trying around with specifying the exact name of the jar, but no success so far, and it also seems wrong as it will be constantly changing. Also, searching hasn't returned anything helpful, so far. I'd really appreciate if someone could point me into the right direction! Thanks! Chantal
DIH : modify document in sibling entity of root entity
Dear all, in DIH, is it possible to have two sibling entities where: - the first one is the root entity that creates the documents by iterating over a table that has one row per document. - the second one is executed after the completion of the first entity iteration, and it provides more data that is added to the newly created documents. I've set up such a dih configuration, and the second entity is executed, but no data is written into the index apart from the data extracted by the root entity (=no document is modified?). Documents are identified by the unique key 'id' which is defined by pk=id on both entities. Is this supposed to work at all? I haven't found anything so far on the net but I could have used the wrong keywords for searching, of course. As answer to the maybe obvious question why I'm not using a subentity: I thought that this solution might be faster because it iterates over the second data source instead of hitting it with a query per each document. Anyway, the main reason I tried this is because I want to know whether it works. I'm still not sure whether it should work but I'm doing something wrong... Thanks! Chantal
Re: DIH : modify document in sibling entity of root entity
Hi Stefan, thanks for your time! No, the second entity is not reusing values from the previous one. It just provides more fields to it, and, of course the unique identifier - which in case of the second entity is not unique: document name=contributor entity name=contributor pk=id rootEntity=true query=select CONTRIBUTOR_ID as id, CONTRIBUTOR_NAME as name, EXT_ID as extid fromDIM_CONTRIBUTOR /entity entity name=appearance pk=id rootEntity=false transformer=RegexTransformer query=select CONTENTID as contentid, SUBVALUE fromCONTENT_VALUE where ID_ATTRIBUTE=170 field column=ignore sourceColName=SUBVALUE groupNames=id,type,pos,character regex=(\d+);(\d+);(\d+);([^;]*);\d*;[A-Z0-9]*;\d* / /entity /document and here are the fields: field name=id type=slong indexed=true stored=true required=true / field name=name type=string indexed=true stored=true required=true termVectors=true / field name=contentid type=slong indexed=true stored=true multiValued=true / field name=character type=string indexed=true stored=true multiValued=true termVectors=true / field name=type type=sint indexed=true stored=true multiValued=true / (For the sake of simplicity I've removed some fields that would be created using copyfield instructions and transformers.) I'm currently trying to run this using a subentity using the SQL restriction SUBVALUE like '${contributor.id};%' but this takes ages... The other one finished in under a minute (and it did actually process the second entity, I think, it just didn't modify the index). The current one runs for about 30min, and has only processed 22,000 documents out of more than 390,000. (Of course, there is probably no index on that column) Thanks for any suggestions! Chantal On Thu, 2011-03-10 at 17:13 +0100, Stefan Matheis wrote: Hi Chantal, i'm not sure if i understood you correctly (if at all)? Two entities, not arranged as sub-entitiy, but using values from the previous entity? Could you paste your dataimport the relevant part of the logging-output? Regards Stefan On Thu, Mar 10, 2011 at 4:12 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Dear all, in DIH, is it possible to have two sibling entities where: - the first one is the root entity that creates the documents by iterating over a table that has one row per document. - the second one is executed after the completion of the first entity iteration, and it provides more data that is added to the newly created documents. I've set up such a dih configuration, and the second entity is executed, but no data is written into the index apart from the data extracted by the root entity (=no document is modified?). Documents are identified by the unique key 'id' which is defined by pk=id on both entities. Is this supposed to work at all? I haven't found anything so far on the net but I could have used the wrong keywords for searching, of course. As answer to the maybe obvious question why I'm not using a subentity: I thought that this solution might be faster because it iterates over the second data source instead of hitting it with a query per each document. Anyway, the main reason I tried this is because I want to know whether it works. I'm still not sure whether it should work but I'm doing something wrong... Thanks! Chantal
Re: DIH : modify document in sibling entity of root entity
Hi Gora, thanks for making me read this part of the documentation again! This processor probably cannot do what I need out of the box but I will try to extend it to allow specifying a regular expression in its where attribute. Thanks! Chantal On Thu, 2011-03-10 at 17:39 +0100, Gora Mohanty wrote: On Thu, Mar 10, 2011 at 8:42 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: [...] Is this supposed to work at all? I haven't found anything so far on the net but I could have used the wrong keywords for searching, of course. As answer to the maybe obvious question why I'm not using a subentity: I thought that this solution might be faster because it iterates over the second data source instead of hitting it with a query per each document. [...] I think that what you are after can be handled by Solr's CachedSqlEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor Two major caveats here: * I am not 100% sure that I have understood your requirements. * The documentation for CachedSqlEntityProcessor needs to be improved. Will see if I can test it, and come up with a better example. As I have not actually used this, it could be that I have misunderstood its purpose. Regards, Gora
Re: solrj http client 4
SOLR-2020 addresses upgrading to HttpComponents (form HttpClient). I have had no time to work more on it, yet, though. I also don't have that much experience with the new version, so any help is much appreciated. Cheers, Chantal On Tue, 2010-12-07 at 18:35 +0100, Yonik Seeley wrote: On Tue, Dec 7, 2010 at 12:32 PM, Stevo Slavić ssla...@gmail.com wrote: Hello solr users and developers, Are there any plans to upgraded http client dependency in solrj from 3.x to 4.x? I'd certainly be for moving to 4.x (and I think everyone else would too). The issue is that it's not a drop-in replacement, so someone needs to do the work. -Yonik http://www.lucidimagination.com Found this https://issues.apache.org/jira/browse/SOLR-861 ticket - judging by comments in it upgrade might help fix the issue. I have a project in jar hell, getting different versions of http client as transitive dependency... Regards, Stevo.
Re: XML to solr
Hi Jörg, you could use the DataImportHandler's XPathEntityProcessor. There you can specify for each sorl field the XPath at which its value is stored in the original file (your first example snippet). The value of field FIEL_ITEMS_DATEINAME for example would have the XPath //fie...@name='DATEINAME']. (http://zvon.org/xxl/XPathTutorial/General_ger/examples.html has a very simple and good reference for xpath patterns.) Have a look at the DataImportHandler wiki page on how to call the XPathEntityProcessor. Cheers, Chantal On Mon, 2010-11-15 at 09:22 +0100, Jörg Agatz wrote: hi Users. I have a Question, i have a lot of XML to indexing, at the Moment i have two XML files, one original, and one for solr a (Search_xml) for example: add doc SECTION type=FILE_ITEMS field name=MD5SUM6483030ed18d8b7a58a701c8bb638d20/field field name=DATEINAME0012_20101105111938206.pdf/field field name=FILE_TYPEPDM/field /SECTION SECTION type=ERP SECTION type=ERP_FILE_ITEMS field name=IDxx/field /SECTION SECTION type=ERP_FILE_CONTENT field name=VORGANGSARTEK-Anfrage/field /SECTION /SECTION /doc /add Search_xml : add doc field name=FILE_ITEMS_MD5SUM6483030ed18d8b7a58a701c8bb638d20/field field name=FILE_ITEMS_DATEINAME0012_20101105111938206.pdf/field field name=FILE_ITEMS_FILE_TYPEPDM/field field name=ERP_ERP_FILE_ITEMS_IDxx/field field name=ERP_ERP_FILE_CONTENT_VORGANSARTEK-Anfrage/field /doc /add My Question is now, (how) can i indexing the Original XML? without move the XML to a special search XML?
Output Search Result in ADD-XML-Format
Dear all, my use case is: Creating an index using DIH where the sub-entity is querying another SOLR index for more fields. As there is a very convenient attribute useSolrAddSchema that would spare me to list all the fields I want to add from the other index, I'm looking for a way to get the search results in the ADD format directly. Before starting on the XSLT file that would transform the regular SOLR result into an SOLR update xml, I just wanted to ask whether there already exists a solution for this. Maybe I missed some request handler that already returns the result in update format? Thanks! Chantal
RE: Output Search Result in ADD-XML-Format
Thank you, James. I was looking for something like that (and I remember having stumbled over it, in the past, now that you mention it). I've created an xslt file that transforms the regular result to an update xml document. Seeing that the SolrEntityProcessor is still in development, I will stick to the XSLT solution while we are still using 1.4 but I will add a note that with the new release we should try this SolrEntityProcessor. (Reading through the JIRA issue I'm not sure whether I can simply get all fields from the other index and dump them into the index which is being built. With the XSLT + useSolrAddSchema solution this works just fine without the need to list all the fields. I should try that before the next solr release to be able to give some feedback.) Thanks! Chantal On Wed, 2010-11-10 at 15:13 +0100, Dyer, James wrote: I'm not sure, but SOLR-1499 might have what you want. https://issues.apache.org/jira/browse/SOLR-1499 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] Sent: Wednesday, November 10, 2010 5:59 AM To: solr-user@lucene.apache.org Subject: Output Search Result in ADD-XML-Format Dear all, my use case is: Creating an index using DIH where the sub-entity is querying another SOLR index for more fields. As there is a very convenient attribute useSolrAddSchema that would spare me to list all the fields I want to add from the other index, I'm looking for a way to get the search results in the ADD format directly. Before starting on the XSLT file that would transform the regular SOLR result into an SOLR update xml, I just wanted to ask whether there already exists a solution for this. Maybe I missed some request handler that already returns the result in update format? Thanks! Chantal
Re: Missing facet values for zero counts
Hi Allistair, On Wed, 2010-09-29 at 15:37 +0200, Allistair Crossley wrote: Hello list, I am implementing a directory using Solr. The user is able to search with a free-text query or 2 filters (provided as pick-lists) for country. A directory entry only has one country. I am using Solr facets for country and I use the facet counts generated initially by a *:* search to generate my pick-list. This is working fairly well but there are a couple of issues I am facing. Specifically the countries pick-list does not contain ALL possible countries. It only contains those that have been indexed against a document. I have looked at facet.missing but I cannot see how this will work - if no documents have a country of Sweden, then how would Solr know to generate a missing total of zero for Sweden - it's never heard of it. I feel I am missing something - is there a way by which you tell Solr all possible countries rather than relying on counts generated from the index? I don't think you are missing anything. Instead, you've described it very well: how should SOLR know of something that never made it into the index? Why not just state in the interface that for all missing countries (and deduce that from the facets and the list retrieved from the database), there are no hits. You can list those countries separately (or even add them to the facets after processing solr's result). If you do want to have them in the index, you'd have to add them by adding empty documents. But you might get into trouble with required fields etc. And you will change the statistics of the fields. Chantal
Re: Autocomplete: match words anywhere in the token
On Wed, 2010-09-22 at 20:14 +0200, Arunkumar Ayyavu wrote: Thanks for the responses. Now, I included the EdgeNGramFilter. But, I get the following results when I search for canon pixma. Canon PIXMA MP500 All-In-One Photo Printer Canon PowerShot SD500 As you can guess, I'm not expecting the 2nd result entry. Though I understand why I'm getting the 2nd entry, I don't know how to ask Solr to exlcude it (I could fitler it in my application though). :-( Looks like I should study more of Solr's capabilites to get the solution. This has not so much to do with autosuggest, anymore? You put those quotes in to denote the search input, not to say that the search input was a phrase, I suppose. Searching for the phrase (quoted), only the first line should have been found. If you want to have returned hits that include most of the searched terms, and in case of only two input terms both: you can configure such sophisticated rules with the http://wiki.apache.org/solr/DisMaxQParserPlugin Have a look at the mm parameter (Minimum Should Match) Chantal
Re: Restrict possible results based on relational information
hi Stefan users can send privates messages, the selection of recipients is done via auto-complete. therefore we need to restrict the possible results based on the users confirmed contacts - but i have absolutely no idea how to do that :/ Add all confirmed contacts to the index, and use it like a type of relation? pass the list of confirmed contacts together with the query? This does not sound like a search query because: 1. you know the user 2. you know his/her list of confirmed contacts If both statements are true, the list of confirmed contacts should be accessible via JSON-URL call so that you can load it into a autocomplete dropdown. SOLR needs not be involved in this case (but you can of course store the list of confirmed contacts in a multivalued field per user if you need it for other searches or facetting). Cheers, Chantal
RE: Simple Filter Query (fq) Use Case Question
Hi Andre, changing the entity in your index from donor to gift changes of course the scope of your search results. I found it helpful to re-think such change from that other side (the result side). If the users of your search application look for individual gifts, in the end, then changing the index to gift is for the better. If they are searching for donors, then I would rethink the change but not discard it completely: you can still get the list of distinct donors by facetting over donors. You can show the users that list of donors (the facets), and they can chose from it and get all information on that donor (restricted to the original query, of course). The information would include the actual search result of a list of gifts that passed the query. Cheers, Chantal On Wed, 2010-09-15 at 21:49 +0200, Andre Bickford wrote: Thanks for the response Erick. I did actually try exactly what you suggested. I flipped the index over so that a gift is the document. This solution certainly solves the previous problem, but introduces a new issue where the search results show duplicate donors. If a donor gave 12 times in a year, and we offer full years as facet ranges, my understanding is that you'd see that donor 12 times in the search results, once for each gift document. Obviously I could do some client side filtering to list only distinct donors, but I was hoping to avoid that. If I've simply stumbled into the basic tradeoffs of denormalization, I can live with client side de-duplication, but if you have any further suggestions I'm all eyes. As for sizing, we have some huge charities as clients. However, right now I'm testing on a copy of prod data from a smaller client with ~350,000 donors and ~8,000,000 gift records. So, when I flipped the index around as you suggested, it went from 350,000 documents to 8,000,000 documents. No issues with performance at all. Thanks again, Andre -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, September 15, 2010 3:09 PM To: solr-user@lucene.apache.org Subject: Re: Simple Filter Query (fq) Use Case Question One strategy is to denormalize all the way. That is, each Solr document is Gift Amount and Gift Date would not be multiValued. You'd create a different document for each gift, so you'd have multiple documents with the same Id, Name, and Address. Be careful, though, if you've defined Id as a UniqueKey, you'd only have one record/donor. You can handle this easily enough by making a composite key of Id+Gift Date (assuming no donor made more than one gift on exactly the same date). I know this goes completely against all the reflexes you've built up with working with DBs, but... Can you give us a clue how many donations we're talking about here? You'd have to be working with a really big nonprofit to get enough documents to have to start worrying about making your index smaller. HTH Erick On Wed, Sep 15, 2010 at 1:41 PM, Andre Bickford abickf...@softrek.comwrote: I'm working on creating a solr index search for a charitable organization. The solr index stores documents of donors. Each donor document has the following four fields: Id Name Address Gift Amount (multiValued) Gift Date (multiValued) In our relational database, there is a one-to-many relationship between the DONOR table and the GIFT table. One donor can of course give many gifts over time. Consequently, I created the Gift Amount and Gift Date fields to be mutiValued. Now, consider the following query filtered for gifts last month between $0 and $100: q=name:Jones fq=giftDate:[NOW/MONTH-1 TO NOW/MONTH] fq=giftAmount:[0 TO 100] The results show me donors who donated ANY amount in the past month and donors who had EVER in the past given a gift between $0 and $100. I was hoping to only see donors who had given a gift between $0 and $100 in the past month exclusively. I believe the problem is that I neglected to consider that for two multiValued fields, while the values might align index wise, there is really no other association between the two fields, so the filter query intersection isn't really behaving as I expected. I think this is a fundamental question of one-to-many denormalization, but obviously I'm not yet experienced enough with Lucene/Solr to find a solution. As to why not just keep using a relational database, it's because I'm trying to provide a faceting solution to drill down to donors. The aforementioned fq parameters would come from faceting. Oh, that and Oracle Text indexes are a PITA. :-) Thanks for any help you can provide. André Bickford Software Engineering Team Leader SofTrek Corporation 30 Bryant Woods North Amherst, NY 14228 716.691.2800 x154 800.442.9211 Fax: 716.691.2828 abickf...@softrek.com www.softrek.com
Re: Boosting specific field value
Hi Ravi, with dismax, use the parameter q.alt which expects standard lucene syntax (instead of q). If q.alt is present in the query, q is not required. Add the parameter qt=dismax. Chantal On Thu, 2010-09-16 at 06:22 +0200, Ravi Kiran wrote: Hello Mr.Rochkind, I am using StandardRequestHandler so I presume I cannot use bq param right ?? Is there a way we can mix dismax and standardhandler i.e use lucene syntax for query and use dismax style for bq using localparams/nested queries? I remember seeing your post related to localparams and nested queries and got thoroughly confused On Wed, Sep 15, 2010 at 10:28 PM, Jonathan Rochkind rochk...@jhu.eduwrote: Maybe you are looking for the 'bq' (boost query) parameter in dismax? http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29 From: Ravi Kiran [ravi.bhas...@gmail.com] Sent: Wednesday, September 15, 2010 10:02 PM To: solr-user@lucene.apache.org Subject: Re: Boosting specific field value Erick, I afraid you misinterpreted my issueif I query like you said i.e q=source(bbc OR associated press)^10 I will ONLY get documents with source BBC or Associated Press...what I am asking is - if my query query does not deal with source at all but uses some other field...since the field source will be in the result , is there a way to still boost such a document To re-iterate, If my query is as follows q=primarysection:(Politics* OR Nation*)fq=contenttype:(Blog OR Photo Gallery) pubdatetime:[NOW-3MONTHS TO NOW] and say the resulting docs have source field, is there any way I can boost the resulting doc/docs that have either BBC/Associated Press as the value in source field to be on top Can a filter query (fq) have a boost ? if yes, then probably I could rewrite the query as follows in a round about way q=primarysection:(Politics* OR Nation*)fq=contenttype:(Blog OR Photo Gallery) pubdatetime:[NOW-3MONTHS TO NOW] (source:(BBC OR Associated Press)^10 OR -source:(BBC OR Associated Press)^5) Theoretically, I have to write source in the fq 2 times as I need docs that have source values too just that they will have a lower boost Thanks, Ravi Kiran Bhaskar On Wed, Sep 15, 2010 at 1:34 PM, Erick Erickson erickerick...@gmail.com wrote: This seems like a simple query-time boost, although I may not be understanding your problem well. That is, q=source(bbc OR associated press)^10 As for boosting more recent documents, see: http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents HTH Erick On Wed, Sep 15, 2010 at 12:44 PM, Ravi Kiran ravi.bhas...@gmail.com wrote: Hello, I am currently querying solr for a *primarysection* which will return documents like - *q=primarysection:(Politics* OR Nation*)fq=contenttype:(Blog OR Photo Gallery) pubdatetime:[NOW-3MONTHS TO NOW]*. Each document has several fields of which I am most interested in single valued field called *source* ...I want to boost documents which contain *source* value say Associated Press OR BBC and also by newer documents. The returned documents may have several other source values other than BBC or Associated Press. since I specifically don't query on these source values I am not sure how I can boost them, Iam using * StandardRequestHandler*
Re: SolrJ and Multi Core Set up
Hi Shaun, you create the SolrServer using multicore by just adding the core to the URL. You don't need to add anything with SolrQuery. URL url = new URL(new URL(solrBaseUrl), coreName); CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); Concerning the default core thing - I wouldn't know about that. Cheers, Chantal On Fri, 2010-09-03 at 12:03 +0200, Shaun Campbell wrote: I'm writing a client using SolrJ and was wondering how to handle a multi core installation. We want to use the facility to rebuild the index on one of the cores at a scheduled time and then use the SWAP facility to switch the live core to the newly rebuilt core. I think I can do the SWAP with CoreAdminRequest.setAction() with a suitable parameter. First of all, does Solr have some concept of a default core? If I have core0 as my live core and core1 which I rebuild, then after the swap I expect core0 to now contain my rebuilt index and core1 to contain the old live core data. My application should then need to keep referring to core0 as normal with no change. Does I have to refer to core0 programmatically? I've currently got working client code to index and to query my Solr data but I was wondering whether or how I set the core when I move to multi core? There's examples showing it set as part of the URL so my guess it's done by using something like setParam on SolrQuery. Has anyone got any advice or examples of using SolrJ in a multi core installation? Regards Shaun
Re: advice on creating a solr index when data source is from many unrelated db tables
Hi Ahmed, fields that are empty do not impact the index. It's different from a database. I have text fields for different languages and per document there is always only one of the languages set (the text fields for the other languages are empty/not set). It works all very well and fast. I wonder more about what you describe as unrelated data - why would you want to put unrelated data into a single index? If you want to search on all the data and return mixed results there surely must be some kind of relation between the documents? Chantal On Thu, 2010-07-29 at 21:33 +0200, S Ahmed wrote: I understand (and its straightforward) when you want to create a index for something simple like Products. But how do you go about creating a Solr index when you have data coming from 10-15 database tables, and the tables have unrelated data? The issue is then you would have many 'columns' in your index, and they will be NULL for much of the data since you are trying to shove 15 db tables into a single Solr/Lucense index. This must be a common problem, what are the potential solutions?
Re: Implementing lookups while importing data
Hi Gora, your suggestion is good. Two thoughts: 1. if both of the tables you are joining are in the same database under the same user you might want to check why the join is so slow. Maybe you just need to add an index on a column that is used in your WHERE clauses. Joins should not be slow. 2. if the tables are in different databases and you are joining them via DIH I tend to agree that this can get too slow (I think the connections might not get pooled and the jdbc driver adds too much overhead - ATTENTION ASSUMPTION). If it's not a possibility for you to create a temporary table that aggregates the required data before indexing, then your proposal is indeed a good solution. Another way I can think off right now, that would only reduce your coding effort and change it to a configuration task: In your indexing procedure do: a) create a temporary solr core on your solr server (see the page on core admin in the wiki) b) index this tmp core with the text data c) index your main core with the data by joining it to the already existing solr index in the tmp core (this is fast, I can assure you, use URLDataSource with XPathEntityProcessor if you are on 1.4) d) delete the tmp core (well, or keep it for next time) Chantal On Thu, 2010-07-29 at 11:51 +0200, Gora Mohanty wrote: Hi, We have a database that has numeric values for some columns, which correspond to text values in drop-downs on a website. We need to index both the numeric and text equivalents into Solr, and can do that via a lookup on a different table from the one holding the main data. We are currently doing this via a JOIN on the numeric field, between the main data table and the lookup table, but this dramatically slows down indexing. We could try using the CachedSqlEntity processor, but there are some issues in doing that, as the data import handler is quite complicated. As the lookups need to be done only once, I was planning the following: (a) Do the lookups in a custom data source that extends JDBCDataSource, and store them in arrays. (b) Implement a custom transformer that uses the array data to convert numeric values read from the database to text. Comments on this approach, or suggestions for simpler ones would be much appreciated. Regards, Gora
Re: Excluding large tokens from indexing
This is probably what you want? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory On Thu, 2010-07-29 at 15:44 +0200, Paul Dlug wrote: Is there a filter available that will remove large tokens from the token stream? Ideally something configurable to a character limit? I have a noisy data set that has some large tokens (in this case more than 50 characters) that I'd like to just strip. They're unlikely to ever match a user query and will just take up space since there are a large number of them that are not distinct. --Paul
Re: Indexing Problem: Where's my data?
make sure to set stored=true on every field you expect to be returned in your results for later display. Chantal
Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)
Hi Lance! On Wed, 2010-07-28 at 02:31 +0200, Lance Norskog wrote: Should this go into the trunk, or does it only solve problems unique to your use case? The solution is generic but is an extension of XPathEntityProcessor because I didn't want to touch the solr.war. This way I can deploy the extension into SOLR_HOME/lib. The problem that it solves is not one with XPathEntityProcessor but more general. What it does: It adds an attribute to the entity that I called skipIfEmpty which takes the variable (it could even take more variables seperated by whitespace). On entityProcessor.init() which is called for sub-entities per row of root entity (:= before every new request to the data source), the value of the attribute is resolved and if it is null or empty (after trimming), the entity is not further processed. This attribute is only allowed on sub-entities. It would probably be nicer to put that somewhere higher up in the class hierarchy so that all entity processors could make use of it. But I don't know how common the use case is - all examples I found where more or less joins on primary keys. Cheers, Chantal Here comes the code== import static org.apache.solr.handler.dataimport.DataImportHandlerException.SEVERE; import java.util.Map; import java.util.logging.Logger; import org.apache.solr.handler.dataimport.Context; import org.apache.solr.handler.dataimport.DataImportHandlerException; import org.apache.solr.handler.dataimport.XPathEntityProcessor; public class OptionalXPathEntityProcessor extends XPathEntityProcessor { private Logger log = Logger.getLogger(OptionalXPathEntityProcessor.class.getName()); private static final String SKIP_IF_EMPTY = skipIfEmpty; private boolean skip = false; @Override protected void firstInit(Context context) { if (context.isRootEntity()) { throw new DataImportHandlerException(SEVERE, OptionalXPathEntityProcessor not allowed for root entities.); } super.firstInit(context); } @Override public void init(Context context) { String value = context.getResolvedEntityAttribute(SKIP_IF_EMPTY); if (value == null || value.trim().isEmpty()) { skip = true; } else { super.init(context); skip = false; } } @Override public MapString, Object nextRow() { if (skip) return null; return super.nextRow(); } }
Re: SolrJ Response + JSON
You could use org.apache.solr.handler.JsonLoader. That one uses org.apache.noggit.JSONParser internally. I've used the JacksonParser with Spring. http://json.org/ lists parsers for different programming languages. Cheers, Chantal On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote: Hello , Second try to send a mail to the mailing list... I need to translate SolrJ's response into JSON-response. I can not query Solr directly, because I need to do some math with the responsed data, before I show the results to the client. Any experiences how to translate SolrJ's response into JSON without writing your own JSON Writer? Thank you. - Mitch
Re: SolrJ Response + JSON
Hi Mitch On Wed, 2010-07-28 at 16:38 +0200, MitchK wrote: Thank you, Chantal. I have looked at this one: http://www.json.org/java/index.html This seems to be an easy-to-understand-implementation. However, I am wondering how to determine whether a SolrDocument's field is multiValued or not. The JSONResponseWriter of Solr looks at the schema-configuration. However, the client shouldn't do that. How did you solved that problem? I didn't. I'm not recreating JSON from the SolrJ results. I would try to use the same classes that SolrJ uses, actually. (Writing that without having a further look at the code.) I would avoid recreating existing code as much as possible. About multivalued fields: you need instanceof checks, I guess. The field only contains a list if there really are multiple values. (That's what works for my ScriptTransformer.) Are you sure that you cannot change the SOLR results at query time according to your needs? Maybe you should ask for that, first (ask for X instead of Y...). Cheers, Chantal Thanks for sharing ideas. - Mitch Am 28.07.2010 15:35, schrieb Chantal Ackermann: You could use org.apache.solr.handler.JsonLoader. That one uses org.apache.noggit.JSONParser internally. I've used the JacksonParser with Spring. http://json.org/ lists parsers for different programming languages. Cheers, Chantal On Wed, 2010-07-28 at 15:08 +0200, MitchK wrote: Hello , Second try to send a mail to the mailing list... I need to translate SolrJ's response into JSON-response. I can not query Solr directly, because I need to do some math with the responsed data, before I show the results to the client. Any experiences how to translate SolrJ's response into JSON without writing your own JSON Writer? Thank you. - Mitch
Re: Design questions/Schema Help
Hi, IMHO you can do this with date range queries and (date) facets. The DateMathParser will allow you to normalize dates on min/hours/days. If you hit a limit there, then just add a field with an integer for either min/hour/day. This way you'll loose the month information - which is sometimes what you want. You probably want the document entity to be a query with fields: query user (id? if you have that) sessionid date the most popular query within a date range is the query that was logged most times? Do a search on the date range: q=date:[start TO end] with facet on the query which gives you the count similar to group by count aggregation functionality in an RDBMS. You can do multiple facets at the same time but be carefull what you are querying for - it will impact the facet count. You can use functions to change the base of each facet. http://wiki.apache.org/solr/SimpleFacetParameters Cheers, Chantal On Tue, 2010-07-27 at 01:43 +0200, Mark wrote: We are thinking about using Cassandra to store our search logs. Can someone point me in the right direction/lend some guidance on design? I am new to Cassandra and I am having trouble wrapping my head around some of these new concepts. My brain keeps wanting to go back to a RDBMS design. We will be storing the user query, # of hits returned and their session id. We would like to be able to answer the following questions. - What is the n most popular queries and their counts within the last x (mins/hours/days/etc). Basically the most popular searches within a given time range. - What is the most popular query within the last x where hits = 0. Same as above but with an extra where clause - For session id x give me all their other queries - What are all the session ids that searched for 'foos' We accomplish the above functionality w/ MySQL using 2 tables. One for the raw search log information and the other to keep the aggregate/running counts of queries. Would this sort of ad-hoc querying be better implemented using Hadoop + Hive? If so, should I be storing all this information in Cassandra then using Hadoop to retrieve it? Thanks for your suggestions
Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)
Hi Mitch, thanks for that suggestion. I wasn't aware of that. I've already added a temporary field in my ScriptTransformer that does basically the same. However, with this approach indexing time went up from 20min to more than 5 hours. The new approach is to query the solr index for that other database that I've already setup. This is only a bit slower than the original query (20min). (I'm using URLDataSource to be 1.4.1 conform.) As with the db entity before, for every document a request is sent to the solr core even if it is useless because the input variable is empty. It seems that once an entity processor kicks in you cannot avoid the initial request to its data source? Thanks, Chantal On Mon, 2010-07-26 at 16:22 +0200, MitchK wrote: Hi Chantal, did you tried to write a http://wiki.apache.org/solr/DIHCustomFunctions custom DIH Function ? If not, I think this will be a solution. Just check, whether ${prog.vip} is an empty string or null. If so, you need to replace it with a value that never can response anything. So the vip-field will always be empty for such queries. Maybe that helps? Hopefully, the variable resolver is able to resolve something like ${dih.functions.getReplacementIfNeeded(prog.vip). Kind regards, - Mitch Chantal Ackermann wrote: Hi, my use case is the following: In a sub-entity I request rows from a database for an input list of strings: entity name=prog ... field name=vip ... /* multivalued, not required */ entity name=ssc_entry dataSource=ssc onError=continue query=select SSC_VALUE from SSC_VALUE where SSC_ATTRIBUTE_ID=1 and SSC_VALUE in (${prog.vip}) field column=SSC_VALUE name=vip_ssc / /entity /entity The root entity is prog and it has an optional multivalued field called vip. When the list of vip values is empty, the SQL for the sub-entity above throws an SQLException. (Working with Oracle which does not allow an empty expression in the in-clause.) Two things: (A) best would be not to run the query whenever ${prog.vip} is null or empty. (B) From the documentation, it is not clear that onError is only checked in the transformer runs but not checked when the SQL for the entity throws an exception. (Trunk version JdbcDataSource lines 250pp). IMHO, (A) is the better fix, and if so, (B) is the right decision. (If (A) is not easily fixable, making (B) work would be helpful.) Looking through the code, I've realized that the replacement of the variables is done in a very generic way. I've not yet seen an appropriate way to check on those variables in order to stop the processing of the entity if the variable is empty. Is there a way to do this? Or maybe there is a completely different way to get my use case working. Any help most appreciated! Thanks, Chantal
Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)
Hi Mitch, New idea: Create a method which returns the query-string: returnString(theVIP) { if ( theVIP != null || theVIP != ) { return a query-string to find the vip } else { return SELECT 1 // you need to modify this, so that it matches your field-definition } } The main-idea is to perform a blazing fast query, instead of a complex IN-clause-query. Does this sounds like a solution??? I was using in because it's a multiValued input that results in multiValued output (not necessarily but it's most probable - it's either empty or multiple values). I don't understand how I can make your solution work with multivalued input/output? The new approach is to query the solr index for that other database that I've already setup. This is only a bit slower than the original query (20min). (I'm using URLDataSource to be 1.4.1 conform.) Unfortunately I can not follow you. You are querying a solr-index for a database? Yes, because I've already put one up (second core) and used SolrJ to get what I want later on, but it would be better to compute the relation between the two indexes at index time instead of at query time. (If it would have worked with the db entity the second index wouldn't have been required, anymore.) But now that it works well with the url entity I'm fine with maintaining that second index. It's not that much effort. I've subclassed URLDataSource to add a check whether the list of input values is empty and only proceed when this is not the case. If realized that I have to throw an exception and add the onError attribute to the entity to make that work. Thanks! Chantal
Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)
Hi Mitch, thanks for the code. Currently, I've got a different solution running but it's always good to have examples. If realized that I have to throw an exception and add the onError attribute to the entity to make that work. I am curious: Can you show how to make a method throwing an exception that is accepted by the onError-attribute? the catch clause looks for Exception so it's actually easy. :-D Anyway, I've found a cleaner way. It is better to subclass the XPathEntityProcessor and put it in a state that prevents it from calling initQuery which triggers the dataSource.getData() call. I have overridden the initContext() method setting a go/no go flag that I am using in the overridden nextRow() to find out whether to delegate to the superclass or not. This way I can also avoid the code that fills the tmp field with an empty value if there is no value to query on. Cheers, Chantal
Re: help with a schema design problem
Hi, I haven't read everything thoroughly but have you considered creating fields for each of your (I think what you call) party value? So that you can query like client:Pramod. You would then be able to facet on client and supplier. Cheers, Chantal On Fri, 2010-07-23 at 23:23 +0200, Geert-Jan Brits wrote: Multiple rows in the OPs example are combined to form 1 solr-document (e.g: row 1 and 2 both have documentid=1) Because of this combine, it would match p_value from row1 with p_type from row2 (or vice versa) 2010/7/23 Nagelberg, Kallin knagelb...@globeandmail.com When i search p_value:Pramod AND p_type:Supplier it would give me result as document 1. Which is incorrect, since in document 1 Pramod is a Client and not a Supplier. Would it? I would expect it to give you nothing. -Kal -Original Message- From: Geert-Jan Brits [mailto:gbr...@gmail.com] Sent: Friday, July 23, 2010 5:05 PM To: solr-user@lucene.apache.org Subject: Re: help with a schema design problem Is there any way in solr to say p_value[someIndex]=pramod And p_type[someIndex]=client. No, I'm 99% sure there is not. One way would be to define a single field in the schema as p_value_type = client pramod i.e. combine the value from both the field and store it in a single field. yep, for the use-case you mentioned that would definitely work. Multivalued of course, so it can contain Supplier Raj as well. 2010/7/23 Pramod Goyal pramod.go...@gmail.com In my case the document id is the unique key( each row is not a unique document ) . So a single document has multiple Party Value and Party Type. Hence i need to define both Party value and Party type as mutli-valued. Is there any way in solr to say p_value[someIndex]=pramod And p_type[someIndex]=client. Is there any other way i can design my schema ? I have some solutions but none seems to be a good solution. One way would be to define a single field in the schema as p_value_type = client pramod i.e. combine the value from both the field and store it in a single field. On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits gbr...@gmail.com wrote: With the usecase you specified it should work to just index each Row as you described in your initial post to be a seperate document. This way p_value and p_type all get singlevalued and you get a correct combination of p_value and p_type. However, this may not go so well with other use-cases you have in mind, e.g.: requiring that no multiple results are returned with the same document id. 2010/7/23 Pramod Goyal pramod.go...@gmail.com I want to do that. But if i understand correctly in solr it would store the field like this: p_value: Pramod Raj p_type: Client Supplier When i search p_value:Pramod AND p_type:Supplier it would give me result as document 1. Which is incorrect, since in document 1 Pramod is a Client and not a Supplier. On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: I think you just want something like: p_value:Pramod AND p_type:Supplier no? -Kallin Nagelberg -Original Message- From: Pramod Goyal [mailto:pramod.go...@gmail.com] Sent: Friday, July 23, 2010 2:17 PM To: solr-user@lucene.apache.org Subject: help with a schema design problem Hi, Lets say i have table with 3 columns document id Party Value and Party Type. In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type: Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier. Now in this table if i use SQL its easy for me find all document with Party Value as Pramod and Party Type as Client. I need to design solr schema so that i can do the same in Solr. If i create 2 fields in solr schema Party value and Party type both of them multi valued and try to query +Pramod +Supplier then solr will return me the first document, even though in the first document Pramod is a client and not a supplier Thanks, Pramod Goyal
DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)
Hi, my use case is the following: In a sub-entity I request rows from a database for an input list of strings: entity name=prog ... field name=vip ... /* multivalued, not required */ entity name=ssc_entry dataSource=ssc onError=continue query=select SSC_VALUE from SSC_VALUE where SSC_ATTRIBUTE_ID=1 and SSC_VALUE in (${prog.vip}) field column=SSC_VALUE name=vip_ssc / /entity /entity The root entity is prog and it has an optional multivalued field called vip. When the list of vip values is empty, the SQL for the sub-entity above throws an SQLException. (Working with Oracle which does not allow an empty expression in the in-clause.) Two things: (A) best would be not to run the query whenever ${prog.vip} is null or empty. (B) From the documentation, it is not clear that onError is only checked in the transformer runs but not checked when the SQL for the entity throws an exception. (Trunk version JdbcDataSource lines 250pp). IMHO, (A) is the better fix, and if so, (B) is the right decision. (If (A) is not easily fixable, making (B) work would be helpful.) Looking through the code, I've realized that the replacement of the variables is done in a very generic way. I've not yet seen an appropriate way to check on those variables in order to stop the processing of the entity if the variable is empty. Is there a way to do this? Or maybe there is a completely different way to get my use case working. Any help most appreciated! Thanks, Chantal
Re: Problem with parsing date
On Mon, 2010-07-26 at 14:46 +0200, Rafal Bluszcz Zawadzki wrote: EEE, d MMM HH:mm:ss z not sure but you might want to try with an uppercase 'Z' for the timezone (surrounded by single quotes, alternatively). The rest of your pattern looks fine. But if you still run into problems try different versions, like putting the comma in quotes etc. Cheers, Chantal
Re: Solr on iPad?
Hi, unfortunately for iPad developers, it seems that it is not possible to use the Spotlight engine through the SDK: http://stackoverflow.com/questions/3133678/spotlight-search-in-the-application Chantal On Fri, 2010-07-23 at 10:16 +0200, Mark Allan wrote: Hi Stephan, On the iPad, as with the iPhone, I'm afraid you're stuck with using SQLite if you want any form of database in your app. I suppose if you wanted to get really ambitious and had a lot of time on your hands you could use Xcode to try and compile one of the open- source C-based DBs/Indexers, but as with most things in OS X and iOS development, if you're bending over yourself trying to implement something, you're probably doing it wrongly! Also, I wouldn't put it past the AppStore guardians to reject your app purely on the basis of having used something other than SQLite! Apple's cocoa-dev mailing list is very active if you have problems, but do your homework before asking questions or you'll get short shrift. http://lists.apple.com/cocoa-dev Mark On 22 Jul 2010, at 6:12 pm, Stephan Schwab wrote: Dear Solr community, does anyone know whether it may be possible or has already been done to bring Solr to the Apple iPad so that applications may use a local search engine? Greetings, Stephan