Re: {!term f)xy OR device:0 in fq has strange results
Combining the syntax of two (or more) query parsers in a single query is called nested queries. This requires two elements: 1) the use of the magic field _query_ to embed or nest a query in a larger query, and 2) enclosing the nested query in quotes since it is likely to have reserved characters that would otherwise interfere with the parsing of the enclosing query syntax. So, your example of: fq=device:0 OR {!term f=model}Vivid(PH39100) should be written as: fq=device:0 OR _query_:{!term f=model}Vivid(PH39100) See: http://wiki.apache.org/solr/SolrQuerySyntax -- Jack Krupansky -Original Message- From: abhayd Sent: Friday, May 11, 2012 10:56 AM To: solr-user@lucene.apache.org Subject: Re: {!term f)xy OR device:0 in fq has strange results reformatted the same hi I am having some issues in using {!term} in fq with OR Following query returns 6 results and it is working as expected q=navigationfq={!term f=model}Vivid(PH39100) And debug out put is also as expected Debug: QParser:LuceneQParser, filter_queries:[{!term f=model}Vivid(PH39100)], parsed_filter_queries:[model:Vivid(PH39100)], Now I want to add OR to fq and it is not working as expected at all q=navigationfq=device:0 OR {!term f=model}Vivid(PH39100) This is returning only 3 results I dont understand parsed_filter_queries output here why its doing +text:vivid Debug: QParser:LuceneQParser, filter_queries:[device:0 OR {!term f=model}Vivid(PH39100)], parsed_filter_queries:[device:0 text:{!term TO f=model} +text:vivid +MultiPhraseQuery(text:\ph (39100 ph39100)\)], How do i fix this issue? thanks abhay -- View this message in context: http://lucene.472066.n3.nabble.com/term-f-xy-OR-device-0-in-fq-has-strange-results-tp3980152p3980156.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: {!term f)xy OR device:0 in fq has strange results
hi jack, Thanks that worked. Also another option that worked was nested queries. It was nice to see u in Lucene Conference. abhay -- View this message in context: http://lucene.472066.n3.nabble.com/term-f-xy-OR-device-0-in-fq-has-strange-results-tp3980152p3982663.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing to add to a field, not replace
No. Lucene and Solr commits replace the entire document. --wunder On May 12, 2012, at 10:00 AM, Mark Laurent wrote: Hello, Is it possible to perform an index commit that Solr would add the incoming value to an existing fields' value? I have for example: fields field name=book_title type=string index=true stored=true required=true / field name=count type=int index=true stored=true required=true / /fields Currently, I am doing a search to retrieve the current value of count, then doing the addition on my processing server then committing the new count value. Is it possible to use a copyField or a custom field type that would perform this addition for me so I can skip the search query? Thanks in advance for your time! - Mark
Re: Indexing to add to a field, not replace
People are working on field update, but that feature is not currently available in a release of Solr. You can read about the current status of that work here: https://issues.apache.org/jira/browse/SOLR-139 It may or may not be usable for you today - in trunk. But if field update is important to you and you don't want to use that patch, you could implement a custom update processor that performed the read/increment/write sequence as you need it. That would require all of you fields to be stored, which is probably what you have if you are doing the update remotely. Take a look at SOLR-139 first. -- Jack Krupansky -Original Message- From: Mark Laurent Sent: Saturday, May 12, 2012 1:00 PM To: solr-user@lucene.apache.org Subject: Indexing to add to a field, not replace Hello, Is it possible to perform an index commit that Solr would add the incoming value to an existing fields' value? I have for example: fields field name=book_title type=string index=true stored=true required=true / field name=count type=int index=true stored=true required=true / /fields Currently, I am doing a search to retrieve the current value of count, then doing the addition on my processing server then committing the new count value. Is it possible to use a copyField or a custom field type that would perform this addition for me so I can skip the search query? Thanks in advance for your time! - Mark
Re: Indexing to add to a field, not replace
Mark That sounds like a Use-Case for ExternalFileField, doesn't it? You'll find more infos about that here: http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html Stefan On Saturday, May 12, 2012 at 7:00 PM, Mark Laurent wrote: Hello, Is it possible to perform an index commit that Solr would add the incoming value to an existing fields' value? I have for example: fields field name=book_title type=string index=true stored=true required=true / field name=count type=int index=true stored=true required=true / /fields Currently, I am doing a search to retrieve the current value of count, then doing the addition on my processing server then committing the new count value. Is it possible to use a copyField or a custom field type that would perform this addition for me so I can skip the search query? Thanks in advance for your time! - Mark
Re: getTransformer error
Hi , I am facing the same problem with my XSLT, after upgrading to 3.6 from solr 1.4 I was wondering if you have found the solution? Can you please share your solution if you have found one? This should be helpful to others as well who are struggling. thanks --Pramila -- View this message in context: http://lucene.472066.n3.nabble.com/getTransformer-error-tp3047726p3982801.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Shards multi core slower then single big core
My query is SolrQuery sQuery = new SolrQuery(query.getQueryStr()); sQuery.setQueryType(dismax); sQuery.setRows(100); if (!query.isSearchOnDefaultField()) { sQuery.setParam(qf, queryFields.toArray(new String[queryFields.size()])); } sQuery.setFields(visibleFields.toArray(new String[visibleFields.size()])); if(query.isORQuery()) { sQuery.setParam(mm,1); } My search is requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=shardslocalhost:9090/solr/book1,localhost:9090/solr/book2,localhost:9090/solr/book3,localhost:9090/solr/book4,localhost:9090/solr/book5,localhost:9090/solr/book6/str str name=qf text^2.0 /str str name=fl title item_id author titleMinusAuthor /str int name=ps4/int str name=q.alt*:*/str str name=hl.fltext features name/str str name=f.name.hl.fragsize0/str str name=f.name.hl.alternateFieldname/str str name=f.text.hl.fragmenterregex/str /lst /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3979243.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about updating SOLR fieldset
Did you shut down and restart or “reload” Solr? A core restart/reload is needed for Solr to “see” schema changes. See: http://wiki.apache.org/solr/CoreAdmin#RELOAD or, if you did try a reload, maybe there were errors which prevented Solr from starting the new core initialization, which leaves the old instance with the old schema still running. -- Jack Krupansky From: dominique schoonbrood Sent: Friday, May 11, 2012 11:22 AM To: solr-user@lucene.apache.org Subject: question about updating SOLR fieldset Hi there I have a question about updating the fieldset i can work with. The first version has been made by an ex-employee of our company. These are the current fields in my php file: $fields = array('size', 'pubdate', 'type', 'section', 'translation', 'groups'); I've managed to put an extra field into the right schema.xml file. However, if i try to work with the new field 'taal' and then try to save/commit it while indexing, it says it cannot find this new field $idoc-addField('type', 'pdf'); $idoc-addField('section', 'files'); $idoc-addField('taal','nl'); Also, my new field doesn't show up in this list: What should be my next step after changing the schema.xml? The core method for doing it dynamically is not installed, and i have no idea how to, but based on the current field set, my predecessor did it without the core method. I guess i just don't know which commands i should use to re-index it. Thanks in advance Dominique -- Met vriendelijke groeten, Dominique Schoonbrood developer - Sanmax Weg naar Zwartberg 18 | 3660 Opglabbeek T. 070 250 236 | F. 089 856 929 | www.sanmax.be
Re: Partition Question
No, this isn't what sharding is all about. Sharding is taking a single logical index and splitting it up amongst a number of physical units, often on individual machines. Load and unload partitions dynamically doesn't make any sense when talking about shards. So let's back up. You could create your own _cores_ that you load/unload and take over the distribution of the incoming queries manually. By that I mean your once in 10,000 queries instance you go ahead and send your queries to older cores and then unload them when you're done. You could even fire off a query to one core, unload it, fire off the query to the next core, unload it, etc. Of course your query would be very slow, but in such a rare case this may be acceptable. Or you could get some more memory/machines and just have some unused resources. Best Erick On Wed, May 9, 2012 at 5:08 AM, Yuval Dotan yuvaldo...@gmail.com wrote: Thanks Lance There is already a clear partition - as you assumed, by date. My requirement is for the best setup for: 1. A *single machine* 2. Quickly changing index - so i need to have the option to load and unload partitions dynamically Do you think that the sharding model that solr offers is the most suitable for this setup? What about the solr multi core model? On Wed, May 9, 2012 at 12:23 AM, Lance Norskog goks...@gmail.com wrote: Lucene does not support more 2^32 unique documents, so you need to partition. In Solr this is done with Distributed Search: http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/DistributedSearch First, you have to decide a policy for which documents go to which 'shard'. It is common to make a hash code as the unique id, then distribute the documents modulo this value. This gives a roughly equal distribution of documents. If there is already a clear partition, like the date of the document (like newspaper articles) you could use that also. You have new documents and existing documents. For new documents you need code for this policy to get all new documents to the right index. This could be one master program that passes them out, or each indexer could know which documents it gets. If you want to split up your current index, that's different. I have done this: for each shard, make a copy of the full index, delete-by-query all of the documents that are NOT in that shard, and optimize. We had to do this in sequence so it took a few days :) You don't need a full optimize. Use 'maxSegments=50' or '100' to suppress that last final giant merge. On Tue, May 8, 2012 at 12:02 AM, Yuval Dotan yuvaldo...@gmail.com wrote: Hi Can someone please guide me to the right way to partition the solr index? On Mon, May 7, 2012 at 11:41 AM, Yuval Dotan yuvaldo...@gmail.com wrote: Hi All Jan, thanks for the reply - answers for your questions are located below Please update me if you have ideas that can solve my problems. First, some corrections to my previous mail: Hi All We have an index of ~2,000,000,000 Documents and the query and facet times are too slow for us - our index in fact will be much larger Most of our queries will be limited by time, hence we want to partition the data by date/time - even when unlimited – which is mostly what will happen, we have results in the recent records and querying the whole dataset is redundant We want to partition the data because the index size is too big and doesn't fit into memory (80 Gb's) - our data actually continuously grows over time, it will never fit into memory, but has to be available for queries in case results are found in older records or a full facet is required 1. Is multi core the best way to implement my requirement? 2. I noticed there are some LOAD / UNLOAD actions on a core - should i use these action when managing my cores? if so how can i LOAD a core that i have unloaded for example: I have 7 partitions / cores - one for each day of the week - we might have 2000 per day In most cases I will search for documents only on the last day core. Once every 1 queries I need documents from all cores. Question: Do I need to unload all of the old cores and then load them on demand (when i see i need data from these cores)? 3. If the question to the last answer is no, how do i ensure that only cores that are loaded into memory are the ones I want? Thanks Yuval * * *Answers to Jan:* Hi, First you need to investigate WHY faceting and querying is too slow. What exactly do you mean by slow? Can you please tell us more about your setup? * How large documents and how many fields? small records ~200bytes, 20 fields avg most of them are not stored - attached schema and config file * What kind of queries? How many hits? How many facets? Have you studies debugQuery=true output? problem is not with queries being slow per se, it is with getting 50
Re: Solr Filter for matching non-accented characters to their accented equivalents
Your field needs to use a field type which has a character folding/mapping filter, such as: charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ Such as in: fieldType name=text_char_norm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType See the example schema. In older releases of Solr there was an ISO Latin-1 filter and later the ASCII Folding Filter, but in 3.6 and trunk the above MappingCharFilterFactory char filter is used. -- Jack Krupansky -Original Message- From: Chiniga Sent: Friday, May 11, 2012 6:41 AM To: solr-user@lucene.apache.org Subject: Solr Filter for matching non-accented characters to their accented equivalents Hello, Our company is maintaining a Vietnamese website and here is the problem: Our keyboards do not contain accented characters... and once we search for a product, our non-accented-character searches result to nothing. We need Solr to match our non-accented characters to their accented character equivalents. For example: Searching for Tre Trung would contain results with the words Trẻ trung. Really hope someone can help. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Filter-for-matching-non-accented-characters-to-their-accented-equivalents-tp3979562.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solritas in production
Solaritas was never intended to be used for production situations at all. It's reason for existing is to provide: 1 a way to show something prettier than the XML (or JSON or) responses to Solr queries for people just getting started. 2 a way to provide a very quick proof-of-concept/prototyping framework for _internal_ use. You can very rapidly produce reasonable looking results pages for your business folks, product managers, etc. It was never intended to be a customer-facing front end for Solr... FWIW Erick On Wed, May 9, 2012 at 6:39 AM, Marcelo Carvalho Fernandes mcf2...@gmail.com wrote: Now that you gave us more info about your project's requirements it's clear to me that Solritas should not be used in your project. - Marcelo Carvalho Fernandes On 9 May 2012 04:56, András Bártházi and...@barthazi.hu wrote: Hi, Solritas (Velocity Response Writer) is NOT intended for production use. The simple reason, apart from that it is not production grade quality, is that it requires direct access to the Solr instance, as it is simply a response writer. You MUST use a separate front end layer above Solr and never expose Solr directly to the world. So you should feel totally comfortable continuing to use Solr over HTTP from PHP! Thanks for the response, we're agree. Here is our situation, a bit more detailed. I see only one reason to use Solritas: - a Solritas only solution, the architecture is more simple, and a bit faster as Symfony has 50-100 ms overhead But I see a lot for NOT using: - a very basic (and a bit outdated) documentation - it's 3-4 years old technology, but no mention on the net about it - I've found about 3 mentions to use it for prototyping, seems to be nobody using it, nobody has questions about it - have found no public site using it - it's a template engine based solution, it has no framework around it (like Symfony - I miss a lot of features), it's a view - and implementing logic in a view layer is not a best practice (some logic can be implemented using Java code, but it will still belong to the view layer, and practically) - it's a MODEL (a search query)-VIEW solution with almost no controller, Symfony is a full MVC framework - there can be only one search query on a page, and as we have pages with 3 different queries, and these pages are SEO related, it seems to be not possible to solve it with Solritas (using AJAX may be a solution if SEO would not be important) - we have a lot of PHP based logic (just some basic examples: processing URLs, generating titles), some needs database access as well, porting it to Velocity seems to be a huge task if it's possible at all - some parts, like our autosuggest solutions may be ported to Solritas easily, however it may need changing our quite complex client side JavaScript code, and seems to be a less maintainable situation for our programmers have direct access to the PHP code only - we have tasks those need the search engine, but have no frontend at all, like sending emails with search results based on saved queries, and previously sent results (do not send the same results again) I may have not listed all my points, but these may be show my point of view. Bye, Andras
Re: facet range query question
There's nothing that I know of that does what you want. Your problem is that you want some intelligence built in to the faceting. It'd be difficult since Solr couldn't know what a reasonable number of buckets were until it found the entire result set, so you'e have to do some kind of two-pass solution. So you really have to either create a custom plugin or some such. I suppose you could specify a range with relatively small increments and then combine them in the application. So imagine you have a gap of 10. Look at the facet results and notice that there are 200 facets returned. Have your app re-apportion them more reasonably, creating the appropriate links. I suspect that your products will actually not return that many gaps as prices tend to cluster, but you'd have to experiment. Best Erick On Wed, May 9, 2012 at 11:58 PM, andy yhl...@sohu.com wrote: I am in a E-Comerce project right now, and I have a requirement like this : I have a lot of commodities in my SOLR indexes, commodity has the price field, now I want to do facet range query, I refer to the solr wiki, the facet range query need specify *facet.range.gap* or specify *facet.range.spec=*,10,50,100,250,** because different commodities in different categories has a huge balance, laptop maybe 500$ more but a pen maybe 10$, so it's hard to specify the facet range gap My prefer result is just specify the *counts* that the whole search result was divided how many parts according to the price for instance : I search keyword *phone* and specify the *counts* 5 return the facet range automatically like this: 10 100 100 100 10 and I search keyword *laptop* and specify the *counts* 5 return the facet range automatically like this: 10 200 200 100 10 does any one know something like this, or other functions can implement my requirement ? please give me a favor, Thank You Andy -- View this message in context: http://lucene.472066.n3.nabble.com/facet-range-query-question-tp3976026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How a combile to columns of one table to use it as unique key??
The DIH Template Transformer can do this, such as in: entity name=e transformer=TemplateTransformer .. field column=namedesc template=hello${e.name},${eparent.surname} / ... /entity You can combine input column values as well as literal strings. See: http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer -- Jack Krupansky -Original Message- From: divya Sent: Friday, May 11, 2012 2:55 PM To: solr-user@lucene.apache.org Subject: How a combile to columns of one table to use it as unique key?? I want to combine two colums of a table to use it as unique key, say customer table.i want to make (customeID,customerorganizationID) a uniqueKey for my indexing document. how can i achieve this -- View this message in context: http://lucene.472066.n3.nabble.com/How-a-combile-to-columns-of-one-table-to-use-it-as-unique-key-tp3980737.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie Tries to make a Schema.xml
You're on the right track. In the default schemas it's kind of tricky. You see the bit of the location definition as: subFieldSuffix=_coordinate And later, you see: dynamicField name=*_coordinate type=tdouble indexed=true stored=false/ So the latlng_0/latlng_1 _coordinate fields are created by this dynamic field mapping. You can either leave the dynamic field stuff there, or I suppose create your own. I'd use the tdouble though, presumably the folks who created the example did it for a reason Best Erick On Thu, May 10, 2012 at 5:01 AM, Spadez james_will...@hotmail.com wrote: Right, for Long/Lat I found this information: -Long / Lat Field Type- fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ -Fields- field name=latlng type=location indexed=true stored=true / field name=latlng_0_coordinate type=double indexed=true stored=true / field name=latlng_1_coordinate type=double indexed=true stored=true / Does this look more logical? -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-tries-to-make-a-Schema-xml-tp3974200p3976539.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems with field names in solr functions
I know there are edge cases where odd field naming causes problems, field names not well-defined/enforced with Solr. Rather than banging my head against the wall and finding these cases at inopportune moments, I'd confine myself to lower-case and underscores. Other stuff _may_ work, like capital letters or '-'. But '-' is part of the solr query syntax and has the chance to getting confused by the query parser. Really, why add to your headaches by insisting on using some dangerous characters? Up to you, of course Best Erick On Thu, May 10, 2012 at 11:28 AM, Iker Huerga iker.hue...@gmail.com wrote: Hi all, I am having problems when sorting solr documents using solr functions due to the field names. Imagine we want to sort the solr documents based on the sum of the scores of the matching fields. These field are created as follows dynamicField name=foo/bar-* type=float indexed=true stored=true/ The idea is that these fields store float values as in this example *field name=foo/bar-1234 50.45/field* The examples below illustrate the issue This query - http://URL/solr/select/?q=(*foo/bar-1234*:*)+AND+( http://184.73.38.213:8080/solr/select/?q=(EMMeT/Concept-5348008:*)+AND+(EMMeT/Concept-5347854:*)version=2.2start=0rows=10indent=onsort=sum(EMMeT/Concept-5348008,EMMeT/Concept-5347854)+descwt=json *foo/bar*http://184.73.38.213:8080/solr/select/?q=(EMMeT/Concept-5348008:*)+AND+(EMMeT/Concept-5347854:*)version=2.2start=0rows=10indent=onsort=sum(EMMeT/Concept-5348008,EMMeT/Concept-5347854)+descwt=json *-2345*:*)version=2.2start=0rows=10indent=onsort=sum( *foo/bar-1234*http://184.73.38.213:8080/solr/select/?q=(EMMeT/Concept-5348008:*)+AND+(EMMeT/Concept-5347854:*)version=2.2start=0rows=10indent=onsort=sum(EMMeT/Concept-5348008,EMMeT/Concept-5347854)+descwt=json , http://184.73.38.213:8080/solr/select/?q=(EMMeT/Concept-5348008:*)+AND+(EMMeT/Concept-5347854:*)version=2.2start=0rows=10indent=onsort=sum(EMMeT/Concept-5348008,EMMeT/Concept-5347854)+descwt=json *foo/bar*http://184.73.38.213:8080/solr/select/?q=(EMMeT/Concept-5348008:*)+AND+(EMMeT/Concept-5347854:*)version=2.2start=0rows=10indent=onsort=sum(EMMeT/Concept-5348008,EMMeT/Concept-5347854)+descwt=json *-2345* )+descwt=json it gives me the following exception * * *The request sent by the client was syntactically incorrect (sort param could not be parsed as a query, and is not a field that exists in the index: sum(foo/bar-1234,foo/bar-2345)).* Whereas if I rename the field removing the / and - the following query will work - http://URL/solr/select/?q=(*bar1234*:*)+AND+(*bar2345*:*)version=2.2start=0rows=10indent=onsort=sum( http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json *bar1234*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json , *bar2345*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json )+descwt=json response:{numFound:2,start:0,docs:[ { primaryDescRes:DescRes2, *bar1234*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json :45.54, *bar2345*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json :100.0}, { primaryDescRes:DescRes1, *bar1234*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json :100.5, *bar2345*http://184.73.38.213:8080/solr/select/?q=(Concept5348008:*)+AND+(Concept5347854:*)version=2.2start=0rows=10indent=onsort=sum(Concept5348008,Concept5347854)+descwt=json :25.22}] }} I tried escaping the character as indicated in solr documentation [1], i.e. foo%2Fbar-12345 instead of foo/bar-12345, without success Could this be caused by the query parser? I would be extremely grateful if you could let me know any workaround for this Best Iker [1] http://wiki.apache.org/solr/SolrQuerySyntax#NOTE:_URL_Escaping_Special_Characters -- Iker Huerga http://www.ikerhuerga.com/
Re: fq syntax question
No. fq queries are standard syntax queries. But they can be arbitrarily complex, i.e. fq=model:(member OR new_member) Best Erick On Thu, May 10, 2012 at 2:38 PM, anarchos78 rigasathanasio...@hotmail.com wrote: Hello, Solr accepts fq parameter like: localhost:8080/solr/select/?q=blah+blah fq=model:member+model:new_member Is it possible to pass the fq parameter with alternative syntax like: fq=model=membermodel= new_member or in other way? Thank you, Tom -- View this message in context: http://lucene.472066.n3.nabble.com/fq-syntax-question-tp3977899.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr On Fly Field creation from full text for N-Gram Indexing
A faster way to do Regex transform is to use the 'PatternReplace' tokenizer or filter. These are inside the schema processing tree, not in the DIH tree. You would use copyField to get the data from your input field to a copy with the regex pattern analyzer type. Look in schema.xml for an example of using the Pattern tools. On Thu, May 10, 2012 at 4:54 AM, Husain, Yavar yhus...@firstam.com wrote: Thanks Jack. I tried (Regex Transformer) it out and the indexing has gone really slow. Is it (RegEx Transformer) slower than N-Gram Indexing? I mean they may be apples and oranges but what I mean is finally after extracting the field I want to NGram Index it. So It seems going in for NGram Indexing of Full Text (i.e. without extracting what I need using RegexTransformer) is a better solution ignoring space complexity?? Any views? THANKS!! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, May 10, 2012 4:09 PM To: solr-user@lucene.apache.org Subject: Re: Solr On Fly Field creation from full text for N-Gram Indexing You can use Regex Transformer to extract from a source field. See: http://wiki.apache.org/solr/DataImportHandler#RegexTransformer -- Jack Krupansky -Original Message- From: Husain, Yavar Sent: Thursday, May 10, 2012 6:04 AM To: solr-user@lucene.apache.org Subject: Solr On Fly Field creation from full text for N-Gram Indexing I have full text in my database and I am indexing that using Solr. Now at runtime i.e. when the indexing is going on can I extract certain parameters based on regex and create another field/column on the fly using Solr for that extracted text? For example my DB has just 2 columns (DocId FullText): DocId FullText 1 My name is Avi. RoleId: GYUIOP-MN-1087456. . Now say while indexing I want to extract RoleId, place it in another column created on fly and index that column using N-Gram indexing. I dont want to go for N-Gram of Full text as that would be too time expensive. Thanks!! Any clues would be appreciated. /PRE BR **BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR **BR FAFLDBR PRE -- Lance Norskog goks...@gmail.com
Re: Replication issues after machine failure
I have not tried to reproduce as of yet but hope to do so Monday. The machine that had the issue was a vm out of my control so I'm not certain how it was restored. I am using a fairly recent nightly build within the last few weeks On Friday, May 11, 2012, Mark Miller markrmil...@gmail.com wrote: So it's easy to reproduce? What do you mean restored from a prior state? What snapshot are you on these days for future ref? You have double checked to make sure that shard is listed as ACTIVE right? On May 11, 2012, at 4:55 PM, Jamie Johnson wrote: I've had a few instances where a machine has needed to be restored from a prior state. After doing so and firing up solr again I've had instances where replication doesn't seem to be working properly. I have not seen any failures in logs (will have to keep a closer eye on this) but when this happens and I execute a query against each with distrib=false I am seeing the following counts Shard @ host1(shard1) returned 95150 Shard @ host2(shard1) returned 95150 Shard @ host2(shard4) returned 94311 Shard @ host3(shard4) returned 8468 Shard @ host3(shard5) returned 8303 Shard @ host1(shard5) returned 96054 Shard @ host1(shard2) returned 95620 Shard @ host2(shard2) returned 95620 Shard @ host2(shard3) returned 93195 Shard @ host3(shard3) returned 8336 Shard @ host3(shard6) returned 8309 Shard @ host1(shard6) returned 96036 in this case host3 is what failed and as you can see everything on host3 is significantly less than what the leader has. Has anyone else experienced this? - Mark Miller lucidimagination.com
Re: Replication issues after machine failure
Sorry hit send too fast. The shards were listed as active. Also the solr instances were still running but the file system they wrote to had become read only. I thought that would make replication fail and when the issue was fixed and solr restarted replication would then succeed. Am I hitting some fringe case? On Saturday, May 12, 2012, Jamie Johnson jej2...@gmail.com wrote: I have not tried to reproduce as of yet but hope to do so Monday. The machine that had the issue was a vm out of my control so I'm not certain how it was restored. I am using a fairly recent nightly build within the last few weeks On Friday, May 11, 2012, Mark Miller markrmil...@gmail.com wrote: So it's easy to reproduce? What do you mean restored from a prior state? What snapshot are you on these days for future ref? You have double checked to make sure that shard is listed as ACTIVE right? On May 11, 2012, at 4:55 PM, Jamie Johnson wrote: I've had a few instances where a machine has needed to be restored from a prior state. After doing so and firing up solr again I've had instances where replication doesn't seem to be working properly. I have not seen any failures in logs (will have to keep a closer eye on this) but when this happens and I execute a query against each with distrib=false I am seeing the following counts Shard @ host1(shard1) returned 95150 Shard @ host2(shard1) returned 95150 Shard @ host2(shard4) returned 94311 Shard @ host3(shard4) returned 8468 Shard @ host3(shard5) returned 8303 Shard @ host1(shard5) returned 96054 Shard @ host1(shard2) returned 95620 Shard @ host2(shard2) returned 95620 Shard @ host2(shard3) returned 93195 Shard @ host3(shard3) returned 8336 Shard @ host3(shard6) returned 8309 Shard @ host1(shard6) returned 96036 in this case host3 is what failed and as you can see everything on host3 is significantly less than what the leader has. Has anyone else experienced this? - Mark Miller lucidimagination.com