Solr training course in Norway June 1-3

2010-05-11 Thread Jan Høydahl / Cominvent
Hi all, Announcing another Solr training course in Oslo, Norway June 1st-3rd. This is the 3 day Developing Search Applications with Solr Lucid Imagination course. The training will be conducted in Norwegian. For more information and sign-up, see www.solrtraining.com -- Jan Høydahl, search

Re: Dynamic analyzers

2010-05-26 Thread Jan Høydahl / Cominvent
You'll have a hard time supporting stemming etc with this approach. Perhaps a hybrid solution, querying across the all-languages field and a few selected Language specific fields which receive proper linguistic treatment? qf=text_all text_en^2.0 text_de^1.5 Jan Høydahl On 27. mai 2010, at

MappingCharFilterFactory equivalent for use after tokenizer?

2010-06-18 Thread Jan Høydahl / Cominvent
Hi, Is there a token filter which do the same job as MappingCharFilterFactory but after tokenizer, reading the same config file? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com

Re: dismax and AND as the default operator

2010-06-18 Thread Jan Høydahl / Cominvent
Standard DisMax does not fully support explicit AND/OR. You can prove that by trying to say q=fuel+OR+cell and see that the score stays the same (given mm=100%) It appears that DisMax does SOME intelligent handling of AND/OR/NOT, because it adds the + on the AND and a - on the NOT. But adding a

Re: ranking question

2010-06-18 Thread Jan Høydahl / Cominvent
Consider upgrading to the 3.1 branch which gives you true sort by function http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 18. juni 2010, at 01.23, Chris Hostetter

Re: finding out why a document is in the result

2010-06-18 Thread Jan Høydahl / Cominvent
Are you wanting to do thin on every single user query, and present to the end user which words matched where? In that case debugQuery may be too much, and I would look into creating a custom debugComponent optimized to only outputting the core parts of the explain section that you need. If

Re: MappingCharFilterFactory equivalent for use after tokenizer?

2010-06-18 Thread Jan Høydahl / Cominvent
It would be nice to have, because sometimes you want to normalize accents and other characters but want to wait until other filters have run. Especially if those filters are dictionary based and therefore need the original word form. Do you have a clue of how different a CharFilter is from a

Re: SolrJ: Setting multiple parameters

2010-06-20 Thread Jan Høydahl / Cominvent
Or simply use add(), because setParam overrides existing hashMap key: solrQuery.setParam(stats.facet, fieldA); solrQuery.add(stats.facet, fieldB); solrQuery.add(stats.facet, fieldC); -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe -

Re: Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ

2010-06-22 Thread Jan Høydahl / Cominvent
Hi, Sometimes I do both. I put the defaults in solrconfig.xml and thus have one place to define all kind of low-level default settings. But then I make a possibility in the application space to add/override any parameters as well. This gives you great flexibility to let server administrators

Re: preside != president

2010-06-28 Thread Jan Høydahl / Cominvent
Hi, You might also want to check out the new Lucene-Hunspell stemmer at http://code.google.com/p/lucene-hunspell/ It uses OpenOffice dictionaries with known stems in combination with a large set of language specific rules. It handles your example, but it is an early release, so test it

Re: optional vs. probhibited aka standard vs. dismax handler

2010-06-29 Thread Jan Høydahl / Cominvent
Hi, In DisMax the mm parameter controls whether terms are required or optional. The default is 100% which means all terms required, i.e. you do not need to add +. You can change to mm=0 and you will get the same behaviour as standard parser, i.e. an OR behaviour, where the + would say that a

Re: use copyField to gather and then split

2010-06-29 Thread Jan Høydahl / Cominvent
Hi pal :) Unfortunately copyField works only BEFORE analysis and you cannot chain them... The simplest solution would be to duplicate your copyField's: copyField source=title dest=textanayzemethod2 / copyField source=body dest=textanayzemethod2 / copyField source=title dest=textanayzemethod1

Re: optional vs. probhibited aka standard vs. dismax handler

2010-06-29 Thread Jan Høydahl / Cominvent
AS - www.cominvent.com Training in Europe - www.solrtraining.com On 29. juni 2010, at 14.02, Lukas Kahwe Smith wrote: On 29.06.2010, at 13:38, Lukas Kahwe Smith wrote: On 29.06.2010, at 13:24, Jan Høydahl / Cominvent wrote: Hi, In DisMax the mm parameter controls whether terms are required

Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread Jan Høydahl / Cominvent
Hi, You need to use HTTP POST in order to send those parameters I believe. Try with curl: curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml --data-binary deletequeryuid:6-HOST*/query/delete -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com

Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread Jan Høydahl / Cominvent
Hmm, nice one - I was not aware of that trick. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 30. juni 2010, at 18.41, bbarani wrote: Hi, I was able to sucessfully delete multiple documents using the below URL

Re: Dilemma - Very Frequent Synonym updates for Huge Index

2010-07-01 Thread Jan Høydahl / Cominvent
Hi, I think I would look at a hybrid approach, where you keep adding new synonyms to a query-side qynonym dictionary for immediate effect. And then every now and then or every Nth night you move those synonyms over to the index-side dictionary and trigger a full reindex. A nice side effect of

Re: Very basic questions: Faceted front-end?

2010-07-01 Thread Jan Høydahl / Cominvent
Have you had a look at www.twigkit.com ? Could be worth the bucks... -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 1. juli 2010, at 00.59, Peter Spam wrote: Wow, thanks Lance - it's really fast now! The last piece of

Re: Multilingual - Search against the appropriate field

2010-07-01 Thread Jan Høydahl / Cominvent
Hi, I have chosen the same approach as you, indexing content into text_language fields with custom analysis, and it works great. Solr does not have any overhead with this even if there are hundreds of languages, due to the schema-less nature of Lucene. And if you know which language is being

SolrJ: BinaryRequestWriter with StreamingUpdateSolrServer

2010-07-01 Thread Jan Høydahl / Cominvent
Hi, I had the impression that the StreamingUpateSolrServer in SolrJ would automatically use the /update/javabin UpdateRequestHandler. Is this not true? Do we need to call server.setRequestWriter(new BinaryRequestWriter()) for it to transmit content with the binary protocol? -- Jan Høydahl,

Re: DisMax, multi fields, and phrase fields

2010-07-01 Thread Jan Høydahl / Cominvent
Hi, Check out the new eDisMax handler (src) and the new pf2 parameter. Also available as path SOLR-1553. Another option to avoid match for doc2 is to add application specific logic in your frontend which detects car brands and years and rewrite the query into a phrase or a filter. -- Jan

Re: Use free text to search against boolean fields?

2010-07-02 Thread Jan Høydahl / Cominvent
Hi, I would rather go for the boolean variant and spend some time writing a query parser which tries to understand all kinds of input people may make, mapping it into boolean filters. In this way you can support both navigation and search and keep both in sync whatever people prefert to start

Re: SolrJ-1.4.0 client needs slf4j-jdk14-1.5.5 library on J2SE 1.5 Update 21

2010-07-03 Thread Jan Høydahl / Cominvent
Hi, SolrJ uses slf4j logging. As you can read on the wiki http://wiki.apache.org/solr/Solrj#Solr_1.4 you need to provide the slf4j-jdk14 binding (or any other log framework you wish to bind to) yourself and add the jar to your classpath. -- Jan Høydahl, search solution architect Cominvent AS

Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

2010-07-06 Thread Jan Høydahl / Cominvent
The Char-filters MUST come before the Tokenizer, due to their nature of processing the character-stream and not the tokens. If you need to apply the accent normalizatino later in the analysis chain, either use ISOLatin1AccentFilterFactory or help with the implementation of SOLR-1978. -- Jan

Re: stemming the index

2010-08-06 Thread Jan Høydahl / Cominvent
Check out slides 36-38 in this presentation for some hint on a possible solution: http://www.slideshare.net/janhoy/migrating-fast-to-solr-jan-hydahl-cominvent-as-euro-con -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 7.

Re: Deleting old index data from solr. But HDD spaces doesn`t free.

2010-08-06 Thread Jan Høydahl / Cominvent
What you are missing is a final server.optimize(); Deleting a document will only mark it as deleted in the index until an optimize. If disk space is a real problem in your case because you e.g. update all docs in the index frequently, you can trigger an optimize(), say nightly. -- Jan Høydahl,

Re: how to create a custom type in Solr

2010-08-06 Thread Jan Høydahl / Cominvent
Your use case can be solved by splitting the range into two int's: Document: {title: My document, from: 8000, to: 9000} Query: q=title:My AND (from:[* TO 8500] AND to:[8500 TO *]) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com

Re: SOLR QUERY

2010-08-06 Thread Jan Høydahl / Cominvent
Another way is to use DisMax parser, and give it a qf=field1 field2 field3... parameter, and it will automatically search in all fields specified. It is more powerful than having one default field, and saves that disk space. Buy you sacrifice some extra resources during querying. -- Jan

Re: Indexing fieldvalues with dashes and spaces

2010-08-10 Thread Jan Høydahl / Cominvent
Hi, Try solr.KeywordTokenizerFactory. However, in your case it looks as if you have certain requirements for searching that requires tokenization. So you should leave the WhitespaceTokenizer as is and create a separate field specially for the faceting, with indexed=true, stored=false and

Re: how to support implicit trailing wildcards

2010-08-10 Thread Jan Høydahl / Cominvent
Hi, You don't need to duplicate the content into two fields to achieve this. Try this: q=mount OR mount* The exact match will always get higher score than the wildcard match because wildcard matches uses constant score. Making this work for multi term queries is a bit trickier, but something

Re: solr query result not read the latest xml file

2010-08-10 Thread Jan Høydahl / Cominvent
Hi, Beware that post.jar is just an example tool to play with the default example index located at /solr/ namespace. It is very limited and you shold look elsewhere for a more production ready and robust tool. However, it has the ability to specify custom url. Please try: java -jar post.jar

Re: delete Problem..

2010-08-10 Thread Jan Høydahl / Cominvent
Hi, Since EMAIL_HEADER_FROM is a String type, you need to specify the whole field every time. Wildcards could also work, but you'll get a problem with leading wildcards. The solution would be to change the fieldType into a text type using e.g. StandardTokenizerFactory - if this does not break

Re: solr query result not read the latest xml file

2010-08-11 Thread Jan Høydahl / Cominvent
Hi, Yes, this is normal behavior. This is because Solr is *document* based, it does not know about *files*. What happens here is that your source database (or whatever) has had deletinons within this category in addition to updates, and you need to relay those to Solr. The best way to

Re: timestamp field

2010-08-11 Thread Jan Høydahl / Cominvent
Hi, Which time zone are you located in? Do you have DST? Solr uses UTC internally for dates, which means that NOW will be the time in London right now :) Does that appear to be right 4 u? Also see this thread: http://search-lucene.com/m/hqBed2jhu2e2/ -- Jan Høydahl, search solution architect

Re: Delta-import with solrj client

2010-08-11 Thread Jan Høydahl / Cominvent
Hi, Make sure you use a proper ID field, which does *not* change even if the content in the database changes. In this way, when your delta-import fetches changed rows to index, they will update the existing rows in your index. -- Jan Høydahl, search solution architect Cominvent AS -

Re: how to support implicit trailing wildcards

2010-08-11 Thread Jan Høydahl / Cominvent
=mount OR mount* have different sorting order with q=mount for those documents including mount. Change to q=mount^100 OR (mount?* -mount)^1.0, and test well. Thanks very much! 2010/8/10 Jan Høydahl / Cominvent jan@cominvent.com Hi, You don't need to duplicate the content into two

Re: Analysing SOLR logfiles

2010-08-11 Thread Jan Høydahl / Cominvent
Have a look at www.splunk.com -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 19.34, Jay Flattery wrote: Hi there, Just wondering what tools people use to analyse SOLR log files. We're looking to

Re: bug or feature???

2010-08-11 Thread Jan Høydahl / Cominvent
Your syntax looks a bit funny. Which version of Solr are you using? Pure negative queries are not supported, try q=(*:* -title:janitor) instead. Also, for debugging what's going on, please add debugQuery=true and share the parsed query for both cases with us. -- Jan Høydahl, search solution

Re: Indexing and ExtractingRequestHandler

2010-08-11 Thread Jan Høydahl / Cominvent
Hi, You can try Tika command line to parse your Excel file, then you will se the exact textual output from it, which will be indexed into Solr, and thus inspect whether something is missing. Are you sure you use a version of Luke which supports your version of Lucene? -- Jan Høydahl, search

Re: Wiki documentation Packaged as single HTML or PDF

2010-08-16 Thread Jan Høydahl / Cominvent
Use a tool to download a site to local disk, and ship the resulting HTML as a folder or ZIP. If that is not good enough, consider shipping the Reference Guide by LucidImagination. It is one PDF and contains most of what you need. The customer may be confused by LucidWorks specific chapters but

Re: Function query to boost scores by a constant if all terms are present

2010-08-18 Thread Jan Høydahl / Cominvent
You can use the map() function for this, see http://wiki.apache.org/solr/FunctionQuery#map q=a foxdefType=dismaxqf=allfieldsbf=map(query($qq),0,0,0,100.0)qq=allfields:(quick AND brown AND fence) This adds a constant boost of 100.0 if the $qq field returns a non-zero score, which it does

Re: Solr data type for date faceting

2010-08-18 Thread Jan Høydahl / Cominvent
If you want to change the schema on the live index, make sure you do a compatible change, as Solr does not do any type checking or schema change validation. I would ADD a field with another name for the tint field. Unfortunately you have to re-index to have an index built on this field. May I

Re: Missing tokens

2010-08-18 Thread Jan Høydahl / Cominvent
Hi, Can you share with us how your schema looks for this field? What FieldType? What tokenizer and analyser? How do you parse the PDF document? Before submitting to Solr? With what tool? How do you do the query? Do you get the same results when doing the query from a browser, not SolrJ? -- Jan

Re: improving search response time

2010-08-18 Thread Jan Høydahl / Cominvent
Some questions: a) What operating system? b) What Java container (Tomcat/Jetty) c) What JAVA_OPTIONS? I.e. memory, garbage collection etc. d) Example queries? I.e. what features, how many facets, sort fields etc e) How do you load balance queries between the slaves? f) What is your search latency

Re: Solr's Index Live Updates

2010-08-18 Thread Jan Høydahl / Cominvent
Hi, I'm afraid you'll have to post the full document again, then do a commit. But it WILL be lightning fast, as it is only the updated document which is indexed, all the other existing documents will not be re-indexed. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com

Re: Solr data type for date faceting

2010-08-19 Thread Jan Høydahl / Cominvent
Yes, I forgot that strings support alphanumeric ranges. However, they will potentially be very memory intensive since you dont get the trie-optimization and since strings take up more space than ints. Only way is to try it out. -- Jan Høydahl, search solution architect Cominvent AS -

Re: Missing tokens

2010-08-19 Thread Jan Høydahl / Cominvent
Høydahl / Cominvent jan@cominvent.com To: solr-user@lucene.apache.org Date: 18/08/2010 23:16 Subject: Re: Missing tokens Cannot see anything obvious... Try http://localhost/solr/select?q=contents:OB10* http://localhost/solr/select?q=contents:OB 10 http://localhost/solr

Re: improving search response time

2010-08-19 Thread Jan Høydahl / Cominvent
It is crucial to MEASURE your system to confirm your bottleneck. I agree that you are very likely to be disk I/O bound with such little memory left for the OS, a large index and many terms in each query. Have your IT guys do some monitoring on your disks and log this while under load. Then you

Re: Basic conceptual questions about solr

2010-08-19 Thread Jan Høydahl / Cominvent
Hi, You can place Solr wherever you want, but if your data is veery large, you'd want dedicated box. Have a look at DIH (http://wiki.apache.org/solr/DataImportHandler). It can both crawl a file share periodically, indexing only files changed since a timestamp (can be e.g. NOW-1HOUR) and

Re: How to get most indexed keyword from SOLR

2010-08-20 Thread Jan Høydahl / Cominvent
Check out the luke request handler: http://localhost:8983/solr/admin/luke?fl=my_ad_fieldnumTerms=100 - you'll find topTerms for the fields specified -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 20. aug. 2010, at 11.39,

Re: Scoring of documents, boost partial and exact hits in one field

2010-08-22 Thread Jan Høydahl / Cominvent
Hi, Try a wildcard term with lower score: q=title:work AND title:work*debugQuery=true You will now see from the debug printout that you get an extra boost for workload. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 22.

Re: how to deal with virtual collection in solr?

2010-08-25 Thread Jan Høydahl / Cominvent
1. Currently we use Verity and have more than 20 collections, each collection has a index for public items and a index for private items. So there are virtual collections which point to each collection and a virtual collection which points to all. For example, we have AA and BB collections.

Re: how to deal with virtual collection in solr?

2010-08-27 Thread Jan Høydahl / Cominvent
(CompositeParser.java:119) ... 24 more /pre pRequestURI=/solr/lhcpdf/update/extract/ppismalla href= http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/ br/ *** -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com

Re: Creating new Solr cores using relative paths

2010-08-27 Thread Jan Høydahl / Cominvent
Yes, this is really a pain sometimes. I'd prefer a well defined base path, which could be assumed everywhere unless otherwise documented. SolrHome is one natural choice. For backward compat we could add a config in solr(config).xml to easily switch to old behaviour. Also, it makes sense to

Re: how to deal with virtual collection in solr?

2010-08-31 Thread Jan Høydahl / Cominvent
?literal.collection=aaprivateliteral.id=doc1commit=true; -F fi...@myfile.xml Thanks so much as always! Xiaohui -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Friday, August 27, 2010 7:42 AM To: solr-user@lucene.apache.org Subject: Re: how to deal

Re: how to deal with virtual collection in solr?

2010-09-03 Thread Jan Høydahl / Cominvent
) * -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Tuesday, August 31, 2010 2:15 PM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? Hi, If you have multiple cores defined in your solr.xml you need to issue

Re: In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-03 Thread Jan Høydahl / Cominvent
Hi, This smells like a job for Hadoop and perhaps Mahout, unless your use cases are totally ad-hoc research. After Nutch has fetched the sites, kick off some MapReduce jobs for each case you wish to study: 1. Extract phrases/contexts 2. For each context, perform detection and whitelisting 3. In

Re: In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-06 Thread Jan Høydahl / Cominvent
there's a lull. Thank you, - Scott On Fri, Sep 3, 2010 at 1:19 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Hi, This smells like a job for Hadoop and perhaps Mahout, unless your use cases are totally ad-hoc research. After Nutch has fetched the sites, kick off some MapReduce

Re: Date faceting +1MONTH problem

2010-09-10 Thread Jan Høydahl / Cominvent
Just attended a talk at JavaZone (www.javazone.no) by Stephen Colebourne about JSR-310 which will make these kind of operations easier in future JDK, and how Joda-Time goes a great way of enabling it today. I'm not saying it would fix your GAP issue, as it's all about what definition of month

Re: Sorting not working on a string field

2010-09-13 Thread Jan Høydahl / Cominvent
Hi, May you show us what result you actually get? Wouldn't it make more sense to choose a numeric fieldtype? To get proper sort order of numbers in a string field, all number need to be exactly same length since order will be lexiographical, i.e. 10 will come before 2, but after 02. -- Jan

Re: mm=0?

2010-09-13 Thread Jan Høydahl / Cominvent
As Erick points out, you don't want a random doc as response! What you're looking at is how to avoid the 0 hits problem. You could look into one of these: * Introduce autosuggest to avoid many 0-hits cases * Introduce spellchecking * Re-run the failed query with fuzzy turned on (e.g. alpha~) *

Re: Solr UIMA integration

2010-09-20 Thread Jan Høydahl / Cominvent
Hi Tommaso, Really cool what you've done. Looking forward to testing it, and I'm sure it's a welcome contribution to Solr. You can easily contribute your code by opening a JIRA issue and attaching a patch file. BTW Have you considered making the output field names configurable on a per

Re: Restrict possible results based on relational information

2010-09-20 Thread Jan Høydahl / Cominvent
Hi, You could simply create an autocomplete Solr Core with a simple schema consisting of id, from, to: Let the fieldType of from be String, and in the fieldType of to you can use StandardTokenizer, WordDelimiterFilter and EdgeNGramFilter. add doc field

Re: Calculating distances in Solr using longitude latitude

2010-09-22 Thread Jan Høydahl / Cominvent
:-) Also, that Wiki page clearly states in the very first line that it talks about uncommitted stuff Solr4.0. I think that is pretty clear. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 22. sep. 2010, at 03.31, Lance Norskog wrote: Developers, like marketers,

Re: Different analyzers for dfferent documents in different languages?

2010-09-22 Thread Jan Høydahl / Cominvent
See this thread: http://search-lucene.com/m/FgbDS1JL3J1 Basically, what we normally do is to rename the fields with a language suffix, so if you have language=en and text=A red fox, then you would index it as text_en=A red fox. You would either have to do this outside Solr or write an

Re: Is Solr right for our project?

2010-09-27 Thread Jan Høydahl / Cominvent
Solr will match this in version 3.1 which is the next major release. Read this page: http://wiki.apache.org/solr/SolrCloud for feature descriptions Coming to a trunk near you - see https://issues.apache.org/jira/browse/SOLR-1873 -- Jan Høydahl, search solution architect Cominvent AS -

Re: Is Solr right for our project?

2010-09-28 Thread Jan Høydahl / Cominvent
. 2010, at 10.44, Mike Thomsen wrote: Interesting. So what you are saying, though, is that at the moment it is NOT there? On Mon, Sep 27, 2010 at 9:06 PM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Solr will match this in version 3.1 which is the next major release. Read this page

Conditional Function Queries

2010-09-28 Thread Jan Høydahl / Cominvent
Hi, Have anyone written any conditional functions yet for use in Function Queries? I see the use for a function which can run different sub functions depending on the value of a field. Say you have three documents: A: title=Sports car, color=red B: title=Boring car, color=green B: title=Big

Re: Conditional Function Queries

2010-09-28 Thread Jan Høydahl / Cominvent
Ok, I created the issues: IF function: SOLR-2136 AND, OR, NOT: SOLR-2137 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 28. sep. 2010, at 19.36, Yonik Seeley wrote: On Tue, Sep 28, 2010 at 11:33 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Have

Re: Solr usage with Auctions/Classifieds?

2010-01-30 Thread Jan Høydahl / Cominvent
A follow-up on the auction use case. How do you handle the need for frequent updates of only one field, such as the last bid field (needed for sort on price, facets or range)? For high traffic sites, the document update rate becomes very high if you re-send the whole document every time the bid

Re: Field highlighting

2010-01-31 Thread Jan Høydahl / Cominvent
Did you solve this? If yes, what was wrong? If no, can you specify one concrete example document and a matching query which fails to highlight? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 7. jan. 2010, at 15.23, Xavier Schepler wrote: Erick Erickson a écrit : It's

Re: Collating results from multiple indexes

2010-02-08 Thread Jan Høydahl / Cominvent
Hi, There is no JOIN functionality in Solr. The common solution is either to accept the high volume update churn, or to add client side code to build a join layer on top of the two indices. I know that Attivio (www.attivio.com) have built some kind of JOIN functionality on top of Solr in their

Re: Solr usage with Auctions/Classifieds?

2010-02-09 Thread Jan Høydahl / Cominvent
value for a field. You can only use functions on it. On Sat, Jan 30, 2010 at 7:05 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: A follow-up on the auction use case. How do you handle the need for frequent updates of only one field, such as the last bid field (needed for sort

Re: Faceting

2010-02-09 Thread Jan Høydahl / Cominvent
NOTE: Please start a new email thread for a new topic (See http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking) Your strategy could work. You might want to look into dedicated entity extraction frameworks like http://opennlp.sourceforge.net/

Re: Is it posible to exclude results from other languages?

2010-02-09 Thread Jan Høydahl / Cominvent
Much more efficient to tag documents with language at index time. Look for language identification tools such as http://www.sematext.com/products/language-identifier/index.html or http://ngramj.sourceforge.net/ or

Re: joining two field for query

2010-02-09 Thread Jan Høydahl / Cominvent
You may also want to play with other highlighting parameters to select how much text to do highlighting on, how many fragments etc. See http://wiki.apache.org/solr/HighlightingParameters -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 9. feb. 2010, at 13.08, Ahmet Arslan

Re: Replication and querying

2010-02-09 Thread Jan Høydahl / Cominvent
Hi, Index replication in Solr makes an exact copy of the original index. Is it not possible to add the 6 extra fields to both instances? An alternative to replication is to feed two independent Solr instances - full control :) Please elaborate on your specific use case if this is not useful

Re: Question on Tokenizing email address

2010-02-09 Thread Jan Høydahl / Cominvent
Hi, To match 1, 2, 3, 4 below you could use a fieldtype based on TextField, with just a simple WordDelimiterFactory. However, this would also match abc-def, def.alpha, xyz-com and a...@def, because all punctuation is treated the same. To avoid this, you could do some custom handling of -, .

How to add SpellCheckResponse to Solritas?

2010-02-09 Thread Jan Høydahl / Cominvent
Hi, I'm using the /itas requestHandler, and would like to add spell-check suggestions to the output. I'm having spell-check configured and working in the XML response writer, but nothing is output in Velocity. Debugging the JSON $response object, I cannot find any representation of spellcheck

Re: Copying dynamic fields into default text field messing up fieldNorm?

2010-02-11 Thread Jan Høydahl / Cominvent
This sounds like an ideal use case for payloads. You could attach a boost value to each term in your keywords field. See http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ Another common workaround is to create, say, 8 multi-valued fields with boosts 0.5, 1.0, 1.5,

Re: Question on Tokenizing email address

2010-02-11 Thread Jan Høydahl / Cominvent
My point is that I WANT the AT, DOT to be indexed, to avoid these being treated the same: foo-...@brown.fox and foo-bar.brown.fox By using the LowerCaseFilterFactory before the replacements, you actually ensure that a search for email:at will not give a match because the query will be

Re: Replication and querying

2010-02-11 Thread Jan Høydahl / Cominvent
09.02.2010 um 16:53 schrieb Jan Høydahl / Cominvent: Hi, Index replication in Solr makes an exact copy of the original index. Is it not possible to add the 6 extra fields to both instances? An alternative to replication is to feed two independent Solr instances - full control :) Please elaborate

Re: spellcheck

2010-02-11 Thread Jan Høydahl / Cominvent
Can you show us how you configured spell check? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 10. feb. 2010, at 11.48, michaelnazaruk wrote: Hello,all! I have some problem with spellcheck! I download,build and connect dictionary(~500 000 words)!It work fine! But i

Re: Getting max/min dates from solr index

2010-02-11 Thread Jan Høydahl / Cominvent
How about a field indextime_dt filled with NOW. Then do a facet query to get the montly stats last 12 months: http://localhost:8983/solr/select/?q=*:*rows=0facet=truefacet.date=indextime_dtfacet.date.start=NOW/MONTH-12MONTHSfacet.date.end=NOW/MONTH%2B1MONTHfacet.date.gap=%2B1MONTH To get min

Re: Faceting

2010-02-11 Thread Jan Høydahl / Cominvent
Regarding hi-jacking, that was a false alarm. Apple Mail fooled me to believe it was part of another thread. Sorry Jose. I think the properties field approach is clean. It relies on index-time classification which is where such heavy-lifting should preferrably be done. Faceting on a

Re: Posting Concurrently to Solr

2010-02-11 Thread Jan Høydahl / Cominvent
You did not say how frequent you need to update the index, if this is batch type of operation or if you also have some real-time requirements after the initial load. Your ETL could use SolrJ and the StreamingUpdateSolrServer for high throughput. You could try multiple threads pushing in

Re: How to add SpellCheckResponse to Solritas?

2010-02-11 Thread Jan Høydahl / Cominvent
, just to be clear, isn't really JSON, it's a toString() that looks similar though. Or did you convert it to JSON in some other fashion? /itas?q=mispeledwt=json should also show the spelling suggestions. Erik On Feb 9, 2010, at 7:30 PM, Jan Høydahl / Cominvent wrote: Hi, I'm

Re: help with facets and searchable fields

2010-02-11 Thread Jan Høydahl / Cominvent
Can you show us your field definitions and the exact query string you are using, and what you expect to see? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 11. feb. 2010, at 15.31, adeelmahmood wrote: hi there i am trying to get familiar with solr while setting it

Re: Collating results from multiple indexes

2010-02-12 Thread Jan Høydahl / Cominvent
- Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Jan Høydahl / Cominvent jan@cominvent.com To: solr-user@lucene.apache.org Sent: Mon, February 8, 2010 3:33:41 PM Subject: Re: Collating results from multiple indexes Hi

Re: Removing single-term results / reordering

2010-02-13 Thread Jan Høydahl / Cominvent
Hi, This is probably due to length normalization. Normally this is wanted, as you want to penalize partial match vs a more exact match. Try specifying omitNorms=true on your field. You should ask yourself what kind of relevancy or sorting you really need in your project. If you search short

Re: Realtime search and facets with very frequent commits

2010-02-17 Thread Jan Høydahl / Cominvent
Hi, Have you tried playing with mergeFactor or even mergePolicy? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 16. feb. 2010, at 08.26, Janne Majaranta wrote: Hey Dipti, Basically query optimizations + setting cache sizes to a very high level. Other than that, the

Re: Discovering Slaves

2010-02-17 Thread Jan Høydahl / Cominvent
After ZooKeeper is integrated (1.5?) there will be a way to get info about all nodes in your cluster including their roles, status etc. Perhaps you want to coordinate your dashboard effort with this version, although still very early in development? See http://wiki.apache.org/solr/SolrCloud --

Re: Collating results from multiple indexes

2010-02-17 Thread Jan Høydahl / Cominvent
have much to do with Lucene/SOLR except where they integrate with the query execution. If you want to learn more feel free to check out www.attivio.com. - w...@attivio.com On Fri, Feb 12, 2010 at 10:35 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Really

Re: Need feedback on solr security

2010-02-22 Thread Jan Høydahl / Cominvent
Hi, Does open for public mean end users through browser or web sites through API? In either case you should have a front end proxying the traffic through to Solr, which explicitly allows only parameters that you allow. -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 17.

Re: If you could have one feature in Solr...

2010-02-24 Thread Jan Høydahl / Cominvent
A mature document processing pipeline, perhaps integration of www.openpipeline.org which is Apache2.0 licensed

Re: Question on Facets and Multiple values (confusion from the Wiki)

2010-02-26 Thread Jan Høydahl / Cominvent
Hi Mark, If (a) is wanted behaviour, i.e. have a business show up in facets for all ZIPs, you should define a multi-valued ZIP field. Since a ZIP is a number, I don't see any reason for any analysis on it, a String or a lightly normalized field type would do the job both for search and facets.

Re: Dynamic Solr indexing

2010-03-01 Thread Jan Høydahl / Cominvent
Hi, In current version you need to handle the cluster layout yourself, both on indexing and search side, i.e. route documents to shards as you please, and know what shards to search. We try to address how to make this easier in http://wiki.apache.org/solr/SolrCloud - have a look at it. The

Re: Dynamic Solr indexing

2010-03-10 Thread Jan Høydahl / Cominvent
Hi, Yes, it will be a really nice package. I think the aim is to keep the ZK stuff optional, which can be nice for small installs or upgrading without embracing the ZK parts. All of this is still in the beginning of development. Much of the cloud stuff is aimed at 1.5 but there are as usual no

Re: Issue on stopword list

2010-03-10 Thread Jan Høydahl / Cominvent
Also, eDisMax query parser will be a welcome tool for these kinds of requirements: https://issues.apache.org/jira/browse/SOLR-1553 From the feature list: advanced stopword handling... stopwords are not required in the mandatory part of the query but are still used (if indexed) in the proximity

Re: get english spell dictionary

2010-03-10 Thread Jan Høydahl / Cominvent
You probably don't want to include words in your dictionary which are not in your index. Have you tried Solr's feature to generate spellcheck dictionary from one or more of your index fields? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe -

Boundary match as part of query language?

2010-03-10 Thread Jan Høydahl / Cominvent
Hi, Sometimes you need to anchor your search to start/end of field. Example: 1. title=New York Yankees 2. title=New York 3. title=York If I search title:New York, or title:York I would get a match, but I'd like to anchor my search to beginning and/or end of the field, e.g. with regex syntax,

  1   2   >