Re: Transparent redundancy in Solr

2010-12-17 Thread Jan Høydahl / Cominvent
Hi, I believe the way to go is through ZooKeeper[1], not property files or local hacks. We've already started on this route and it makes sense to let ZK do what it is designed for, such as leader election. When a node starts up, it asks ZK what role it should have and fetches corresponding

Re: Changing the default Fuzzy minSimilarity?

2010-12-15 Thread Jan Høydahl / Cominvent
A fuzzy query foo~ defaults to a similarity of 0.5, i.e. equal to foo~0.5 just as an FYI, this isn't true in trunk (4.0) any more. the defaults are changed so that it never enumerates the entire dictionary (slow) like before, see: https://issues.apache.org/jira/browse/LUCENE-2667 so,

Omitting tf but not positions

2010-12-15 Thread Jan Høydahl / Cominvent
Hi, I have a case where I use DisMax pf to boost on phrase match in a field. I use omitNorms=true to avoid length normalization to mess with my scores. However, for some documents, the phrase foo bar occur more than one time in the same field, and I get an unintended TF boost for one of them

Changing the default Fuzzy minSimilarity?

2010-12-14 Thread Jan Høydahl / Cominvent
Hi, A fuzzy query foo~ defaults to a similarity of 0.5, i.e. equal to foo~0.5 I want to set the default to 0.8 so that if a user enters the query foo~ it euqals to foo~0.8 Have not seen a way to do this in Solr. A param fuzzy.minSim=0.8 would do the trick. Anything like this, or shall I open

Re: Joining Fields in and Index

2010-12-03 Thread Jan Høydahl / Cominvent
Hi, I made a MappingUpdateRequestHandler which lets you map country codes to full country names with a config file. See https://issues.apache.org/jira/browse/SOLR-2151 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 3. des. 2010, at 00.03, Adam Estrada wrote:

Re: Is this sort order possible in a single query?

2010-11-28 Thread Jan Høydahl / Cominvent
I can't see a way to do it without functionqueries at the moment, which doesn't mean there isn't any. If you want to use the suggested sort method, you could probably sort first by score: sort=score desc, num_copies desc, num_comments desc To let the score be influenced by exact author match

Re: Logging queries and hit count

2010-11-28 Thread Jan Høydahl / Cominvent
You can also configure your logging framework to output the relevant logs to a separate file: log4j.logger.org.apache.solr.core.SolrCore=INFO, A1 This way you'll avoid too much noise from other componets, but you'll get all update and admin requests as well, so you'll have to filter on core

Re: Need Middleware between search client and solr?

2010-11-23 Thread Jan Høydahl / Cominvent
Check out for instance www.twigkit.com which is a light-weight middleware (as well as GUI framework) for Solr. It could speed up development time considerably for your project. It has hooks to transform queries before they are sent to Solr and process responses before displaying, if needed. --

Re: Need Middleware between search client and solr?

2010-11-23 Thread Jan Høydahl / Cominvent
interesting. Regards, Lukas On Tue, Nov 23, 2010 at 2:13 PM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Check out for instance www.twigkit.com which is a light-weight middleware (as well as GUI framework) for Solr. It could speed up development time considerably for your project

Re: Boosting on a document value

2010-11-16 Thread Jan Høydahl / Cominvent
Also this http://search-lucene.com/m/hBnHH1Q4NVb2 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 15. nov. 2010, at 23.21, Ahmet Arslan wrote: I've got a document with a type field. If the type is 1, I want to boost the document's relevancy, but type=1 is not a

Re: Link to download solr4.0 is not working?

2010-11-15 Thread Jan Høydahl / Cominvent
Fixed the Wiki. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 12. nov. 2010, at 03.44, Deche Pangestu wrote: Hello, Does anyone know where to download solr4.0 source? I tried downloading from this page: http://wiki.apache.org/solr/FrontPage#solr_development

Re: Link to download solr4.0 is not working?

2010-11-15 Thread Jan Høydahl / Cominvent
Hi, Added a link to the wiki to the latest stable 1.4 branch that will become 1.4.2. You should checkout and build this branch if you have a requirement to use only a released version. 1.4.2 only contains critical bug fixes over 1.4.1 and is considered stable. See here for a clarification of

Re: Link to download solr4.0 is not working?

2010-11-15 Thread Jan Høydahl / Cominvent
Yes, the project is not good enough at communicating the roadmap clearly. We often hide behind the fact that nobody knows since it's open source, but I think the PMC would benefit from trying to maintain some sort of no-guarantee roadmap clarifying to all what most people think will happen

Re: Using Multiple Cores for Multiple Users

2010-11-10 Thread Jan Høydahl / Cominvent
Hi, If your index is supposed to handle only public information, i.e. public RSS feeds, then I don't see a need for multiple cores. I would probably try to handle this on the query side only. Imagine this scenario: User A registers RSS-X and RSS-Y (the application starts pulling and indexing

Re: Replication and ignored fields

2010-11-09 Thread Jan Høydahl / Cominvent
, and you also get the benefit of transferring a smaller index across the network. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 8. nov. 2010, at 23.50, Shalin Shekhar Mangar wrote: On Fri, Nov 5, 2010 at 2:30 PM, Jan Høydahl / Cominvent jan@cominvent.com

Re: Replication and ignored fields

2010-11-09 Thread Jan Høydahl / Cominvent
Cool, thanks for the clarification, Shalin. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 9. nov. 2010, at 15.12, Shalin Shekhar Mangar wrote: On Tue, Nov 9, 2010 at 12:33 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Not sure about that. I have

Re: How to Facet on a price range

2010-11-05 Thread Jan Høydahl / Cominvent
Note that using many facet.query= parameters may be expensive. Another way to solve this is to pre-compute the ranges as plain strings in another field during indexing. This can be done in your app prior to indexing or by creating a new FieldType for your range. Here's a field type that computes

Re: Replication and ignored fields

2010-11-05 Thread Jan Høydahl / Cominvent
How about hooking in Andrzej's pruning tool at the postCommit event, literally removing unused fields. I believe a commit is fired on the slave by itself after every successful replication, to put the index live. You could execute a script which prunes away the dead meat and then call a new

Re: blacklist docs by uniqueKey

2010-11-03 Thread Jan Høydahl / Cominvent
How does the exclude=true option in elevate.xml perform with large number of excludes? Then you could have a separate elevate config for that client. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 3. nov. 2010, at 20.11, Yonik Seeley wrote: On Wed, Nov 3, 2010

Re: Reverse range search

2010-11-01 Thread Jan Høydahl / Cominvent
Hi, I think I have seen a comment on the list from someone with the same need a few months ago. He planned to make a new fieldType to support this, e.g. MinMaxRangeFieldType which would be a polyField type holding both a min and max value, and then you could query it q=myminmaxfield:123 I did

Re: Use SolrCloud (SOLR-1873) on trunk, or with 1.4.1?

2010-10-28 Thread Jan Høydahl / Cominvent
Hi, I would aim for reindexing on branch3_x, which will be the 3.1 release soon. I don't know if SOLR-1873 applies cleanly to 3_x now, but it would surely be less effort to have it apply to 3_x than to 1.4. Perhaps you can help backport the patch to 3_x? -- Jan Høydahl, search solution

Re: Need help for solr searching case insensative item

2010-10-26 Thread Jan Høydahl / Cominvent
Hi, You need to share relevant parts of your schema for us to be able to see what's going on. Try using fieldType=text. Basically, you need a fieldType which has the lowercaseFilter included. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 25. okt. 2010, at

Re: How to index on basis of a condition?

2010-10-25 Thread Jan Høydahl / Cominvent
Do you want to use a field's content do decide whether the document should be indexed or not? You could write an UpdateProcessor for that, simply aborting the chain for the docs that don't pass your test. @Override public void processAdd(AddUpdateCommand cmd) throws IOException {

Re: pf parameter in edismax (SOLR-1553)

2010-10-23 Thread Jan Høydahl / Cominvent
Answering my own question: The pf feature only kicks in with multi term q param. In my case I used a field tokenized by KeywordTokenizer, hence pf never kicked in. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 14. okt. 2010, at 13.29, Jan Høydahl / Cominvent

Re: Commits on service after shutdown

2010-10-19 Thread Jan Høydahl / Cominvent
You never get full control of commits, as Solr will auto-commit anyway whenever the (configurable) input buffer is full. With the current architecture you cannot really trust adds or commits to 100% certainly be successful, because the server may have been restarted between an add and commit()

Re: Spanning an index across multiple volumes

2010-10-17 Thread Jan Høydahl / Cominvent
Juggling disk volumes does not sound like a logical responsibility for Solr to me. Solr/Lucene expects to have enough room to live in. Better to push this down to the OS level. There are all kinds of logical volume managers around which lets you add new disks to the same logical volume,

Re: Quick question on indexing an existing index

2010-10-15 Thread Jan Høydahl / Cominvent
Why don't you simply index the source content which you used to build index2 into index1, i.e. have your tool index to both? You won't save anything on trying to extract that content from an existing index. But of course, you COULD write yourself a tool which extracts all stored fields for all

Possible to sort by explicit docid order?

2010-10-15 Thread Jan Høydahl / Cominvent
Hi, In an online bookstore project I'm working on, most frontend widgets are search driven. Most often they query with some filters and a sort order, such as availabledate desc or simply by score. However, to allow editorial control, some widgets will display a fixed list of books, defined as

pf parameter in edismax (SOLR-1553)

2010-10-14 Thread Jan Høydahl / Cominvent
Hi, Have applied SOLR-1553 to 1.4.2 and it works great. However, I can't get the pf param to work. Example: q=foo barqf=title^2.0 body^0.5pf=title^50.0 Shouldn't I see the phrase query boost in debugQuery? Currently I see no trace of pf being used. -- Jan Høydahl, search solution architect

Re: LuceneRevolution - NoSQL: A comparison

2010-10-13 Thread Jan Høydahl / Cominvent
On Tue, Oct 12, 2010 at 12:11 PM, Jan Høydahl / Cominvent jan@cominvent.com wrote: I'm pretty sure the 2nd phase to fetch doc-summaries goes directly to same server as first phase. But what if you stick a LB in between? A related point - the load balancing implementation that's part

Re: LuceneRevolution - NoSQL: A comparison

2010-10-12 Thread Jan Høydahl / Cominvent
This is what FAST does in ESP. When a new version of a partition is built, it is staged in its own process and co-exists alongside the old one. The query-dispatcher sees both and routes traffic based on requested generation id. Should probably not invest in such a feature until there's a clear

Re: LuceneRevolution - NoSQL: A comparison

2010-10-12 Thread Jan Høydahl / Cominvent
This is a different issue. You are seeing the latency between master index update and replication to slave(s). Solve this by pointing your monitoring script directly to slave instead of master. What this thread is about is a potential difference in state during the execution of a single

Re: Webservice for push indexing

2010-10-12 Thread Jan Høydahl / Cominvent
Hi, I would advise you to get involved in the SolrCloud initiative (see http://wiki.apache.org/solr/SolrCloud) and start designing a native indexing distributor component. I envision something like an integration in UpdateHandler which knows about all collections and shards from ZK config,

Re: having problem about Solr Date Field.

2010-10-08 Thread Jan Høydahl / Cominvent
Correct. You get back what you push in. Of course if your index is for users in one time zone only, you may insert the local time to Solr, and everything will work well. However, if you operate an index with international users, you'd want to make sure you convert to/from UTC in your

Re: Getting an ngram fieldtype to work

2010-10-08 Thread Jan Høydahl / Cominvent
Hi, The first thing I would try is to go to the analysis page, enter your test data, and report back what each analysis stage prints out: http://localhost:8983/solr/admin/analysis.jsp -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 8. okt. 2010, at 14.19,

Re: Begins with and ends with word

2010-10-05 Thread Jan Høydahl / Cominvent
There is a ticket for this request: https://issues.apache.org/jira/browse/SOLR-1980 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 5. okt. 2010, at 08.39, Maddy.Jsh wrote: Hi, I have 2 documents with following values. Doc1 Subject: Weekly transport

Re: can i have more update processors with solr

2010-10-01 Thread Jan Høydahl / Cominvent
I think the parameter name is confusing. I have proposed renaming it to processor.chain: https://issues.apache.org/jira/browse/SOLR-2105 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 30. sep. 2010, at 22.25, Markus Jelsma wrote: Almost, you can define a

Re: Is Solr right for our project?

2010-09-28 Thread Jan Høydahl / Cominvent
. 2010, at 10.44, Mike Thomsen wrote: Interesting. So what you are saying, though, is that at the moment it is NOT there? On Mon, Sep 27, 2010 at 9:06 PM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Solr will match this in version 3.1 which is the next major release. Read this page

Conditional Function Queries

2010-09-28 Thread Jan Høydahl / Cominvent
Hi, Have anyone written any conditional functions yet for use in Function Queries? I see the use for a function which can run different sub functions depending on the value of a field. Say you have three documents: A: title=Sports car, color=red B: title=Boring car, color=green B: title=Big

Re: Conditional Function Queries

2010-09-28 Thread Jan Høydahl / Cominvent
Ok, I created the issues: IF function: SOLR-2136 AND, OR, NOT: SOLR-2137 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 28. sep. 2010, at 19.36, Yonik Seeley wrote: On Tue, Sep 28, 2010 at 11:33 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Have

Re: Is Solr right for our project?

2010-09-27 Thread Jan Høydahl / Cominvent
Solr will match this in version 3.1 which is the next major release. Read this page: http://wiki.apache.org/solr/SolrCloud for feature descriptions Coming to a trunk near you - see https://issues.apache.org/jira/browse/SOLR-1873 -- Jan Høydahl, search solution architect Cominvent AS -

Re: Calculating distances in Solr using longitude latitude

2010-09-22 Thread Jan Høydahl / Cominvent
:-) Also, that Wiki page clearly states in the very first line that it talks about uncommitted stuff Solr4.0. I think that is pretty clear. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 22. sep. 2010, at 03.31, Lance Norskog wrote: Developers, like marketers,

Re: Different analyzers for dfferent documents in different languages?

2010-09-22 Thread Jan Høydahl / Cominvent
See this thread: http://search-lucene.com/m/FgbDS1JL3J1 Basically, what we normally do is to rename the fields with a language suffix, so if you have language=en and text=A red fox, then you would index it as text_en=A red fox. You would either have to do this outside Solr or write an

Re: Solr UIMA integration

2010-09-20 Thread Jan Høydahl / Cominvent
Hi Tommaso, Really cool what you've done. Looking forward to testing it, and I'm sure it's a welcome contribution to Solr. You can easily contribute your code by opening a JIRA issue and attaching a patch file. BTW Have you considered making the output field names configurable on a per

Re: Restrict possible results based on relational information

2010-09-20 Thread Jan Høydahl / Cominvent
Hi, You could simply create an autocomplete Solr Core with a simple schema consisting of id, from, to: Let the fieldType of from be String, and in the fieldType of to you can use StandardTokenizer, WordDelimiterFilter and EdgeNGramFilter. add doc field

Re: Sorting not working on a string field

2010-09-13 Thread Jan Høydahl / Cominvent
Hi, May you show us what result you actually get? Wouldn't it make more sense to choose a numeric fieldtype? To get proper sort order of numbers in a string field, all number need to be exactly same length since order will be lexiographical, i.e. 10 will come before 2, but after 02. -- Jan

Re: mm=0?

2010-09-13 Thread Jan Høydahl / Cominvent
As Erick points out, you don't want a random doc as response! What you're looking at is how to avoid the 0 hits problem. You could look into one of these: * Introduce autosuggest to avoid many 0-hits cases * Introduce spellchecking * Re-run the failed query with fuzzy turned on (e.g. alpha~) *

Re: Date faceting +1MONTH problem

2010-09-10 Thread Jan Høydahl / Cominvent
Just attended a talk at JavaZone (www.javazone.no) by Stephen Colebourne about JSR-310 which will make these kind of operations easier in future JDK, and how Joda-Time goes a great way of enabling it today. I'm not saying it would fix your GAP issue, as it's all about what definition of month

Re: In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-06 Thread Jan Høydahl / Cominvent
there's a lull. Thank you, - Scott On Fri, Sep 3, 2010 at 1:19 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Hi, This smells like a job for Hadoop and perhaps Mahout, unless your use cases are totally ad-hoc research. After Nutch has fetched the sites, kick off some MapReduce

Re: how to deal with virtual collection in solr?

2010-09-03 Thread Jan Høydahl / Cominvent
) * -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Tuesday, August 31, 2010 2:15 PM To: solr-user@lucene.apache.org Subject: Re: how to deal with virtual collection in solr? Hi, If you have multiple cores defined in your solr.xml you need to issue

Re: In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-03 Thread Jan Høydahl / Cominvent
Hi, This smells like a job for Hadoop and perhaps Mahout, unless your use cases are totally ad-hoc research. After Nutch has fetched the sites, kick off some MapReduce jobs for each case you wish to study: 1. Extract phrases/contexts 2. For each context, perform detection and whitelisting 3. In

Re: how to deal with virtual collection in solr?

2010-08-31 Thread Jan Høydahl / Cominvent
?literal.collection=aaprivateliteral.id=doc1commit=true; -F fi...@myfile.xml Thanks so much as always! Xiaohui -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] Sent: Friday, August 27, 2010 7:42 AM To: solr-user@lucene.apache.org Subject: Re: how to deal

Re: how to deal with virtual collection in solr?

2010-08-27 Thread Jan Høydahl / Cominvent
(CompositeParser.java:119) ... 24 more /pre pRequestURI=/solr/lhcpdf/update/extract/ppismalla href= http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/ br/ *** -Original Message- From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com

Re: Creating new Solr cores using relative paths

2010-08-27 Thread Jan Høydahl / Cominvent
Yes, this is really a pain sometimes. I'd prefer a well defined base path, which could be assumed everywhere unless otherwise documented. SolrHome is one natural choice. For backward compat we could add a config in solr(config).xml to easily switch to old behaviour. Also, it makes sense to

Re: how to deal with virtual collection in solr?

2010-08-25 Thread Jan Høydahl / Cominvent
1. Currently we use Verity and have more than 20 collections, each collection has a index for public items and a index for private items. So there are virtual collections which point to each collection and a virtual collection which points to all. For example, we have AA and BB collections.

Re: Scoring of documents, boost partial and exact hits in one field

2010-08-22 Thread Jan Høydahl / Cominvent
Hi, Try a wildcard term with lower score: q=title:work AND title:work*debugQuery=true You will now see from the debug printout that you get an extra boost for workload. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 22.

Re: How to get most indexed keyword from SOLR

2010-08-20 Thread Jan Høydahl / Cominvent
Check out the luke request handler: http://localhost:8983/solr/admin/luke?fl=my_ad_fieldnumTerms=100 - you'll find topTerms for the fields specified -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 20. aug. 2010, at 11.39,

Re: Solr data type for date faceting

2010-08-19 Thread Jan Høydahl / Cominvent
Yes, I forgot that strings support alphanumeric ranges. However, they will potentially be very memory intensive since you dont get the trie-optimization and since strings take up more space than ints. Only way is to try it out. -- Jan Høydahl, search solution architect Cominvent AS -

Re: Missing tokens

2010-08-19 Thread Jan Høydahl / Cominvent
Høydahl / Cominvent jan@cominvent.com To: solr-user@lucene.apache.org Date: 18/08/2010 23:16 Subject: Re: Missing tokens Cannot see anything obvious... Try http://localhost/solr/select?q=contents:OB10* http://localhost/solr/select?q=contents:OB 10 http://localhost/solr

Re: improving search response time

2010-08-19 Thread Jan Høydahl / Cominvent
It is crucial to MEASURE your system to confirm your bottleneck. I agree that you are very likely to be disk I/O bound with such little memory left for the OS, a large index and many terms in each query. Have your IT guys do some monitoring on your disks and log this while under load. Then you

Re: Basic conceptual questions about solr

2010-08-19 Thread Jan Høydahl / Cominvent
Hi, You can place Solr wherever you want, but if your data is veery large, you'd want dedicated box. Have a look at DIH (http://wiki.apache.org/solr/DataImportHandler). It can both crawl a file share periodically, indexing only files changed since a timestamp (can be e.g. NOW-1HOUR) and

Re: Function query to boost scores by a constant if all terms are present

2010-08-18 Thread Jan Høydahl / Cominvent
You can use the map() function for this, see http://wiki.apache.org/solr/FunctionQuery#map q=a foxdefType=dismaxqf=allfieldsbf=map(query($qq),0,0,0,100.0)qq=allfields:(quick AND brown AND fence) This adds a constant boost of 100.0 if the $qq field returns a non-zero score, which it does

Re: Solr data type for date faceting

2010-08-18 Thread Jan Høydahl / Cominvent
If you want to change the schema on the live index, make sure you do a compatible change, as Solr does not do any type checking or schema change validation. I would ADD a field with another name for the tint field. Unfortunately you have to re-index to have an index built on this field. May I

Re: Missing tokens

2010-08-18 Thread Jan Høydahl / Cominvent
Hi, Can you share with us how your schema looks for this field? What FieldType? What tokenizer and analyser? How do you parse the PDF document? Before submitting to Solr? With what tool? How do you do the query? Do you get the same results when doing the query from a browser, not SolrJ? -- Jan

Re: improving search response time

2010-08-18 Thread Jan Høydahl / Cominvent
Some questions: a) What operating system? b) What Java container (Tomcat/Jetty) c) What JAVA_OPTIONS? I.e. memory, garbage collection etc. d) Example queries? I.e. what features, how many facets, sort fields etc e) How do you load balance queries between the slaves? f) What is your search latency

Re: Solr's Index Live Updates

2010-08-18 Thread Jan Høydahl / Cominvent
Hi, I'm afraid you'll have to post the full document again, then do a commit. But it WILL be lightning fast, as it is only the updated document which is indexed, all the other existing documents will not be re-indexed. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com

Re: Wiki documentation Packaged as single HTML or PDF

2010-08-16 Thread Jan Høydahl / Cominvent
Use a tool to download a site to local disk, and ship the resulting HTML as a folder or ZIP. If that is not good enough, consider shipping the Reference Guide by LucidImagination. It is one PDF and contains most of what you need. The customer may be confused by LucidWorks specific chapters but

Re: solr query result not read the latest xml file

2010-08-11 Thread Jan Høydahl / Cominvent
Hi, Yes, this is normal behavior. This is because Solr is *document* based, it does not know about *files*. What happens here is that your source database (or whatever) has had deletinons within this category in addition to updates, and you need to relay those to Solr. The best way to

Re: timestamp field

2010-08-11 Thread Jan Høydahl / Cominvent
Hi, Which time zone are you located in? Do you have DST? Solr uses UTC internally for dates, which means that NOW will be the time in London right now :) Does that appear to be right 4 u? Also see this thread: http://search-lucene.com/m/hqBed2jhu2e2/ -- Jan Høydahl, search solution architect

Re: Delta-import with solrj client

2010-08-11 Thread Jan Høydahl / Cominvent
Hi, Make sure you use a proper ID field, which does *not* change even if the content in the database changes. In this way, when your delta-import fetches changed rows to index, they will update the existing rows in your index. -- Jan Høydahl, search solution architect Cominvent AS -

Re: how to support implicit trailing wildcards

2010-08-11 Thread Jan Høydahl / Cominvent
=mount OR mount* have different sorting order with q=mount for those documents including mount. Change to q=mount^100 OR (mount?* -mount)^1.0, and test well. Thanks very much! 2010/8/10 Jan Høydahl / Cominvent jan@cominvent.com Hi, You don't need to duplicate the content into two

Re: Analysing SOLR logfiles

2010-08-11 Thread Jan Høydahl / Cominvent
Have a look at www.splunk.com -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 11. aug. 2010, at 19.34, Jay Flattery wrote: Hi there, Just wondering what tools people use to analyse SOLR log files. We're looking to

Re: bug or feature???

2010-08-11 Thread Jan Høydahl / Cominvent
Your syntax looks a bit funny. Which version of Solr are you using? Pure negative queries are not supported, try q=(*:* -title:janitor) instead. Also, for debugging what's going on, please add debugQuery=true and share the parsed query for both cases with us. -- Jan Høydahl, search solution

Re: Indexing and ExtractingRequestHandler

2010-08-11 Thread Jan Høydahl / Cominvent
Hi, You can try Tika command line to parse your Excel file, then you will se the exact textual output from it, which will be indexed into Solr, and thus inspect whether something is missing. Are you sure you use a version of Luke which supports your version of Lucene? -- Jan Høydahl, search

Re: Indexing fieldvalues with dashes and spaces

2010-08-10 Thread Jan Høydahl / Cominvent
Hi, Try solr.KeywordTokenizerFactory. However, in your case it looks as if you have certain requirements for searching that requires tokenization. So you should leave the WhitespaceTokenizer as is and create a separate field specially for the faceting, with indexed=true, stored=false and

Re: how to support implicit trailing wildcards

2010-08-10 Thread Jan Høydahl / Cominvent
Hi, You don't need to duplicate the content into two fields to achieve this. Try this: q=mount OR mount* The exact match will always get higher score than the wildcard match because wildcard matches uses constant score. Making this work for multi term queries is a bit trickier, but something

Re: solr query result not read the latest xml file

2010-08-10 Thread Jan Høydahl / Cominvent
Hi, Beware that post.jar is just an example tool to play with the default example index located at /solr/ namespace. It is very limited and you shold look elsewhere for a more production ready and robust tool. However, it has the ability to specify custom url. Please try: java -jar post.jar

Re: delete Problem..

2010-08-10 Thread Jan Høydahl / Cominvent
Hi, Since EMAIL_HEADER_FROM is a String type, you need to specify the whole field every time. Wildcards could also work, but you'll get a problem with leading wildcards. The solution would be to change the fieldType into a text type using e.g. StandardTokenizerFactory - if this does not break

Re: stemming the index

2010-08-06 Thread Jan Høydahl / Cominvent
Check out slides 36-38 in this presentation for some hint on a possible solution: http://www.slideshare.net/janhoy/migrating-fast-to-solr-jan-hydahl-cominvent-as-euro-con -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 7.

Re: Deleting old index data from solr. But HDD spaces doesn`t free.

2010-08-06 Thread Jan Høydahl / Cominvent
What you are missing is a final server.optimize(); Deleting a document will only mark it as deleted in the index until an optimize. If disk space is a real problem in your case because you e.g. update all docs in the index frequently, you can trigger an optimize(), say nightly. -- Jan Høydahl,

Re: how to create a custom type in Solr

2010-08-06 Thread Jan Høydahl / Cominvent
Your use case can be solved by splitting the range into two int's: Document: {title: My document, from: 8000, to: 9000} Query: q=title:My AND (from:[* TO 8500] AND to:[8500 TO *]) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com

Re: SOLR QUERY

2010-08-06 Thread Jan Høydahl / Cominvent
Another way is to use DisMax parser, and give it a qf=field1 field2 field3... parameter, and it will automatically search in all fields specified. It is more powerful than having one default field, and saves that disk space. Buy you sacrifice some extra resources during querying. -- Jan

Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory

2010-07-06 Thread Jan Høydahl / Cominvent
The Char-filters MUST come before the Tokenizer, due to their nature of processing the character-stream and not the tokens. If you need to apply the accent normalizatino later in the analysis chain, either use ISOLatin1AccentFilterFactory or help with the implementation of SOLR-1978. -- Jan

Re: SolrJ-1.4.0 client needs slf4j-jdk14-1.5.5 library on J2SE 1.5 Update 21

2010-07-03 Thread Jan Høydahl / Cominvent
Hi, SolrJ uses slf4j logging. As you can read on the wiki http://wiki.apache.org/solr/Solrj#Solr_1.4 you need to provide the slf4j-jdk14 binding (or any other log framework you wish to bind to) yourself and add the jar to your classpath. -- Jan Høydahl, search solution architect Cominvent AS

Re: Use free text to search against boolean fields?

2010-07-02 Thread Jan Høydahl / Cominvent
Hi, I would rather go for the boolean variant and spend some time writing a query parser which tries to understand all kinds of input people may make, mapping it into boolean filters. In this way you can support both navigation and search and keep both in sync whatever people prefert to start

Re: Dilemma - Very Frequent Synonym updates for Huge Index

2010-07-01 Thread Jan Høydahl / Cominvent
Hi, I think I would look at a hybrid approach, where you keep adding new synonyms to a query-side qynonym dictionary for immediate effect. And then every now and then or every Nth night you move those synonyms over to the index-side dictionary and trigger a full reindex. A nice side effect of

Re: Very basic questions: Faceted front-end?

2010-07-01 Thread Jan Høydahl / Cominvent
Have you had a look at www.twigkit.com ? Could be worth the bucks... -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 1. juli 2010, at 00.59, Peter Spam wrote: Wow, thanks Lance - it's really fast now! The last piece of

Re: Multilingual - Search against the appropriate field

2010-07-01 Thread Jan Høydahl / Cominvent
Hi, I have chosen the same approach as you, indexing content into text_language fields with custom analysis, and it works great. Solr does not have any overhead with this even if there are hundreds of languages, due to the schema-less nature of Lucene. And if you know which language is being

SolrJ: BinaryRequestWriter with StreamingUpdateSolrServer

2010-07-01 Thread Jan Høydahl / Cominvent
Hi, I had the impression that the StreamingUpateSolrServer in SolrJ would automatically use the /update/javabin UpdateRequestHandler. Is this not true? Do we need to call server.setRequestWriter(new BinaryRequestWriter()) for it to transmit content with the binary protocol? -- Jan Høydahl,

Re: DisMax, multi fields, and phrase fields

2010-07-01 Thread Jan Høydahl / Cominvent
Hi, Check out the new eDisMax handler (src) and the new pf2 parameter. Also available as path SOLR-1553. Another option to avoid match for doc2 is to add application specific logic in your frontend which detects car brands and years and rewrite the query into a phrase or a filter. -- Jan

Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread Jan Høydahl / Cominvent
Hi, You need to use HTTP POST in order to send those parameters I believe. Try with curl: curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml --data-binary deletequeryuid:6-HOST*/query/delete -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com

Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread Jan Høydahl / Cominvent
Hmm, nice one - I was not aware of that trick. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 30. juni 2010, at 18.41, bbarani wrote: Hi, I was able to sucessfully delete multiple documents using the below URL

Re: optional vs. probhibited aka standard vs. dismax handler

2010-06-29 Thread Jan Høydahl / Cominvent
Hi, In DisMax the mm parameter controls whether terms are required or optional. The default is 100% which means all terms required, i.e. you do not need to add +. You can change to mm=0 and you will get the same behaviour as standard parser, i.e. an OR behaviour, where the + would say that a

Re: use copyField to gather and then split

2010-06-29 Thread Jan Høydahl / Cominvent
Hi pal :) Unfortunately copyField works only BEFORE analysis and you cannot chain them... The simplest solution would be to duplicate your copyField's: copyField source=title dest=textanayzemethod2 / copyField source=body dest=textanayzemethod2 / copyField source=title dest=textanayzemethod1

Re: optional vs. probhibited aka standard vs. dismax handler

2010-06-29 Thread Jan Høydahl / Cominvent
AS - www.cominvent.com Training in Europe - www.solrtraining.com On 29. juni 2010, at 14.02, Lukas Kahwe Smith wrote: On 29.06.2010, at 13:38, Lukas Kahwe Smith wrote: On 29.06.2010, at 13:24, Jan Høydahl / Cominvent wrote: Hi, In DisMax the mm parameter controls whether terms are required

Re: preside != president

2010-06-28 Thread Jan Høydahl / Cominvent
Hi, You might also want to check out the new Lucene-Hunspell stemmer at http://code.google.com/p/lucene-hunspell/ It uses OpenOffice dictionaries with known stems in combination with a large set of language specific rules. It handles your example, but it is an early release, so test it

Re: Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ

2010-06-22 Thread Jan Høydahl / Cominvent
Hi, Sometimes I do both. I put the defaults in solrconfig.xml and thus have one place to define all kind of low-level default settings. But then I make a possibility in the application space to add/override any parameters as well. This gives you great flexibility to let server administrators

Re: SolrJ: Setting multiple parameters

2010-06-20 Thread Jan Høydahl / Cominvent
Or simply use add(), because setParam overrides existing hashMap key: solrQuery.setParam(stats.facet, fieldA); solrQuery.add(stats.facet, fieldB); solrQuery.add(stats.facet, fieldC); -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe -

MappingCharFilterFactory equivalent for use after tokenizer?

2010-06-18 Thread Jan Høydahl / Cominvent
Hi, Is there a token filter which do the same job as MappingCharFilterFactory but after tokenizer, reading the same config file? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com

Re: dismax and AND as the default operator

2010-06-18 Thread Jan Høydahl / Cominvent
Standard DisMax does not fully support explicit AND/OR. You can prove that by trying to say q=fuel+OR+cell and see that the score stays the same (given mm=100%) It appears that DisMax does SOME intelligent handling of AND/OR/NOT, because it adds the + on the AND and a - on the NOT. But adding a

  1   2   >