Re: Solr contribs build and jar-of-jars

2012-08-20 Thread Chantal Ackermann
Hi Lance, does this do what you want? http://maven.apache.org/plugins/maven-assembly-plugin/descriptor-refs.html#jar-with-dependencies It's maven but that would be an advantage I'd say… ;-) Chantal Am 05.08.2012 um 01:25 schrieb Lance Norskog: Has anybody tried packaging the contrib

Re: Diversifying Search Results - Custom Collector

2012-08-20 Thread Mikhail Khludnev
Hello, I've got the problem description below. Can you explain the expected user experience, and/or solution approach before diving into the algorithm design? Thanks On Sat, Aug 18, 2012 at 2:50 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: My problem is that when

Upgrade solr 3.4 to solr 3.6.1 without rebuilding the existing index ?

2012-08-20 Thread Dominique Bejean
Hi, I think the response is yes, but I need to check. Is it possible to upgrade from solr 3.4 to solr 3.6.1 without rebuilding the existing index ? Thank you. Dominique

Re: how to retrieve total token count per collection/index

2012-08-20 Thread tech.vronk
Am 09.08.2012 18:02, schrieb Robert Muir: On Thu, Aug 9, 2012 at 10:20 AM, tech.vronk t...@vronk.net wrote: Hello, I wonder how to figure out the total token count in a collection (per index), i.e. the size of a corpus/collection measured in tokens. You want to use this statistic, which

RE: Get results only from the last hour

2012-08-20 Thread Markus Jelsma
Date queries are described here: http://wiki.apache.org/solr/SolrQuerySyntax You must first make sure your dates end up in a Date fieldType and are in the proper format. -Original message- From:Dotan Cohen dotanco...@gmail.com Sent: Mon 20-Aug-2012 13:57 To:

Re: Get results only from the last hour

2012-08-20 Thread Dotan Cohen
On Mon, Aug 20, 2012 at 3:00 PM, Markus Jelsma markus.jel...@openindex.io wrote: Date queries are described here: http://wiki.apache.org/solr/SolrQuerySyntax Terrific, thank you! You must first make sure your dates end up in a Date fieldType and are in the proper format. Thanks. --

Re: scanned pdf with solr cell

2012-08-20 Thread Michael Della Bitta
It's pretty easy to accidentally run into the AWT stuff if you're doing anything that involves image processing, which I would expect a generic RTF parser might do. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017

Re: How to index multivalued field tokens by their attached metadata?

2012-08-20 Thread Fuu
After pondering it for a while I decided to take the advice and write the processing as a separate program. It will probably be easier to pre-format the data with a scripting language anyways. Thank you for taking your time to reply. :) - Fuu -- View this message in context:

Re: Diversifying Search Results - Custom Collector

2012-08-20 Thread Karthick Duraisamy Soundararaj
Hello Mikhail, Thank you for the reply. In terms of user experience, I want to spread out the products from same brand farther from each other, *atleast* in the first 50-100 results we display. I am thinking about two different approaches as solution.

Re: Upgrade solr 3.4 to solr 3.6.1 without rebuilding the existing index ?

2012-08-20 Thread Jack Krupansky
At the Lucene level the index should be 100% compatible, but I don't know with 100% certainty whether there may be subtle changes in any field type analyzers or token filters, such as in the example schema. You might want to read SOLR-2519 and see whether your fields and field types may be

solr finds allways all documents

2012-08-20 Thread robert rottermann
Hi there, I am new to solr et all. Besides I am a java noob. What I am doing: I want to do full text retrival on office documents. The metadata of these documents are maintained in Postgesql. So the only intormation I need to get out of solr is a documet ID. My problem no is, that my index

Re: solr finds allways all documents

2012-08-20 Thread Sven Maurmann
Dear Robert, could you give me a little more information about your setting? For example the complete solrconfig.xml and the complete schema.xml would definitely help. Best, Sven -- kippdata informationstechnologie GmbH Sven Maurmann Tel: 0228 98549 -12 Bornheimer Str. 33a

Re: Upgrade solr 3.4 to solr 3.6.1 without rebuilding the existing index ?

2012-08-20 Thread Erick Erickson
The CHANGES.txt file (make sure to look in the Lucene version as well as Solr) will have, for each new version, a section about upgrading from that should answer for you... Best Erick On Mon, Aug 20, 2012 at 3:13 AM, Dominique Bejean dominique.bej...@eolya.fr wrote: Hi, I think the

Re: Diversifying Search Results - Custom Collector

2012-08-20 Thread Karthick Duraisamy Soundararaj
Tanguy, You idea is perfect for cases where there is a too many documents with 80-90% documents having same value for a particular field. As an example, your idea is ideal for, lets say we have 10 documents in total like this, doc1 : merchantName Kellog's /merchantName doc2 :

Re: solr finds allways all documents

2012-08-20 Thread Jack Krupansky
How are you ingesting the offic documents? SolrCell, or some other method? Do you have CopyFields? What fields are you querying on? What does your text field type look like? -- Jack Krupansky -Original Message- From: robert rottermann Sent: Monday, August 20, 2012 10:39 AM To:

Solr Custom Filter Factory - How to pass parameters?

2012-08-20 Thread ksu wildcats
We are using SOLR and are in the process of adding custom filter factory to handle the processing of words/tokens to suit our needs. Here is what our custom filter factory does 1) Reads the tokens and does some analysis and writes the result of analysis to database. We are using Embedded Solr

Re: Solr Custom Filter Factory - How to pass parameters?

2012-08-20 Thread Jack Krupansky
First, the obvious question: What kind of information? Be specific. Second, you can pass parameters to your filter factory in your field type definitions. You could have separate schemas or separate field types for the different indexes. Is there anything this doesn't cover? You can also

Re: Solr Custom Filter Factory - How to pass parameters?

2012-08-20 Thread ksu wildcats
Thanks Jack. The information I want to pass is the databasename into which the analyzed data needs to be inserted. As i was saying earlier, the set up we have is 1) we use embedded solr server with multi cores - embedded into our webapp 2) support one index for each client - each client has a

RE: Solr Custom Filter Factory - How to pass parameters?

2012-08-20 Thread Markus Jelsma
-Original message- From:ksu wildcats ksu.wildc...@gmail.com Sent: Mon 20-Aug-2012 20:28 To: solr-user@lucene.apache.org Subject: Re: Solr Custom Filter Factory - How to pass parameters? Thanks Jack. The information I want to pass is the databasename into which the analyzed

Re: Upgrade solr 3.4 to solr 3.6.1 without rebuilding the existing index ?

2012-08-20 Thread Dominique Bejean
Thank you to both of you. Le 20/08/12 17:28, Erick Erickson a écrit : The CHANGES.txt file (make sure to look in the Lucene version as well as Solr) will have, for each new version, a section about upgrading from that should answer for you... Best Erick On Mon, Aug 20, 2012 at 3:13 AM,

Re: Diversifying Search Results - Custom Collector

2012-08-20 Thread Mikhail Khludnev
Hello, I don't believe your task can be solved by playing with scoring/collector or shuffling. For me it's absolutely Grouping usecase (despite I don't really know this feature well). Grouping cannot solve the problem because I dont want to limit the number of results showed based on the

RE: Solr Custom Filter Factory - How to pass parameters?

2012-08-20 Thread ksu wildcats
Thanks Markus. Links are helpful. I will give it a try and see if that solves my problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217p4002248.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

2012-08-20 Thread Fuad Efendi
NRT does not work because index updates hundreds times per second vs. cache warm-up time few minutesŠ and we are in a loopŠ allowing you to query your huge index in ms. Solr also allows to query in ms. What is the difference? No one can sort 1,000,000 terms in descending counts order faster

Shingle and PositionFilterFactory question

2012-08-20 Thread Carrie Coy
I am trying to use shingles and position filter to make a query for foot print, for example, match either foot print or footprint. From the docs: using the PositionFilter http://wiki.apache.org/solr/PositionFilter in combination makes it possible to make all shingles synonyms of each other.

UnInvertedField limitations

2012-08-20 Thread Fuad Efendi
Hi All, I have a problemŠ (Yonik, please!) help me, what is Term count limits? I possibly have 256,000,000 different terms in a fieldŠ or 16,000,000? Can I temporarily disable tho feature? Thanks! 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] - :

Grammar for ComplexPhraseQueryParser

2012-08-20 Thread vempap
Hello, Does anyone have the grammar file (.jj file) for the complex phrase query parser. The patch from https://issues.apache.org/jira/browse/SOLR-1604 does not have the grammar file as part of it. Thanks, Phani. -- View this message in context:

Re: Diversifying Search Results - Custom Collector

2012-08-20 Thread Karthick Duraisamy Soundararaj
Hi Mikhail, You are correct. [+] show 6 result.. will work but it wouldn't suit my requirements. This is a question of user experience right? Imagine if the product manager comes to you and says I dont want to see [+] show 6 result.. and I want the results to be diverse but

Many fields versus join

2012-08-20 Thread Steven Livingstone Pérez
Hi folks. I read some posts in the past about this subject but nothing that definitively answer my question. I am trying to understand the trade off when you use a large number of fields (now sure what a quantative value of large is in Solr .. say 200 fields) versus a join - and even a multi

UnInvertedField limitations

2012-08-20 Thread Fuad Efendi
Hi All, I have a problemŠ (Yonik, please!) help me, what is Term count limits? I possibly have 256,000,000 different terms in a fieldŠ or 16,000,000? Thanks! 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] - : org.apache.solr.common.SolrException: Too many values for

Re: Switch from Sphinx to Solr - some basics please

2012-08-20 Thread Lance Norskog
I have for example jobs form country A, jobs from country B and so on until 100 countries. I need to have for each country an separate index, because if someone search for jobs in country A I need to query only the index for country A. How to solve this problem? Ah! Will the text be in different

Re: Atomic Multicore Operations - E.G. Move Docs

2012-08-20 Thread Nicholas Ball
hi lance, how would that work? generation is essentially versioning right? i also don't see why you need to use zk to do this as it's all on a single machine, was hoping for a simpler solution :) On Sun, 19 Aug 2012 19:26:41 -0700, Lance Norskog goks...@gmail.com wrote: I would use generation

Re: UnInvertedField limitations

2012-08-20 Thread Jack Krupansky
It appears that there is a hard limit of 24-bits or 16M for the number of bytes to reference the terms in a single field of a single document. It takes 1, 2, 3, 4, or 5 bytes to reference a term. If it took 4 bytes, that would allow 16/4 or 4 million unique terms - per document. Do you have

Re: UnInvertedField limitations

2012-08-20 Thread Lance Norskog
Is this required by your application? Is there any way to reduce the number of terms? A work around is to use shards. If your terms follow Zipf's Law each shard will have fewer than the complete number of terms. For N shards, each shard will have ~1/N of the singleton terms. For 2-count terms,

Re: Atomic Multicore Operations - E.G. Move Docs

2012-08-20 Thread Lance Norskog
Yes, by generations I meant versioning. The problem is that you have to have a central holder of the current generation number. ZK does this very well. It is a distributed synchronized file system for very small files. If you have a more natural place to store the current generation number, that's

Re: Many fields versus join

2012-08-20 Thread Erick Erickson
Join works best with a small number of unique values. Unfortunately, people often want to join on uniqueKey, which is by definition unique per document. The usual advice is to first try to flatten your data as much as possible. There's also some ongoing work on block joins that you may want to

Solr Score threshold 'reasonably', independent of results returned

2012-08-20 Thread Ramzi Alqrainy
Usually, search results are sorted by their score (how well the document matched the query), but it is common to need to support the sorting of supplied data too. Boosting affects the scores of matching documents in order to affect ranking in score-sorted search results. Providing a boost value,

mergeindex: what happens if there is deletion during index merging

2012-08-20 Thread Yandong Yao
Hi guys, From http://wiki.apache.org/solr/MergingSolrIndexes, it said 'Using srcCore, care is taken to ensure that the merged index is not corrupted even if writes are happening in parallel on the source index'. What does it means? If there are deletion request during merging, will this