Re: Changing the default Fuzzy minSimilarity?

2010-12-15 Thread Jan Høydahl / Cominvent
A fuzzy query foo~ defaults to a similarity of 0.5, i.e. equal to foo~0.5 just as an FYI, this isn't true in trunk (4.0) any more. the defaults are changed so that it never enumerates the entire dictionary (slow) like before, see: https://issues.apache.org/jira/browse/LUCENE-2667 so,

Omitting tf but not positions

2010-12-15 Thread Jan Høydahl / Cominvent
Hi, I have a case where I use DisMax pf to boost on phrase match in a field. I use omitNorms=true to avoid length normalization to mess with my scores. However, for some documents, the phrase foo bar occur more than one time in the same field, and I get an unintended TF boost for one of them

Re: [DIH] Example for SQL Server

2010-12-15 Thread Savvas-Andreas Moysidis
Hi Adam, we are using DIH to index off an SQL Server database(the freeby SQLExpress one.. ;) ). We have defined the following in our %TOMCAT_HOME%\solr\conf\data-config.xml: dataConfig dataSource type=JdbcDataSource name=mssqlDatasource

Problem using curl in PHP to get Solr results

2010-12-15 Thread Dennis Gearon
I finally figured out how to use curl to GET results, i.e. just turn all spaces into '%20' in my type of queries. I'm using solar spatial, and then searching in both the default text field and a couple of columns. Works fine on in the browser. But if I query for it using curl in PHP, there's

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread pankaj bhatt
HI , On Wed, Dec 15, 2010 at 2:52 PM, Dennis Gearon gear...@sbcglobal.netwrote: I finally figured out how to use curl to GET results, i.e. just turn all spaces into '%20' in my type of queries. I'm using solar spatial, and then searching in both the default text field and a couple of

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Stephen Weiss
Forgive me if this seems like a dumb question but have you tried the Apache_Solr_Service class? http://www.ibm.com/developerworks/library/os-php-apachesolr/index.html It's really quite good at handling the nuts and bolts of making the HTTP requests and decoding the responses for PHP. I almost

Re: Omitting tf but not positions

2010-12-15 Thread Robert Muir
On Wed, Dec 15, 2010 at 3:09 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: Any way to disable TF/IDF normalization without also disabling positions? see Similarity.tf(float) and Similarity.tf(int) if you want to change this for both terms and phrases just override

French stemming / size of synonyms file

2010-12-15 Thread Emmanuel Bégué
Hello, According to the wiki http://wiki.apache.org/solr/LanguageAnalysis, the light stemmers for French (solr.FrenchLightStemFilterFactory and solr.FrenchMinimalStemFilterFactory) are only available for SOLR 3.1. Is there a way to make them work with 1.4.1? - - - Additionally, there is an

Re: Search with facet.pivot

2010-12-15 Thread Erik Hatcher
One oddity is the duplicated sections: arr name=facet.pivot strroot_category_name,parent_category_name,category/str strroot_category_id,parent_category_id,category_id/str /arr That's in your responseHeader twice. Perhaps something fishy caused from that? Is this hardcoded in your

R: limit the search results to one category

2010-12-15 Thread Andrea Gazzarini
Did you try with filterquery? Andrea Gazzarini -Original Message- From: sara motahari saramotah...@yahoo.com Date: Tue, 14 Dec 2010 17:34:52 To: solr-user@lucene.apache.org Reply-To: solr-user@lucene.apache.org Subject: limit the search results to one category Hi all, I am using a

Re: French stemming / size of synonyms file

2010-12-15 Thread Robert Muir
2010/12/15 Emmanuel Bégué medu...@gmail.com: Hello, According to the wiki http://wiki.apache.org/solr/LanguageAnalysis, the light stemmers for French (solr.FrenchLightStemFilterFactory and solr.FrenchMinimalStemFilterFactory) are only available for SOLR 3.1. Is there a way to make them work

Dataimport performance

2010-12-15 Thread Robert Gründler
Hi, we're looking for some comparison-benchmarks for importing large tables from a mysql database (full import). Currently, a full-import of ~ 8 Million rows from a MySQL database takes around 3 hours, on a QuadCore Machine with 16 GB of ram and a Raid 10 storage setup. Solr is running on a

Re: [DIH] Example for SQL Server

2010-12-15 Thread Adam Estrada
Thanks All, Testing here shortly and will report back asap. w/r, Adam On Wed, Dec 15, 2010 at 4:10 AM, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: Hi Adam, we are using DIH to index off an SQL Server database(the freeby SQLExpress one.. ;) ). We have defined the

Problem with multicore

2010-12-15 Thread Jörg Agatz
Hallo Users, I habve a Problem wit Solr 1.4.1 on Ubuntu 10.10 I have download the new version and extract it! than i have copy the solr.xml from example/multicore/solr.xml to /examples/solr/solr.xml ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under

Re: Dataimport performance

2010-12-15 Thread Adam Estrada
What version of Solr are you using? Adam 2010/12/15 Robert Gründler rob...@dubture.com Hi, we're looking for some comparison-benchmarks for importing large tables from a mysql database (full import). Currently, a full-import of ~ 8 Million rows from a MySQL database takes around 3 hours,

Re: Dataimport performance

2010-12-15 Thread Erick Erickson
You're adding on the order of 750 rows (docs)/second, which isn't bad... have you profiled the machine as this runs? Even just with top (assuming unix)... because the very first question is always what takes the time, getting the data from MySQL or indexing or I/O?. If you aren't maxing out your

Re: Dataimport performance

2010-12-15 Thread Robert Gründler
What version of Solr are you using? Solr Specification Version: 1.4.1 Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42 Lucene Specification Version: 2.9.3 Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55 -robert Adam 2010/12/15 Robert Gründler

Re: Dataimport performance

2010-12-15 Thread Bernd Fehling
We are currently running Solr 4.x from trunk. -d64 -Xms10240M -Xmx10240M Total Rows Fetched: 24935988 Total Documents Skipped: 0 Total Documents Processed: 24568997 Time Taken: 5:55:19.104 24.5 Million Docs as XML from filesystem with less than 6 hours. May be your MySQL is the bottleneck?

Re: Dataimport performance

2010-12-15 Thread Tim Heckman
2010/12/15 Robert Gründler rob...@dubture.com: The data-config.xml looks like this (only 1 entity):      entity name=track query=select t.id as id, t.title as title, l.title as label from track t left join label l on (l.id = t.label_id) where t.deleted = 0 transformer=TemplateTransformer  

Re: Problem with multicore

2010-12-15 Thread Tommaso Teofili
Hi Jörg, I think the first thing you should check is your Ubuntu's encoding, second one is file permissions (BTW why are you sudoing?). Did you try using the bash script under example/exampledocs named post.sh (use it like this: 'sh post.sh *.xml') Cheers, Tommaso 2010/12/15 Jörg Agatz

Re: Dataimport performance

2010-12-15 Thread Robert Gründler
i've benchmarked the import already with 500k records, one time without the artists subquery, and one time without the join in the main query: Without subquery: 500k in 3 min 30 sec Without join and without subquery: 500k in 2 min 30. With subquery and with left join: 320k in 6 Min 30 so

Lower level filtering

2010-12-15 Thread Michael Owen
Hi all, I'm currently using Solr and I've got a question about filtering on a lower level than filter queries. We want to be able to restrict the documents that can possibly be returned to a users query. From another system we'll get a list of document unique ids for the user which is all the

Re: Lower level filtering

2010-12-15 Thread Stephen Green
On Wed, Dec 15, 2010 at 9:49 AM, Michael Owen michaelowe...@hotmail.com wrote: I'm currently using Solr and I've got a question about filtering on a lower level than filter queries. We want to be able to restrict the documents that can possibly be returned to a users query. From another

Re: Lower level filtering

2010-12-15 Thread Savvas-Andreas Moysidis
It might not be practical in your case, but is it possible to get from that other system, a list of ids the user is *not* allow to see and somehow invert the logic in the filter? Regards, -- Savvas. On 15 December 2010 14:49, Michael Owen michaelowe...@hotmail.com wrote: Hi all, I'm

Re: Dataimport performance

2010-12-15 Thread Tim Heckman
The custom import I wrote is a java application that uses the SolrJ library. Basically, where I had sub-entities in the DIH config I did the mappings inside my java code. 1. Identify a subset or chunk of the primary id's to work on (so I don't have to load everything into memory at once) and put

Custom scoring for searhing geographic objects

2010-12-15 Thread Pavel Minchenkov
Hi, Please give me advise how to create custom scoring. I need to result that documents were in order, depending on how popular each term in the document (popular = how many times it appears in the index) and length of the document (less terms - higher in search results). For example, index

RE: Lower level filtering

2010-12-15 Thread Michael Owen
That was a quick response Steve! Sounds all great! Much appreciated. Definitely think specifying a bit filter is something that many people many find useful. I'll have a look at Solr-2052 too. Thanks again, Mike Date: Wed, 15 Dec 2010 09:57:54 -0500 Subject: Re: Lower level filtering From:

RE: Lower level filtering

2010-12-15 Thread Michael Owen
Good point - though the inverse could be true where only a few documents is allowed and then a big list still exists. Even in the middle ground, its still going to be a long list of thousands. Thanks Mike Date: Wed, 15 Dec 2010 14:58:33 + Subject: Re: Lower level filtering From:

Re: Lower level filtering

2010-12-15 Thread Erick Erickson
Here's the problem with what you're outlining: Solr/Lucene doc ids are NOT invariant, so the doc IDs you get from the other system will not be directly usable by in the filter. But assuming the other system stores what you've defined as uniqueKey you could walk the index and get the doc IDs from

Copying the index from one solr instance to another

2010-12-15 Thread Robert Gründler
Hi again, let's say you have 2 solr Instances, which have both exactly the same configuration (schema, solrconfig, etc). Could it cause any troubles if we import an index from a SQL database on solr instance A, and copy the whole index to the datadir of solr instance B (both solr instances run

Re: Copying the index from one solr instance to another

2010-12-15 Thread Shawn Heisey
On 12/15/2010 10:05 AM, Robert Gründler wrote: Hi again, let's say you have 2 solr Instances, which have both exactly the same configuration (schema, solrconfig, etc). Could it cause any troubles if we import an index from a SQL database on solr instance A, and copy the whole index to the

Re: Copying the index from one solr instance to another

2010-12-15 Thread Robert Gründler
thanks for your feedback. we can shutdown both solr servers for the time of the copy-process, and both solr instances run the same version, so we should be ok. i'll let you know if we encounter any troubles. -robert On Dec 15, 2010, at 18:11 , Shawn Heisey wrote: On 12/15/2010 10:05 AM,

Parenthesis in query string

2010-12-15 Thread Tommaso Teofili
Hi all, I've just noticed a strange behavior (or, at least, I didn't expect that), when adding useless parenthesis to a query. Using the lucene query parser in Solr I get no results with the query: * ((( NOT (text:something))) AND date = 2010-12-15) * while I get the expected results when the

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Dennis Gearon
I want to just pass the JSON through after qualifying the user's access to the site. Didn't want to spend the horse power to receive it as PHP array syntax, run the risk of someone putting bad stuff in the contents and running 'exec()' on it, and then spending the extra horsepower to putput

Dates BC

2010-12-15 Thread Agethle, Matthias
Hi everyone, does the solr.TrieDateField support dates BC? I indexed negative dates and I'm able to query them, but if I store them, they show up as postitive dates. Thanks Matthias

Re: Copying the index from one solr instance to another

2010-12-15 Thread Rob Casson
just making sure that you're aware of the built-in replication: http://wiki.apache.org/solr/SolrReplication can pull the indexes, along with config files. cheers, rob 2010/12/15 Robert Gründler rob...@dubture.com: Hi again, let's say you have 2 solr Instances, which have both exactly

Re: facet.pivot for date fields

2010-12-15 Thread Adeel Qureshi
Thanks Pankaj - that was useful to know. I havent used the query stuff before for facets .. so that was good to know .. but the problem is still there because I want the hierarchical counts which is exactly what facet.pivot does .. so e.g. i want to count for fieldC within fieldB and even fieldB

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Andrew McCombe
Hi You could use Solr's php serialized object output (wt=phps) and then convert it to json in your php: ?php echo json_encode(unserialize($results_from_solr)); ? Regards Andrew McCombe On 15 December 2010 17:49, Dennis Gearon gear...@sbcglobal.net wrote: I want to just pass the JSON through

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Markus Jelsma
The GeoDistanceComponent triggers the problem. It may be an issue in the component but it could very well be a Solr issue. It seems you missed a very recent thread on this one. https://issues.apache.org/jira/browse/SOLR-2278 I finally figured out how to use curl to GET results, i.e. just turn

Re: Exceptions in Embedded Solr

2010-12-15 Thread Antoniya Statelova
I experienced this on an EmbeddedSolrServer which was running behind a tomcat process. After restarting the tomcat process 2-3 times (implying this also recreates the SolrServer every time as well) this issue went away but I don't know why it ever started. It looked like the searcher shutdown was

[Adding] Entities when indexing a DB

2010-12-15 Thread Adam Estrada
All, I have successfully indexed a single entity but when I try multiple entities is the second is skipped all together. Is there something wrong with my config file? ?xml version=1.0 encoding=utf-8 ? dataConfig dataSource type=JdbcDataSource

Re: [Adding] Entities when indexing a DB

2010-12-15 Thread Allistair Crossley
mission.id and event.id if the same value will be overwriting the indexed document. your ids need to be unique across all documents. i usually have a field id_original that i map the table id to, and then for id per entity i usually prefix it with the entity name in the value mapped to the

Re: Dates BC

2010-12-15 Thread Chris Hostetter
: does the solr.TrieDateField support dates BC? : I indexed negative dates and I'm able to query them, : but if I store them, they show up as postitive dates. Hmm... definitely seems to be a bug. I *think* this is another manifestation of SOLR-1899 (because of how the hokey formatting code

Re: Custom scoring for searhing geographic objects

2010-12-15 Thread Grant Ingersoll
Have a look at http://lucene.apache.org/java/3_0_2/scoring.html on how Lucene's scoring works. You can override the Similarity class in Solr as well via the schema.xml file. On Dec 15, 2010, at 10:28 AM, Pavel Minchenkov wrote: Hi, Please give me advise how to create custom scoring. I

[ANN] General Availability of LucidWorks Enterprise

2010-12-15 Thread Grant Ingersoll
Lucid Imagination is pleased to announce the general availability of our Apache Solr/Lucene powered LucidWorks Enterprise (LWE). LWE is designed to make it easier for people to get up to speed on search by providing easier management, integration with libraries commonly used in building search

Re: Viewing query debug explanation with dismax and multicore

2010-12-15 Thread Chris Hostetter
: I am trying to debug my queries and see how scoring is done. I have 6 cores and : send the quesy to 6 shards and it's dismax handler (with search on various : fields with different boostings). I enable debug, and view source but I'm unable : to see the explanations. I'm returning ID and

Re: limit the search results to one category

2010-12-15 Thread Chris Hostetter
: Subject: limit the search results to one category : References: 427522.34555...@web52907.mail.re2.yahoo.com : 930238.38683...@web51308.mail.re2.yahoo.com : In-Reply-To: 930238.38683...@web51308.mail.re2.yahoo.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing

Re: Problem with multicore

2010-12-15 Thread Chris Hostetter
: SimplePostTool: FATAL: Solr returned an error: : Unexpected_character_m_code_109_in_prolog_expected___at_rowcol_unknownsource_11 if you look at your solr log (or the HTTP response body, SimplePostTool only gives you the status line) you'll see the more human readable form of that error

Re: [ANN] General Availability of LucidWorks Enterprise

2010-12-15 Thread Andy
Congrats! A couple questions: 1) Which version of Solr is this based on? 2) How is LWE different from standard Solr? How should one choose between the two? Thanks. --- On Wed, 12/15/10, Grant Ingersoll gsing...@apache.org wrote: From: Grant Ingersoll gsing...@apache.org Subject: [ANN]

Re: [Adding] Entities when indexing a DB

2010-12-15 Thread Adam Estrada
Ahhh...I found that I did not set a dataSource name and when I did that and then referred each entity to that dataSource all went according to plan ;-) ?xml version=1.0 encoding=utf-8 ? dataConfig dataSource type=JdbcDataSource name=bleh

Re: nexus of synonyms and stemming, take 2

2010-12-15 Thread Chris Hostetter
: This is a fairly basic synonyms question: how does synonyms handle stemming? it's all a question of how your analysis chain is configured forh te field type. if you have your stemming filter before your synonyms filter, then the synonyms.txt file needs to map the *stems* of hte synonyms.

Re: can solrj swap cores?

2010-12-15 Thread Chris Hostetter
: One of our developers had initially tried swapping solr cores (e.g. core0 : and core1) using the solrj api, but it failed. (don't have the exact error) : He susequently replaced the call with straight http (i.e. http client). : : Unfortunately I don't have the exact error in front of me...

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Dennis Gearon
I will look into the security and processor power implications of that. Good idea, thx. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them

Re: can solrj swap cores?

2010-12-15 Thread Tim Heckman
It's been working for me. One thing to look out for might be the url you're using in SolrUtil.getSolrServer()? The url you use for reindexing won't be the same as the one you use to swap cores. Make sure it's using admin/cores and not production/admin/cores or reindex/admin/cores. Sorry if this

Memory use during merges (OOM)

2010-12-15 Thread Burton-West, Tom
Hello all, Are there any general guidelines for determining the main factors in memory use during merges? We recently changed our indexing configuration to speed up indexing but in the process of doing a very large merge we are running out of memory. Below is a list of the changes and part of

Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Dennis Gearon
well, it was three problems: 1/ I was saving the file as a 'complete web page', uknowingly, from firefox. 2/ I had a small message for troubleshooting being spit out after the json. 3/ My partner had output all the spatial solr 'tiers' information, and there's a binary value in there that stops

Re: Next Word - Any Suggestions?

2010-12-15 Thread Sean O'Connor
Hi Christopher, One option comes to mind: shingles? I have not done anything with them yet, but that is on my radar for sometime about a month out. Speaking unencumbered by experience or substantial understanding, my guess is that shingles would be great for you if you can select

Thank you!

2010-12-15 Thread Adam Estrada
I just want to say that this list serve has been invaluable to a newbie like me ;-) I posted a question earlier today and literally 10 minutes later I got an answer that helped me solve my problem. This is proof that there is a experienced and energetic community behind this FOSS group of projects

Re: Dataimport performance

2010-12-15 Thread Lance Norskog
Can you do just one join in the top-level query? The DIH does not have a batching mechanism for these joins, but your database does. On Wed, Dec 15, 2010 at 7:11 AM, Tim Heckman theck...@gmail.com wrote: The custom import I wrote is a java application that uses the SolrJ library. Basically,