Search Regression Testing

2011-04-06 Thread Mark Mandel
Hey guys, I'm wondering how people are managing regression testing, in particular with things like text based search. I.e. if you change how fields are indexed or change boosts in dismax, ensuring that doesn't mean that critical queries are showing bad data. The obvious answer to me was using

Re: how to start GarbageCollector

2011-04-06 Thread stockii
why is solr copy my complete index to somewhere when i start an delta-import? i copy one core, start an full-import from 35Million docs and then start an delta-import from the last hour (~2000Docs). dih/solr need start to copy the hole index... why ? i think he is copy the index, because my

very slow commit. copy of index ?

2011-04-06 Thread stockii
Hello again ;-) after a full-import from 36M Doc`s my delta import dont work fine. if i starts my delta (which runs on another core very fast) the commit need vry long. I think, that solr copys the hole index and commit the new documents in the index and then reduce the index size after

Re: command is still running ? delta-import?

2011-04-06 Thread stockii
i have the same problem. any resolutions ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx

Re: Search Regression Testing

2011-04-06 Thread Colin Vipurs
Hi Mark, What we're doing is using a bunch of acceptance tests with JBehave to drive our testing. We run this in a clean room environment, clearing out the indexes before a test run and inserting the data we're interested in. As well as tests to ensure things just work we have a bunch of tests

solr faceted search performance reason

2011-04-06 Thread Robin Palotai
Hello List, Please see my question at http://stackoverflow.com/questions/5552919/how-does-lucene-solr-achieve-high-performance-in-multi-field-faceted-search, I would be interested to know some details. Thank you, Robin

Re: Search Regression Testing

2011-04-06 Thread Paul Libbrecht
Mark, In one project, with Lucene not Solr, I also use a smallish unit test sample and apply some queries there. It is very limited but is automatable. I find a better way is to have precision and recall measures of real users run release after release. I could never fully apply this yet on

Re: help with Jetty log message

2011-04-06 Thread Matthieu Huin
As far as I am aware of, licensing issues make that impossible for us ... On 04/05/2011 07:29 PM, Kaufman Ng wrote: Looks like you are using openjdk. Can you try using Sun jdk? On Mon, Apr 4, 2011 at 6:53 AM, Upayavirau...@odoko.co.uk wrote: This is not Solr crashing, per se, it is your

How to avoid Lock file generation - solr 1.4.1

2011-04-06 Thread rajini maski
I am using Solr 1.4.1(windows os) and below are the settings in my solr config file: writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout ramBufferSizeMB32/ramBufferSizeMB maxMergeDocs1/maxMergeDocs lockTypenative/lockType While writing the index, I am

Re: Script to remove all index.* leftovers

2011-04-06 Thread Markus Jelsma
Yes my mistake, you're right about #1. On Wednesday 06 April 2011 05:25:50 William Bell wrote: Thank you for pointing out #2. The commitsToKeep is interesting, but I thought each commit would create a segment (before optimized) and be self contained in the index.* directory? I would only

Re: how to reset the index in solr

2011-04-06 Thread Gabriele Kahlout
Hi Marcus, Your curl cmds don't work in that format on my unix. I conver them as follows, and they still don't work: $ curl --fail $solrIndex/update?commit=true -d '*:*' $ curl --fail $solrIndex/update -d '' From the browser:

Solr architecture diagram

2011-04-06 Thread Jan Høydahl
Hi, At Cominvent we've often had the need to visualize the internal architecture of Apache Solr in order to explain both the relationships of the components as well as the flow of data and queries. The result is a conceptual architecture diagram, clearly showing how Solr relates to the

Re: how to reset the index in solr

2011-04-06 Thread Gabriele Kahlout
Solved. The correct translation of Marcus cmd: $ curl http://localhost:8080/solr/update?commit=true -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete' http://stackoverflow.com/questions/2358476/solr-delete-not-working-for-some-reason NB: the response is still not what I'd

Re: Solr architecture diagram

2011-04-06 Thread Stevo Slavić
Nice, thank you! Wish there was something similar or extra to this one depicting where do SolrJ's CommonsHttpSolrServer and EmbeddedSolrServer fit in. Regards, Stevo. On Wed, Apr 6, 2011 at 11:44 AM, Jan Høydahl jan@cominvent.com wrote: Hi, At Cominvent we've often had the need to

sort by function problem

2011-04-06 Thread ramzesua
I try to use sort by function in a new release of SOLR 3.1, but I have some problems, for example: http://localhost:8983/new_search/select?q=mothers dayindent=truefl=templateSetId,score,templateSetPopularitysort=product(templateSetPopularity,query(mothers day)) desc templateSetPopularity - my

RE: FW: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-06 Thread Ephraim Ofir
Hi all, I'd love to share the diagram, just not sure how to do that on the list (it's a word document I tried to send as attachment). Jens, to answer your questions: 1. Correct, in our setup the source of the data is a DB from which we pull the data using DIH (search the list for my previous post

solr-2351 patch

2011-04-06 Thread Isha Garg
Hi, Tell me for which solr version does Patch file SOLR-2351(https://issues.apache.org/jira/secure/attachment/12470560/mlt.patch) fixed for . Regards! Isha

RE: Embedded Solr constructor not returning

2011-04-06 Thread Steven A Rowe
Hi Greg, I need the servlet API in my app for it to work, despite being command line. So adding this to the maven POM fixed everything: dependency groupIdjavax.servlet/groupId artifactIdservlet-api/artifactId version2.5/version

Re: Solrj and display which Solr version is used

2011-04-06 Thread Erick Erickson
The only way I know of (and it's a little, well, a lot arcane) is to ping the admin/system handler. As it happens, I just had to do something like this. This uses apache commons http client 3X, NOT the most recent FWIW... The URl can be admin/see solrconfig.xml I'd really like to find out that

Re: what happens to docsPending if stop solr before commit

2011-04-06 Thread Erick Erickson
They're lost, never to be seen again. You'll have to reindex them. Best Erick On Tue, Apr 5, 2011 at 4:25 PM, Robert Petersen rober...@buy.com wrote: Hello fellow enthusiastic solr users, I tried to find the answer to this simple question online, but failed. I was wondering about this,

Re: Synonym-time Reindexing Issues

2011-04-06 Thread Erick Erickson
Hmmm, this should work just fine. Here are my questions. 1 are you absolutely sure that the new synonym file is available when reindexing? 2 does the sunspot program do anything wonky with the ids? The documents will only be replaced if the IDs are identical. 3 are you sure that a

Re: solr faceted search performance reason

2011-04-06 Thread Erick Erickson
Please re-post the question here so others can see the discussion without going to another list. Best Erick On Wed, Apr 6, 2011 at 4:09 AM, Robin Palotai m.palotai.ro...@gmail.comwrote: Hello List, Please see my question at

Re: sort by function problem

2011-04-06 Thread Yonik Seeley
The problem is query(mothers day) See http://wiki.apache.org/solr/FunctionQuery#query You can't directly include query syntax because the function parser wouldn't know how to get to the end of that syntax. You could either do query($qq) and then add a qq=mothers day to the request Or if you

Migrating from solr 1.4.1 to 3.1.0

2011-04-06 Thread Isan Fulia
Hi all, Solr 3.1.0 uses different javabin format from 1.4.1 So if I use Solrj 1.4.1 jar , then i get javabin error while saving to 3.1.0 and if I use Solrj 3.1.0 jar , then I get javabin error while reading the document from solr 1.4.1. How to go for reindexing in this situation. -- Thanks

Solr: Images, Docs and Binary data

2011-04-06 Thread Ezequiel Calderara
Hello everyone, i need to know if some has used solr for indexing and storing images (upt to 16MB) or binary docs. How does solr behaves with this type of docs? How affects performance? Thanks Everyone -- __ Ezequiel. Http://www.ironicnet.com

dataimporhandler

2011-04-06 Thread Gastone Penzo
Hello, i have a problem with dataimporthandler. i want to index many products directly from db with this component. i want to index some products little by little.. and every time i finish a piece i want to be sure that indexes are committed before go on with the other piece. i see that i can

Re: Solrj and display which Solr version is used

2011-04-06 Thread Marc SCHNEIDER
Ok thanks, that's an idea :-) Maybe we should suggest to have a method in CommonsHttpSolrServer that is returning Solr's version... Marc. On Wed, Apr 6, 2011 at 2:58 PM, Erick Erickson erickerick...@gmail.comwrote: The only way I know of (and it's a little, well, a lot arcane) is to ping the

Re: solr faceted search performance reason

2011-04-06 Thread Robin Palotai
Carbon copied: *Context* This is a question mainly about Lucene (or possibly Solr) internals. The main topic is *faceted search*, in which search can happen along multiple independent dimensions (facets) of objects (for example size, speed, price of a car). When implemented with relational

Re: dismax boost query not useful?

2011-04-06 Thread Shawn Heisey
On 4/5/2011 1:17 PM, Chris Hostetter wrote: the boost param of edismax is probably a lot better choice then either bq/bf -- but it really depends on wether you want an additive boost or a multiplicitive one (of course with teh function query syntax add(), product() and (query() can be combined

Re: dismax boost query not useful?

2011-04-06 Thread Yonik Seeley
On Wed, Apr 6, 2011 at 12:00 PM, Shawn Heisey s...@elyograg.org wrote: We aren't yet using dismax in production, but I've had it in my config for a while now.  I've changed it to edismax in the 3.1 setup I'm putting together now.  It has the following in the bf parameter:

RE: Using MLT feature

2011-04-06 Thread Frederico Azeiteiro
Yes, I had already check the code for it and use it to compile a c# method that returns the same signature. But I have a strange issue: For instance, using MinTokenLenght=2 and default QUANT_RATE, passing the text frederico (simple text no big deal here): 1. using my c# app returns

Re: dataimporhandler

2011-04-06 Thread Erick Erickson
There's not much to go on here, can you provide details on how you check that you've committed? How are you configuring DIH? etc. It might be helpful to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Wed, Apr 6, 2011 at 10:11 AM, Gastone Penzo gastone.pe...@gmail.comwrote:

Re: dismax boost query not useful?

2011-04-06 Thread Smiley, David W.
On Apr 5, 2011, at 3:17 PM, Chris Hostetter wrote: one of the original use cases for bq was for artificial keyword boosting, in which case it still comes in handy... bq=meta:promote^100 text:new^10 category:featured^100 (*:* -category:accessories)^10 Yeah I thought of this specific

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Ezequiel Calderara
Another question that maybe is easier to answer, how can i store binary data? Any example schema? 2011/4/6 Ezequiel Calderara ezech...@gmail.com Hello everyone, i need to know if some has used solr for indexing and storing images (upt to 16MB) or binary docs. How does solr behaves with this

Re: solr faceted search performance reason

2011-04-06 Thread Jonathan Rochkind
On 4/6/2011 10:55 AM, Robin Palotai wrote: Therefore, Lucene supposedly has some advanced technique for multi-field queries other than just taking the intersection of matching documents based on the inverted index. I don't think so, neccesarily. It's just that Lucene's algorithms to doing

Re: solr faceted search performance reason

2011-04-06 Thread Jonathan Rochkind
PS: If you want to see how Solr actually computes facetting (the facetting code lives in the 'Solr' codebase, not in the lower level lucene codebase), here's the file to look at, this web snapshot is from 1.4.1 dont' know if it's been changed more recently, but I don't think majorly:

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Ryan McKinley
You can store binary data using a binary field type -- then you need to send the data base64 encoded. I would strongly recommend against storing large binary files in solr -- unless you really don't care about performance -- the file system is a good option that springs to mind. ryan

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Jonathan Rochkind
I put binary data in an ordinary Solr stored field, don't need any special schema. I have run into trouble making sure the data is not corrupted on the way in during indexing, depending on exactly what form of communication is being used to index (SolrJ, SolrJ with EmbeddedSolr, DIH, etc.),

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Jonathan Rochkind
Ha, there's a binary field type?! I've stored binary data in an ordinary String field type, and it's worked. But there were some headaches to get it to work, might have been smoother if I had realized there was actually a binary field type. But wait I'm talking about Solr 'stored field',

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Ezequiel Calderara
Hi, your answers were really helpfull I was thinking in putting the base64 encoded file into a string field. But was a little worried about solr trying to stem it or vectorize or those stuff. Seen in the example of the schema.xml: !--Binary data type. The data should be sent/retrieved in as

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Markus Jelsma
Ha, there's a binary field type?! I've stored binary data in an ordinary String field type, and it's worked. But there were some headaches to get it to work, might have been smoother if I had realized there was actually a binary field type. How, you can't just embed control characters in

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Markus Jelsma
Hi, your answers were really helpfull I was thinking in putting the base64 encoded file into a string field. But was a little worried about solr trying to stem it or vectorize or those stuff. String field types are not analyzed. So it doesn't brutalize your data. Better use BinaryField.

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Jonathan Rochkind
On 4/6/2011 2:39 PM, Markus Jelsma wrote: Ha, there's a binary field type?! I've stored binary data in an ordinary String field type, and it's worked. But there were some headaches to get it to work, might have been smoother if I had realized there was actually a binary field type. How, you

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Adam Estrada
Well...by default there is a pretty decent schema that you can use as a template in the example project that builds with Solr. Tika is the library that does the actual content extraction so it would be a good idea to try the example project out first. Adam 2011/4/6 Ezequiel Calderara

Re: Concatenate multivalued DIH fields

2011-04-06 Thread alexei
Hi Everyone, I am having an identical problem with concatenating author's first and last names stored in an xml blob. Because this field is multivalued copyfield does not work. Does anyone have a solution? Regards, Alexei -- View this message in context:

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Stefan Matheis
Ezequiel, Am 06.04.2011 20:38, schrieb Ezequiel Calderara: Anyone knows any storage for images that performs well, other than FS ? you may have a look on http://www.danga.com/mogilefs/ ? :) Regards Stefan

Re: DIH: Indexing multiple datasources with the same schema

2011-04-06 Thread alexei
Sorry about bringing an old thread back, I thought my solution could be useful. I also had to deal with multiple data sources. If the data source number could be queried for in one of your parent entities then you could get it using a variable as follows: entity name=ChildEntity

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Ezequiel Calderara
On Wed, Apr 6, 2011 at 15:31 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: Well...by default there is a pretty decent schema that you can use as a template in the example project that builds with Solr. Tika is the library that does the actual content extraction so it would be a good

unindexible Chars?

2011-04-06 Thread Charles Wardell
Once and awhile, my post.jar seems to fail on commit. Durring the commit process, I have gotten a few errors. One is that EOF character found, and another is that semicolon expected after the. I also have come across a was expected. So my question is what characters do I need to strip out of

Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Markus Jelsma
On Wed, Apr 6, 2011 at 15:31 PM, Adam Estrada estrada.adam.gro...@gmail.com I wanted to know how large field's size affects performance. If you use replication then it's a huge impact on performance as the data gets sent over the network. It's also a memory hog so there's less memory and

Re: unindexible Chars?

2011-04-06 Thread Markus Jelsma
Once and awhile, my post.jar seems to fail on commit. Durring the commit process, I have gotten a few errors. One is that EOF character found, and another is that semicolon expected after the. I also have come across a was expected. So my question is what characters do I need to strip out

ClobTransformer Issues

2011-04-06 Thread Stephen Garvey
Hi All, I'm hoping someone can give me some pointers. I've got Solr 1.4.1 and am using DIH to import a table from and Ingres database. The table contains a column which is a CLOB type. I've tried to use a CLOB transformer to transform the CLOB to a String but the index only contains something

Re: Eclipse: Invalid character constant

2011-04-06 Thread Eric Grobler
Hi Stefan, Thanks, my eclipse is now perfectly configured. It makes it very easy for amateurs like me! For other amateurs the steps are: 1. checkout the sources: *svn checkout https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/* 2. the root folder (lucene_solr_3_1 in this

where is INFOSTREAM.txt located?

2011-04-06 Thread Tirthankar Chatterjee
**Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in

Re: Synonym-time Reindexing Issues

2011-04-06 Thread Preston Marshall
Reply Inline: On Apr 6, 2011, at 8:12 AM, Erick Erickson wrote: Hmmm, this should work just fine. Here are my questions. 1 are you absolutely sure that the new synonym file is available when reindexing? Not sure what you mean here, solr is running as root, and the file is never moved

RE: what happens to docsPending if stop solr before commit

2011-04-06 Thread Robert Petersen
Oh woe is me... lol NP good to know. I'll get them on the next go 'round. :) Thanks for the answer! -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, April 06, 2011 6:05 AM To: solr-user@lucene.apache.org Subject: Re: what happens to

RE: where is INFOSTREAM.txt located?

2011-04-06 Thread Tirthankar Chatterjee
Thanks All, I figured it out. http://lucene.472066.n3.nabble.com/general-debugging-techniques-td868300.html See the last line on this page. -Original Message- From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com] Sent: Wednesday, April 06, 2011 6:15 PM To:

Re: Embedded Solr constructor not returning

2011-04-06 Thread Greg Pendlebury
Sounds good. Please go ahead and make this change yourself. Done. Ta, Greg On 6 April 2011 22:52, Steven A Rowe sar...@syr.edu wrote: Hi Greg, I need the servlet API in my app for it to work, despite being command line. So adding this to the maven POM fixed everything:

Re: what happens to docsPending if stop solr before commit

2011-04-06 Thread Koji Sekiguchi
(11/04/06 5:25), Robert Petersen wrote: I tried to find the answer to this simple question online, but failed. I was wondering about this, what happens to uncommitted docsPending if I stop solr and then restart solr? Are they lost? Are they still there but still uncommitted? Do they get

RE: what happens to docsPending if stop solr before commit

2011-04-06 Thread Robert Petersen
Really? Great! I was wondering if there was some cleanup cycle like that which would occur upon shutdown. That sounds like much more logical behavior! -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Wednesday, April 06, 2011 4:03 PM To:

Shared conf

2011-04-06 Thread Mark
Is there a configuration value I can specify for multiple cores to use the same conf directory? Thanks

difference between geospatial search from database angle and from solr angle

2011-04-06 Thread Sean Bigdatafun
I understand Solr can do pretty powerful geospatial search http://www.ibm.com/developerworks/java/library/j-spatial/ http://www.ibm.com/developerworks/java/library/j-spatial/But I also understand lots of DB researchers have done lots of geospatial related work, can someone give an overview of the

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-06 Thread Lance Norskog
I would not use replication. LinkedIn consumer search is a flat system where one process indexes new entries and does queries simultaneously. It's a custom Lucene app called Zoie. Their stuff is on Github.. I would get documents to indexers via a multicast IP-based queueing system. This scales

Re: Using MLT feature

2011-04-06 Thread Lance Norskog
A fuzzy signature system will not work here. You are right, you want to try MLT instead. Lance On Wed, Apr 6, 2011 at 9:47 AM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Yes, I had already check the code for it and use it to compile a c# method that returns the same signature.

Re: SOLR - problems with non-english symbols when extracting HTML

2011-04-06 Thread Lance Norskog
Tomcat has to be configured to use UTF-8. http://wiki.apache.org/solr/SolrTomcat?highlight=%28tomcat%29#URI_Charset_Config On Fri, Mar 25, 2011 at 6:58 PM, kushti sandyl...@gmail.com wrote: Grijesh wrote: Try to send HTML data using format CDATA . Doesn't work with $content = ; And

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-06 Thread Walter Underwood
The bigger answer is that you cannot get to this size by just configuring Solr. You may have to invent a lot of stuff. Like all of Google. Where did you get these numbers? The proposed query rate is twice as big as Google (Feb 2010 estimate, 34K qps). I work at MarkLogic, and we scale to 100's

solr-2351 patch

2011-04-06 Thread Isha Garg
Hi, Tell me for which solr version does Patch file SOLR-2351(https://issues.apache.org/jira/secure/attachment/12470560/mlt.patch) fixed for . Regards! Isha

Re: difference between geospatial search from database angle and from solr angle

2011-04-06 Thread David Smiley (@MITRE.org)
Sean, Geospatial search in Lucene/Solr is of course implemented based on Lucene's underlying index technology. That technology was originally just for text but it's been adapted very successfully for numerics and querying ranges too. The only mature geospatial field type in Solr 3.1 is

Tips for getting unique results?

2011-04-06 Thread Peter Spam
Hi, I have documents with a field that has 1A2B3C alphanumeric characters. I can query for * and sort results based on this field, however I'd like to uniq these results (remove duplicates) so that I can get the 5 largest unique values. I can't use the StatsComponent because my values have

Re: Tips for getting unique results?

2011-04-06 Thread Otis Gospodnetic
Hi, I think you are saying dupes are the main problem? If so, http://wiki.apache.org/solr/Deduplication ? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Peter Spam ps...@mac.com To:

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-06 Thread Jens Mueller
Hello Ephraim, hello Lance, hello Walter, thanks for your replies: Ephraim, thanks very much for the further detailed explanation. I will try to setup a demo system in the next few days and use your advice. LoadBalancers are an important aspect of your design. Can you recommend one LB

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-06 Thread Otis Gospodnetic
Just a quick comment re LinkedIn's stuff. You can look at Zoie (also covered in Lucene in Action 2), but you may be more interested in Sensei. And yes, big systems like that need sharding and replication, multiple master and lots of slaves. Otis Sematext :: http://sematext.com/ :: Solr