Re: Can I set up a config-based distributed search

2011-04-11 Thread Ran Peled
Thanks, Ludovic and Jonathan. Yes, this configuration default is exactly what I was looking for. Ran On Mon, Apr 11, 2011 at 7:12 PM, Jonathan Rochkind wrote: > I have not worked with shards/distributed, but I think you can probably > specify them as defaults in your requesthandler in your so

Re: when to change rows param?

2011-04-11 Thread Paul Libbrecht
Hoss, as of now I managed to adjust this in the client code before it touches the server so it is not urgent at all anymore. I wanted to avoid touching the client code (which is giving, oh great fun, MSIE concurrency miseries) hence I wanted a server-side rewrite of the maximum number of hits

Re: Clarifying "fetchindex" command

2011-04-11 Thread Mark Miller
Looking at the code, issuing a fetchindex will cause the fetch to occur right away, with no respect for polling. - Mark On Apr 11, 2011, at 12:37 PM, Otis Gospodnetic wrote: > Hi, > > Can one actually *force* replication of the index from the master without a > commit being issued on the mast

Indexing Flickr and Panaramio

2011-04-11 Thread Estrada Groups
Has anyone tried doing this? Got any tips for someone getting started? Thanks, Adam Sent from my iPhone

Re: Solr 3.1 performance compared to 1.4.1

2011-04-11 Thread Lance Norskog
Marius: "I have copied the configuration from 1.4.1 to the 3.1." Does the Directory implementation show up in the JMX beans? In admin/statistics.jsp ? Or the Solr startup logs? (Sorry, don't have a Solr available.) Yonik: > What platform are you on? I believe the Lucene Directory > implementatio

Re: Tika, Solr running under Tomcat 6 on Debian

2011-04-11 Thread Lance Norskog
Ah! Did you set the UTF-8 parameter in Tomcat? On Mon, Apr 11, 2011 at 2:49 AM, Mike wrote: > Hi Roy, > > Thank you for the quick reply. When i tried to index the PDF file i was able > to see the response: > > > 0 > 479 > > > > Query: > http://localhost:8080/solr/update/extract?stream.file=D:\mik

Re: Indexing Best Practice

2011-04-11 Thread Lance Norskog
SOLR-1499 is a plug-in for the DIH that uses Solr as a DataSource. This means that you can read the database and PDFs separately. You could index all of the PDF content in one DIH script. Then, when there's a database update, you have a separate DIH scripts that reads the old row from Solr, and pul

Re: Solr under Tomcat

2011-04-11 Thread Lance Norskog
Hi Mike- Please start a new thread for this. On Mon, Apr 11, 2011 at 2:47 AM, Mike wrote: > Hi All, > > I have installed solr instance on tomcat6. When i tried to index the PDF > file i was able to see the response: > > > 0 > 479 > > > Query: > http://localhost:8080/solr/update/extract?stream.fi

Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-11 Thread Lance Norskog
The DIH has multi-threading. You can have one thread fetching files and then give them to different threads. On Mon, Apr 11, 2011 at 11:40 AM, wrote: > Hi Lance, > > I used XPathEntityProcessor with attribut "xsl" and generate a xml-File "in > the form of the standard Solr update schema". > I l

Re: MoreLikeThis match

2011-04-11 Thread Mike Mattozzi
Match is the document that's the top result of the query (q param) that you specify. Response is the list of documents that are similar to the 'match' document. -Mike On Mon, Apr 11, 2011 at 4:55 PM, Brian Lamb wrote: > Does anyone have any thoughts on this one? > > On Fri, Apr 8, 2011 at 9:26

RE: Exact match on a field with stemming

2011-04-11 Thread Jean-Sebastien Vachon
Thanks for the clarification. This make sense. -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: April-11-11 7:54 PM To: solr-user@lucene.apache.org Subject: FW: Exact match on a field with stemming > I'm curious to know why Solr is not respecting the phrase. >

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-11 Thread Joey Hanzel
Awesome. Thanks Jayendra. I hadn't caught these patches yet. I applied SOLR-2416 patch to the solr-3.1 release tag. This resolved the problem of archive files not being unpacked and indexed with Solr CELL. Thanks for the FYI. https://issues.apache.org/jira/browse/SOLR-2416 On Mon, Apr 11, 2011 a

Re: XML not coming through from nabble to Gmail

2011-04-11 Thread Chris Hostetter
: I see the same problem (missing markup) in Thunderbird. Seems like Nabble : might be the culprit? if someone can cite some specific examples (by email message-id, or subject, or date+sender, or url from nabble, or url from any public archive, or anything more specific then "posts from nabble

Re: does overwrite=false work with json

2011-04-11 Thread Chris Hostetter
: I tried it with the example json documents, and even if I add : overwrite=false to the URL, it still overwrites. : : Do this twice: : curl 'http://localhost:8983/solr/update/json?commit=true&overwrite=false' --data-binary @books.json -H 'Content-type:application/json' ...the JSON Update Reque

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-11 Thread Chris Hostetter
: I have a core with 120+ segment files and I tried partial optimize specify : maxNumSegments=10, after the optimize the segment files reduced to 64 files; a) the option you want to specify is "maxSegments" .. not "maxNumSegments" http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes

Re: Solr 1.4.1 compatible with Lucene 3.0.1?

2011-04-11 Thread Otis Gospodnetic
Hi, I only read the short story. :) Note that you should post questions like this on solr-user@lucene list, which is where I'm replying now. Since you are just starting with Solr, why not grab the recently released 3.1? That way you'll get the latest Lucene and the latest Solr. Otis Sem

Re: XML not coming through from nabble to Gmail

2011-04-11 Thread Michael Sokolov
I see the same problem (missing markup) in Thunderbird. Seems like Nabble might be the culprit? -Mike On 4/11/2011 8:13 AM, Erick Erickson wrote: All: Lately I've been seeing a lot of posts where people paste in parts of their schema.xml or solrconfig.xml and the results are...er...disappoint

Re: Deduplication questions

2011-04-11 Thread Chris Hostetter
: Q1. Is is possible to pass *analyzed* content to the : : public abstract class Signature { No, analysis happens as the documents are being written to the lucene index, well after the UpdateProcessors have had a chance to interact with the values. : Q2. Method calculate() is using concatenat

Re: when to change rows param?

2011-04-11 Thread Chris Hostetter
Paul: can you elaborate a little bit on what exactly your problem is? - what is the full component list you are using? - how are you changing the param value (ie: what does the code look like) - what isn't working the way you expect? : I've been using my own QueryComponent (that extends the s

FW: Exact match on a field with stemming

2011-04-11 Thread Jonathan Rochkind
> I'm curious to know why Solr is not respecting the phrase. > If it consider "manager" as a phrase... shouldn't it return only document > containing that phrase? A phrase means to solr (or rather to the lucene and dismax query parsers, which are what understand double-quoted phrases) "these t

RE: Exact match on a field with stemming

2011-04-11 Thread Jean-Sebastien Vachon
I'm curious to know why Solr is not respecting the phrase. If it consider "manager" as a phrase... shouldn't it return only document containing that phrase? -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: April-11-11 3:42 PM To: solr-user@lucene.apache

Too many open files exception related to solrj getServer too often?

2011-04-11 Thread cyang2010
Hi, I get this solrj error in development environment. org.apache.solr.client.solrj.SolrServerException: java.net.SocketException: Too many open files At the time there was no reindexing or any write to the index. There were only different queries genrated using solrj to hit solr server:

Re: MoreLikeThis match

2011-04-11 Thread Brian Lamb
Does anyone have any thoughts on this one? On Fri, Apr 8, 2011 at 9:26 AM, Brian Lamb wrote: > I've looked at both wiki pages and none really clarify the difference > between these two. If I copy and paste an existing index value for field and > do an mlt search, it shows up under match but not r

Re: Mongo REST interface and full data import

2011-04-11 Thread andrew_s
Thank you guys for your answers. I didn't recognise that it will be so easy to do it and example from http://wiki.apache.org/solr/UpdateJSON#Example works perfectly for me. Regards, Andrew -- View this message in context: http://lucene.472066.n3.nabble.com/Mongo-REST-interface-and-full-data-impo

Re: Question on Dismax plugin

2011-04-11 Thread Otis Gospodnetic
Hi Raj, I'm guessing your slug field is much shorter and thus a match in that field has more weight than a match is a much longer story field. If you omit norms for those fields in the schema (and reindex), I believe you will see File 4 drop to position #4. Otis Sematext :: http://semate

Question on Dismax plugin

2011-04-11 Thread Nemani, Raj
All, I have a question on the Dismax plugin for the search handler. I have two test instances of Solr. In one I am using the default search handler. In this case, the fields that I am working with (slug and story) are indexed via the all_text filed and the searches are done on the all_text fiel

Re: Will Slaves Pileup Replication Requests?

2011-04-11 Thread Parker Johnson
Thanks Larry. -Parker On 4/11/11 12:14 PM, "Green, Larry (CMG - Digital)" wrote: >Yes. It will wait whatever the replication interval is after the most >recent replication completes before attempting again. > >On Apr 11, 2011, at 2:42 PM, Parker Johnson wrote: > >> >> What is the slave replic

Re: Exact match on a field with stemming

2011-04-11 Thread Otis Gospodnetic
Hi, Using quoted means "use this as a phrase", not "use this as a literal". :) I think copying to unstemmed field is the only/common work-around. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message

Re: Will Slaves Pileup Replication Requests?

2011-04-11 Thread Green, Larry (CMG - Digital)
Yes. It will wait whatever the replication interval is after the most recent replication completes before attempting again. On Apr 11, 2011, at 2:42 PM, Parker Johnson wrote: > > What is the slave replication behavior if a replication request to pull > indexes takes longer than the replication

Will Slaves Pileup Replication Requests?

2011-04-11 Thread Parker Johnson
What is the slave replication behavior if a replication request to pull indexes takes longer than the replication interval itself? Anotherwords, if my replication interval is set to be every 30 seconds, and my indexes are significantly large enough to take longer than 30 seconds to transfer, is t

Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-11 Thread karsten-solr
Hi Lance, I used XPathEntityProcessor with attribut "xsl" and generate a xml-File "in the form of the standard Solr update schema". I lost a lot of performance, it is a pity that XPathEntityProcessor does only use one thread. My tests with a collection of 350T Document: 1. use of XPathRecordRea

Lucene Revolution 2011 - Early Bird Ends April 18

2011-04-11 Thread Michael Bohlig
A quick reminder that there's one week left on special pricing for Lucene Revolution 2011. Sign up this week and save some serious cash: - Conference Registration, now $545, a savings of $180 over the $725 late registration price - Training Package with 2-day Training plus Conference Re

RE: ArrayIndexOutOfBoundsException with facet query

2011-04-11 Thread Burton-West, Tom
Thanks Mike, With the unpatched version, the first time I run the facet query on topicStr it works fine, but the second time I get the ArrayIndexOutOfBoundsException. If I try different facets such as language, I don't see the same symptoms. Maybe the number of facet values needs to exceed s

Re: ArrayIndexOutOfBoundsException with facet query

2011-04-11 Thread Michael McCandless
Right, it's the total number of terms across all fields... unfortunately. This class is used to enroll a term into the terms cache that wraps the terms dictionary, so in theory you could also hit this issue during normal searching when a term is looked up once, and then looked up again (the 2nd t

RE: Problems indexing very large set of documents

2011-04-11 Thread Brandon Waterloo
I found a simpler command-line method to update the PDF files. On some documents it does so perfect, the result is a pixel-for-pixel match and none of the OCR text (which is what all these PDFs are, newspaper articles that have been passed through OCR) is lost. However, on other documents the

RE: ArrayIndexOutOfBoundsException with facet query

2011-04-11 Thread Burton-West, Tom
Thanks Mike, At first I thought this couldn't be related to the 2.1 Billion terms issue since the only place we have tons of terms is in the OCR field and this is not the OCR field. But then I remembered that the total number of terms in all fields is what matters. We've had no problems with re

Clarifying "fetchindex" command

2011-04-11 Thread Otis Gospodnetic
Hi, Can one actually *force* replication of the index from the master without a commit being issued on the master since the last replication? I do see "Force a fetchindex on slave from master command: http://slave_host:port/solr/replication?command=fetchindex"; on http://wiki.apache.org/solr/S

Re: Performance with search terms starting and ending with wildcards

2011-04-11 Thread Otis Gospodnetic
Hi, Perhaps you should give Lucene/Solr trunk a try and compare! The Wildcard query in trunk should be much faster. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Ueland > To: solr

Re: Can I set up a config-based distributed search

2011-04-11 Thread Jonathan Rochkind
I have not worked with shards/distributed, but I think you can probably specify them as defaults in your requesthandler in your solrconfig.xml instead. Somewhere there is (or was) a wiki page on this I can't find right now. There's a way to specify (for a particular request handler) a default

Reloading synonyms.txt without downtime

2011-04-11 Thread Otis Gospodnetic
Hi, Apparently, when one RELOADs a core, the synonyms file is not reloaded. Is this the expected behaviour? Is it the desired behaviour? Here's the use-case: When one is doing purely query-time synonym expansion, ideally one would be able to edit synonyms.txt and get it reloaded, so that

Re: Indexing Best Practice

2011-04-11 Thread Shaun Campbell
If it's of any help I've split the processing of PDF files from the indexing. I put the PDF content into a text file (but I guess you could load it into a database) and use that as part of the indexing. My processing of the PDF files also compares timestamps on the document and the text file so th

Re: Is there a way to create multiple using DIH and access the data pertaining to a particular ?

2011-04-11 Thread Mike
Hi All, I am new to solr. I want to implement solr search. I have to implement two search buttons(1. books and 2. computers and both are in the same datasource) which are completely different there is no relation between each other. Could you please let know how to define the entities in data-con

Re: Can I set up a config-based distributed search

2011-04-11 Thread lboutros
You can add to your search handler the "shards" parameter : host1/solr, host2/solr Is is what you are looking for ? Ludovic. 2011/4/11 Ran Peled [via Lucene] < ml-node+2806331-346788257-383...@n3.nabble.com> > In the Distributed Search page ( > http://wiki.apache.org/solr/DistributedSearc

Re: ArrayIndexOutOfBoundsException with facet query

2011-04-11 Thread Michael McCandless
Tom, I think I see where this may be -- it looks like another > 2B terms bug in Lucene (we are using an int instead of a long in the TermInfoAndOrd class inside TermInfosReader.java), only present in 3.1. I'm also mad that Test2BTerms fails to catch this!! I will go fix that test and confirm it

Can I set up a config-based distributed search

2011-04-11 Thread Ran Peled
In the Distributed Search page ( http://wiki.apache.org/solr/DistributedSearch), it is documented that in order to perform a distributed search over a sharded index, I should use the "shards" request parameter, listing the shards to participate in the search (e.g. ?shards=localhost:8983/solr,localh

XML not coming through from nabble to Gmail

2011-04-11 Thread Erick Erickson
All: Lately I've been seeing a lot of posts where people paste in parts of their schema.xml or solrconfig.xml and the results are...er...disappointing. None of the less-than or greater-than symbols show and the formatting is all over the map. Since some mails would come through with the XML forma

Re: Spellchecker with synonyms

2011-04-11 Thread royr
Yes, it looks like this: will work on query and index time i think. -- View this message in context: http://lucene.472066.n3.nabble.com/Spellchecker-with-synonyms-tp2806028p2806157.html Sent from the Solr - User mailing list archive at Nabble.com

Re: Spellchecker with synonyms

2011-04-11 Thread lboutros
Did you configure synonyms for your field at query time ? Ludovic. 2011/4/11 royr [via Lucene] > Hello, > > I have some synonyms for city names. Sometimes there are multiple names for > one city, example:. > > newyork, newyork city, big apple > > I search for "big apple" and get results with ne

Spellchecker with synonyms

2011-04-11 Thread royr
Hello, I have some synonyms for city names. Sometimes there are multiple names for one city, example:. newyork, newyork city, big apple I search for "big apple" and get results with new york(synonym) If somebody search for "big aple" i want a spelling suggestion like: big apple. How can i fix th

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-11 Thread Gary Taylor
Jayendra, Thanks for the info - been keeping an eye on this list in case this topic cropped up again. It's currently a background task for me, so I'll try and take a look at the patches and re-test soon. Joey - glad you brought this issue up again. I haven't progressed any further with it.

Re: Tika, Solr running under Tomcat 6 on Debian

2011-04-11 Thread Mike
Hi Roy, Thank you for the quick reply. When i tried to index the PDF file i was able to see the response: 0 479 Query: http://localhost:8080/solr/update/extract?stream.file=D:\mike\lucene\apache-solr-1.4.1\example\exampledocs\Struts%202%20Design%20and%20Programming1.pdf&stream.contentType=app

RE: Solr under Tomcat

2011-04-11 Thread Mike
Hi All, I have installed solr instance on tomcat6. When i tried to index the PDF file i was able to see the response: 0 479 Query: http://localhost:8080/solr/update/extract?stream.file=D:\mike\lucene\apache-solr-1.4.1\example\exampledocs\Struts%202%20Design%20and%20Programming1.pdf&stream.cont

Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1

2011-04-11 Thread karsten-solr
Hi Lance, your are right: XPathEntityProcessor has the attribut "xsl", so I can use xslt to generate a xml-File "in the form of the standard Solr update schema". I will check the performance of this. Best regards Karsten btw. "flatten" is an attribute of the "field"-Tag, not of XPathEntityP

Re: Tika, Solr running under Tomcat 6 on Debian

2011-04-11 Thread Roy Liu
\apache-solr-3.1.0\contrib\extraction\lib\tika*.jar -- Best Regards, Roy Liu On Mon, Apr 11, 2011 at 3:10 PM, Mike wrote: > Hi All, > > I have the same issue. I have installed solr instance on tomcat6. When try > to index pdf I am running into the below exception: > > 11 Apr, 2011 12:11:55 PM

Re: How to index PDF file stored in SQL Server 2008

2011-04-11 Thread Roy Liu
I changed data-config-sql.xml to There are no errors, but, the indexed pdf is convert to Numbers.. 200 1 202 1 203 1 212 1 222 1 236 1 242 1 244 1 254 1 255 -- Best Regards, Roy Liu On Mon, Apr 11, 2011 at 2:02 PM, Roy Liu wrote: >

Re: Solr 3.1 performance compared to 1.4.1

2011-04-11 Thread Marius van Zwijndregt
Hi Yonik ! Thanks for your reply. I decided to switch to 3.1 and see if the performance would settle down after building up a proper index. Looking at the average response time from both installations i can see that 3.1 is now actually performing much better than 1.4.1 (1.4.1 shows an average of

Re: Tika, Solr running under Tomcat 6 on Debian

2011-04-11 Thread Mike
Hi All, I have the same issue. I have installed solr instance on tomcat6. When try to index pdf I am running into the below exception: 11 Apr, 2011 12:11:55 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NoClassDefFoundError: org/apache/tika/exception/TikaException at java.