Re: Problem with suggest search

2010-03-16 Thread David Rühr
Thank you. This work good as workaround. Yesterday I get the Tipp to look for wrong solrconfig.xml and that was right. By uploading our Files the solrconfig.xml was LOST ;-) Is it possible to start Java in Debugmode for more Infos? David Am 16.03.2010 02:02, schrieb Tom Hill: You need a

Re: AutoSuggest

2010-03-16 Thread Suram
Shalin Shekhar Mangar wrote: On Sat, Mar 13, 2010 at 9:30 AM, Suram reactive...@yahoo.com wrote: Erick Erickson wrote: Did you commit your changes? Erick On Fri, Mar 12, 2010 at 7:38 AM, Suram reactive...@yahoo.com wrote: Can set my index fields for auto Suggestion,

Re: How to get Term Positions?

2010-03-16 Thread Grant Ingersoll
If you're going to spend time mucking w/ TermPositions, you should just spend your time working with SpanQuery, as that is what I understand you to be asking about. AIUI, you want to be able to get at the positions in the document where the query matched. This is exactly what a SpanQuery and

Re: Spatial search in Solr 1.5

2010-03-16 Thread Grant Ingersoll
On Mar 15, 2010, at 11:36 AM, Jean-Sebastien Vachon wrote: Hi All, I'm trying to figure out how to perform spatial searches using Solr 1.5 (from the trunk). Is the support for spatial search built-in? Almost. Main thing missing right now is filtering. There are still ways to do

solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-03-16 Thread Demian Katz
This is my first post on this list -- apologies if this has been discussed before; I didn't come upon anything exactly equivalent in searching the archives via Google. I'm using Solr 1.4 as part of the VuFind application, and I just noticed that searches for hyphenated terms are failing in

DIH request parameters

2010-03-16 Thread Lukas Kahwe Smith
Hi, According to the wiki its possible to pass parameters to the DIH: http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters I assume they are just being replaced via simple string replacements, which is exactly what I need. Can they also be in all places, even attributes

SQL and $deleteDocById

2010-03-16 Thread Lukas Kahwe Smith
Hi, I am trying to use $deleteDocById to delete rows based on an SQL query in my db-data-config.xml. The following tag is a top level tag in the document tag. entity name=company_del query=SELECT e.id AS `$deleteDocById` ROM deletedentity AS e/ However it seems like its only fetching

Re: PDF extraction leads to reversed words

2010-03-16 Thread Abdelhamid ABID
Hi again , I just came from trying the version 1.5-dev from Solr trunk. After applying the patch you provided, and adding icu4j-3_8_1 in classpath, results are pretty good different then before. Now words and texts are not reversed and are displayed correctly except some pdf files's text parts

Switching data dir on the fly

2010-03-16 Thread schmax
I generate solr index on an hadoop cluster and I want to copy it from HDFS to a server running solr. I wish to copy the index on a different disk than the disk that solr instance is using, then tell the solr server to switch from the current data dir to the location where I copied the hadoop

Stemming suggestions

2010-03-16 Thread blargy
Most of our documents will be in English but not all and we are certain in the process of acquiring more international content. Does anyone have any experience using all of the different stemmers for languages of unknown origin? Which ones perform the best? Give the most relevant results? What

Re: LucidWorks Solr

2010-03-16 Thread Kevin Osborn
I used it mostly for KStemmer, but I also liked the fact that it included about a dozen or so stable patches since Solr 1.4 was released. We just use the included WAR in our project however. We don't use the installer or anything like that. From: blargy

Re: LucidWorks Solr

2010-03-16 Thread AJ Chen
I'm trying it out right now. I hope it will work well out-of-box for indexing/searching a set of documents with frequent update. -aj On Tue, Mar 16, 2010 at 11:52 AM, blargy zman...@hotmail.com wrote: Has anyone used this?: http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr Other

Re: Stemming suggestions

2010-03-16 Thread Erick Erickson
If you search the mail archive, you'll find many discussions of multilingual indexing/searching that'll provide you a plethora of information. But the synopsis as I remember is that using a single stemmer for multiple languages is generally a bad idea Best Erick On Tue, Mar 16, 2010 at

Moving From Oracle Text Search To Solr

2010-03-16 Thread Neil Chaudhuri
I am working on an application that currently hits a database containing millions of very large documents. I use Oracle Text Search at the moment, and things work fine. However, there is a request for faceting capability, and Solr seems like a technology I should look at. Suffice to say I am

Re: LucidWorks Solr

2010-03-16 Thread blargy
Kevin, When you say you just included the war you mean the /packs/solr.war correct? I see that the KStemmer is nicely packed in there but I don't see LucidGaze anywhere. Have you had any experience using this? So I'm guessing you would suggest using the LucidWorks solr.war over the

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Erick Erickson
Why do you think you'd hit OOM errors? How big is very large? I've indexed, as a single document, a 26 volume encyclopedia of civil war records.. Although as much as I like the technology, if I could get away without using two technologies, I would. Are you completely sure you can't get what

XML data in solr field

2010-03-16 Thread Nair, Manas
Hello Experts, I need help on this issue of mine. I am unsure if this scenario is possible. I have a field in my solr document named inputxml, the value of which is a xml string as below. This xml structure is within the inputxml field value. I needed help on searching this xml structure i.e.

Re: LucidWorks Solr

2010-03-16 Thread Kevin Osborn
For my purposes, the Porter analyzer was overly aggressive with stemming. So, we then moved to KStem. It looks like this is no longer being maintained and Lucid claimed much better performance with theirs, so I gave that a try and it seems to be working fine. I didn't do any benchmarks though.

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Glen Newton
I've also index a concatenation of 50k journal articles (making a single document of several hundred MB of text) and it did not give me an OOM. -glen On 16 March 2010 15:57, Erick Erickson erickerick...@gmail.com wrote: Why do you think you'd hit OOM errors? How big is very large? I've

PDFBox/Tika Performance Issues

2010-03-16 Thread Giovanni Fernandez-Kincade
I've been trying to bulk index about 11 million PDFs, and while profiling our Solr instance, I noticed that all of the threads that are processing indexing requests are constantly blocking each other during this call: http-8080-Processor39 [BLOCKED] CPU time: 9:35

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Smiley, David W.
If you do stay with Oracle, please report back to the list how that went. In order to get decent filtering and faceting performance, I believe you will need to use bitmapped indexes which Oracle and some other databases support. You may want to check out my article on this subject:

Re: XML data in solr field

2010-03-16 Thread Tommy Chheng
Do you have the option of just importing each xml node as a field/value when you add the document? That'll let you do the search easily. If you need to store the raw XML, you can use an extra field. Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng

Solr RAM Requirements

2010-03-16 Thread KaktuChakarabati
Hey, I am trying to understand what kind of calculation I should do in order to come up with reasonable RAM size for a given solr machine. Suppose the index size is at 16GB. The Max heap allocated to JVM is about 12GB. The machine I'm trying now has 24GB. When the machine is running for a while

Re: PDFBox/Tika Performance Issues

2010-03-16 Thread Grant Ingersoll
Hmm, that is an ugly thing in PDFBox. We should probably take this over to the PDFBox project. How many threads are you indexing with? FWIW, for that many documents, I might consider using Tika on the client side to save on a lot of network traffic. -Grant On Mar 16, 2010, at 4:37 PM,

RE: Moving From Oracle Text Search To Solr

2010-03-16 Thread Neil Chaudhuri
That is a great article, David. For the moment, I am trying an all-Solr approach, but I have run into a small problem. The documents are stored as XML CLOB's using Oracle's OPAQUE object. Is there any facility to unpack this into the actual text? Or must I execute that in the SQL query?

RE: PDFBox/Tika Performance Issues

2010-03-16 Thread Giovanni Fernandez-Kincade
Originally 16 (the number of CPUs on the machine), but even with 5 threads it's not looking so hot. -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Tuesday, March 16, 2010 5:15 PM To: solr-user@lucene.apache.org Subject: Re:

Re: PDFBox/Tika Performance Issues

2010-03-16 Thread Mattmann, Chris A (388J)
Guys, I think this is an issue with PDFBOX and the version that Tika 0.6 depends on. Tika 0.7-trunk upgraded to PDFBox 1.0.0 (see [1]), so it may include a fix for the problem you're seeing. See this discussion [2] on how to patch Tika to use the new PDFBox if you can't wait for the 0.7

Re: Trouble Implementing Extracting Request Handler

2010-03-16 Thread Lance Norskog
NoClassDefFoundError usually means that the class was found, but it needs other classes and those were not found. That is, Solr finds the ExtractingRequestHandler jar but cannot find the Tika jars. In example/solr/conf/slrconfig.xml, there are several 'lib dir=path/' elements. These give

Re: DIH request parameters

2010-03-16 Thread Lance Norskog
They are a namespace like other namespaces and are useable in attributes, just like in the DB query string examples. As to defaults, you can declare those in the requestHandler declarations in solrconfig.xml. Examples of this (search for defaults) in the wiki page. On Tue, Mar 16, 2010 at 7:05

RE: PDFBox/Tika Performance Issues

2010-03-16 Thread Giovanni Fernandez-Kincade
I'm pretty unclear on how to patch the Tika 0.7-trunk on our Solr instance. This is what I've tried so far (which was really just me guessing): 1. Got the latest version of the trunk code from http://svn.apache.org/repos/asf/lucene/tika/trunk 2. Built this using Maven (mvn install)

Undefined field price on Dismax query

2010-03-16 Thread Alex Thurlow
Hi guys, Based on some suggestions, I'm trying to use the dismax query type. I'm getting a weird error though that I think it related to the default test data set. From the query tool (/solr/admin/form.jsp), I put in this: Statement: artist:test title:test +type:video query type: dismax

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Lance Norskog
The DataImportHandler has tools for this. It will fetch rows from Oracle and allow you to unpack columns as XML with Xpaths. http://wiki.apache.org/solr/DataImportHandler http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS

Indexing CLOB Column in Oracle

2010-03-16 Thread Neil Chaudhuri
Since my original thread was straying to a new topic, I thought it made sense to create a new thread of discussion. I am using the DataImportHandler to index 3 fields in a table: an id, a date, and the text of a document. This is an Oracle database, and the document is an XML document stored

Re: Trouble Implementing Extracting Request Handler

2010-03-16 Thread Steve Reichgut
Lance, I tried that but no luck. Just in case the relative paths were causing a problem, I also tried using absolute paths but neither seemed to help. First, I tried adding *lib dir=/path/to/example/solr/lib /* as the full directory so it would hopefully include everything. When that didn't

Re: Indexing CLOB Column in Oracle

2010-03-16 Thread Shawn Heisey
Disclaimer: My Oracle experience is miniscule at best. I am also a beginner at Solr, so grab yourself the proverbial grain of salt. I googled a bit on CLOB. One page I found mentioned setting up a view to return the data type you want. Can you use the functions described on these pages in

Re: Solr RAM Requirements

2010-03-16 Thread Peter Sturge
On Tue, Mar 16, 2010 at 9:08 PM, KaktuChakarabati jimmoe...@gmail.comwrote: Hey, I am trying to understand what kind of calculation I should do in order to come up with reasonable RAM size for a given solr machine. Suppose the index size is at 16GB. The Max heap allocated to JVM is about

Re: Undefined field price on Dismax query

2010-03-16 Thread Erick Erickson
I suspect your problem is that you still have price defined in solrconfig.xml for the dismax handler. Look for the section requestHandler name=dismax.. You'll see price defined as one of the default fields for fl and bf. HTH Erick On Tue, Mar 16, 2010 at 6:55 PM, Alex Thurlow

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Erick Erickson
Besides the other notes here, I agree you'll hit OOM if you try to read all the rows into memory at once, but I'm absolutely sure you can read then N at a time instead. Not that I could tell you how, mind you. You're on your way... Erick On Tue, Mar 16, 2010 at 4:13 PM, Neil Chaudhuri

Re: Undefined field price on Dismax query

2010-03-16 Thread Alex Thurlow
Aha. That appears to be the issue. I hadn't realized that the query handler had all of those definitions there. -Alex On 3/16/2010 6:56 PM, Erick Erickson wrote: I suspect your problem is that you still have price defined in solrconfig.xml for the dismax handler. Look for the section

Solr query parser doesn't invoke analyzer for simple term query?

2010-03-16 Thread Teruhiko Kurosaka
It seems that Solr's query parser doesn't pass a single term query to the Analyzer for the field. For example, if I give it 2001年 (year 2001 in Japanese), the searcher returns 0 hits but if I quote them with double-quotes, it returns hits. In this experiment, I configured schema.xml so that the

problem during benchmarking solr query

2010-03-16 Thread KshamaPai
Hi, Am using autobench to benchmark solr with the query http://localhost:8983/solr/select/?q=body:hotel AND _val_:recip(hsin(0.7113258,-1.291311553,lat_rad,lng_rad,30),1,1,0)^100 But if i specify the same in the autobench command as autobench --file bar1.tsv --high_rate 100 --low_rate 20

Re: Solr RAM Requirements

2010-03-16 Thread Peter Sturge
There are certainly a number of widely varying opinions on the use of RAM directory. Basically, though, if you need the index to be persistent at some point (i.e. saved across reboots, crashes etc.), you'll need to write to a disk, so RAM directory becomes somewhat superfluous in this case.

Stopwords

2010-03-16 Thread blargy
I was reading Scaling Lucen and Solr (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/) and I came across the section StopWords. In there it mentioned that its not recommended to remove stop words at index time. Why is this the case? Don't all

APR setup

2010-03-16 Thread blargy
[java] INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: .:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java What the heck is this and why is it recommended for production

Re: Trouble Implementing Extracting Request Handler

2010-03-16 Thread Lance Norskog
org/apache/solr/util/plugin/SolrCoreAware in the stack trace refers to an interface in the main Solr jar. I think this means that putting all of the libs in apache-tomcat-6.0.20/lib is a mistake: the classloader finds ExtractingRequestHandler in

spanish solr tutorial

2010-03-16 Thread Juan Pedro Danculovic
Hi all, we translated the Solr tutorial to Spanish due to a client's request. For all you Spanish speakers/readers out there, you can have a look at it: http://www.linebee.com/?p=155 We hope this can expand the usage of the project and lower the language barrier to non-english speakers. Thanks

Re: APR setup

2010-03-16 Thread Lance Norskog
That would be a Tomcat question :) On Tue, Mar 16, 2010 at 8:36 PM, blargy zman...@hotmail.com wrote: [java] INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path:

Re: problem during benchmarking solr query

2010-03-16 Thread Lance Norskog
Use a + sign or %20 for the space. The URL standard uses a plus to mean a space. On Tue, Mar 16, 2010 at 6:06 PM, KshamaPai kshamapai2...@gmail.com wrote: Hi, Am using autobench to benchmark solr with the query http://localhost:8983/solr/select/?q=body:hotel AND

Re: PDFBox/Tika Performance Issues

2010-03-16 Thread Mattmann, Chris A (388J)
Hi Giovanni, Comments below: I'm pretty unclear on how to patch the Tika 0.7-trunk on our Solr instance. This is what I've tried so far (which was really just me guessing): 1. Got the latest version of the trunk code from http://svn.apache.org/repos/asf/lucene/tika/trunk 2.

Re: field length normalization

2010-03-16 Thread Lance Norskog
You need to change your similarity object to be more sensitive at the short end. This is a patch about how to do this: http://issues.apache.org/jira/browse/LUCENE-2187 It involves Lucene coding. On Fri, Mar 12, 2010 at 3:19 AM, muneeb muneeba...@hotmail.com wrote:  Ah I see. Thanks very much

Issue in search

2010-03-16 Thread Suram
In solr how can perform AND, OR, NOT search while querying the data -- View this message in context: http://old.nabble.com/Issue-in-search-tp27927828p27927828.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr RAM Requirements

2010-03-16 Thread Dennis Gearon
Just turn your entire disk to RAM http://www.hyperossystems.co.uk/ 800X faster. Who cares if it swaps to 'disk' then :-) Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php