Re: DataimportHandler development issue

2011-01-14 Thread Gora Mohanty
On Fri, Jan 14, 2011 at 12:17 AM, Derek Werthmuller dwert...@ctg.albany.edu wrote: Its not clear why its not working.  Advice? Also is this the best way to load data?  We intent on loading several thousand docbook documents once we understand how this all works.  We stuck with the rss/atom

Re: Improving Solr performance

2011-01-14 Thread supersoft
The tests are performed with a selfmade program. The arguments are the number of threads and the path to a file which contains available queries (in the last test only one). When each thread is created, it gets the current date (in milisecs), and when it gets the response from the query, the

Re: Solr 4.0 = Spatial Search - How to

2011-01-14 Thread Stefan Matheis
caman, how did you try to concat them? perhaps some typecasting would do the trick? Stefan On Fri, Jan 14, 2011 at 7:20 AM, caman aboxfortheotherst...@gmail.comwrote: Thanks Here was the issues. Concatenating 2 floats(lat,lng) at mysql end converted it to a BLOB. Indexing would fail in

Re: Dismax, Sharding and Elevation

2011-01-14 Thread Oliver Marahrens
Hi, thank you for your reply, Grijesh. But Elevation in general works with sharding - if I used the Standard Request Handler instead of Dismax. I just wonder how (or if) it could work also with dismax. I think its not a problem of distributed search, but one of dismax (perhaps combined with

Solr and Ping PHP

2011-01-14 Thread stockii
Hello. Iam using NRT and for each search-request, updater-request and commit-request (on the search-instance) i start a ping to solr with a httpRequest. But sometimes ping isnt okay, but sor is available. Why cannot solr ping, when he is doing something like Commit on my searcher or when a

Re: Searchers and Warmups

2011-01-14 Thread Savvas-Andreas Moysidis
Hi David, maybe the wiki page on caching could be helpful: http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners Regards, - Savvas On 14 January 2011 00:08, David Cramer

Re: Searchers and Warmups

2011-01-14 Thread Tommaso Teofili
Hi David, The idea is that you can define some listeners which make a list of queries to an IndexSearcher. In particular the firstSearcher event is related to the very first IndexSearcher being created inside the Solr instance while the newSearcher is the event related to the creation of a new

Re: Solr 4.0 = Spatial Search - How to

2011-01-14 Thread Stefan Matheis
absolutely no idea why it is a blob .. but the following one works as expected: CAST( CONCAT( lat, ',', lng ) AS CHAR ) HTH Stefan On Fri, Jan 14, 2011 at 9:31 AM, caman aboxfortheotherst...@gmail.comwrote: CONCAT(CAST(lat as CHAR),',',CAST(lng as CHAR)) -- View this message in context:

Schema design FAQs/questions

2011-01-14 Thread Matthias Pigulla
Dear Solr-users, is there a compilation of FAQs particularly targeting at schema design? I have a two questions that probably have been asked before: - I have to map different kinds of documents into my schema. Some of these documents have one or multiple time/dates that might be relevant for

solr speed issues..

2011-01-14 Thread saureen
I am working on an application that requires fetching results from solr based on date parameter..earlier i was using sharding to fetch the results but that was making things too slow,so instead of sharding,i queried on three different cores with the same parameters and merged the results..still

Query : FAQ? Forum?

2011-01-14 Thread Cathy Hemsley
Hi, I am trying to get Solr installed and working: and have some queries: is there a FAQ or a Forum? How do I search to see whether someone has already asked my question and answered it? Regards Cathy -- Converteam UK Ltd. Registration Number: 5571739 and Converteam Ltd. Registration

Re: Query : FAQ? Forum?

2011-01-14 Thread Stefan Matheis
What about http://search.lucidimagination.com/search/#/p:solr ? :) On Fri, Jan 14, 2011 at 12:45 PM, Cathy Hemsley cathy.hems...@converteam.com wrote: Hi, I am trying to get Solr installed and working: and have some queries: is there a FAQ or a Forum? How do I search to see whether

boilerpipe solr tika howto please

2011-01-14 Thread arnaud gaudinat
Hello, I would like to use BoilerPipe (a very good program which cleans the html content from surplus clutter). I saw that BoilerPipe is inside Tika 0.8 and so should be accessible from solr, am I right? How I can Activate BoilerPipe in Solr? Do I need to change solrconfig.xml ( with

Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)

2011-01-14 Thread Jörg Agatz
ok, now in the 4 test, it works ? ok.. i dont know... it works.. but now i have a Oher Problem, i cant sent content to the Server.. when i will send Content to solr i get: html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 400 /title /head bodyh2HTTP

Re: segment gets corrupted (after background merge ?)

2011-01-14 Thread Michael McCandless
Right, but removing a segment out from under a live IW (when you run CheckIndex with -fix) is deadly, because that other IW doesn't know you've removed the segment, and will later commit a new segment infos still referencing that segment. The nature of this particular exception from CheckIndex is

Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Cathy Hemsley
Hi Solr users, I hope you can help. We are migrating our intranet web site management system to Windows 2008 and need a replacement for Index Server to do the text searching. I am trying to establish if Lucene and Solr is a feasible replacement, but I cannot find the answers to these questions:

Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)

2011-01-14 Thread Stefan Matheis
pass an value for your id-field as you do it already for all the other fields? http://search.lucidimagination.com/search/document/ca95d06e700322ed/missing_required_field_id_using_extractingrequesthandler On Fri, Jan 14, 2011 at 12:59 PM, Jörg Agatz joerg.ag...@googlemail.comwrote: ok, now in

Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Markus Jelsma
Please visit the Nutch project. It is a powerful crawler and can integrate with Solr. http://nutch.apache.org/ Hi Solr users, I hope you can help. We are migrating our intranet web site management system to Windows 2008 and need a replacement for Index Server to do the text searching. I

Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Toke Eskildsen
On Fri, 2011-01-14 at 13:05 +0100, Cathy Hemsley wrote: I hope you can help. We are migrating our intranet web site management system to Windows 2008 and need a replacement for Index Server to do the text searching. I am trying to establish if Lucene and Solr is a feasible replacement, but I

Re: Adding a new site to existing solr configuration

2011-01-14 Thread PeterKerk
Awesome! thx! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-new-site-to-existing-solr-configuration-tp2249223p2255160.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr: using to index large folders recursively containing lots of different documents, and querying over the web

2011-01-14 Thread Markus Jelsma
Nutch can crawl the file system as well. Nutch 1.x can also provide search but this is delegated to Solr in Nutch 2.x. Solr can provide the search and Nutch can provide Solr with content from your intranet. On Friday 14 January 2011 13:17:52 Cathy Hemsley wrote: Hi, Thanks for suggesting

Is deduplication possible during Tika extract?

2011-01-14 Thread arnaud gaudinat
Hello, here is an excerpt of my solrconfig.xml: requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler startup=lazy lst name=defaults str name=update.processordedupe/str !-- All the main content goes into text... if you need to return

Re: segment gets corrupted (after background merge ?)

2011-01-14 Thread Stéphane Delprat
So I ran checkIndex (without -fix) 5 times in a row : SOLR was running, but no client connected to it. (just the slave which was synchronizing every 5 minutes) summary : 1: all good 2: 2 errors: (seg 1 2) terms, freq, prox...ERROR [term blog_id:104150: doc 324697 = lastDoc 324697] terms,

LukeRequestHandler histogram?

2011-01-14 Thread Bernd Fehling
Dear list, what is the LukeRequestHandler histogram telling me? Couldn't find any explanation and would be pleased to have it explained. Many thanks in advance, Bernd

Re: LukeRequestHandler histogram?

2011-01-14 Thread Stefan Matheis
Hi Bernd, there is an explanation from Hoss: http://search.lucidimagination.com/search/document/149e7d25415c0a36/some_kind_of_crazy_histogram#b22563120f1ec32b HTH Stefan On Fri, Jan 14, 2011 at 3:15 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Dear list, what is the

Re: LukeRequestHandler histogram?

2011-01-14 Thread Bernd Fehling
Hi Stefan, thanks a lot. Regards, Bernd Am 14.01.2011 15:25, schrieb Stefan Matheis: Hi Bernd, there is an explanation from Hoss: http://search.lucidimagination.com/search/document/149e7d25415c0a36/some_kind_of_crazy_histogram#b22563120f1ec32b HTH Stefan On Fri, Jan 14, 2011 at

Re: Query : FAQ? Forum?

2011-01-14 Thread kenf_nc
http://wiki.apache.org/solr/FrontPage Solr Wiki http://wiki.apache.org/solr/FAQ Solr FAQ http://www.amazon.com/Solr-1-4-Enterprise-Search-Server/dp/1847195881/ref=sr_1_1?ie=UTF8qid=1295018231sr=8-1 A good book on Solr And this forum you posted to

Re: boilerpipe solr tika howto please

2011-01-14 Thread Adam Estrada
Is there a drastic difference between this and TagSoup which is already included in Solr? On Fri, Jan 14, 2011 at 6:57 AM, arnaud gaudinat arnaud.gaudi...@gmail.comwrote: Hello, I would like to use BoilerPipe (a very good program which cleans the html content from surplus clutter). I saw

Re: boilerpipe solr tika howto please

2011-01-14 Thread arnaud gaudinat
I just saw TagSoup and it seems to clean bad HTML tags to create a good HTML file. what's BoilerPipe does, it try to eliminate html content which is not part of the useful content for a human reader (ie. navigation contents, ads, comments...) take a look here: http://boilerpipe-web.appspot.com/

Re: Improving Solr performance

2011-01-14 Thread Gora Mohanty
On Fri, Jan 14, 2011 at 1:56 PM, supersoft elarab...@gmail.com wrote: The tests are performed with a selfmade program. [...] May I ask in what language is the program written in? The reason to ask that is to eliminate the possibility that there is an issue with the threading model, e.g., if you

Re: boilerpipe solr tika howto please

2011-01-14 Thread Ken Krugler
Hi Arno, On Jan 14, 2011, at 3:57am, arnaud gaudinat wrote: Hello, I would like to use BoilerPipe (a very good program which cleans the html content from surplus clutter). I saw that BoilerPipe is inside Tika 0.8 and so should be accessible from solr, am I right? How I can Activate

Re: Variable datasources

2011-01-14 Thread tjpoe
I was actually able to figure this out using a slightly different method since the databases exist on the same server I simply made a single datasource with no database selected: datasource url=jdbc:mysql://localhost/ name=content / then in the queries, I qualify using the full database

No system property or default value specified for...

2011-01-14 Thread Tanner Postert
I'm trying to dynamically add a core to a multi core system using the following command: http://localhost:8983/solr/admin/cores?action=CREATEname=itemsinstanceDir=itemsconfig=data-config.xmlschema=schema.xmldataDir=datapersist=true the data-config.xml looks like this: dataConfig dataSource

Re: segment gets corrupted (after background merge ?)

2011-01-14 Thread Michael McCandless
OK given that you're seeing non-deterministic results on the same index... I think this is likely a hardware issue or a JRE bug? If you move that index over to another env and run CheckIndex, is it consistent? Mike On Fri, Jan 14, 2011 at 9:00 AM, Stéphane Delprat

DataImportHandler: full import of a single entity

2011-01-14 Thread Jon Drukman
I've got a DataImportHandler set up with 5 entities. I would like to do a full import on just one entity. Is that possible? I worked around it temporarily by hand editing the dataimport.properties file and deleting the delta line for that one entity, and kicking off a delta. But for

MaxRows and disabling sort

2011-01-14 Thread Salman Akram
Hi, I want to limit my SOLR results so that it stops further searching once it founds a certain number of records (just like 'limit' in MySQL). I know it has timeAllowed property but is there anything like MaxRows? I am NOT talking about 'rows' attribute which returns a specific no. of rows to

Re: Multi-word exact keyword case-insensitive search suggestions

2011-01-14 Thread Erick Erickson
This might work: Define your field to use WhitespaceTokenizer and LowerCaseFilterFactory Use a filter query referencing this field. If you wanted the words to appear in their exact order, you could just define the pf field in your dismax. Best Erick On Thu, Jan 13, 2011 at 8:01 PM, Estrada

Re: solr speed issues..

2011-01-14 Thread Erick Erickson
You haven't given us much information here, it might help to review: http://wiki.apache.org/solr/UsingMailingLists In addition to Kenf_nc's comments, your sorting may be an issue, especially if you're measuring the first query times. What does debugQuery=on show? How many docs in your index?

Re: MaxRows and disabling sort

2011-01-14 Thread Erick Erickson
Why do you want to do this? That is, what problem do you think would be solved by this? Because there are other problems if you're trying to, say, return all rows that match But no, there's nothing that I know of that would do what you want (of course that doesn't mean there isn't). Best

Re: MaxRows and disabling sort

2011-01-14 Thread Salman Akram
In some cases my search takes too long. Now I want to show user partial matches if its taking too long. The problem with timeAllowed is that lets say I set its value to 10 secs then for some queries it would be fine and will at least return few hundred rows but in really worse scenarios it might

Re: MaxRows and disabling sort

2011-01-14 Thread Chris Hostetter
: Also I guess default sorting is on Scoring and sorting can only be done once : it has the scores of all matches so then limiting it to the max rows becomes : useless. So if there a way to disable sorting? e.g. it returns the rows as : it finds without any order?

Re: Multi-word exact keyword case-insensitive search suggestions

2011-01-14 Thread Chamnap Chhorn
Ahh, thanks guys for helping me! For Adam solution, it doesn't work for me. Here is my Field, FieldType, and solr query: fieldType name=text_keyword class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory / filter