Re: HOWTO get a working copy of SOLR?

2010-06-16 Thread Bernd Fehling
Sixten Otto wrote: On Tue, Jun 15, 2010 at 12:58 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: - changed to SOLR branch_3x. Installs fine, runs fine, luke works fine but the extraction with /update/extract (ExtractingRequestHandler) only replies the metadata but not the content.

Re: Field Collapsing SOLR-236

2010-06-16 Thread Rakhi Khatwani
Hi, I wanted to try out field collapsing for a requirement. i went through the wiki and solr-236. but there are lot of patch files. and the comments below left me confused. i tried applyin the patch file on 1.4.0 release but ended up with many compile errors. i even downloaded the latest

Solr: query in admin and where is my data?

2010-06-16 Thread cstc
Dear Solr gurus, I am still currently running a script which says that the Solr software is still commiting the data: == INFO: [] Registered new searcher searc...@3b48a17a main Jun 16, 2010 12:56:58 PM org.apache.solr.search.SolrIndexSearcher close INFO: Closing

TermsComponent Reverse !?

2010-06-16 Thread stockii
Hello again Nabble :D TermsComponent works fine so far, but how can i get the same result for the typing: harry pot - harry potter AND potter harr - harry potter i try ReversedWildcardFilterFactory, but i dont want the reversed Word. i want the reversed sentence. ^^ thx -- View this

access term vectors in lucene

2010-06-16 Thread sarfaraz masood
hello all, I wanna know that how can we access terms vectors in lucene.. actually i making a project where i need tf idf values of all the terms in the documents.. but i m unable to get any reference eg where it shows how to use these term vectors to get the tf idf values of ALL the terms in

Re: Solr: query in admin and where is my data?

2010-06-16 Thread Otis Gospodnetic
Hello, Yes, this looks like it's working correctly - it looks like the docs are getting committed. You should see some logging messages about the searcher being reloaded after the commit. When that happens you will see your changes in the index. Otis Sematext :: http://sematext.com/

Re: HOWTO get a working copy of SOLR?

2010-06-16 Thread Otis Gospodnetic
Bernd, Not everything has to be bundled in one package. :) Luke is a separate project. It also depends on some software that makes it unsuitable for bundling with Lucene/Solr for licensing reasons. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search ::

Re: SolrEventListener

2010-06-16 Thread Otis Gospodnetic
Hi, Look at https://issues.apache.org/jira/browse/SOLR-795?focusedCommentId=12642870page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12642870 Look for buildOn string in various example solrconfigs. Otis Sematext :: http://sematext.com/ :: Solr - Lucene -

Re: Spellchecker index cannot be optimized

2010-06-16 Thread Otis Gospodnetic
Lutz, Look at this: http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL You should be able to do this with your spellchecker index, too. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search ::

Re: Master master?

2010-06-16 Thread Otis Gospodnetic
Hello, The closest thing to this with Solr is a Repeater: http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater A Repeater is an instance that acts as both the master and slave at the same time. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem

Re: Field Collapsing SOLR-236

2010-06-16 Thread Moazzam Khan
Hi Rakhi, You are supposed to get the code for solr 1.4 from SVN here: http:/svn.apache.org/repos/asf/lucene/solr/tags/ Then apply the path to it and comppile. It should work. However, you will probably get an error at run time saying some java class is missing. I haven't been able to figure

Re: Some basics

2010-06-16 Thread Otis Gospodnetic
Frank, Is the following what you are after: Here is a query for my last name, but misspelled: http://search-lucene.com/?q=gospodneticc But if you look above the results, you will see this text: Search results for gospodnetic : ... and the search results are indeed for the auto-corrected

Re: Solr: query in admin and where is my data?

2010-06-16 Thread cstc
Hello, Thank you for your help. Q1: Do I have to wait until all the data is fully committed before querying? Q2: I put '*:*' (without quotes) in the admin query box. Is that the correct syntax for a search? Q3: Why did it come back with no results if the data is being committed? Again, any help

Re: Solr: query in admin and where is my data?

2010-06-16 Thread Otis Gospodnetic
Hello, A1: you can search and commit concurrently. Sounds like you are using 1 box with Solr to both index and search. Solr is typically deployed a multi-node cluster where 1 of those nodes is a master that does all indexing, and 1 or more slaves perform searches. A2: *:* is correct A3:

Re: Issue with response header in SOLR running on Linux instance

2010-06-16 Thread bbarani
Hi, Thanks a lot for your response. My issue now is that the response header is not at all consistent. Sometimes the response header is in this format, - responseHeader status0/status QTime/QTime - lst name=params str name=qcredit/str /lst /responseHeader sometimes its in

Is there a way to set the default response handler version to 2.2

2010-06-16 Thread bbarani
Hi, I am facing issues in getting back the response header correctly in different OS (Windows / Linux). I could see that in windows OS, the version of response handler is set to 2.2 by default and even without specifying the version in the query I am getting the proper response header. In

Re: Is there a way to set the default response handler version to 2.2

2010-06-16 Thread Otis Gospodnetic
BB, You can set the versoin param default in solrconfig.xml. Here is a snippet: requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int

Re: Solr 1.4 and Nutch 1.0 Integration

2010-06-16 Thread Otis Gospodnetic
Dean, In general, you'll get more help about Nutch with Solr on the Nutch list than on the Solr one. Here it the info: http://wiki.apache.org/nutch/RunningNutchAndSolr Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/

Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread MitchK
Hello community, from several discussions about Solr and Nutch, I got some questions for a virtual web-search-engine. I know I've posted this message to the mailing list a few days ago, but the thread got injected and at least I did not get any more postings about the topic and so I try to

Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread Otis Gospodnetic
My quick feedback would be: Try using Nutch first, because it is a more complete platform. From what I know, Droids is just the crawler with an in-memory queue + link extractor. We did use it for crawling Lucene project sites (for the index on http://search-lucene.com/ ), but that is because

Re: Solr 1.4 and Nutch 1.0 Integration

2010-06-16 Thread Dean Del Ponte
Thanks! On Wed, Jun 16, 2010 at 10:24 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Dean, In general, you'll get more help about Nutch with Solr on the Nutch list than on the Solr one. Here it the info: http://wiki.apache.org/nutch/RunningNutchAndSolr Otis Sematext ::

Re: Field Collapsing SOLR-236

2010-06-16 Thread Moazzam Khan
Actually I take that back. I am just as lost as you. I wish there was a tutorial on how to do this (although I get the feeling that once I know how to do it I will go ohh... I can't believe I couldn't figure that out) - Moazzam On Wed, Jun 16, 2010 at 8:25 AM, Moazzam Khan moazz...@gmail.com

Re: Field Collapsing SOLR-236

2010-06-16 Thread Eric Caron
I've had the best luck checking out the newest Solr/Lucene (so the 1.5-line) from SVN, then just doing patch -p0 SOLR-236-trunk.patch from inside the trunk directory. I just did it against the newest checkout and it works fine still. On Wed, Jun 16, 2010 at 11:35 AM, Moazzam Khan

Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread MitchK
Thank you for the feedback, Otis. Yes, I thought that such an approach is usefull if the number of pages to crawl is relatively low. However, what about using solr + nutch? Exists the problem that this would not scale, if the index becomes too large, up to now? What about extending nutch with

Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread Otis Gospodnetic
Mitch, I think you really have 2 distinct questions there: One question is Nutch vs. Droids. The other one is Solr vs. Nutch for search. My suggestions: * Use Nutch, not Droids, if scaling is important * Use Solr, not Nutch's search webapp Otis Sematext :: http://sematext.com/ :: Solr -

Re: Some questions about ability of solr.

2010-06-16 Thread Otis Gospodnetic
Vitaliy: Check http://blog.sematext.com/2010/06/01/hbase-digest-may-2010/ and http://twitter.com/otisg/status/16320594923 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Vitaliy

Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread MitchK
Thanks, that really helps to find the right beginning for such a journey. :-) * Use Solr, not Nutch's search webapp As far as I have read, Solr can't scale, if the index gets too large for one Server The setup explained here has one significant caveat you also need to keep in mind:

Re: access term vectors in lucene

2010-06-16 Thread Grant Ingersoll
See http://wiki.apache.org/solr/TermVectorComponent YOu might also be interested in the TermsComponent: http://wiki.apache.org/solr/TermsComponent On Jun 16, 2010, at 8:47 AM, sarfaraz masood wrote: hello all, I wanna know that how can we access terms vectors in lucene.. actually i

Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread Otis Gospodnetic
Hi Mitch, Solr can do distributed search, so it can definitely handle indices that can't fit on a single server without sharding. What I think *might* be the case that the Nutch indexer that sends docs to Solr might not be capable of sending documents to multiple Solr cores/shards. If that

LocalParams, quotes, bug?

2010-06-16 Thread Jonathan Rochkind
So using LocalParams with dollar-sign references to other parameters. In LocalParams in general, you can use single-quotes for values that have spaces in them: {!dismax qf='field^5 field2^10'}= no problem And even if the value does not have spaces, you can use single quotes too, why

RE: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread Markus Jelsma
Nutch does not, at this moment, support some form of consistent hashing to select an appropriate shard. It would be nice if someone could file an issue in Nutch' Jira to add sharding support to it, perhaps someone with a better understanding and more experience with Solr's distributed search

Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread Otis Gospodnetic
Well, it's not that Nutch doesn't support it. Solr itself doesn't support it. Indexing applications need to know which shard they want to send documents to. This may be a good case for a new wish issue in Solr JIRA? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene

RE: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread Markus Jelsma
You're right. Currently clients need to take care of this, in this case, Nutch would be the client but it cannot be configured as such. It would, indeed, be more appropriate for Solr to take care of this. We can already query any server with a set of shard hosts specified, so it would make

SOLR search performance - Linux vs Windows servers

2010-06-16 Thread bbarani
Hi, I have SOLR instances running in both Linux / windows server (same version / same index data). Search performance is good in windows box compared to Linux box. Some queries takes more than 10 seconds in Linux box but takes just a second in windows box. Have anyone encountered this kind of

Re: Reindexing only occurs after bouncing app

2010-06-16 Thread John Ament
So just to throw the idea out there, what would happen if I shutdown and created a new solrServer on reindex? We only reindex daily. Will that force the reread of all lucene files? John On Tue, Jun 15, 2010 at 4:47 PM, John Ament my.repr...@gmail.com wrote: Hi all I wrote a small app using

Re: Field Collapsing SOLR-236

2010-06-16 Thread Moazzam Khan
I did the same thing. And, the code compiles without the patch but when I apply the patch I get these errors: [javac] C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\ FieldValueCountCollapseCollectorFactory.java:127: class, interface, or enum expe cted [javac] import

Re: SOLR search performance - Linux vs Windows servers

2010-06-16 Thread Otis Gospodnetic
BB, Could it be that you are comparing apples and oranges? * Is the hardware identical? * Are indices identical? * Are JVM versions the same? * Are JVM arguments identical? * Are the two boxes equally idle when Solr is not running? * etc. In general, no, there is no reason why Windows would

Re: SOLR search performance - Linux vs Windows servers

2010-06-16 Thread Israel Ekpo
Thats a good note. I get this kind of question a lot. Most of the time, the reason is because there are database servers (MySQL) and Webservers (Apache) and other processes running on the Linux box. Try to verify that the load, number of processors/cores as well as other environment settings

Re: Field Collapsing SOLR-236

2010-06-16 Thread Moazzam Khan
I got the code from trunk again and now I get this error: [javac] symbol : class StringIndex [javac] location: interface org.apache.lucene.search.FieldCache [javac] private final MapString, FieldCache.StringIndex fieldCaches = new HashMapString, FieldCache.StringIndex();

Re: Issue with response header in SOLR running on Linux instance

2010-06-16 Thread Chris Hostetter
: My issue now is that the response header is not at all consistent. : : Sometimes the response header is in this format, ... : sometimes its in this format (for same query) same query differnet solr instance, or same query same server? please be specific .. show us URLs, show us

Re: SolrCoreAware

2010-06-16 Thread Chris Hostetter
: Can someone please explain what the inform method should accomplish? Thanks whatever you want it to acomplish ... it's just a hook that (some types of) plugins can use to finish their initialize themselves after init() has been called on the SolrCore and all of the other plugins. (it's a

Re: SolrEventListener

2010-06-16 Thread Chris Hostetter
: Can someone explain how to register a SolrEventListener? *typically* it's done using the listener syntax as noted on the wiki page you linked to However... : I am actually interested in using the SpellCheckerListener and it appears ...that listener is designed to be registred for you

Re: Field Collapsing SOLR-236

2010-06-16 Thread Mark Diggory
Blargy? I produced a patched version of Solr 1.4 and released it into the maven central repository under our DSpace groupid as a dependency for our applications. Your welcome to test it out and use our code for examples. Although, it is not the most recent patch of Field Collapsing, it has

Re: LocalParams, quotes, bug?

2010-06-16 Thread Yonik Seeley
On Wed, Jun 16, 2010 at 3:27 PM, Jonathan Rochkind rochk...@jhu.edu wrote: {!dismax qf=$some_qf}   = no problem, and debugQuery reveals it is indeed using the qf I desire. {!dismax qf='$some_qf'}  = Solr throws undefined field $some_qf. Is this a bug in Solr? Nope, it's by design. Parameter

Re: Indexing HTML files in SOLR

2010-06-16 Thread Lance Norskog
This is the tool in Solr for indexing various kinds of content. After you learn the basics of indexing (see solr/example/exampledocs for samples), the ExtractingRequestHandler will make sense: http://wiki.apache.org/solr/ExtractingRequestHandler On Tue, Jun 15, 2010 at 12:35 AM, seesiddharth

how to index the words of a lecture transcript, and the timecodes for each word?

2010-06-16 Thread Peter Wilkins
I have lecture transcripts with start and stop times for each word. The time codes allow us to search the transcripts, and show the part of the lecture video that contain the search results. I want to structure the index so that I can search the transcripts for phrases, and have the search

MailEntityProcessor class cast exception

2010-06-16 Thread Max Lynch
With last night's build of solr, I am trying to use the MailEntityProcessor to index an email account. However, when I call my dataimport url, I receive a class cast exception: INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=44 Jun 16, 2010 8:16:03 PM

SpellCheckComponent questions

2010-06-16 Thread Blargy
Is it generally wiser to build the dictionary from the existing index? Search Log? Other? For Did you mean does one usually just use collate=true and then return that string? Should I be using a separate spellchecker handler to should I just always include spellcheck=true in my original search

Re: Solr DataConfig / DIH Question

2010-06-16 Thread Alexey Serba
There is a 1-[0,1] relationship between Person and Address with address_id being the nullable foreign key. I think you should be good with single query/entity then (no need for nested entities) entity name=person query=select person.id, person.name, person.address_id, address.zipcode from

Re: SpellCheckComponent questions

2010-06-16 Thread Blargy
Follow up question. How can I influence the scoring of results that comeback either through term frequency (if i build of an index) or through # of search results returned (if using a search log)? Thanks -- View this message in context:

Re: how to apply patch SOLR-1316

2010-06-16 Thread Blargy
Im trying to apply this via the command line patch -p0 SOLR-1316.patch. When patching against trunk I get the following errors. ~/workspace $ patch -p0 SOLR-1316.patch patching file dev/trunk/solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java Hunk #2 succeeded at 575

RE: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-16 Thread MitchK
Good morning! Great feedback from you all. This really helped a lot to get an impression of what is possible and what is not. What is interesting to me are some detail questions. Let's assume Solr is possible to work on his own with distributed indexing, so that the client does not need to