RE: How can i make a distribute search on Solr?

2007-09-19 Thread Jarvis
Thanks for your reply, I need the Federated Search. You mean this is not yet supported out of the box. So I have a question that in this situation what can Collection Distribution used for? Jarvis -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Wednesday,

Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Venkatraman S
Hi , Product : Solr (Embedded)Version : 1.2 Problem Description : While trying to add and search over the index, we are stumbling on this error again and again. Do note that the SolrCore is committed and closed suitably in our Embedded Solr. Error (StackTrace) : Sep 19, 2007 9:41:41 AM

multithread update client causes exceptions and dropped documents

2007-09-19 Thread Will Johnson
TestJettyLargeVolume.java Description: Binary data we were doing some performance testing for the updating aspects of solr and ran into what seems to be a large problem.  we're creating small documents with an id and one field of 1 term only submitting them in batches of 200 with commits every

Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Daley, Kristopher M.
I am using Tomcat 6 and Solr 1.2 on a Windows 2003 server using the following java code. I am trying to index pdf files, and I'm constantly getting errors on larger files (the same ones). SolrServer server = new CommonsHttpSolrServer(solrPostUrl); SolrInputDocument addDoc =

Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Bill Au
What files are there in your /data/pub/index directory? Bill On 9/19/07, Venkatraman S [EMAIL PROTECTED] wrote: Hi , Product : Solr (Embedded)Version : 1.2 Problem Description : While trying to add and search over the index, we are stumbling on this error again and again. Do note

Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Venkatraman S
Quite inetersting actually (this is for 5 documents that were indexed) : _0.fdt _0.prx _1.fnm _1.tis _2.nrm _3.fdx _3.tii _4.frq segments.gen _0.fdx _0.tii _1.frq _2.fdt _2.prx _3.fnm _3.tis _4.nrm segments_6 _0.fnm _0.tis _1.nrm _2.fdx _2.tii _3.frq _4.fdt _4.prx _0.frq

Re: multithread update client causes exceptions and dropped documents

2007-09-19 Thread Will Johnson
one other note. the errors pop up when running against the 1.3 trunk but do not appear to happen when run against 1.2. - will On 9/19/07, Will Johnson [EMAIL PROTECTED] wrote: we were doing some performance testing for the updating aspects of solr and ran into what seems to be a large

Re: multithread update client causes exceptions and dropped documents

2007-09-19 Thread Ryan McKinley
Can you start a JIRA issue and attach the patch? I have not seen this happen, but I bet it is caused by something from: https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.ext.subversion:subversion-commits-tabpanel Can we add that test to trunk? By default it does not

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Wed, 19 Sep 2007 01:46:53 -0400 Ryan McKinley [EMAIL PROTECTED] wrote: Stu is referring to Federated Search - where each index has some of the data and results are combined before they are returned. This is not yet supported out of the box Maybe this is related. How does this compare to

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Yonik Seeley
On 9/19/07, Norberto Meijome [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007 01:46:53 -0400 Ryan McKinley [EMAIL PROTECTED] wrote: Stu is referring to Federated Search - where each index has some of the It really should be Distributed Search I think (my mistake... I started out calling it

Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley
I have had this and other files index correctly using a different combination version of Tomcat/Solr without any problem (using similar code, I re-wrote it because I thought it would be better to use Solrj). I get the same error whether I use a simple StringBuilder to created the add manually or

Re: How can i make a distribute search on So lr?

2007-09-19 Thread Stu Hood
Nutch implements federated search separately from their index generation. My understanding is that MapReduce jobs generate the indexes (Nutch calls them segments) from raw data that has been downloaded, and then makes them available to be searched via remote procedure calls. Queries never pass

Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley
Daley, Kristopher M. wrote: I have tried changing those settings, for example, as: SolrServer server = new CommonsHttpSolrServer(solrPostUrl); ((CommonsHttpSolrServer)server).setConnectionTimeout(60); ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);

Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley
I'm stabbing in the dark here, but try fiddling with some of the other connection settings: getConnectionManager().getParams().setSendBufferSize( big ); getConnectionManager().getParams().setReceiveBufferSize( big );

RE: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Daley, Kristopher M.
Ok, I'll try to play with those. Any suggestion on the size? Something else that is very interesting is that I just tried to do an aggregate add of a bunch of docs, including the one that always returned the error. I called a function to create a SolrInputDocument and return it. I then did the

Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets

2007-09-19 Thread Laurent Hoss
Hi We want to (mis)use facet search to get the number of (unique) field values appearing in a document resultset. I thought facet search perfect for this, because it already gives me all the (unique) field values. But for us to be used for this special problem, we don't want all the values

Re: Select distinct in Solr

2007-09-19 Thread Ryan McKinley
Lance Norskog wrote: I believe I saw in the Javadocs for Lucene that there is the ability to return the unique values for one field for a search, rather than each record. Is it possible to add this feature to Solr? It is the equivalent of 'select distinct' in SQL. Look into faceting:

useColdSearcher = false... not working in 1.2?

2007-09-19 Thread Adam Goldband
Anyone else using this, and finding it not working in Solr 1.2? Since we've got an automated release process, I really need to be able to have the appserver not see itself as done warming up until the firstSearcher is ready to go... but with 1.2 this no longer seems to be the case. adam

Re: useColdSearcher = false... not working in 1.2?

2007-09-19 Thread Yonik Seeley
On 9/19/07, Adam Goldband [EMAIL PROTECTED] wrote: Anyone else using this, and finding it not working in Solr 1.2? Since we've got an automated release process, I really need to be able to have the appserver not see itself as done warming up until the firstSearcher is ready to go... but with

Re: Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets

2007-09-19 Thread Yonik Seeley
On 9/19/07, Laurent Hoss [EMAIL PROTECTED] wrote: We want to (mis)use facet search to get the number of (unique) field values appearing in a document resultset. We have paging of facets, so just like normal search results, it does make sense to list the total number of facets matching. The

Exact phrase highlighting

2007-09-19 Thread Marc Bechler
Hi out of there, I just walked through the mailing list archive, but I did not find an appropriate answer for phrase highlighting. I do not have any highlighting section (and no dismax handler definition) in solrconfig.xml. This way (AFAIK :-)), the standard lucene query syntax should be

Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Chris Hostetter
: Product : Solr (Embedded)Version : 1.2 : java.io.FileNotFoundException: no segments* file found in : org.apache.lucene.store.FSDirectory@/data/pub/index: files: According to that, the FSDirectory was empty when it ws opened (a file list is suppose to come after that files: part) you

Re: Exact phrase highlighting

2007-09-19 Thread Mike Klaas
On 19-Sep-07, at 1:12 PM, Marc Bechler wrote: Hi out of there, I just walked through the mailing list archive, but I did not find an appropriate answer for phrase highlighting. I do not have any highlighting section (and no dismax handler definition) in solrconfig.xml. This way (AFAIK

Re: DisMax queries referencing undefined fields

2007-09-19 Thread Chris Hostetter
: I noticed that the field list (fl) parameter ignores field names that it : cannot locate, while the query fields (qf) parameter throws an exception : when fields cannot be located. Is there any way to override this behavior and : have qf also ignore fields it cannot find? Those parameters are

Re: Exact phrase highlighting

2007-09-19 Thread Mike Klaas
On 19-Sep-07, at 2:39 PM, Marc Bechler wrote: Hi Mike, thanks for the quick response. It would make a great project to get one's hands dirty contributing, though :) ... sounds like giving a broad hint ;-) Sounds challenging... I'm not sure about that--it is supposed to be a drop-in

Re: Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets

2007-09-19 Thread Chris Hostetter
: The main problem with implementing this is trying to figure out where : to put the info in a backward compatible manner. Here is how the info 1) this seems like the kind of thing that would only be returend if requested -- so we probably don't have to be overly concerned about backwards

RE: Triggering snapshooter through web admin interface

2007-09-19 Thread Chris Hostetter
lance: since the topic you are describing is not directly related to triggering a snapshot from the web interface can you please start a new thread with a unique subejct describing in more details exactly what it was you were doing and the problem you encountered? this will make it easier for

rsync start and enable for multiple solr instances within one tomcat

2007-09-19 Thread Yu-Hui Jin
Hi, there, So we are using the Tomcat's JNDI method to set up multiple solr instances within a tomcat server. Each instance has a solr home directory. Now we want to set up collection distribution for all these solr home indexes. My understanding is: 1. we only need to run rsync-start once use

Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley
However, if I go to the tomcat server and restart it after I have issued the process command, the program returns and the documents are all posted correctly! Very strange behavioram I somehow not closing the connection properly? What version is the solr you are connecting to? 1.2 or

setting absolute path for snapshooter in solrconfig.xml doesn't work

2007-09-19 Thread Yu-Hui Jin
Hi, there, I used an absolute path for the dir param in the solrconfig.xml as below: listener event=postCommit class=solr.RunExecutableListener str name=exesnapshooter/str str name=dir/var/SolrHome/solr/bin/str bool name=waittrue/bool arr name=args strarg1/str

RE: How can i make a distribute search on Solr?

2007-09-19 Thread Jarvis
Nutch has two ways to make a distributed query - through HDFS(hadoop file system) or RPC call that is in org.apache.nutch.searcher.DistributedSearch class. But I think these are both not good enough. If we use HDFS to service the user's query. Stability is a problem. We must all do the crawl ,

Filter by Group

2007-09-19 Thread mark angelillo
Hey all, Let's say I have an index of one hundred documents, and these documents are grouped into 4 groups A, B, C, and D. The groups do in fact overlap. What would people recommend as the best way to apply a search query and return only the documents that are in group A? Also, how about

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Wed, 19 Sep 2007 10:29:54 -0400 Yonik Seeley [EMAIL PROTECTED] wrote: Maybe this is related. How does this compare to the map-reduce functionality in Nutch/Hadoop ? map-reduce is more for batch jobs. Nutch only uses map-reduce for parallel indexing, not searching. I see... so in

Term extraction

2007-09-19 Thread Pieter Berkel
I'm currently looking at methods of term extraction and automatic keyword generation from indexed documents. I've been experimenting with MoreLikeThis and values returned by the mlt.interestingTerms parameter and so far this approach has worked well. However, I'd like to be able to analyze

RE: How can i make a distribute search on Solr?

2007-09-19 Thread Jarvis
I think index data which stored in HDFS and generated by map-reduce function is used for searching in NUTCH-0.9 You can see the code in org.apache.nutch.searcher.NutchBean class . :) Jarvis -Original Message- From: Norberto Meijome [mailto:[EMAIL PROTECTED] Sent: Thursday, September

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 09:37:51 +0800 Jarvis [EMAIL PROTECTED] wrote: If we use the RPC call in nutch . Hi, I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in this league to be suggesting architecture stuff :) but i imagine there's nothing wrong with using what they've built

Re: Term extraction

2007-09-19 Thread Brian Whitman
On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote: I'm currently looking at methods of term extraction and automatic keyword generation from indexed documents. We do it manually (not in solr, but we put the results in solr.) We do it the usual way - chunk (into n-grams, named entities

Re: Filter by Group

2007-09-19 Thread Pieter Berkel
Sounds like you're on the right track, if your groups overap (i.e. a document can be in group A and B), then you should ensure your groups field is multivalued. If you are searching for foo in documents contained in group A, then it might be more efficient to use a filter query (fq) like:

RE: How can i make a distribute search on Solr?

2007-09-19 Thread Jarvis
HI, What you say is done by hadoop that support Hardware Failure、Data Replication and some else . If we want to implement such a good system by ourselves without HDFS but Solr , it's a very very complex work I think. :) I just want to know whether there is a component

Re: Term extraction

2007-09-19 Thread Pieter Berkel
Thanks Brian, I think the smart approaches you refer to might be outside the scope of my current project. The documents I am indexing already have manually-generated keyword data, moving forward I'd like to have these keywords automatically generated, selected from a pre-defined list of keywords

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Mike Klaas
On 19-Sep-07, at 7:21 PM, Jarvis wrote: HI, What you say is done by hadoop that support Hardware Failure、Data Replication and some else . If we want to implement such a good system by ourselves without HDFS but Solr , it's a very very complex work I think. :) I just want

Re: setting absolute path for snapshooter in solrconfig.xml doesn't work

2007-09-19 Thread Yu-Hui Jin
Hi, Pieter, Thanks! Now the exception is gone. However, There's no snapshot file created in the data directory. Strangely, the snapshooter.log seems to complete successfully. Any idea what else I'm missing? $ cat var/SolrHome/solr/logs/snapshooter.log 2007/09/19 20:16:17 started by solruser

Re: setting absolute path for snapshooter in solrconfig.xml doesn't work

2007-09-19 Thread Pieter Berkel
If you don't need to pass any command line arguments to snapshooter, remove (or comment out) this line from solrconfig.xml: arr name=args strarg1/str strarg2/str /arr By the same token, if you're not setting environment variables either, remove the following line as well: arr name=env

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 10:02:08 +0800 Jarvis [EMAIL PROTECTED] wrote: You can see the code in org.apache.nutch.searcher.NutchBean class . :) thx for the pointer. _ {Beto|Norberto|Numard} Meijome In order to avoid being called a flirt, she always yielded easily. Charles,

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 10:21:39 +0800 Jarvis [EMAIL PROTECTED] wrote: What you say is done by hadoop that support Hardware Failure、Data Replication and some else . If we want to implement such a good system by ourselves without HDFS but Solr , it's a very very complex work I think.

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Venkatraman S
Along similar lines : assuming that i have 2 indexes in the same box , say at : /home/abc/data/index1 and /home/abc/data/index2, and i want the results from both the indexes when i do a search - then how should this be 'optimally' designed - basically these are different Solr homes and i want

Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Venkatraman S
On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote: you imply that you are building your index using embedded solr, but based on your stack trace it seems you are using Solr in a servlet container ... i assume to search the index you've already built? I have a jsp that routes the info