Re: Near Duplicate Documents

2007-11-20 Thread Rishabh Joshi
Otis, Thanks for your response. I just gave a quick look to the Nutch Forum and find that there is an implementation to obtain de-duplicate documents/pages but none for Near Duplicates documents. Can you guide me a little further as to where exactly under Nutch I should be concentrating, regardin

Re: two solr instances - index and commit

2007-11-20 Thread Otis Gospodnetic
Uh, avoid NFS and Lucene/Solr, unless you really really don't care about performance. We recently benchmarked Lucene indexing+searching+... on 1) local disk, 2) SAN, and 3) NFS. You have the right to a single guess - which of the three was the slweet? Otis -- Sematext -- http://se

Re: Any tips for indexing large amounts of data?

2007-11-20 Thread Eswar K
Thats great. At what size of the index do you think we should look at partitioning the index file? Eswar On Nov 21, 2007 12:57 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Just tried a search for "web" on this index - 1.1 seconds. This matches > about 1MM of about 20MM docs. Redo the sea

Re: Performance question: Solr 64 bit java vs 32 bit mode.

2007-11-20 Thread Otis Gospodnetic
Solr runs equally well on both 64-bit and 32-bit systems. Your 15 second problem could be caused by IO bottleneck (not likely if your index is small and fits in RAM), could be concurrency (esp. if you are using compound index format), could be something else on production killing your CPU, coul

Re: Any tips for indexing large amounts of data?

2007-11-20 Thread Otis Gospodnetic
Just tried a search for "web" on this index - 1.1 seconds. This matches about 1MM of about 20MM docs. Redo the search, and it's 1 ms (cached). This is without any load nor serious benchmarking, clearly. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Messag

Re: Performance of Solr on different Platforms

2007-11-20 Thread Otis Gospodnetic
Most of Sematext's customers seem to be RH fans. I've seen some Ubuntu, some Debian, and some SuSe users. RH feels "safe". :) Some use Solaris. Some are going crazy with Xen, putting everything in VMs. RAM - as much as you can afford, as usual. CPU - AMD Opterons performed the best last time

Re: Near Duplicate Documents

2007-11-20 Thread Otis Gospodnetic
To whomever started this thread: look at Nutch. I believe something related to this already exists in Nutch for near-duplicate detection. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Mike Klaas <[EMAIL PROTECTED]> To: solr-user@lucene.apach

Re: Any tips for indexing large amounts of data?

2007-11-20 Thread Eswar K
Hi otis, I understand that is slightly off track question, but I am just curious to know the performance of Search on a 20 GB index file. What has been your observation? Regards, Eswar On Nov 21, 2007 12:33 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Mike is right about the occasional slo

Re: Any tips for indexing large amounts of data?

2007-11-20 Thread Otis Gospodnetic
Mike is right about the occasional slow-down, which appears as a pause and is due to large Lucene index segment merging. This should go away with newer versions of Lucene where this is happening in the background. That said, we just indexed about 20MM documents on a single 8-core machine with

Re: Help with Debian solr/jetty install?

2007-11-20 Thread Otis Gospodnetic
Phillip, I won't go into details, but I'll point out that the Java compiler is called javac and if memory serves me well, it is defined in one of Jetty's XML config files in its etc/ dir. The java compiler is used to compile JSPs that Solr uses for the admin UI. So, make sure you have javac a

Finding the right place to start ...

2007-11-20 Thread Tracy Flynn
I'm trying to find the right place to start in this community. I recently posted a question in the thread on SOLR-236. In that posting I mentioned that I was hoping to persuade my management to move from a FAST installation to a SOLR-based one. The changeover was approved in principle to

Re: Problems with Basic Install (newbie question)

2007-11-20 Thread Chris Hostetter
: As far as I know, I do have a full JDK. I'm on OS X and it should come with : a full JDK: : http://developer.apple.com/java/ well, 1) it depends on which version of "OS X" you are running (10.1, 10.2?, 10.3?, 10.4?, 10.5?) but i don't think that's your problem ... you said you could see th

Re: facet - associated fields

2007-11-20 Thread Norberto Meijome
On Tue, 20 Nov 2007 17:39:58 -0500 "Jae Joo" <[EMAIL PROTECTED]> wrote: > Hi, > Can anyone help me how to facet and/or search for associated fields? - http://wiki.apache.org/solr/SimpleFacetParameters _ {Beto|Norberto|Numard} Meijome Fear not the path of truth for the

Re: Solr cluster topology.

2007-11-20 Thread Norberto Meijome
On Tue, 20 Nov 2007 16:26:27 -0600 Alexander Wallace <[EMAIL PROTECTED]> wrote: > Interesting, this ALL MASTERS mode... I guess you don't do any > replication then... correct > In the single master, several slaves mode, I'm assuming the client > still writes to one and reads from the others.

Help with Debian solr/jetty install?

2007-11-20 Thread Phillip Farber
Hi, I've successfully run as far as the example admin page on Debian linux 2.6. So I installed the solr-jetty packaged for Debian testing which gives me Jetty 5.1.14-1 and Solr 1.2.0+ds1-1. Jetty starts fine and so does the Solr home page at http://localhost:8280/solr But I get an error wh

facet - associated fields

2007-11-20 Thread Jae Joo
Hi, Can anyone help me how to facet and/or search for associated fields? - 1234 Baseball hall of Fame opens Jackie Robinson exhibit Description about the new JR hall of fame exhibit. 20071114 200711 0 press Sports Baseball Major League Baseball Arts and Culture C

Re: Solr cluster topology.

2007-11-20 Thread Alexander Wallace
Thanks for the response! Interesting, this ALL MASTERS mode... I guess you don't do any replication then... In the single master, several slaves mode, I'm assuming the client still writes to one and reads from the others... right? On Nov 20, 2007, at 12:54 PM, Matthew Runo wrote: Yes. T

RE: Weird memory error.

2007-11-20 Thread Norskog, Lance
AppPerfect has a free-for-noncommercial-use version of their tools. I've used them before and was very impressed. http://www.appperfect.com/products/devtest.html#versions -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, Novem

RE: Solr cluster topology.

2007-11-20 Thread Norskog, Lance
http://wiki.apache.org/solr/CollectionDistribution http://wiki.apache.org/solr/SolrCollectionDistributionScripts http://wiki.apache.org/solr/SolrCollectionDistributionStatusStats http://wiki.apache.org/solr/SolrOperationsTools http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutl

Re: Weird memory error.

2007-11-20 Thread Mike Klaas
On 20-Nov-07, at 8:16 AM, Brian Carmalt wrote: Hello all, I started looking into the scalability of solr, and have started getting weird results. I am getting the following error: Exception in thread "btpool0-3" java.lang.OutOfMemoryError: unable to create new native thread at ja

Re: Solr cluster topology.

2007-11-20 Thread Matthew Runo
Yes. The clients will always be a minute or two behind the master. I like the way some people are doing it - make them all masters! Just post your updates to each of them - you loose a bit of performance perhaps, but it doesn't matter if a server bombs out or you have to upgrade them, since

Re: Payloads, Tokenizers, and Filters. Oh My!

2007-11-20 Thread Chris Hostetter
: I apologize for cross-posting but I believe both Solr and Lucene users and : developers should be concerned with this. I am not aware of a better way to : reach both communities. some of these questions strike me as being largely unrelated. if anyone wishes to followup on them further, let'

BooleanQuery exception

2007-11-20 Thread Cody Caughlan
I am trying to run a very simple query via the Admin interface and receive the exception below. The query is: description_t:guard AND title_t:help I am using dynamic fields (hence the underscored suffix). Any ideas? Thanks in advance /cody Nov 19, 2007 3:01:31 PM org.apache.solr.core.SolrE

OR-ing together filter queries

2007-11-20 Thread Arnone, Anthony
Hello all, I am writing my own handler, and I would like to pre-filter the results based on a field. I’m calling searcher.getDocList() with a custom constructed query and filters list, but the filters always seem to AND together. My question is this: how can I construct the List of filters to m

Re: Weird memory error.

2007-11-20 Thread Yonik Seeley
On Nov 20, 2007 11:29 AM, Brian Carmalt <[EMAIL PROTECTED]> wrote: > Can you recommend one? I am not familar with how to profile under Java. Netbeans has one for free: http://www.netbeans.org/products/profiler/ -Yonik

Re: Weird memory error.

2007-11-20 Thread Simon Willnauer
I'm using the Eclipse TPTP platfrom and I'm very happy with it. You will also find good howto or tutorial pages on the web. - simon On Nov 20, 2007 5:29 PM, Brian Carmalt <[EMAIL PROTECTED]> wrote: > Can you recommend one? I am not familar with how to profile under Java. > > Yonik Seeley schrieb

Re: Weird memory error.

2007-11-20 Thread Brian Carmalt
Can you recommend one? I am not familar with how to profile under Java. Yonik Seeley schrieb: Can you try a profiler to see where the memory is being used? -Yonik On Nov 20, 2007 11:16 AM, Brian Carmalt <[EMAIL PROTECTED]> wrote: Hello all, I started looking into the scalability of solr, a

Re: Weird memory error.

2007-11-20 Thread Yonik Seeley
Can you try a profiler to see where the memory is being used? -Yonik On Nov 20, 2007 11:16 AM, Brian Carmalt <[EMAIL PROTECTED]> wrote: > Hello all, > > I started looking into the scalability of solr, and have started getting > weird results. > I am getting the following error: > > Exception in t

Weird memory error.

2007-11-20 Thread Brian Carmalt
Hello all, I started looking into the scalability of solr, and have started getting weird results. I am getting the following error: Exception in thread "btpool0-3" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java

Re: rows=VERY_LARGE_VALUE throws exception, and error in some cases

2007-11-20 Thread Yonik Seeley
I recently fixed this in the trunk. -Yonik On Nov 20, 2007 10:31 AM, Rishabh Joshi <[EMAIL PROTECTED]> wrote: > Hi, > > We are using Solr 1.2 for our project and have come across the following > exception and error: > > Exception: > SEVERE: java.lang.OutOfMemoryError: Java heap space > at org.

Re: Invalid value 'explicit' for echoParams parameter

2007-11-20 Thread Chris Hostetter
: I'm confident that /trunk accepts any case: : : v = v.toUpperCase(); thats in Solr 1.2 as well hmmm Ahmet: what is the default Locale of your JVM? String.toUpper() does use the default Locale ... i guess maybe we should start being more strict about using "compareToIgnoreCase"

Solr cluster topology.

2007-11-20 Thread Alexander Wallace
Hi All! I just started reading about Solr a couple of days ago (not full time of course) and it looks like a pretty impressive set of technologies... I have still a few questions I have not clearly found: Q: On a cluster, as I understand it, one and only one machine is a master, and N ser

Re: Pagination with Solr

2007-11-20 Thread Chris Hostetter
: What I'm trying is to parse the response for "numFound:" : and if this number is greater than the "rows" parameter, I send another : search request to Solr with a new "start" parameter. Is there a better : way to do this? Specifically, is there another way to obtain the : "numFound" rather

rows=VERY_LARGE_VALUE throws exception, and error in some cases

2007-11-20 Thread Rishabh Joshi
Hi, We are using Solr 1.2 for our project and have come across the following exception and error: Exception: SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.PriorityQueue.initialize (PriorityQueue.java :36) Steps to reproduce: 1. Restart your Web Server. 2. Ente

SolrJ "commit" problem

2007-11-20 Thread Traut
Hi I've got a problem with solrj from nightly build (from 2007-11-12). I have this code: solrClient = new CommonsHttpSolrServer(new URL(indexServerUrl)); and after "add" operation firing solrClient.commit(true, true); But commit operation is not processing in Solr as I can see in log files

Re: Invalid value 'explicit' for echoParams parameter

2007-11-20 Thread Ryan McKinley
The URL is http://localhost:8983/solr/select/?q=solr&version=2.2&start=0&rows=10&indent=on When i added &echoParams=explicit to the query nothing has changed. But when I find and replaced the word 'explicit' to uppercase 'EXPLICIT' in the solrconfig.xml it worked. The problem has solved. Thank

Re: Invalid value 'explicit' for echoParams parameter

2007-11-20 Thread AHMET ARSLAN
-Orijinal e-posta iletisi- From: Ryan McKinley [EMAIL PROTECTED] Date: Tue, 20 Nov 2007 07:16:53 +0200 To: solr-user@lucene.apache.org Subject: Re: Invalid value 'explicit' for echoParams parameter > AHMET ARSLAN wrote: > > I am a newbie at solr. I have done everything in the solr tutoria

Re: Performance of Solr on different Platforms

2007-11-20 Thread Rishabh Joshi
Eswar, This link would give you a fair idea of how Solr is used by some of the sites/companies - http://wiki.apache.org/solr/SolrPerformanceData Rishabh On Nov 20, 2007 10:49 AM, Eswar K <[EMAIL PROTECTED]> wrote: > In our case, the load is kind of distributed. On an average, the QPS could > be

Re: Solr PHP client

2007-11-20 Thread Nick Jenkin
You can use curl (www.php.net/curl) to interface with solr, its a piece of cake! -Nick On 11/20/07, SDIS M. Beauchamp <[EMAIL PROTECTED]> wrote: > I use the php and php serialized writer to query Solr from php > > It's very easy to use > > But it's not so easy to update solr from php ( that's why

Re: Solr on Windows / Linux

2007-11-20 Thread Norberto Meijome
On Tue, 20 Nov 2007 10:55:04 +0530 "Eswar K" <[EMAIL PROTECTED]> wrote: > Is there any difference in the way any of the Solr's features work on > Windows/Linux. Hi Eswar, I am developing on FreeBSD 6.2 and 7, testing on a VM with Windows 2003 Server, and deploying for now, on Win32 too. We wil

indexing excel file

2007-11-20 Thread crazy
Hi, i want to index an excel file and i have the following error: http://dev.torrez.us/public/2006/pundit/java/src/plugin/parse-msexcel/sample/test.xls: failed(2,0): Can't be handled as Microsoft document. java.lang.ArrayIndexOutOfBoundsException: No cell at position col1, row 0. I already add m