Re: Indexing very large files.

2007-09-06 Thread Brian Carmalt
Yonik Seeley schrieb: On 9/5/07, Brian Carmalt [EMAIL PROTECTED] wrote: I've bin trying to index a 300MB file to solr 1.2. I keep getting out of memory heap errors. 300MB of what... a single 300MB document? Or is that file represent multiple documents in XML or CSV format? -Yonik

Re: Indexing very large files.

2007-09-06 Thread Brian Carmalt
Hello again, I run Solr on Tomcat under windows and use the tomcat monitor to start the service. I have set the minimum heap size to be 512MB and then maximum to be 1024mb. The system has 2 Gigs of ram. The error that I get after sending approximately 300 MB is: java.lang.OutOfMemoryError:

Re: Indexing very large files.

2007-09-06 Thread Thorsten Scherler
On Thu, 2007-09-06 at 08:55 +0200, Brian Carmalt wrote: Hello again, I run Solr on Tomcat under windows and use the tomcat monitor to start the service. I have set the minimum heap size to be 512MB and then maximum to be 1024mb. The system has 2 Gigs of ram. The error that I get after

Tagging using SOLR

2007-09-06 Thread Doss
Dear all, We are running an appalication built using SOLR, now we are trying to build a tagging system using the existing SOLR indexed field called tag_keywords, this field has different keywords seperated by comma, please give suggestions on how can we build tagging system using this field?

Re: Indexing very large files.

2007-09-06 Thread Brian Carmalt
Moin Thorsten, I am using Solr 1.2.0. I'll try the svn version out and see of that helps. Thanks, Brian Which version do you use of solr? http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/handler/XmlUpdateRequestHandler.java?view=markup The trunk version of the

solr.py problems with german Umlaute

2007-09-06 Thread Christian Klinger
Hi all, i try to add/update documents with the python solr.py api. Everything works fine so far but if i try to add a documents which contain German Umlaute (ö,ä,ü, ...) i got errors. Maybe someone has an idea how i could convert my data? Should i post this to JIRA? Thanks for help. Btw: I

Re: Indexing very large files.

2007-09-06 Thread Thorsten Scherler
On Thu, 2007-09-06 at 11:26 +0200, Brian Carmalt wrote: Hallo again, I checked out the solr source and built the 1.3-dev version and then I tried to index the same file to the new server. I do get a different exception trace, but the result is the same. java.lang.OutOfMemoryError: Java

Re: Tagging using SOLR

2007-09-06 Thread Erik Hatcher
On Sep 6, 2007, at 3:29 AM, Doss wrote: We are running an appalication built using SOLR, now we are trying to build a tagging system using the existing SOLR indexed field called tag_keywords, this field has different keywords seperated by comma, please give suggestions on how can we build

Re: Replication broken.. no helpful errors?

2007-09-06 Thread Bill Au
The snapinstaller script opens a new searcher by calling commit. From the attached debug output it looks like that actually worked: + /opt/solr/bin/commit + [[ 0 != 0 ]] + logExit ended 0 Try running the /opt/solr/bin/commit directly with the -V option. Bill On 9/5/07, Matthew Runo [EMAIL

RSS syndication Plugin

2007-09-06 Thread Thorsten Scherler
Hi all, I am curious whether somebody has written a rss plugin for solr. The idea is to provide a rss syndication link for the current search. It should be really easy to implement since it would be just a transformation solrXml - RSS which easily can be done with a simple xsl. Has somebody

Re: RSS syndication Plugin

2007-09-06 Thread Ryan McKinley
perhaps: https://issues.apache.org/jira/browse/SOLR-208 in http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/xslt/ check: example_atom.xsl example_rss.xsl Thorsten Scherler wrote: Hi all, I am curious whether somebody has written a rss plugin for solr. The idea is to

Re: Distribution Information?

2007-09-06 Thread Bill Au
That is very strange. Even if there is something wrong with the config or code, the static HTML contained in distributiondump.jsp should show up. Are you using the latest version of the JSP? There has been a recent fix: http://issues.apache.org/jira/browse/SOLR-333 Bill On 9/5/07, Matthew

update servlet not working

2007-09-06 Thread Benjamin Li
Hi, We have the example solr installed with jetty. We are able to navigate to the solr/admin page, but when we try to POST an xml document via the command line, there is a fatal error. It seems that the solr/update servlet isnt running, giving a http 400 error. does anyone have any clue what is

Re: update servlet not working

2007-09-06 Thread Chris Hostetter
: We are able to navigate to the solr/admin page, but when we try to : POST an xml document via the command line, there is a fatal error. It : seems that the solr/update servlet isnt running, giving a http 400 : error. a 400 could mean a lot of things ... what is the full HTTP response you get

Re: Distribution Information?

2007-09-06 Thread Matthew Runo
Well, I do get... Distribution Info Master Server No distribution info present ... But there appears to be no information filled in. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833

RE: Indexing very large files.

2007-09-06 Thread Lance Norskog
Now I'm curious: what is the use case for documents this large? Thanks, Lance Norskog

Re: Replication broken.. no helpful errors?

2007-09-06 Thread Matthew Runo
The thing is that a new searcher is not opened if I look in the stats.jsp page. The index version never changes. When I run.. sudo /opt/solr/bin/commit -V -u tomcat5 ..I get a new searcher opened, but even though it (in theory) installed the new index, I see no docs in there. During the

Re: updates on the server

2007-09-06 Thread Matthew Runo
On a related note, it'd be great if we could set up a series of transformations to be done on data when it comes into the index, before being indexed. I guess a custom tokenizer might be the best way to do this though..? ie: -Post -Data is cleaned up, properly escaped, etc -Then data is

RE: solr.py problems with german Umlaute

2007-09-06 Thread Lance Norskog
I researched this problem before. The problem I found is that Python strings are not Unicode by default. You have to do something to make them Unicode. Here are the links I found: http://www.reportlab.com/i18n/python_unicode_tutorial.html http://evanjones.ca/python-utf8.html

Re: solr.py problems with german Umlaute

2007-09-06 Thread Yonik Seeley
On 9/6/07, Brian Carmalt [EMAIL PROTECTED] wrote: Try it with title.encode('utf-8'). As in: kw = {'id':'12','title':title.encode('utf-8'),'system':'plone','url':'http://www.google.de'} It seems like the client library should be responsible for encoding, not the user. So try changing

solr/home

2007-09-06 Thread Matt Mitchell
Hi, I recently upgraded to Solr 1.2. I've set it up through Tomcat using context fragment files. I deploy using the tomcat web manager. In the context fragment I set the environment variable solr/home. This use to work as expected. The solr/home value pointed to the directory where data,

Re: update servlet not working

2007-09-06 Thread Tom Hill
I don't use the java client, but when I switched to 1.2, I'd get that message when I forget to add the content type header, as described in CHANGES.txt 9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler using the new request dispatcher (SOLR-104). This requires posted

Re: Replication broken.. no helpful errors?

2007-09-06 Thread Yonik Seeley
On 9/6/07, Matthew Runo [EMAIL PROTECTED] wrote: The thing is that a new searcher is not opened if I look in the stats.jsp page. The index version never changes. The index version is read from the index... hence if the lucene index doesn't change (even if a ew snapshot was taken), the version

Re: solr/home

2007-09-06 Thread Matt Mitchell
Here you go: Context docBase=/usr/local/lib/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/usr/ local/projects/my_app/current/solr-home / /Context This is the same file I'm putting into the Tomcat manager XML Configuration file URL form

Re: solr.py problems with german Umlaute

2007-09-06 Thread Mike Klaas
On 6-Sep-07, at 12:13 PM, Yonik Seeley wrote: On 9/6/07, Brian Carmalt [EMAIL PROTECTED] wrote: Try it with title.encode('utf-8'). As in: kw = {'id':'12','title':title.encode ('utf-8'),'system':'plone','url':'http://www.google.de'} It seems like the client library should be responsible for

searching where a value is not null?

2007-09-06 Thread David Whalen
Hi all. I'm trying to construct a query that in pseudo-code would read like this: field != '' I'm finding it difficult to write this as a solr query, though. Stuff like: NOT field:() doesn't seem to do the trick. any ideas? dw

Re: searching where a value is not null?

2007-09-06 Thread Yonik Seeley
On 9/6/07, David Whalen [EMAIL PROTECTED] wrote: Hi all. I'm trying to construct a query that in pseudo-code would read like this: field != '' I'm finding it difficult to write this as a solr query, though. Stuff like: NOT field:() doesn't seem to do the trick. any ideas? perhaps

Slow response

2007-09-06 Thread Aaron Hammond
I am pretty new to Solr and this is my first post to this list so please forgive me if I make any glaring errors. Here's my problem. When I do a search using the Solr admin interface for a term that I know does not exist in my index the QTime is about 1ms. However, if I add facets to the

Re: Slow response

2007-09-06 Thread Yonik Seeley
On 9/6/07, Aaron Hammond [EMAIL PROTECTED] wrote: I am pretty new to Solr and this is my first post to this list so please forgive me if I make any glaring errors. Here's my problem. When I do a search using the Solr admin interface for a term that I know does not exist in my index the QTime

Non-HTTP Indexing

2007-09-06 Thread Renaud Waldura
Dear Solr Users: Is it possible to index documents directly without going through any XML/HTTP bridge? I have a large collection (10^7 documents, some very large) and indexing speed is a concern. Thanks! --Renaud

RE: Non-HTTP Indexing

2007-09-06 Thread Wu, Daniel
There are couple choices, see: http://wiki.apache.org/solr/SolJava - Daniel -Original Message- From: Renaud Waldura [mailto:[EMAIL PROTECTED] Sent: Thursday, September 06, 2007 2:21 PM To: solr-user@lucene.apache.org Subject: Non-HTTP Indexing Dear Solr Users: Is it

RE: Slow response

2007-09-06 Thread Aaron Hammond
Thank-you for your response, this does shed some light on the subject. Our basic question was why were we seeing slower responses the smaller our result set got. Currently we are searching about 1.2 million documents with the source document about 2KB, but we do duplicate some of the data. I

Re: Slow response

2007-09-06 Thread Mike Klaas
On 6-Sep-07, at 3:16 PM, Aaron Hammond wrote: Thank-you for your response, this does shed some light on the subject. Our basic question was why were we seeing slower responses the smaller our result set got. Currently we are searching about 1.2 million documents with the source document about

Re: Slow response

2007-09-06 Thread Mike Klaas
On 6-Sep-07, at 3:25 PM, Mike Klaas wrote: There are essentially two facet computation strategies: 1. cached bitsets: a bitset for each term is generated and intersected with the query restul bitset. This is more general and performs well up to a few thousand terms. 2. field

caching query result

2007-09-06 Thread Jae Joo
HI, I am wondering that is there any way for CACHING FACETS SEARCH Result? I have 13 millions and have facets by states (50). If there is a mechasim to chche, I may get faster result back. Thanks, Jae

removing a field from the relevance calculation

2007-09-06 Thread Bart Smyth
Hi, I'm having trouble getting a field of type SortableFloatField to not weigh into to the relevancy score returned for a document. fieldtype name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ So far I've tried boosting the field to 0.0 at index time using this field

Re: updates on the server

2007-09-06 Thread Erik Hatcher
On Sep 6, 2007, at 2:56 PM, Matthew Runo wrote: On a related note, it'd be great if we could set up a series of transformations to be done on data when it comes into the index, before being indexed. I guess a custom tokenizer might be the best way to do this though..? ie: -Post -Data is

Re: caching query result

2007-09-06 Thread Yonik Seeley
On 9/6/07, Jae Joo [EMAIL PROTECTED] wrote: I have 13 millions and have facets by states (50). If there is a mechasim to chche, I may get faster result back. How fast are you getting results back with standard field faceting (facet.field=state)?

Question on use of wildcard to field name at query

2007-09-06 Thread Toru Matsuzawa
Hi all. Wildcard cannot be used for field name by specifying query though storage in index is possible according to the specification of wildcard by dynamic field. I want to use wildcard to specify field name at query. Please teach something a good idea. The following images. --document add