specifying dataDir on launch of jetty

2007-01-08 Thread Brian Whitman
I would like to specify the solr dataDir on launch of jetty via java - jar start.jar instead of editing the solrconfig.xml before launching. I've tried java -Dsolr.dataDir=/x/y/z -jar start.jar but it seems to have no effect -- it starts with the solrconfig.xml default. Use case is that I

MoreLikeThis similarity-type queries in Solr

2007-01-31 Thread Brian Whitman
Does Solr have support for the Lucene query-contrib MoreLikeThis query type or anything like it? I know that you can specify your own similarity scorer but we're looking for a way to do direct query by document number type queries in Solr. -Brian

JOIN in Solr (was: convert custom facets to Solr facets...)

2007-02-03 Thread Brian Whitman
On Feb 2, 2007, at 4:46 PM, Ryan McKinley wrote: I would LOVE to see a JOIN in SOLR. I have an index of artists, albums, and songs. The artists have lots of metadata and the songs very little. I'd love to be able to search for songs using the artist metadata. Right now, I have to add all

Re: retrieve document boost

2007-02-20 Thread Brian Whitman
On Feb 20, 2007, at 2:59 PM, Chris Hostetter wrote: In Lucene, Document boosts aren't stored in the docs for later recovered - the getBoost method is meaningless from a Document returned by a search (or retrieved from an IndexReader) Boosts are folded into the fieldNorm - doc boosts are

Re: retrieve document boost

2007-02-20 Thread Brian Whitman
On Feb 20, 2007, at 4:09 PM, Mike Klaas wrote: Sorry, I inverted the logic in my head. You're right. You're not alone, I do it all the time too. I believe that docBoost does not translate into a fieldBoost display (which is only the query-time boost), but is factored into the fieldNorm.

internal field max length?

2007-02-21 Thread Brian Whitman
I am sending Solr stored fields of sizes in the 10-50K range. My maxFieldLength is 5, and the field in question is a solr.TextField. I am finding that fields that have more than a few K of text come back clipped: if I try to index the field with 40K of text, the search result will show

Re: internal field max length?

2007-02-21 Thread Brian Whitman
Ouch... sounds serious (assuming you aren't talking about highlighting). Could you open a JIRA issue and describe or attach a test that can reproduce it? I'll try to reproduce this myself in the meantime. Not highlighting, no. I'll try to make a test case. I am using the SOLR-20 client

Re: internal field max length?

2007-02-21 Thread Brian Whitman
On Feb 21, 2007, at 5:10 PM, Yonik Seeley wrote: So far so good for me. I started with example/exampledocs/solr.xml and added an additional field value for features of size 500K It starts with this is the first line, then repeats the ASL over and over, then ends with this is the last line. I

Re: lots of inserts very fast, out of heap or file descs

2007-02-23 Thread Brian Whitman
Try not committing so often (perhaps until you are done). Don't use post.sh, or modify it to remove the commit. OK, I modified it to not commit after and I also realized I had SOLR-126 (autocommit) on, which I disabled. Is there a rule of thumb on when to commit / optimize? Part of

Re: lots of inserts very fast, out of heap or file descs

2007-02-23 Thread Brian Whitman
On Feb 23, 2007, at 8:31 PM, Yonik Seeley wrote: -- it does not go down until I restart solr. This would be the cause of my too many files open problem. Turning off autocommit / not commiting after every add keeps this count steady at 100-200. The files are all of type: [...] Bug or feature?

Re: lots of inserts very fast, out of heap or file descs

2007-02-24 Thread Brian Whitman
On Feb 24, 2007, at 1:16 AM, Chris Hostetter wrote: Based on Brain's email, it sounds like it didn't work in *exactly* the same way, because it caused some filedescriptor leaks (and possibly some memory leaks) Hopefully Ryan will be a rock star and spot the probably immediately --

logging off

2007-03-03 Thread Brian Whitman
I'm trying to disable all logging from Solr, or at least re-route it to a file. I was finally able to disable Jetty logging through a custom org.mortbay.log.Logger class, but I am still seeing the Solr logs, which seem to come from java.util.logging.Logger. Is there a thing I can do in

Re: logging off

2007-03-03 Thread Brian Whitman
On Mar 3, 2007, at 12:56 PM, Brian Whitman wrote: I'm trying to disable all logging from Solr, or at least re-route it to a file. Hi Brian, all you have to do is create a logging.properties file and call this before starting up solr: System.setProperty(java.util.logging.config.file

Re: making an in-order query

2007-03-07 Thread Brian Whitman
id:A id:B id:C id:D *usually* works, but I have seen D appear first in the results for certain queries. Is there a query I can do or a better way to accomplish this? It's a bit of a hack, but you could use boosts to order the docs: id:A^4 id:B^3 id:C^2 id:D^1 Gorgeous! Does the job

XSLTResponseWriter and xslt 2.0

2007-03-15 Thread Brian Whitman
There's a lot of good stuff in XSLT 2.0, specifically for Solr users -- like grouping, time and date, and uri encoding. It's my understanding that the javax.xml.transform used by the XSLTResponseWriter is 1.0 only-- at least, it does not understand any of the 2.0 stuff I've thrown at it. I

Tiny term boost with an e in it

2007-03-21 Thread Brian Whitman
In an function that eventually becomes a Solr query, I create a few TermQuery clauses that go into a BooleanQuery. For each TermQuery, I do tq.setBoost( score ); where score is a float my app generates. This usually works except when the numbers get real small, like 2.712607e-4 that I just

How to wildcard search with colons?

2007-03-26 Thread Brian Whitman
The field is called trackURL and has a URL in it, the type is string. I want to be able to search for http://host* q=trackURL:http* -- works q=trackURL:http://host* -- doesn't work, the query parser removes the : and everything after it q=trackURL:http%3A//host* -- doesn't work, same as

Re: How to wildcard search with colons?

2007-03-26 Thread Brian Whitman
On Mar 26, 2007, at 2:03 PM, Mike Klaas wrote: Have you tried: trackURL:http\://host* Obviously not :) Thanks for the help, that did it. Brian

solr - xsl - rss

2007-04-11 Thread Brian Whitman
Anyone out there done solr - xslt - rss? I'm about to embark on it but don't want to reinvent any wheels.

Re: Specifying no-ops...

2007-05-01 Thread Brian Whitman
When we use solr in a javascript / ajax.request context we often want to 'tag' requests with the user id or item number or something that will not normally appear in the solr results. Because in an asynchronous request handler, you won't know who or what the query is about. To do this, we

Re: Solr Update Handler Failes with Some Doc Characters

2007-05-09 Thread Brian Whitman
I see that the update handler fails even if the character is NOT right next to XML closing tag. If the character is anywhere in any of the XML tags, the update handler fails to parse the XML. Does posting the utf8-example in the exampledocs directory work?

dates times

2007-05-10 Thread Brian Whitman
After writing my 3rd parser in my third scripting language in so many months to go from unix timestamps to Solr Time (8601) I have to ask: shouldn't the date/time field type be more resilient? I assume there's a good reason that it's 8601 internally, but certainly it would be excellent for

Re: dates times

2007-05-10 Thread Brian Whitman
You can get at some of this functionality in the built-in xslt 1.0 engine (Xalan) by using the e-xslt date-time extensions: see http://exslt.org/date/index.html, and for Xalan's implementation see http://xml.apache.org/xalan-j/extensionslib.html#exslt . The exslt stuff looks good, thanks! I'll

Re: dates times

2007-05-10 Thread Brian Whitman
On May 10, 2007, at 2:30 PM, Chris Hostetter wrote: Questions like these are whiy I'm glad Solr currently keeps it simple and makes people deal in absolutes .. less room for confusion :) I get all that, thanks for the great explanation. I imagine most of my problems can be solved with a

Re: Does Solr XSL writer work with Arabic text?

2007-05-10 Thread Brian Whitman
In example.xsl change the output type xsl:output media-type=text/html/ to xsl:output media-type=text/html; charset=UTF-8 encoding=UTF-8/ And see if that helps. I had the same problem (different language.) If this works we should file a JIRA to fix it up in trunk. On May 10, 2007,

Re: Crawler for solr

2007-05-11 Thread Brian Whitman
On May 11, 2007, at 7:32 AM, David Xiao wrote: Hello, I am using crawler to index and search some intranet webpages which need authorization. I wrote my own crawler for this kind of needs. But with the requirement is evolving, I need another crawler for external webpages (on internet)

Re: missing post.jar

2007-05-12 Thread Brian Whitman
i was trying out the introductory solr tutorial and i was unable to locate the post.jar mentioned in there, which shall be used for posting files to index. i then used post.sh inside cygwin, but i would still like to know, where to find this util (post.jar). If you downloaded the latest release

Re: Feature Request: Multiple default search fields

2007-05-14 Thread Brian Whitman
On May 14, 2007, at 12:38 PM, Jack L wrote: The default search field is really handy. It helps simplify the query, and thus simplify the application using solr. My understand is that solr only allows one default search field. It would be useful to allow multiple default fields, and maybe also

Re: update no work

2007-05-15 Thread Brian Whitman
Add -H 'Content-type:text/xml; charset=utf-8' after the 'delete... /delete' bit. On May 15, 2007, at 12:55 PM, Alessandro Ferrucci wrote: I installed solr solr-2007-05-10.zip http://people.apache.org/builds/lucene/solr/nightly/ solr-2007-05-10.zip I ran example indexing and it indexes

Re: compile error with SOLR 69 MoreLikeThis patch

2007-05-16 Thread Brian Whitman
Change it to DEFALT or change the spelling error in the Lucene version. On May 16, 2007, at 12:13 PM, Andrew Nagy wrote: I downloaded and patched my solr source with the latest solr69 patch and whenever I run ant I get an error: [javac]

optimize/ takes an hour

2007-05-18 Thread Brian Whitman
I have a largish solr store (2.4m documents with lots of stored text, 27GB data dir) and I ran optimize on it last night. The QTime was 3605096 ! (The commit took about a minute.) During the optimize the solr java process had 50% CPU and was using all of its max heap size. (1GB) On a

Re: optimize/ takes an hour

2007-05-18 Thread Brian Whitman
On May 18, 2007, at 2:10 PM, Yonik Seeley wrote: What's your max heap set to? Might just want to verify that not too much time is spent in GC, which can happen when you are right at the brink. Ah.. I thought it was set to 1GB but in my upgrade to java 1.6 I guess I'm now just giving it

slow MLT, how to inject top tf-idf terms on indexing

2007-05-22 Thread Brian Whitman
We're looking at MLT queries that take 10-60 seconds on average to return, using the latest (this a.m.) SOLR-69 patch. Our data dir is 8.5G with 300K docs, but almost all of those have on average 50-200KB of stored text in thousands of fragments (multivalued field, one chunk per sentence.)

Re: Re[5]: Where are the log files...

2007-06-15 Thread Brian Whitman
On Jun 15, 2007, at 4:35 PM, Jack L wrote: Is there a way to configure solr to log to a file? ...all you have to do is create a logging.properties file and call this before starting up solr: System.setProperty(java.util.logging.config.file, home+/conf/ logging.properties); (you can

Re: HTTP response code: 400 error

2007-06-19 Thread Brian Whitman
Hi List, Thanks in advance for the help. I'm new to Solr and ran across a bit of a problem. I installed Solr with the Jetty and tested the exampledocs. Everything went great. Next I tried adding one of my own documents to the collection. The XML is below: Are you running the example without

Re: HTTP response code: 400 error

2007-06-19 Thread Brian Whitman
On Jun 19, 2007, at 2:09 PM, Yonik Seeley wrote: Or a Java bug? http://www.innovation.ch/java/HTTPClient/urlcon_vs_httpclient.html I'm not sure if it's possible to get the extra info with Java's built-in HTTP client. Spencer, does post.sh give you more error info? Does for me, not very

Re: problems getting data into solr index

2007-06-20 Thread Brian Whitman
reading these two opinoins as opposing each other. I'm sure I'm reading it incorrectly, but they seem to contradict each other. Are they? Brian Whitman wrote: Solr has no problems with proper utf8 and you don't need to do anything special to get it to work. Check out the newer solr.py

Re: MoreLikeThis woes

2007-06-25 Thread Brian Whitman
My MLT query uses a fieldlist of about 5 or 6 fields. There are a mix of string and text fields. They are all in a TermVector. I have played around with the mindf values. With about 90% of my mlt queries solr it returns no matches and the remaining 10% get completely irrelevant

Re: solrj and appending to existing index

2007-06-26 Thread Brian Whitman
On Jun 26, 2007, at 4:11 PM, Otis Gospodnetic wrote: Hi, I took a quick look at solrj. One thing I didn't find is a way to add documents to an existing index without overwriting the index. I looked at the sources and the unit tests, but didn't spot the append modus operandi. Ryan,

Re: Pagination of results and XSLT.

2007-07-23 Thread Brian Whitman
Has anyone tried to handle pagination of results using XSLT's ? I'm not really sure it is possible to do it in pure XSLT because all the response object gives us is a total document count - paginating the results would involve more than what XSLT 1.0 could handle (I'll be very happy if

Re: boost field without dismax

2007-07-24 Thread Brian Whitman
Jul 24, 2007, at 9:42 AM, Alessandro Ferrucci wrote: is there a way to boost a field much like is done in dismax request handler? I've tried doing index-time boosting by providing the boost to the field as an attribute in the add doc but that did nothing to affect the score when I went to

Re: XML parsing error

2007-07-26 Thread Brian Whitman
On Jul 26, 2007, at 11:10 AM, Yonik Seeley wrote: If the '' truely got destroyed, it's a server (Solr or Jetty) bug. One possibility is that the '' does exist, but due to a charset mismatch, it's being slurped into a multi-byte char. Just dumped it with curl and did a hexdump: 5a0

XML parsing error

2007-07-26 Thread Brian Whitman
I ended up with this doc in solr: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/intlst name=paramsstr name=start7/strstr name=flcontent/strstr name=qPez~1/strstr name=rows1/str/lst/lstresult name=response numFound=5381

Re: XML parsing error

2007-07-26 Thread Brian Whitman
On Jul 26, 2007, at 11:49 AM, Yonik Seeley wrote: Could you try it with jetty to see if it's the servlet container? It should be simple to just copy the index directory into solr's example/solr/data directory. Yonik, sorry for my delay, but I did just try this in jetty -- it works (it

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Brian Whitman
On Aug 9, 2007, at 11:12 AM, Kevin Holmes wrote: 2: Is there a way to inject into solr without using POST / curl / http? Check http://wiki.apache.org/solr/EmbeddedSolr There's examples in java and cocoa to use the DirectSolrConnection class, querying and updating solr w/o a web

Re: Python Utilitys for Solr

2007-08-14 Thread Brian Whitman
On Aug 14, 2007, at 5:16 AM, Christian Klinger wrote: Hi i just play a bit with: http://svn.apache.org/repos/asf/lucene/solr/trunk/client/python/ solr.py Is it possible that this library is a bit out of date? If i try to get the example running. I got a parese error from the result.

Re: Indexing a URL

2007-09-05 Thread Brian Whitman
It is apparently attempting to parse en=499af384a9ebd18f in the URL. I am not clear why it would do this as I specified indexed=false. I need to store this because that is how the user gets to the original article. the ampersand is an XML reserved character. you have to escape it

Re: DirectSolrConnection, write.lock and Too Many Open Files

2007-09-10 Thread Brian Whitman
On Sep 10, 2007, at 5:00 PM, Mike Klaas wrote: On 10-Sep-07, at 1:50 PM, Adrian Sutton wrote: We use DirectSolrConnection via JNI in a couple of client apps that sometimes have 100s of thousands of new docs as fast as Solr will have them. It would crash relentlessly if I didn't force all

Re: Term extraction

2007-09-19 Thread Brian Whitman
On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote: I'm currently looking at methods of term extraction and automatic keyword generation from indexed documents. We do it manually (not in solr, but we put the results in solr.) We do it the usual way - chunk (into n-grams, named entities

logging bad stuff separately in resin

2007-09-22 Thread Brian Whitman
We have a largish solr index that handles roughly 200K new docs a day and also roughly a million queries a day from other programs. It's hosted by resin. A couple of times in the past few weeks something bad has happened -- a lock error or file handle error, or maybe a required field

Re: Term extraction

2007-09-22 Thread Brian Whitman
On Sep 21, 2007, at 3:37 AM, Pieter Berkel wrote: Thanks for the response guys: Grant: I had a brief look at LingPipe, it looks quite interesting but I'm concerned that the licensing may prevent me from using it in my project. Does the opennlp license look good for you? It's LGPL. Not

Re: Nutch with SOLR

2007-09-25 Thread Brian Whitman
Sami has a patch in there which used a older version of the solr client. with the current solr client in the SVN tree, his patch becomes much easier. your job would be to upgrade the patch and mail it back to him so he can update his blog, or post it as a patch for inclusion in

Re: Nutch with SOLR

2007-09-25 Thread Brian Whitman
But we still use a version of Sami's patch that works on both trunk nutch and trunk solr (solrj.) I sent my changes to sami when we did it, if you need it let me know... I put my files up here: http://variogr.am/latest/?p=26 -b

Re: Nutch with SOLR

2007-09-26 Thread Brian Whitman
On Sep 26, 2007, at 4:04 AM, Doğacan Güney wrote: NUTCH-442 is one of the issues that I want to really see resolved. Unfortunately, I haven't received many (as in, none) comments, so I haven't made further progress on it. I am probably your target customer but to be honest all we care about

searching for non-empty fields

2007-09-26 Thread Brian Whitman
I have a large index with a field for a URL. For some reason or another, sometimes a doc will get indexed with that field blank. This is fine but I want a query to return only the set URL fields... If I do a query like: q=URL:[* TO *] I get a lot of empty fields back, like: docstr

Re: searching for non-empty fields

2007-09-27 Thread Brian Whitman
thanks Peter, Hoss and Ryan.. q=(URL:[* TO *] -URL:) This gives me 400 Query parsing error: Cannot parse '(URL:[* TO *] - URL:)': Lexical error at line 1, column 29. Encountered: \ (34), after : \ adding something like: filter class=solr.LengthFilterFactory min=1 max=1 / I'll

dismax downweighting

2007-10-12 Thread Brian Whitman
i have a dismax query where I want to boost appearance of the query terms in certain fields but downboost appearance in others. The practical use is a field containing a lot of descriptive text and then a product name field where products might be named after a descriptive word. Consider

grouped clause search in dismax

2007-10-20 Thread Brian Whitman
I have a dismax handler to match product names found in free text that looks like: !-- for thing detection -- requestHandler name=thing class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qf name^5

Re: How to get number of indexed documents?

2007-11-01 Thread Brian Whitman
does http://.../solr/admin/luke work for you? lst name=index int name=numDocs601818/int ... On Nov 1, 2007, at 10:39 PM, Papalagi Pakeha wrote: Hello, Is there any way to get XML version of statistics like how many documents are indexed etc? I have found http://.../solr/admin/properties

overlapping onDeckSearchers message

2007-11-03 Thread Brian Whitman
I have a solr index that hasn't had many problems recently but I had the logs open and noticed this a lot during indexing: [16:23:34.086] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 Not sure what it means, google didn't come back with much.

Re: start.jar -Djetty.port= not working

2007-11-07 Thread Brian Whitman
On Nov 7, 2007, at 10:00 AM, Mike Davies wrote: java -Djetty.port=8521 -jar start.jar However when I run this it seems to ignore the command and still start on the default port of 8983. Any suggestions? Are you using trunk solr or 1.2? I believe 1.2 still shipped with an older version

Re: start.jar -Djetty.port= not working

2007-11-07 Thread Brian Whitman
On Nov 7, 2007, at 10:07 AM, Mike Davies wrote: I'm using 1.2, downloaded from http://apache.rediris.es/lucene/solr/ Where can i get the trunk version? svn, or http://people.apache.org/builds/lucene/solr/nightly/

Re: LSA Implementation

2007-11-26 Thread Brian Whitman
On Nov 26, 2007 6:58 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is patented, so it is not likely to happen unless the authors donate the patent to the ASF. -Grant There are many ways to catch a bird... LSA reduces to SVD on the

Re: Solr and nutch, for reading a nutch index

2007-11-27 Thread Brian Whitman
On Nov 27, 2007, at 6:08 PM, bbrown wrote: I couldn't tell if this was asked before. But I want to perform a nutch crawl without any solr plugin which will simply write to some index directory. And then ideally I would like to use solr for searching? I am assuming this is possible?

Re: Solr and nutch, for reading a nutch index

2007-11-27 Thread Brian Whitman
, November 27, 2007 8:33:18 PM Subject: Re: Solr and nutch, for reading a nutch index On Tue, 27 Nov 2007 18:12:13 -0500 Brian Whitman [EMAIL PROTECTED] wrote: On Nov 27, 2007, at 6:08 PM, bbrown wrote: I couldn't tell if this was asked before. But I want to perform a nutch crawl without

can I do *thing* substring searches at all?

2007-11-29 Thread Brian Whitman
With a fieldtype of string, can I do any sort of *thing* search? I can do thing* but not *thing or *thing*. Workarounds?

Re: Re:

2007-12-02 Thread Brian Whitman
On Dec 2, 2007, at 5:43 PM, Ryan McKinley wrote: try \ rather then %26 or just put quotes around the whole url. I think curl does the right thing here.

Re: RE: Re:

2007-12-02 Thread Brian Whitman
On Dec 2, 2007, at 6:00 PM, Andrew Nagy wrote: On Dec 2, 2007, at 5:43 PM, Ryan McKinley wrote: try \ rather then %26 or just put quotes around the whole url. I think curl does the right thing here. I tried all the methods: converting to %26, converting to \ and encapsulating the

Re: RE: Re:

2007-12-02 Thread Brian Whitman
On Dec 2, 2007, at 5:29 PM, Andrew Nagy wrote: Sorry for not explaining my self clearly: I have header=true as you can see from the curl command and there is a header line in the csv file. was this your actual curl request? curl

Re: out of heap space, every day

2007-12-04 Thread Brian Whitman
For faceting and sorting, yes. For normal search, no. Interesting you mention that, because one of the other changes since last week besides the index growing is that we added a sort to an sint field on the queries. Is it reasonable that a sint sort would require over 2.5GB of heap on

out of heap space, every day

2007-12-04 Thread Brian Whitman
This maybe more of a general java q than a solr one, but I'm a bit confused. We have a largish solr index, about 8M documents, the data dir is about 70G. We're getting about 500K new docs a week, as well as about 1 query/second. Recently (when we crossed about the 6M threshold) resin has

Re: out of heap space, every day

2007-12-04 Thread Brian Whitman
int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms. Then double that to allow for a warming searcher. This is great, but can you help me parse this? Assume 8M docs and I'm sorting on an int field that is unix time (seonds since epoch.) For the purposes of the experiment assume

solrj - adding a SolrDocument (not a SolrInputDocument)

2007-12-06 Thread Brian Whitman
Writing a utility in java to do a copy from one solr index to another. I query for the documents I want to copy: SolrQuery q = new SolrQuery(); q.setQuery(dogs); QueryResponse rq = source_solrserver.query(q); for( SolrDocument d : rq.getResults() ) { // now I want to add these to a new

Re: solrj - adding a SolrDocument (not a SolrInputDocument)

2007-12-06 Thread Brian Whitman
On Dec 6, 2007, at 3:07 PM, Ryan McKinley wrote: public static SolrInputDocument toSolrInputDocument( SolrDocument d ) { SolrInputDocument doc = new SolrInputDocument(); for( String name : d.getFieldNames() ) { doc.addField( name, d.getFieldValue(name), 1.0f ); }

Re: Solr and Flex

2007-12-13 Thread Brian Whitman
On Dec 13, 2007, at 10:42 AM, jenix wrote: I'm using Flex for the frontend interface and Solr on backend for the search engine. I'm new to Flex and Flash and thought someone might have some code integrating the two. We've done light stuff querying solr w/ actionscript. It is pretty

Re: debugging slowness

2007-12-20 Thread Brian Whitman
On Dec 20, 2007, at 11:02 AM, Otis Gospodnetic wrote: Sounds like GC to me. That is, the JVM not having large enough heap. Run jconsole and you'll quickly see if this guess is correct or not (kill -QUIT is also your friend, believe it or not). We recently had somebody who had a nice

Re: Status 500 - ParseError at [row,col]:[1,1] Message Content is not allowed in Prolog

2008-01-08 Thread Brian Whitman
On Jan 8, 2008, at 10:58 AM, Kirk Beers wrote: curl http://localhost:8080/solr/update -H Content-Type:text/xml -- data-binary '/add allowDups=false overwriteCommitted=true overwritePending=truedocfield name=entryId0001/ fieldfield name=titleTitle/fieldfield name=contentIt was the best

Re: Status 500 - ParseError at [row,col]:[1,1] Message Content is not allowed in Prolog

2008-01-08 Thread Brian Whitman
I found that on the Wiki at http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef under the title: Updating a Data Record via curl. I removed it and now have the following: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint

index out of disk space, CorruptIndexException

2008-01-14 Thread Brian Whitman
We had an index run out of disk space. Queries work fine but commits return h1500 doc counts differ for segment _18lu: fieldsReader shows 104 but segmentInfo shows 212 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _18lu: fieldsReader shows 104 but

Re: index out of disk space, CorruptIndexException

2008-01-14 Thread Brian Whitman
On Jan 14, 2008, at 4:08 PM, Ryan McKinley wrote: ug -- maybe someone else has better ideas, but you can try: http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/index/CheckIndex.java thanks for the tip, i did run that, but I stopped it 30 minutes in, as it was

Re: Missing Content Stream

2008-01-15 Thread Brian Whitman
On Jan 15, 2008, at 1:50 PM, Ismail Siddiqui wrote: Hi Everyone, I am new to solr. I am trying to index xml using http post as follows Ismail, you seem to have a few spelling mistakes in your xml string. fiehld, nadme etc. (a) try fixing them, (b) try solrj instead, I agree w/ otis.

Re: best way to get number of documents in a Solr index

2008-01-15 Thread Brian Whitman
On Jan 15, 2008, at 3:47 PM, Maria Mosolova wrote: Hello, I am looking for the best way to get the number of documents in a Solr index. I'd like to do it from a java code using solrj. public int resultCount() { try { SolrQuery q = new SolrQuery(*:*); QueryResponse rq =

Re: Newbie with Java + typo

2008-01-21 Thread Brian Whitman
On Jan 21, 2008, at 11:13 AM, Daniel Andersson wrote: Well, no. Immutable Page, and as far as I know (english not being my mother tongue), that means I can't edit the page You need to create an account first.

Re: SolrPhpClient with example jetty

2008-01-22 Thread Brian Whitman
$document-title = 'Some Title'; $document-content = 'Some content for this wonderful document. Blah blah blah.'; did you change the schema? There's no title or content field in the default example schema. But I believe solr does output different errors for that.

Re: Cache size clarification

2008-01-28 Thread Brian Whitman
On Jan 28, 2008, at 6:05 PM, Alex Benjamen wrote: I need some clarification on the cache size parameters in the solrconfig. Suppose I'm using these values: A lot of this is here: http://wiki.apache.org/solr/SolrCaching

Re: Converting Solr results to java query/collection/map object

2008-02-19 Thread Brian Whitman
On Feb 19, 2008, at 3:08 PM, Paul Treszczotko wrote: Hi, I'm pretty new to SOLR and I'd like to ask your opinion on the best practice for converting XML results you get from SOLR into something that is better fit to display on a webpage. I'm looking for performance and relatively small

will hardlinks work across partitions?

2008-02-23 Thread Brian Whitman
Will the hardlink snapshot scheme work across physical disk partitions? Can I snapshoot to a different partition than the one holding the live solr index?

can I form a SolrQuery and query a SolrServer in a request handler?

2008-02-25 Thread Brian Whitman
I'm in a request handler: public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) { And in here i want to form a SolrQuery based on the req, query the searcher and return results. But how do I get a SolrServer out of the req? I can get a SolrIndexSearcher but that

Re: can I form a SolrQuery and query a SolrServer in a request handler?

2008-02-25 Thread Brian Whitman
Perhaps back up and see if we can do this a simpler way than a request handler... What is the query structure you are trying to generate? I have two dismax queries defined in a solrconfig. Something like requestHandler name=q1 class=solr.DisMaxRequestHandler ... str name=qf

Re: can I form a SolrQuery and query a SolrServer in a request handler?

2008-02-25 Thread Brian Whitman
Would query ?qt=q1q=kittensbf=2fl=id, then ? qt=q2q=kittensbf=2fl=id. Sorry, I meant: ?qt=q1q=kittensbf=sortable^2fl=id, then ? qt=q2q=kittensbf=sortable^2fl=id

invalid XML character

2008-03-01 Thread Brian Whitman
Once in a while we get this javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4,790470] [14:32:21.877] Message: An invalid XML character (Unicode: 0x6) was found in the element content of the document. [14:32:21.877] at com .sun .org .apache .xerces

Re: invalid XML character

2008-03-02 Thread Brian Whitman
I'm pretty sure it's a bad idea :-) I was just explaining why it wasn't really feasible to do on the server side. This particular case came from this solr.py: https://issues.apache.org/jira/browse/SOLR-216 By the way, is that going to become the official 1.3 solr python client? It would

What can get past document uniqueness?

2008-03-13 Thread Brian Whitman
On a solr instance with !-- field to use to determine and enforce document uniqueness. -- uniqueKeyid/uniqueKey This is happening: http://solr/select?q=id:abc123fl=id doc str name=idabc123/str /doc doc str name=idabc123/str /doc Lots of weird stuff is writing to this index: solrj code,

Re: What can get past document uniqueness?

2008-03-14 Thread Brian Whitman
is uniqueness enforced? I'd like to at least put some debug checking in there. On Mar 13, 2008, at 4:47 PM, Ryan McKinley wrote: Check this thread: http://www.nabble.com/duplicate-entries-being-returned%2C-possible-caching-issue--td15237016.html perhaps it is related? Brian Whitman wrote: On a solr

Re: What can get past document uniqueness?

2008-03-14 Thread Brian Whitman
On Mar 14, 2008, at 11:37 AM, Yonik Seeley wrote: During the add (in DirectUpdateHandler2) docs are kept track of, and during a commit they are checked for dups. That code has been very well tested though, and I've only seen duplicates on a JVM crash/restart. That's because docs are added to

highlighting: requireFieldMatch not returning anything even if field matches

2008-03-19 Thread Brian Whitman
on a solr text fieldtype called content, I have text like the following: Bono (L), Irish lead singer of the band U2 and Kurt Beck, chairman of the German Social Democratic Party (SPD) address the media at the party?s headquarters in Berlin May 14, 2007. A query with highlighting and

Re: highlighting pt2: returning tokens out of order from PhraseQuery

2008-03-20 Thread Brian Whitman
Unfortunately not with the current highlighter. But there has been a great deal of work towards fixing this here: http://issues.apache.org/jira/browse/LUCENE-794 ah, thanks Eric, didn't think to check w/ the lucene folks. I see they have somewhat working patches -- does this kind of

huge site powered by solr :)

2008-03-23 Thread Brian Whitman
We turned a demo for a partner into this cute little web app in about a week. It's using pretty much every facet (har) of solr we could think of. All the jams are solr documents and the search, recommendation, browsing, RSS/podcast etc are direct solr calls through a proxy. It was a nice

Re: Highlighting Quoted Phrases

2008-03-25 Thread Brian Whitman
On Mar 25, 2008, at 6:31 PM, Chris Harris wrote: working pretty well, but my testers have discovered something they find borderline unacceptable. If they search for stock market (with quotes), then Solr correctly returns only documents where stock and market appear as adjacent words. Two

force rsync update on a read-only index

2008-05-07 Thread Brian Whitman
We have a few slave solr servers that are just hardlinked-rsynced copies of a master server. When we do the rsync the changes don't show up immediately. The snap* scripts call a commit on the slave servers -- but since these are readonly servers we've disabled /update in the solrconfig.xml

  1   2   >