JEE servlet mapping, security and multiple Solr cores

2011-08-12 Thread Jaeger, Jay - DOT
This is both an FYI for the list so the issue gets documented and a suggestion for the developers. I thought about a JIRA, and would be happy to submit one, but the issue is pretty environment-specific, so I have not done so at this point. In testing Solr 3.3 under WebSphere Application

RE: filtering non english text from my results

2011-08-15 Thread Jaeger, Jay - DOT
1. Find a dictionary with the English words you find acceptable 2. Use the KeepWordFilterFactory (doc in the AnalyzerTTokenizersTokenFilters Wiki page). -Original Message- From: Omri Cohen [mailto:omri...@gmail.com] Sent: Monday, August 15, 2011 1:23 AM To:

RE: ideas for indexing large amount of pdf docs

2011-08-15 Thread Jaeger, Jay - DOT
Note on i: Solr replication provides pretty good clustering support out-of-the-box, including replication of multiple cores. Read the Wiki on replication (Google +solr +replication if you don't know where it is). In my experience, the problem with indexing PDFs is it takes a lot of CPU on

RE: Product data schema question

2011-08-16 Thread Jaeger, Jay - DOT
On the surface, you could simply add some more fields to your schema. But as far as I can tell, you would have to have a separate Solr document for each SKU/size combination, and store the rest of the information (brand, model, color, SKU) redundantly and make the unique key a combination of

RE: Product data schema question

2011-08-16 Thread Jaeger, Jay - DOT
on (p.sku = i.sku) On Tue, Aug 16, 2011 at 8:00 AM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: On the surface, you could simply add some more fields to your schema. But as far as I can tell, you would have to have a separate Solr document for each SKU/size combination, and store the rest

RE: Product data schema question

2011-08-16 Thread Jaeger, Jay - DOT
to a reasonable solution are you interested in the details? On Tue, Aug 16, 2011 at 11:44 AM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: No, I don't think so. A given core can only use one configuration and therefore only one schema, as far as I know, and a schema can only have one key. You

RE: Unable to get multicore working

2011-08-16 Thread Jaeger, Jay - DOT
Perhaps your admin doesn’t work because you don't have defaultCoreName=whatever-core-you-want-by-default in your cores tag? E.g.: cores adminPath=/admin/cores defaultCoreName=collection1 Perhaps this was enough to prevent it starting any cores -- I'd expect a default to be required.

RE: Unable to get multicore working

2011-08-16 Thread Jaeger, Jay - DOT
, Jaeger, Jay - DOT wrote: Perhaps your admin doesn’t work because you don't have defaultCoreName=whatever-core-you-want-by-default in your cores tag? E.g.: cores adminPath=/admin/cores defaultCoreName=collection1 Perhaps this was enough to prevent it starting any cores -- I'd expect

RE: Unable to get multicore working

2011-08-16 Thread Jaeger, Jay - DOT
I tried on my own test environment -- pulling out the default core parameter out, under Solr 3.1 I got exactly your symptom: an error 404. HTTP ERROR 404 Problem accessing /solr/admin/index.jsp. Reason: missing core name in path The log showed:

RE: Unable to get multicore working

2011-08-16 Thread Jaeger, Jay - DOT
Whoops: That was Solr 4.0 (which pre-dates 3.1). I doubt very much that the release matters, though: I expect the behavior would be the same. -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Tuesday, August 16, 2011 4:04 PM To: solr-user

RE: Unable to get multicore working

2011-08-16 Thread Jaeger, Jay - DOT
: org.apache.solr.common.SolrException: Unknown fieldtype 'long' specified on field area_id Errr. Why would `long` be an invalid type? On Tuesday, 16 August, 2011 at 2:06 PM, Jaeger, Jay - DOT wrote: Whoops: That was Solr 4.0 (which pre-dates 3.1). I doubt very much that the release matters, though: I expect

RE: Unable to get multicore working

2011-08-17 Thread Jaeger, Jay - DOT
. You guys saved me from the insane asylum. On Tuesday, 16 August, 2011 at 2:32 PM, Jaeger, Jay - DOT wrote: That said, the logs are showing a different error now. Excellent! The site schemas are loading! Great! SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'long

RE: master unreachable - attempting simple replication

2011-08-17 Thread Jaeger, Jay - DOT
I'd suggest looking at the logs of the master to see if the request is getting thru or not, or if there are any errors logged there. If the master has a replication config error, it might show up there. We just went thru some master/slave troubleshooting. Here are some things that you might

RE: Solr 1.4.1 vs 3.3 (Speed)

2011-08-17 Thread Jaeger, Jay - DOT
It would perhaps help if you reported what you mean by noticeably less time. What were your timings? Did you run the tests multiple times? One thing to watch for in testing: Solr performance is greatly affected by the OS file system cache. So make sure when testing that you use the same

RE: Most current tik jar files that work with Solr 1.4.1

2011-08-17 Thread Jaeger, Jay - DOT
What is the latest version of Tika that I can use with Solr 1.4.1? it comes packaged with 0.4. I tried 0.8 and it no workie. When I was testing Tika last year, I used Solr build 1271 to get the most recent Tika I could get my hands on at the time. That was before Solr 3.1, so I expect it

RE: 'Stable' 4.0 version

2011-08-17 Thread Jaeger, Jay - DOT
geospatial requirements Looking at your email address, no surprise there. 8^) What insight can you share (if any) regarding moving forward to a later nightly build? I used build 1271 (Solr 1.4.1, which seemed to be called Solr 4 at the time) during some testing, and it performed well --

RE: Synonym and Whitespaces and optional TokenizerFactory

2011-08-18 Thread Jaeger, Jay - DOT
You could presumably do it with solr.PatternTokenizerFactory with the pattern set to .* as your tokenizer Or, maybe, if Solr allows it, you don't use any tokenizer at all? Or, maybe you could use solr.WhitespaceTokenizerFactory, allowing it to split up the words, along with

RE: Solr Copyfields

2011-08-18 Thread Jaeger, Jay - DOT
I would suggest #3, unless you have some very unusual performance requirements. It has the advantage of isolating your index environment requirements from the database. -Original Message- From: Nicholas Fellows [mailto:n...@djdownload.com] Sent: Thursday, August 18, 2011 8:40 AM To:

RE: XSLT Exception

2011-08-18 Thread Jaeger, Jay - DOT
I am not an XSLT expert, but believe that in XSLT, not is a function, rather than an operator. http://www.w3.org/TR/xpath-functions/#func-not So, not(contains)) rather than not contains() should presumably do the trick. -Original Message- From: Christopher Gross

RE: how to deal with URLDatasource which needs authorization?

2011-08-24 Thread Jaeger, Jay - DOT
You could run the HTML import from Tika (see the Solr tutorial on the Solr website). The job that ran Tika would need the user/password of the site to be indexed, but Solr would not. (You might have to write a little script to get the HTML page using curl or wget or Nutch). Users could then

RE: query

2011-08-24 Thread Jaeger, Jay - DOT
One way I had thought of doing this kind of thing: include in the index an ACL of some sort. The problem I see in your case is that the list if friends can presumably change over time. So, given that, one way would be to have a little application in between. The request goes to the

RE: Best way to anchor solr searches?

2011-08-25 Thread Jaeger, Jay - DOT
I don't think it has to be quite so bleak as that, depending upon the number of queries done over a given timeframe, and the size of the result sets. Solr does cache the identifiers of documents returned by search results. See http://wiki.apache.org/solr/SolrCaching paying particular

RE: Solr in a windows shared hosting environment

2011-08-25 Thread Jaeger, Jay - DOT
Yes, but since Solr is written in Java to run in a JEE container, you would host Solr in a web application server, either Jetty (which comes packaged), or something else (say, Tomcat or WebSphere or something like that). As a result, you aren't going to find anything that says how to run Solr

RE: How to copy and extract information from a multi-line text before the tokenizer

2011-08-25 Thread Jaeger, Jay - DOT
A programmer had a problem. He tried to solve it with regular expressions. Now he has two problems :). A. That just isn't fair... 8^) (I can't think of very many things that have allowed me to perform more magic over my career than regular expressions, starting with SNOBOL. Uh oh: I

RE: Solr in a windows shared hosting environment

2011-08-25 Thread Jaeger, Jay - DOT
shared hosting environment Thank you! Since it's shared hosting, how do I install java? -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Thursday, August 25, 2011 4:34 PM To: solr-user@lucene.apache.org Subject: RE: Solr in a windows shared hosting environment

RE: SolrServer instances

2011-08-29 Thread Jaeger, Jay - DOT
It sounds like the correspondent (Jonty) is thinking just in terms of SolrJ -- wanting to share that across multiple threads in an application server. In which case the question would be whether it would be possible/safe/efficient to share a single instantiation of the SolrJ class(es) across

RE: how to deal with URLDatasource which needs authorization?

2011-08-29 Thread Jaeger, Jay - DOT
So, the question then seems to be: is there a way to place credentials in the URLDataSource. There doesn't seem to be an explicit user ID or password ( http://wiki.apache.org/solr/DataImportHandler#Configuration_of_URLDataSource_or_HttpDataSource ) but perhaps you can include them in URL

RE: Viewing the complete document from within the index

2011-08-30 Thread Jaeger, Jay - DOT
I am trying to peek into the index to see if my index-time synonym expansions are working properly or not. For this I have successfully used the analysis page of the admin application that comes out of the box. Works really well for debugging schema changes. JRJ -Original Message-

RE: add documents to the slave

2011-08-30 Thread Jaeger, Jay - DOT
Another way that occurs to me is that if you have a securityconstraint on the update URL(s) in your web.xml, you can map them to no groups / empty groups in the JEE container. JRJ -Original Message- From: simon [mailto:mtnes...@gmail.com] Sent: Tuesday, August 30, 2011 12:21 PM To:

RE: missing field in schema browser on solr admin

2011-08-30 Thread Jaeger, Jay - DOT
Also... Did he restart either his web app server container or at least the Solr servlet inside the container? JRJ -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Friday, August 26, 2011 5:29 AM To: solr-user@lucene.apache.org Subject: Re: missing field in

RE: core creation and instanceDir parameter

2011-08-31 Thread Jaeger, Jay - DOT
Well, if it is for creating a *new* core, Solr doesn't know it is pointing to your shared conf directory until after you create it, does it? JRJ -Original Message- From: Gérard Dupont [mailto:ger.dup...@gmail.com] Sent: Wednesday, August 31, 2011 8:17 AM To: solr-user@lucene.apache.org

RE: is it possible to do automatic indexing in solr ?

2011-09-01 Thread Jaeger, Jay - DOT
If you are indexing data, rather than documents, another possibility is to use database triggers to fire off updates. -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Wednesday, August 31, 2011 9:13 AM To: solr-user@lucene.apache.org Subject: Re: is it

RE: Wildcard Query

2011-09-06 Thread Jaeger, Jay - DOT
I solved a similar kind of issue (where I actually needed multi-valued attributes, e.g. people with multiple or hyphenated last names) by including PositionFilterFactory in the filter list for the analyzer in such fields' fieldType, thereby setting the position of each value to 1. JRJ

RE: copying one field to another using regex

2011-09-06 Thread Jaeger, Jay - DOT
Not quite sure what you are asking. You can certainly use copyField to copy a field, and then apply regex on the destination field's fieldType. We do that. JRJ -Original Message- From: alx...@aim.com [mailto:alx...@aim.com] Sent: Thursday, September 01, 2011 4:16 PM To:

RE: how to write a script for indexing in windows to perform scheduling?

2011-09-06 Thread Jaeger, Jay - DOT
You seem to have two questions: 1) How to write a script to import data 2) How to schedule that in Windows For #1, I suggest that you visit the Solr tutorials at http://lucene.apache.org/solr/tutorial.html to learn what commands might be used to import data. You might find that you need to

RE: Synonyms Not Working when using SRC DEST

2011-09-06 Thread Jaeger, Jay - DOT
It won't work given your current schema. To get the desired results, you would need to expand your synonyms at both index AND query time. Right now your schema seems to specify it only at index time. So, as the other respondent indicated, currently you replace allergy with the other list

RE: Synonyms Not Working when using SRC DEST

2011-09-07 Thread Jaeger, Jay - DOT
I have a very huge schema spanning up to 10K lines , if I use query time it will be huge hit for me because one term will be mapped to multiple terms . similar in the case of allergy I think maybe you mean synonym file, rather than the schema? I doubt that the number of lines matters all

RE: Synonyms Not Working when using SRC DEST

2011-09-07 Thread Jaeger, Jay - DOT
Also, just to make one thing just a bit more clear. You can specify two different kinds of entries in synonym files. See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters (solr.SynonymFilterFactory) One is replacement, where the words before the = are *replaced* by the right

RE: how to run solr in apache server?

2011-09-07 Thread Jaeger, Jay - DOT
That is correct. Apache is not an *application* server. It is an HTTP *web* server. On its own it does not support running Java applications written to the JEE/J2EE servlet specification - like Solr. (Apache is also not written in Java, if that was what you meant). -Original

RE: how to run solr in apache server?

2011-09-07 Thread Jaeger, Jay - DOT
Other containers that will support Solr: just about any JEE/J2EE container. We have tested under WebSphere Application Server Version 7 -- works fine. Oracle's web application server would presumably work, too -- just about anything. -Original Message- From: nagarjuna

RE: how to run solr in apache server?

2011-09-07 Thread Jaeger, Jay - DOT
Jaeger, Jay - DOT. so i can conclude that solr will run only on application servers(having servlet containers) and not in web servers am i correct? and i have one more question is it possible to add servlet container to the web servers? -- View this message in context: http

RE: running SOLR on same server as your website

2011-09-07 Thread Jaeger, Jay - DOT
You could host Solr inside the same Tomcat container, or in a different servlet container (say, a second Tomcat instance) on the same server. Be aware of your OS memory requirements, though: In my experience, Solr performs best when it has lots of OS memory to cache index files (at least, if

RE: Spellcheck

2011-09-08 Thread Jaeger, Jay - DOT
Following up from your message on the Nutch list. If q=*:* is showing you empty doc/doc elements, no fields are getting indexed. I don't think that is correct. I believe that the correct statement would be no fields are getting *** stored ***. If the fields were not getting indexed, they

RE: can indexing information stored in db rather than filesystem?

2011-09-08 Thread Jaeger, Jay - DOT
If you think about it, Lucene (upon which Solr is build) *is* a kind of DBMS - just not an RDBMS. After all, in the end, a DBMS stores its stuff in files, too. If you then turned around and mapped the stuff that Solr does into database tables, you would lose all of the performance advantages

RE: question about StandardAnalyzer, differences between solr 1.4 and solr 3.3

2011-09-12 Thread Jaeger, Jay - DOT
Looking at the Wiki ( http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters ), it looks like the solr.StandardTokenizerFactory changed with Solr 3.1 . We use solr.KeyWordTokenizerFactory for our middle names (and then also throw in solr.LowerCaseFilterFactory to normalize to lower

RE: Master Slave Question

2011-09-12 Thread Jaeger, Jay - DOT
You could prevent queries to the master by limiting what IP addresses are allowed to communicate with it, or by modifying web.xml to put different security on /update vs. /select . We took a simplistic approach. We did some load testing, and discovered that we could handle our expected update

RE: How to serach on specific file types ?

2011-09-12 Thread Jaeger, Jay - DOT
Some possibilities: 1) Put the file extension into your index (that is what we did when we were testing indexing documents with Solr) 2) Put a mime type for the document into your index. 3) Put the whole file name / URL into your index, and match on part of the name. This will give some false

RE: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Jaeger, Jay - DOT
I don't think you understand. Solr does not have the code to do that. It just isn't there, nor would I expect it would ever be there. Solr is open source though. You could look at the code and figure out how to do it (though why anyone would do that remains beyond my ability to understand).

RE: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Jaeger, Jay - DOT
Nicely put. ;^) -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Tuesday, September 13, 2011 9:16 AM To: solr-user@lucene.apache.org Subject: Re: can indexing information stored in db rather than filesystem? On Sep 13, 2011, at 6:51 AM, kiran.bodigam

RE: Out of memory

2011-09-13 Thread Jaeger, Jay - DOT
numDocs is not the number of documents in memory. It is the number of documents currently in the index (which is kept on disk). Same goes for maxDocs, except that it is a count of all of the documents that have ever been in the index since it was created or optimized (including deleted

RE: EofException with Solr in Jetty

2011-09-14 Thread Jaeger, Jay - DOT
Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks like this: if (_closed) throw new IOException(Closed);

RE: index not created

2011-09-14 Thread Jaeger, Jay - DOT
changed the configuration to point it to my solr dir and started it again You might look in your logs to see where Solr thinks the Solr home directory is and/or if it complains about not being able to find it. As a guess, it can't find it, perhaps because solr.solr.home does not point to the

RE: Schema fieldType y-m-d ?!?!

2011-09-14 Thread Jaeger, Jay - DOT
Just add a bogus 0 timestamp after it when you index it. That is what we did. Dates are not stored or indexed as characters, anyway, so space would not be any different one way or the other. JRJ -Original Message- From: stockii [mailto:stock.jo...@googlemail.com] Sent: Wednesday,

RE: EofException with Solr in Jetty

2011-09-14 Thread Jaeger, Jay - DOT
I have not used SolrJ, but it probably is worth considering as a possible suspect. Also, do you have anything in between the client and the Solr server (a firewall, load balancer, etc.?) that might play games with HTTP connections? You might want to start up a network trace on the server or

RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
I think folks are going to need a *lot* more information. Particularly 1. Just what does your test script do? Is it doing updates, or just queries of the sort you mentioned below? 2. If the test script is doing updates, how are those updates being fed to Solr? 3. What version of Solr

RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
of documents (2,000,000). I was trying everything before asking here. 5. Machine characteristics, particularly operating system and physical memory on the machine. OS = Debian 6.0, Physcal Memory = 32 gb, CPU = 2x Intel Quad Core On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT jay.jae

RE: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler

2011-09-14 Thread Jaeger, Jay - DOT
Some things to think about: When solr starts up, solr should report for the location of solr home. Is it what you expect? Is there any security on the dist directory that would prevent solr from accessing it? Is there a classloader policy set on glassfish that could be getting in the way?

RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
much. When i send a set of random queries (10-20 queries per second) response times goes crayz ( 8 seconds to 60+ seconds). On Wed, Sep 14, 2011 at 6:07 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: I don't have enough experience with filter queries to advise well on when to use fq vs. putting

RE: Replication and ExternalFileField

2011-09-15 Thread Jaeger, Jay - DOT
Actually, Windoze also has symbolic links. You have to manipulate them from the command line, but they do exist. http://en.wikipedia.org/wiki/NTFS_symbolic_link -Original Message- From: Per Osbeck [mailto:per.osb...@lbi.com] Sent: Thursday, September 15, 2011 7:15 AM To:

RE: SOLR Index Speed

2011-09-26 Thread Jaeger, Jay - DOT
500 / second would be 1,800,000 per hour (much more than 500K documents). 1) how big is each document? 2) how big are your index files? 3) as others have recently written, make sure you don't give your JRE so much memory that your OS is starved for memory to use for file system cache. JRJ

RE: A fieldType for a address street

2011-09-26 Thread Jaeger, Jay - DOT
We used copyField to copy the address to two fields: 1. Which contains just the first token up to the first whitespace 2. Which copies all of it, but translates to lower case. Then our users can enter either a street number, a street name, or both. We copied all of it to the second field

RE: strange performance issue with many shards on one server

2011-09-28 Thread Jaeger, Jay - DOT
That would still show up as the CPU being busy. -Original Message- From: Federico Fissore [mailto:feder...@fissore.org] Sent: Wednesday, September 28, 2011 6:12 AM To: solr-user@lucene.apache.org Subject: Re: strange performance issue with many shards on one server Frederik Kraus, il

RE: strange performance issue with many shards on one server

2011-09-28 Thread Jaeger, Jay - DOT
Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto: That would still show up as the CPU being busy. i don't know how the program (top, htop, whatever) displays the value but when the cpu has a cache miss definitely that thread sits and waits for a number of clock cycles with 130GB of ram (per

RE: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Jaeger, Jay - DOT
One time when we had that problem, it was because one or more cores had a broken XML configuration file. Another time, it was because solr/home was not set right in the servlet container. Another time it was because we had an older EAR pointing to a newer release Solr home directory. Given

RE: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Jaeger, Jay - DOT
cores adminPath=/admij/cores Was that a cut and paste? If so, the /admij/cores is presumably incorrect, and ought to be /admin/cores -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Wednesday, September 28, 2011 4:10 PM To: solr-user

RE: 32-bit to 64-bit

2011-09-29 Thread Jaeger, Jay - DOT
Are you changing just the host OS or the JVM, or both, from 32 bit to 64 bit? If it is just the OS, the answer is definitely no, you don't need to do anything more than copy. If the answer is the JVM, I *think* the answer is still no, but others more authoritative than I may wish to respond.

RE: About solr distributed search

2011-09-29 Thread Jaeger, Jay - DOT
I am no expert, but here is my take and our situation. Firstly, are you asking what the minimum number of documents is before it makes *any* sense at all to use a distributed search, or are you asking what the maximum number of documents is before a distributed search is essentially required?

RE: Errors in requesthandler statistics

2011-09-29 Thread Jaeger, Jay - DOT
I am not expert, but based on my experience, the information you are looking for should indeed be in your logs. There are at least three logs you might look for / at: - An HTTP request log - The solr log - Logging by the application server / JVM Some information is available at

RE: Errors in requesthandler statistics

2011-09-29 Thread Jaeger, Jay - DOT
If you are asking how to tell which of 94000 records failed in a SINGLE HTTP update request, I have no idea, but I suspect that you cannot necessarily tell. It might help if you copied and pasted what you find in the solr log for the failure (see my previous response for how to figure out where

RE: Weird issues when upgrading from 1.4 to 3.4

2011-10-03 Thread Jaeger, Jay - DOT
I have no idea what might be causing your memory to increase like that (we haven't run 3.4, and our index so far has been at most 28 million rows with maybe 40 fields), but just as an aside, depending upon what you meant by we drop the whole index, I'd think it might work better to do an

RE: Error loading class 'solr.extraction.ExtractingRequestHandler'

2011-10-17 Thread Jaeger, Jay - DOT
It sounds like maybe you either have not told Solr where the Solr home directory is, or , more likely, have not copied the jar files for this particular class into the right directory (typically a lib directory) so Tomcat cannot find that class. There is other correspondence on this list that

RE: Xsl for query output

2011-10-17 Thread Jaeger, Jay - DOT
It depends upon whether you want Solr to do the XSL processing, or the browser. After fussing a bit, and doing some reading and thinking, we decided it was best to let the browser do the work, at least in our case. If the browser is doing the processing, you don't need to modify sorlconfig.xml

RE: how was developed solr admin page and the UI part?

2011-10-19 Thread Jaeger, Jay - DOT
I believe that if you have the Solr distribution, you have the source for the web UI already: it is just .jsp pages. They are inside the solr .war file. JRJ -Original Message- From: nagarjuna [mailto:nagarjuna.avul...@gmail.com] Sent: Wednesday, October 19, 2011 12:07 AM To:

RE: OS Cache - Solr

2011-10-19 Thread Jaeger, Jay - DOT
200 instances of what? The Solr application with lucene, etc. per usual? Solr cores? ??? Either way, 200 seems to be very very very many: unusually so. Why so many? If you have 200 instances of Solr in a 20 GB JVM, that would only be 100MB per Solr instance. If you have 200 instances of

RE: How to update document with solrj?

2011-10-19 Thread Jaeger, Jay - DOT
Solr does not have an update per se: you have to re-add the document. A document with the same value for the field defined as the uniqueKey will replace any existing document with that key (you do not have to query and explicitly delete it first). JRJ -Original Message- From: hadi

RE: add thumnail image for search result

2011-10-19 Thread Jaeger, Jay - DOT
It won't do it for you automatically. I suppose you might create the thumbnail image beforehand, Base64 encode it, and add it as a stored, non-indexed, binary field (see schema: solr.BinaryField) when you index the document. JRJ -Original Message- From: hadi

RE: Optimization /Commit memory

2011-10-19 Thread Jaeger, Jay - DOT
Commit does not particularly spike disk or memory usage, unless you are adding a very large number of documents between commits. A commit can cause a need to merge indexes, which can increase disk space temporarily. An optimize is *likely* to merge indexes, which will usually increase disk

RE: how was developed solr admin page and the UI part?

2011-10-20 Thread Jaeger, Jay - DOT
It certainly is possible to develop search pages, update pages, etc. in any architecture you like: I think I'd suggest looking at SolrJ if you want to do that.http://wiki.apache.org/solr/Solrj PLEASE: Go read through the documentation and tutorial and browse thru the Wiki and FAQ. It's

RE: Optimization /Commit memory

2011-10-20 Thread Jaeger, Jay - DOT
,combined Index Size is 14GB .Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM is 14GB +3 * 2.5 GB ~ = 22GB. Correct? Regards Sujatha On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: Commit does not particularly spike disk or memory usage, unless

RE: OS Cache - Solr

2011-10-20 Thread Jaeger, Jay - DOT
Instances not solr cores. We get an avg response time of below 1 sec. The number of documents is not many most of the isntances ,some of the instnaces have about 5 lac documents on average. Regards Sujahta On Thu, Oct 20, 2011 at 3:35 AM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: 200 instances

RE: Optimization /Commit memory

2011-10-24 Thread Jaeger, Jay - DOT
On Thu, Oct 20, 2011 at 6:23 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: Well, since the OS RAM includes the JVM RAM, that is part of your requirement, yes? Aside from the JVM and normal OS requirements, all you need OS RAM for is file caching. Thus, for updates, the OS RAM is not a major

RE: some basic information on Solr

2011-10-24 Thread Jaeger, Jay - DOT
1. Solr, proper, does not index files. An adjunct called Solr Cel can. See http://wiki.apache.org/solr/ExtractingRequestHandler . That article describes which kinds of files it Solr Cel can handle. 2. I have no idea what you mean by incidents per year. Please explain. 3. Even though you

RE: indexing key value pair into lucene solr index

2011-10-24 Thread Jaeger, Jay - DOT
Maybe put them in a single string field (or any other field type that is not analyzed -- certainly not text) using some character separator that will connect them, but won't confuse the Solr query parser? So maybe you start out with key value pairs of Key1 value1 Key2 value2 Key3 value3

RE: some basic information on Solr

2011-10-25 Thread Jaeger, Jay - DOT
website but found it was really technical, since we are not on the developer side and we just want some basic information or numbers about its usage. Thanks for your answer, anyway. 2011/10/24 Jaeger, Jay - DOT jay.jae...@dot.wi.gov 1. Solr, proper, does not index files. An adjunct called Solr

RE: sort non-roman character strings last

2011-10-25 Thread Jaeger, Jay - DOT
Could you replace it with something that will sort it last instead of an empty string? (Say, for example, replacement={}). This would still give something that looks empty to a person, and would sort last. BTW, it looks to me as though your pattern only requires that the input contain just

RE: sort non-roman character strings last

2011-10-25 Thread Jaeger, Jay - DOT
As far as I know, in the index, a string that is zero length is still a string, and would not count as missing. The CSV importer has a way to not index empty entries, but once it is in the index, it is in the index -- as an empty string. i.e. String silly = null; Is not the same

RE: Points to processing hastags

2011-10-25 Thread Jaeger, Jay - DOT
Sounds like a possible application of solr.PatternTokenizerFactory http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternTokenizerFactory.html You could use copyField to copy the entire string to a separate field (or set of fields) that are processed by patterns. JRJ

RE: Replication issues with multiple Slaves

2011-10-25 Thread Jaeger, Jay - DOT
I noted that in these messages the left hand side is lower case collection, but the right hand side is upper case Collection. Assuming you did a cut/paste, could you have a core name mismatch between a master and a slave somehow? Otherwise (shudder): could you be doing a commit while the

RE: Loading data to SOLR first time ( taking too long)

2011-10-25 Thread Jaeger, Jay - DOT
My goodness. We do 4 million in about 1/2 HOUR (7+ million in 40 minutes). First question: Are you somehow forcing Solr to do a commit for each and every record? If so, that way leads to the house of PAIN. The thing to do next, I suppose, might be to try and figure out whether the issue is

RE: Replication issues with multiple Slaves

2011-10-26 Thread Jaeger, Jay - DOT
. -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: 25 October 2011 20:48 To: solr-user@lucene.apache.org Subject: RE: Replication issues with multiple Slaves I noted that in these messages the left hand side is lower case collection, but the right

RE: Loading data to SOLR first time ( taking too long)

2011-10-26 Thread Jaeger, Jay - DOT
- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Tuesday, October 25, 2011 4:03 PM To: 'solr-user@lucene.apache.org' Subject: RE: Loading data to SOLR first time ( taking too long) My goodness. We do 4 million in about 1/2 HOUR (7+ million in 40 minutes). First question

RE: some basic information on Solr

2011-10-26 Thread Jaeger, Jay - DOT
It didn't look like that, but maybe. Our experience has been very very good. I don't think we have seen a crash in our prototype to date (though that prototype is also not very busy). We have had as many a four cores, with as many as 35 million documents. -Original Message- From:

RE: Difficulties Installing Solr with Jetty 7.x

2011-10-26 Thread Jaeger, Jay - DOT
From your logs, it looks like the Solr library is being found just fine, and that the servlet is initing OK. Does your Jetty configuration specify index.jsp in a welcome list? We had that problem in WebSphere: we got 404's the same way, and the cure was to modify the Jetty web.xml to include:

RE: Difficulties Installing Solr with Jetty 7.x

2011-10-26 Thread Jaeger, Jay - DOT
ERRATA, that should the the *SOLR* web.xml (not the Jetty web.xml) Sorry for the confusion. -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Wednesday, October 26, 2011 4:02 PM To: 'solr-user@lucene.apache.org' Subject: RE: Difficulties Installing Solr

RE: Upgratding the Index from 1.4.1 to 3.4 using replication

2011-10-26 Thread Jaeger, Jay - DOT
I very much doubt that would work: different versions of Lucene involved, and Solr replication does just a streamed file copy, nothing fancy. JRJ -Original Message- From: Nemani, Raj [mailto:raj.nem...@turner.com] Sent: Wednesday, October 26, 2011 12:55 PM To:

RE: Difficulties Installing Solr with Jetty 7.x

2011-10-27 Thread Jaeger, Jay - DOT
be messed with unless the intention is to affect global container-wide behavior. Which I don't. I'm only trying to get Solr running. I may want to run other apps, so I'd rather leave Jetty's config files as is. On 10/26/2011 2:05 PM, Jaeger, Jay - DOT wrote: ERRATA, that should the the *SOLR

RE: large scale indexing issues / single threaded bottleneck

2011-11-03 Thread Jaeger, Jay - DOT
Shishir, we have 35 million documents, and should be doing about 5000-1 new documents a day, but with very small documents: 40 fields which have at most a few terms, with many being single terms. You may occasionally see some impact from top level index merges but those should be very

RE: change solr url

2011-11-03 Thread Jaeger, Jay - DOT
The file that he refers to, web.xml, is inside the solr WAR file in folder web-inf. That WAR file is in ...\example\webapps. You would have to uncomment the init-param section under filter-class and change the param-value to something else. But, as the comments in the filter-class section

RE: Questions about Solr's security

2011-11-03 Thread Jaeger, Jay - DOT
It seems to me that this issue needs to be addressed in the FAQ and in the tutorial, and that somewhere there should be a /select lock-down how to. This is not obvious to many (most?) users of Solr. It certainly wasn't obvious to me before I read this. JRJ -Original Message- From: