Re: Upgrading Tika in Solr

2010-02-17 Thread Liam O'Boyle
I just copied in the newer .jars and got rid of the old ones and everything seemed to work smoothly enough. Liam On Tue, 2010-02-16 at 13:11 -0500, Grant Ingersoll wrote: I've got a task open to upgrade to 0.6. Will try to get to it this week. Upgrading is usually pretty trivial. On

Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-17 Thread Ron Chan
probably not if there is no need to embed or programmatically start and stop the server then Tomcat would be the safe choice, probably easier to get going with to start with and you'll find a lot more information about it - Original Message - From: Steve Radhouani

Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-17 Thread Steve Radhouani
Thanks a lot Ron! 2010/2/17 Ron Chan rc...@i-tao.com probably not if there is no need to embed or programmatically start and stop the server then Tomcat would be the safe choice, probably easier to get going with to start with and you'll find a lot more information about it -

Incremental Backup of Indexes

2010-02-17 Thread abhishes
Hello All, If we have very large index size, how can I back up incrementally. (one full backup followed by multiple incremental backups). How do I take compressed backups? Do I have roll out the backup infrastructure manually? or is there something pre-built? -- View this message in

Re: dataimporthandler and expungeDeletes=false

2010-02-17 Thread Jorg Heymans
Looking closer at the documentation, it appears that expungeDeletes in fact has nothing to do with 'removing deleted documents from the index' as i thought before: http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22_and_.22optimize.22 expungeDeletes = true | false

Re: Incremental Backup of Indexes

2010-02-17 Thread Jay Ess
abhishes wrote: Hello All, If we have very large index size, how can I back up incrementally. (one full backup followed by multiple incremental backups). How do I take compressed backups? http://rsnapshot.org/

xml error when indexing

2010-02-17 Thread Jan Simon Winkelmann
Hi, I'm having a strange problem when indexing data through our application. Whenever I post something to the update resource, I get Unexpected character 'a' (code 97) in prolog; expected '' at [row,col {unknown-source}]: [1,1], html head meta http-equiv=Content-Type content=text/html;

Need feedback on solr security

2010-02-17 Thread Vijayant Kumar
Hi Group, I need some feedback on solr security. For Making by solr admin password protected, I had used the Path Based Authentication form http://wiki.apache.org/solr/SolrSecurity. In this way my admin area,search,delete,add to index is protected.But Now when I make solr authenticated then

solr word frequency

2010-02-17 Thread michaelnazaruk
hi all! How I can get the frequency for word in index? -- View this message in context: http://old.nabble.com/solr-word-frequency-tp27622615p27622615.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr word frequency

2010-02-17 Thread Steve Radhouani
Using the Schema Browser of the Solr interface or Luke you can get the frequency of a word in a specific field, but I don't know how to get it in the entire index. A dirty solution would be to create a new field and copy in it all your existing fields (copyField source=existingField dest=newField

scores are the same for many diferent documents

2010-02-17 Thread Marc Sturlese
Hey there, I see that when solr gives me back the scores in the response it are the same for many different documents. I have build a simple index for testing purposes with just documents with one field indexed with standard analyzer and containing pices of text. I have done the same with a self

Re: Need feedback on solr security

2010-02-17 Thread Xavier Schepler
Vijayant Kumar wrote: Hi Group, I need some feedback on solr security. For Making by solr admin password protected, I had used the Path Based Authentication form http://wiki.apache.org/solr/SolrSecurity. In this way my admin area,search,delete,add to index is protected.But Now when I make

Re: solr word frequency

2010-02-17 Thread michaelnazaruk
Schema browser and Luke don't fit! Because I need get frequency for selected word in my code! In Luke display only first 10 words! I try to change some configs in solrconfig and in schema but it don't help me! Maybe there are another way to get frequency for word? -- View this message in

Re: Need feedback on solr security

2010-02-17 Thread Vijayant Kumar
Hi Xavier, Thanks for your feedback the firewall rule for the trusted IP is not fessiable for us because the application is open for public so we can not work through IP banning. Vijayant Kumar wrote: Hi Group, I need some feedback on solr security. For Making by solr admin password

Re: solr word frequency

2010-02-17 Thread Steve Radhouani
in the Schema browser, you can specify the Top X Terms you want to display. Here's what you have on the browser: *Docs: * xxx *Distinct: * Top Terms Thus, you can get the frequency of a given word, even though it's not the most elegant solution. 2010/2/17 michaelnazaruk

Re: Need feedback on solr security

2010-02-17 Thread Xavier Schepler
Vijayant Kumar wrote: Hi Xavier, Thanks for your feedback the firewall rule for the trusted IP is not fessiable for us because the application is open for public so we can not work through IP banning. Vijayant Kumar wrote: Hi Group, I need some feedback on solr security. For Making

Re: solr word frequency

2010-02-17 Thread michaelnazaruk
I found more interesting way: http://localhost:8983/solr/select?q=bongoterms=trueterms.fl=idterms.prefix=bongoindent=true in terms.prefix we set the value witch we want to find :) I hope this example help for another people ... Thanks for all, who help me :) -- View this message in context:

Re: xml error when indexing

2010-02-17 Thread Erick Erickson
The file looks good to me, but as I remember, the xml must be UTF-8 (but check). Is there a chance that somewhere in the chain it's not? HTH Erick 2010/2/17 Jan Simon Winkelmann winkelm...@newsfactory.de Hi, I'm having a strange problem when indexing data through our application. Whenever I

Re: Need feedback on solr security

2010-02-17 Thread Xavier Schepler
Xavier Schepler wrote: Vijayant Kumar wrote: Hi Xavier, Thanks for your feedback the firewall rule for the trusted IP is not fessiable for us because the application is open for public so we can not work through IP banning. Vijayant Kumar wrote: Hi Group, I need some feedback on solr

Re: scores are the same for many diferent documents

2010-02-17 Thread Erick Erickson
OmitNorms=false is probably what you want. Did you re-create your index for each test? Also, what does debutQuery=true show? You could get a copy of Luke (google Lucene Luke) and use that to examine your index to see how things score, which would give you some clue whether your index (and

long warmup duration

2010-02-17 Thread Stefan Neumann
Hi all, we are facing extremly increasing warmup times the last 15 days, which we are not able to explain, since the number of documents and their size is stable. Before the increase we can commit our changes in nearly 20 minutes, now it is about 2 hours. We were able to identify the warmup of

RE: Need feedback on solr security

2010-02-17 Thread Fuad Efendi
You could set a firewall that forbid any connection to your Solr's server port to everyone, except the computer that host your application that connect to Solr. So, only your application will be able to connect to Solr. I believe firewalling is the only possible solution since SOLR doesn't

RE: Need feedback on solr security

2010-02-17 Thread Fuad Efendi
For Making by solr admin password protected, I had used the Path Based Authentication form http://wiki.apache.org/solr/SolrSecurity. In this way my admin area,search,delete,add to index is protected.But Now when I make solr authenticated then for every update/delete from the fornt end is

Re: Need feedback on solr security

2010-02-17 Thread Gora Mohanty
On Wed, 17 Feb 2010 10:13:46 -0400 Fuad Efendi f...@efendi.ca wrote: You could set a firewall that forbid any connection to your Solr's server port to everyone, except the computer that host your application that connect to Solr. So, only your application will be able to connect to Solr.

Re: persistent cache

2010-02-17 Thread Toke Eskildsen
On Tue, 2010-02-16 at 10:35 +0100, Tim Terlegård wrote: I actually tried SSD yesterday. Queries which need to go to disk are much faster now. I did expect that warmup for sort fields would be much quicker as well, but that seems to be cpu bound. That and bulk I/O. The sorter imports the Terms

Re: ConstantScoreQuery and wildcards

2010-02-17 Thread TCK
Thanks, this is very helpful! -TCK On Tue, Feb 16, 2010 at 8:16 PM, Ahmet Arslan iori...@yahoo.com wrote: It seems that when I do a search with a wildcard (eg, +text:abc*) the Solr standard SearchHandler will construct a ConstantScoreQuery passing in a Filter, so all the documents in

Re: Preventing mass index delete via DataImportHandler full-import

2010-02-17 Thread Daniel Shane
Thats what I thought. I think I'll take the time to add something to the DIH to prevent such things. Maybe a parameter that will cause the import to bail out if the documents to index are less than X % of the total number of documents already in the index. There would also be a parameter to

Re: Merge several queries into one result?

2010-02-17 Thread Daniel Shane
Yup, thats also what I was thinking. However, I do think that many real world examples cannot simply use one flat index. If you have a big index with big documents, you may want to have a separate, small index, for things that update frequently etc.. You would need to cross reference that

Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-17 Thread gary
http://www.webtide.com/choose/jetty.jsp - Original Message - From: Steve Radhouani r.steve@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, 16 February, 2010 12:38:04 PM Subject: Tomcat vs Jetty: A Comparative Analysis? Hi there, Is there any analysis out

Catching slow shards

2010-02-17 Thread Otis Gospodnetic
Hello, Does Solr have any hooks that allow one to watch out for any slaves not responding to a query request in the context of distributed search? That is, if a query is sent to shards A, B, and C, and if B doesn't actually respond (within N milliseconds), I'd like to know about it, and I'm

Re: create requesthandler with default shard parameter for different query parser, stock solr 1.4

2010-02-17 Thread Jason Venner
Anyone come up with an answer for this? I am using the blacklight ruby app and seems to require multiple handlers for different styles of queries. In particular, what I am noticing is that the facet query using q=*:* seems to produce a single shard answer. This query produces 1 result and

Re: Copying dynamic fields into default text field messing up fieldNorm?

2010-02-17 Thread Yu-Shan Fung
I'll take a stab. IMHO, it doesn't make much sense to propagae the boost, and here's why: For the typical use case, copyField is used to add other searchable fields into the default text field for Standard queries. Say we are copying the ModelNumber field into the text field, and we have a boost

AW: Performance-Issues and raising numbers of cumulative inserts

2010-02-17 Thread Bohnsack, Sven
Sorry, for the chaos-posts, if someone minds :) My Colleague posted more infos here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201002.mbox/%3c4b7bf56e.3080...@freiheit.com%3e I would be very pleased if you could response any idea to his post. Regards, Sven -Ursprüngliche

Re: long warmup duration

2010-02-17 Thread Antonio Lobato
Drop those cache numbers. Way down. I warm up 30 million documents in about 2 minutes with the following configuration: documentCache class=solr.FastLRUCache size=128 initialSize=10 cleanupThread=true / queryResultCache

Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-17 Thread Andy
This read more like a PR release or product brochure for jetty than anything else. Then I poked around the website and realized why: it was written by the creator of Jetty, and is hosted on the website of a company with the slogan The Java Experts behind Jetty --- On Wed, 2/17/10,

Reindex after changing defaultSearchField?

2010-02-17 Thread Frederico Azeiteiro
Hi, If i change the defaultSearchField in the core schema, do I need to recreate the index? Thanks, Frederico

Re: Reindex after changing defaultSearchField?

2010-02-17 Thread Joe Calderon
no, youre just changing how your querying the index, not the actual index, you will need to restart the servlet container or reload the core for the config changes to take effect tho On 02/17/2010 10:04 AM, Frederico Azeiteiro wrote: Hi, If i change the defaultSearchField in the core

Re: xml error when indexing

2010-02-17 Thread Chris Hostetter
: I'm having a strange problem when indexing data through our application. : Whenever I post something to the update resource, I get : : Unexpected character 'a' (code 97) in prolog; expected '' at [row,col {unknown-source}]: [1,1], html ... : However, when I post the same data from

Re: Preventing mass index delete via DataImportHandler full-import

2010-02-17 Thread Chris Hostetter
: Thats what I thought. I think I'll take the time to add something to the : DIH to prevent such things. Maybe a parameter that will cause the import : to bail out if the documents to index are less than X % of the total : number of documents already in the index. the devils in the details

Re: Merge several queries into one result?

2010-02-17 Thread Erick Erickson
Certainly if you come up with a general solution, the whole community will be *very* interested G. On Wed, Feb 17, 2010 at 11:14 AM, Daniel Shane sha...@lexum.umontreal.cawrote: Yup, thats also what I was thinking. However, I do think that many real world examples cannot simply use one

Re: parsing strings into phrase queries

2010-02-17 Thread Chris Hostetter
: take a look at PositionFilter Right, there was another thread recently where almost the exact same issue was discussed... http://old.nabble.com/Re%3A-Tokenizer-question-p27120836.html ..except that i was ignorant of the existence of PositionFilter when i wrote that message. -Hoss

Re: How to reindex data without restarting server

2010-02-17 Thread Chris Hostetter
: How do I SWAP the old_core with the new_core. Is it to be done manually or : does solr provide with a command for doing so. What if I don't make a new you use the SWAP command, as described in the URL that was mentioned... : : http://wiki.apache.org/solr/CoreAdmin : : For making a schema

Re: getting unexpected statscomponent values

2010-02-17 Thread Grant Ingersoll
Can you share the full output from the StatsComponent? On Feb 15, 2010, at 3:07 PM, solr-user wrote: Has anyone encountered the following issue? I wanted to understand the statscomponent better, so I setup a simple test index with a few thousand docs. In my schema I have: - an

Re: VelocityResponseWriter: Image References

2010-02-17 Thread Erik Hatcher
Unfortunately the file request handler does not support bindary file types (yet). Lance's suggestion of hosting static content in another servlet container context is the best solution for now. Erik On Feb 15, 2010, at 8:47 AM, Chantal Ackermann wrote: Hi all, Google didn't

Re: Deleting spelll checker index

2010-02-17 Thread darniz
Please bear with me on the limitted understanding. i deleted all documents and i made a rebuild of my spell checker using the command spellcheck=truespellcheck.build=truespellcheck.dictionary=default After this i went to the schema browser and i saw that mySpellText still has around 2000

Re: parsing strings into phrase queries

2010-02-17 Thread Robert Muir
i think we can improve the docs/wiki to show this example use case, i noticed the wiki explanation for this filter gives a more complex shingles example, which is interesting, but this seems to be a common problem and maybe we should add this use case. On Wed, Feb 17, 2010 at 1:54 PM, Chris

Re: Site search upsells boosting by content type

2010-02-17 Thread Chris Hostetter
: 54 results with that particular event on top. However, if I try to : boost another term, such as +(old 97's) || granada^100 - I get over : 300 results because it adds in all of the matches for the word In Solr/Lucene, the keywords of AND and OR are really just syntactic sugar for making

Re: optimize is taking too much time

2010-02-17 Thread Chris Hostetter
: in my solr u have 1,42,45,223 records having some 50GB . : Now when iam loading a new record and when its trying optimize the docs its : taking 2 much memory and time : can any body please tell do we have any property in solr to get rid of this. Solr isn't going to optimize the index unless

Re: getting unexpected statscomponent values

2010-02-17 Thread solr-user
Grant Ingersoll-6 wrote: Can you share the full output from the StatsComponent? Sure. This is what I get. ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime62/int - lst name=params str name=faceton/str str

What is largest reasonable setting for ramBufferSizeMB?

2010-02-17 Thread Burton-West, Tom
Hello all, At some point we will need to re-build an index that totals about 2 terrabytes in size (split over 10 shards). At our current indexing speed we estimate that this will take about 3 weeks. We would like to reduce that time. It appears that our main bottleneck is disk I/O. We

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-17 Thread Mark Miller
Burton-West, Tom wrote: Hello all, At some point we will need to re-build an index that totals about 2 terrabytes in size (split over 10 shards). At our current indexing speed we estimate that this will take about 3 weeks. We would like to reduce that time. It appears that our main

Re: getting unexpected statscomponent values

2010-02-17 Thread Chris Hostetter
: Sure. This is what I get. That does look really weird, and definitely seems like a bug. Can you open an issue in Jira? ... ideally with a TestCase (even if it's not a JUnit test case, just having some sample docs that can be indexed against the example schema and a URL showing the problem

Re: Getting max/min dates from solr index

2010-02-17 Thread Chris Hostetter
: Is it possible to do date faceting on multiple solr shards? Distributed search doesn't currently support date faceting... http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations https://issues.apache.org/jira/browse/SOLR-1709 -Hoss

Re: getting unexpected statscomponent values

2010-02-17 Thread solr-user
hossman wrote: That does look really weird, and definitely seems like a bug. Can you open an issue in Jira? ... ideally with a TestCase (even if it's not a JUnit test case, just having some sample docs that can be indexed against the example schema and a URL showing the problem would

Re: Realtime search and facets with very frequent commits

2010-02-17 Thread Jan Høydahl / Cominvent
Hi, Have you tried playing with mergeFactor or even mergePolicy? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 16. feb. 2010, at 08.26, Janne Majaranta wrote: Hey Dipti, Basically query optimizations + setting cache sizes to a very high level. Other than that, the

Re: Discovering Slaves

2010-02-17 Thread Jan Høydahl / Cominvent
After ZooKeeper is integrated (1.5?) there will be a way to get info about all nodes in your cluster including their roles, status etc. Perhaps you want to coordinate your dashboard effort with this version, although still very early in development? See http://wiki.apache.org/solr/SolrCloud --

Re: Collating results from multiple indexes

2010-02-17 Thread Jan Høydahl / Cominvent
Thanks for your clarification and link, Will. Back to Aaron's question. There is some ongoing work to try to support updating single fields within documents (http://issues.apache.org/jira/browse/SOLR-139 and http://issues.apache.org/jira/browse/SOLR-828) which could perhaps be part of a future

labeling facets and highlighting question

2010-02-17 Thread adeelmahmood
simple question: I want to give a label to my facet queries instead of the name of facet field .. i found the documentation at solr site that I can do that by specifying the key local param .. syntax something like facet.field={!ex=dt%20key='By%20Owner'}owner I am just not sure what the ex=dt

Re: xml error when indexing

2010-02-17 Thread Lance Norskog
What type are you posting with? Is it expecting a multipart upload? What is the curl command and what is its mime-type for uploaded data? On Wed, Feb 17, 2010 at 10:19 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I'm having a strange problem when indexing data through our application.

Re: xml error when indexing

2010-02-17 Thread Lance Norskog
I mean: what MIME type does the POST command use? On Wed, Feb 17, 2010 at 7:09 PM, Lance Norskog goks...@gmail.com wrote: What type are you posting with? Is it expecting a multipart upload? What is the curl command and what is its mime-type for uploaded data? On Wed, Feb 17, 2010 at 10:19

Re: Deleting spelll checker index

2010-02-17 Thread Lance Norskog
This is a quirk of Lucene - when you delete a document, the indexed terms for the document are not deleted. That is, if 2 documents have the word 'frampton' in an indexed field, the term dictionary contains the entry 'frampton' and pointers to those two documents. When you delete those two

Re: parsing strings into phrase queries

2010-02-17 Thread Lance Norskog
That would be great. After reading this and the PositionFilter class I still don't know how to use it. On Wed, Feb 17, 2010 at 12:38 PM, Robert Muir rcm...@gmail.com wrote: i think we can improve the docs/wiki to show this example use case, i noticed the wiki explanation for this filter gives a

Re: labeling facets and highlighting question

2010-02-17 Thread Lance Norskog
Here's the problem: the wiki page is confusing: http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters The line: q=mainqueryfq=status:publicfq={!tag=dt}doctype:pdffacet=onfacet.field={!ex=dt}doctype is standalone, but the later line: facet.field={!ex=dt

Re: labeling facets and highlighting question

2010-02-17 Thread adeelmahmood
okay so if I dont want to do any excludes then I am assuming I should just put in {key=label}field .. i tried that and it doesnt work .. it says undefined field {key=label}field Lance Norskog-2 wrote: Here's the problem: the wiki page is confusing:

Re: optimize is taking too much time

2010-02-17 Thread mklprasad
hossman wrote: : in my solr u have 1,42,45,223 records having some 50GB . : Now when iam loading a new record and when its trying optimize the docs its : taking 2 much memory and time : can any body please tell do we have any property in solr to get rid of this. Solr isn't going

Re: Realtime search and facets with very frequent commits

2010-02-17 Thread Janne Majaranta
Hi, Yes, I did play with mergeFactor. I didn't play with mergePolicy. Wouldn't that affect indexing speed and possibly memory usage ? I don't have any problems with indexing speed ( 1000 - 2000 docs / sec via the standard HTTP API ). My problem is that I need very warm caches to get fast

Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-17 Thread Steve Radhouani
Totally agreed! 2010/2/17 Andy angelf...@yahoo.com This read more like a PR release or product brochure for jetty than anything else. Then I poked around the website and realized why: it was written by the creator of Jetty, and is hosted on the website of a company with the slogan The Java