Re: search for a number within a range, where range values are mentioned in documents

2010-12-16 Thread lee carroll
During data import can you update a record with min and max fields, these would be equal in the case of a single non range value. I know this is not a solr solution but a data pre-processing one but would work? Failing the above i've saw in the docs reference to a compound value field (in the

Re: Memory use during merges (OOM)

2010-12-16 Thread Upayavira
How long does it take to reach this OOM situation? Is it possible for you to try a merge with each setting in turn, and evaluate what impact they each have? That is, indexing speed and memory consumption? It might be interesting to watch garbage collection too while it is running with jstat, as

RE: Dataimport performance

2010-12-16 Thread Ephraim Ofir
Check out http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com%3e This approach of not using sub entities really improved our load time. Ephraim Ofir -Original Message- From: Robert Gründler

PHPSolrClient

2010-12-16 Thread Dennis Gearon
First of all, it's a very nice piece of work. I am just getting my feet wet with Solr in general. So I 'am not even sure how a document is NORMALLY deleted. The library PHPDocs say 'add', 'get' 'delete', But does anyone know about 'update'? (obviously one can read-delete-modify-create)

Re: Thank you!

2010-12-16 Thread Dennis Gearon
I feel the same way about this group and the Postgres group. VERY helpful people. All of us helping heacho other. Dennis Gearon Signature Warning - Original Message From: Adam Estrada estrada.a...@gmail.com Subject: Thank you! I just want to say that this list

Re: PHPSolrClient

2010-12-16 Thread Tanguy Moal
Hi Dennis, Not particular to the client you use (solr-php-client) for sending documents, think of update as an overwrite. This means that if you update a particular document, the previous version indexed is lost. Therefore, when updating a document, make sure that all the fields to be indexed

indexing a lot of XML dokuments

2010-12-16 Thread Jörg Agatz
hi, users, i serch e way to indexing a lot of iml Dokuments so fast as Possible. i have more than 1 million docs on Server 1 and a SolR multicor an Server 2 with tomcat. i dont know ho i can do it easy and fast.. I cant find a idea in the wiki, maby you have some ideas? King

Re: Memory use during merges (OOM)

2010-12-16 Thread Michael McCandless
RAM usage for merging is tricky. First off, merging must hold open a SegmentReader for each segment being merged. However, it's not necessarily a full segment reader; for example, merging doesn't need the terms index nor norms. But it will load deleted docs. But, if you are doing deletions (or

Re: Results from More then One Cors?

2010-12-16 Thread Jörg Agatz
ok, works Great, at the Beginning, but now i get a Big Error :-( HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:462) at

Determining core name from a result?

2010-12-16 Thread Mark Allan
Hi all, I've been bashing my head against the wall for a few hours now, trying to get mlt (more-like-this) queries working across multiple cores. I've since seen a JIRA issue and documentation saying that multicore doesn't yet support mlt queries. Oops! Anyway, to get around this, I was

Re: Google like search

2010-12-16 Thread satya swaroop
Hi All, Thanks for your suggestions.. I got the result of what i expected.. Cheers, Satya

Re: PHPSolrClient

2010-12-16 Thread Erick Erickson
As Tanguy says, simply re-adding a document with the same uniqueKey will automatically delete/readd the doc. But I wanted to add a caution about your phrase read-delete-modify-create You only get back what you #stored#. So generally the update is done from the original source rather than the

Re: Determining core name from a result?

2010-12-16 Thread Grant Ingersoll
How are you querying the core to begin with? On Dec 16, 2010, at 6:46 AM, Mark Allan wrote: Hi all, I've been bashing my head against the wall for a few hours now, trying to get mlt (more-like-this) queries working across multiple cores. I've since seen a JIRA issue and documentation

Re: Thank you!

2010-12-16 Thread kenf_nc
Hear hear! In the beginning of my journey with Solr/Lucene I couldn't have done it without this site. Smiley and Pugh's book was useful, but this forum was invaluable. I don't have as many questions now, but each new venture, Geospatial searching, replication and redundancy, performance tuning,

Re: Determining core name from a result?

2010-12-16 Thread Mark Allan
Hi Grant, Thanks for your reply. I'm using solrj to connect via http, which eventually sends this query

STUCK Threads at org.apache.lucene.document.CompressionTools.decompress

2010-12-16 Thread Alexander Ramos Jardim
Hello guys, I am getting threads stuck forever at * org.apache.lucene.document.CompressionTools.decompress*. I am using Weblogic 10.02, with solr deployed as ear and no work manager specifically configured for this instance. Only doing simple queries at this node (q=itemId:9 or

Why does Solr commit block indexing?

2010-12-16 Thread Renaud Delbru
Hi, See log at [1]. We are using the latest snapshot of lucene_branch3.1. We have configured Solr to use the ConcurrentMergeScheduler: mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler/ When a commit() runs, it blocks indexing (all imcoming update requests are blocked

Re: indexing a lot of XML dokuments

2010-12-16 Thread Adam Estrada
I have been very successful in following this example http://wiki.apache.org/solr/DataImportHandler#HttpDataSource_Example http://wiki.apache.org/solr/DataImportHandler#HttpDataSource_ExampleAdam On Thu, Dec 16, 2010 at 5:44 AM, Jörg Agatz joerg.ag...@googlemail.comwrote: hi, users, i serch e

Multicore Search broken

2010-12-16 Thread Jörg Agatz
Hallo users, I have create a Multicore instance from Solr with Tomcat6, i create two Cores mail and index2 at first, mail and index2 are the Same config, after this, i change the Mail config and Indexing 30 xml No when i search in each core:

Re: how to config DataImport Scheduling

2010-12-16 Thread do3do3
I also have the same problem, i configure dataimport.properties file as shown in http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example but no change occur, can any one help me -- View this message in context:

Re: STUCK Threads at org.apache.lucene.document.CompressionTools.decompress

2010-12-16 Thread Erick Erickson
What are you trying to do? It sounds like you're storing fields compressed, is that true (i.e. defining compressed=true in your field defs)? If so, why? It may be costing you more than you benefit. A quick test would be to stop returning anything except the score by specifying fl=score. Or at

Re: Determining core name from a result?

2010-12-16 Thread Chris Hostetter
: Subject: Determining core name from a result? FYI: some people may be confused because of terminoligy -- i think what you are asking is how to know which *shard* a document came from when doing a distributed search. This isn't currently supported, there is an open issue tracking it...

Re: Query performance issue while using EdgeNGram

2010-12-16 Thread Erick Erickson
A couple of observations: 1 your regex at query time is interesting. You're using KeywordTokenizer, so input of search me becomes searchme before it goes through the parser. Is this your intent? 2 Why are you using EdgeNGrams for auto suggest? The TermsComponent is an easier, more

Re: Determining core name from a result?

2010-12-16 Thread Mark Allan
Oops! Sorry, I thought shard and core were one in the same and the terms could be used interchangeably - I've got a multicore setup which I'm able to search across by using the shards parameter. I think you're right, that *is* the question I was asking. Thanks for letting me know it's not

Case Insensitive sorting while preserving case during faceted search

2010-12-16 Thread shan2812
Hi, I am trying to do a facet search and sort the facet values too. First I tried with 'solr.TextField' as field type. But this does not return sorted facet values. After referring to FAQ(http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F), I changed it to

Re: STUCK Threads at org.apache.lucene.document.CompressionTools.decompress

2010-12-16 Thread Alexander Ramos Jardim
2010/12/16 Erick Erickson erickerick...@gmail.com What are you trying to do? It sounds like you're storing fields compressed, is that true (i.e. defining compressed=true in your field defs)? If so, why? It may be costing you more than you benefit. No compressed fields in my schema A

Re: Multicore Search broken

2010-12-16 Thread Jörg Agatz
I have tryed some Thinks, now i have new news, when i search in :

Re: Why does Solr commit block indexing?

2010-12-16 Thread Michael McCandless
Unfortunately, (I think?) Solr currently commits by closing the IndexWriter, which must wait for any running merges to complete, and then opening a new one. This is really rather silly because IndexWriter has had its own commit method (which does not block ongoing indexing nor merging) for quite

RE: Dataimport performance

2010-12-16 Thread Dyer, James
We have ~50 long-running SQL queries that need to be joined and denormalized. Not all of the queries are to the same db, and some data comes from fixed-width data feeds. Our current search engine (that we are converting to SOLR) has a fast disk-caching mechanism that lets you cache all of

Re: PHPSolrClient

2010-12-16 Thread Dennis Gearon
So just use add and overwrite. OK, thanks Dennis Gearon Signature Warning - - Original Message From: Tanguy Moal tanguy.m...@gmail.com To: solr-user@lucene.apache.org Sent: Thu, December 16, 2010 1:33:36 AM Subject: Re: PHPSolrClient Hi Dennis, Not particular to the client you use

RE: Memory use during merges (OOM)

2010-12-16 Thread Robert Petersen
Hello we occasionally bump into the OOM issue during merging after propagation too, and from the discussion below I guess we are doing thousands of 'false deletions' by unique id to make sure certain documents are *not* in the index. Could anyone explain why that is bad? I didn't really

Re: Thank you!

2010-12-16 Thread Dennis Gearon
If I ever make it, wikipedia, stackoverflow, PHP, Symfony, Doctrine, Apache are all going to get donations. I already send $20 to wikipedia, they're huring now. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better

Re: Memory use during merges (OOM)

2010-12-16 Thread Michael McCandless
It's not that it's bad, it's just that Lucene must do extra work to check if these deletes are real or not, and that extra work requires loading the terms index which will consume additional RAM. For most apps, though, the terms index is relatively small and so this isn't really an issue. But if

Re: bulk commits

2010-12-16 Thread Adam Estrada
what is it that you are trying to commit? a On Thu, Dec 16, 2010 at 1:03 PM, Dennis Gearon gear...@sbcglobal.netwrote: What have people found as the best way to do bulk commits either from the web or from a file on the system? Dennis Gearon Signature Warning It is

Re: bulk commits

2010-12-16 Thread Adam Estrada
This is how I import a lot of data from a cvs file. There are close to 100k records in there. Note that you can either pre-define the column names using the fieldnames param like I did here *or* include header=true which will automatically pick up the column header if your file has it. curl

Query Problem

2010-12-16 Thread Ezequiel Calderara
Hi all, I have the following problems. I have this set of data (View data (Pastebin) http://pastebin.com/jKbUhjVS ) If i do a search for: *SectionName:Programas_Home* i have no results: Returned Data (PasteBin) http://pastebin.com/wnPdHqBm If i do a search for: *Programas_Home* i have only 1

RE: Memory use during merges (OOM)

2010-12-16 Thread Burton-West, Tom
Thanks Mike, But, if you are doing deletions (or updateDocument, which is just a delete + add under-the-hood), then this will force the terms index of the segment readers to be loaded, thus consuming more RAM. Out of 700,000 docs, by the time we get to doc 600,000, there is a good chance a few

RE: Memory use during merges (OOM)

2010-12-16 Thread Robert Petersen
Thanks Mike! When you say 'term index of the segment readers', are you referring to the term vectors? In our case our index of 8 million docs holds pretty 'skinny' docs containing searchable product titles and keywords, with the rest of the doc only holding Ids for faceting upon. Docs

Re: Memory use during merges (OOM)

2010-12-16 Thread Michael McCandless
On Thu, Dec 16, 2010 at 2:09 PM, Burton-West, Tom tburt...@umich.edu wrote: Thanks Mike, But, if you are doing deletions (or updateDocument, which is just a delete + add under-the-hood), then this will force the terms index of the segment readers to be loaded, thus consuming more RAM. Out of

Re: Memory use during merges (OOM)

2010-12-16 Thread Michael McCandless
Actually terms index is something different. If you don't use CFS, go and look at the size of *.tii in your index directory -- those are the terms index. The terms index picks a subset of the terms (by default 128) to hold in RAM (plus some metadata) in order to make seeking to a specific term

Re: Memory use during merges (OOM)

2010-12-16 Thread Robert Muir
On Thu, Dec 16, 2010 at 2:09 PM, Burton-West, Tom tburt...@umich.edu wrote: I always get confused about the two different divisors and their names in the solrconfig.xml file This one (for the writer) isnt configurable by Solr. want to open an issue? We are setting  termInfosIndexDivisor,

Re: Dataimport performance

2010-12-16 Thread Glen Newton
Hi, LuSqlv2 beta comes out in the next few weeks, and is designed to address this issue (among others). LuSql original (http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql now moved to: https://code.google.com/p/lusql/) is a JDBC--Lucene high performance loader. You may have

Re: bulk commits

2010-12-16 Thread Dennis Gearon
That easy, huh? Heck, this gets better and better. BTW, how about escaping? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from

Re: bulk commits

2010-12-16 Thread Yonik Seeley
On Thu, Dec 16, 2010 at 3:06 PM, Dennis Gearon gear...@sbcglobal.net wrote: That easy, huh? Heck, this gets better and better. BTW, how about escaping? The CSV escaping? It's configurable to allow for loading different CSV dialects. http://wiki.apache.org/solr/UpdateCSV By default it uses

Re: Query Problem

2010-12-16 Thread Erick Erickson
Ezequiel: Nice job of including relevant details, by the way. Unfortunately I'm puzzled too. Your SectionName is a string type, so it should be placed in the index as-is. Be a bit cautious about looking at returned results (as I see in one of your xml files) because the returned values are the

RE: Memory use during merges (OOM)

2010-12-16 Thread Burton-West, Tom
Your setting isn't being applied to the reader IW uses during merging... its only for readers Solr opens from directories explicitly. I think you should open a jira issue! Do I understand correctly that this setting in theory could be applied to the reader IW uses during merging but is not

Re: Memory use during merges (OOM)

2010-12-16 Thread Robert Muir
On Thu, Dec 16, 2010 at 4:03 PM, Burton-West, Tom tburt...@umich.edu wrote: Your setting isn't being applied to the reader IW uses during merging... its only for readers Solr opens from directories explicitly. I think you should open a jira issue! Do I understand correctly that this setting in

Re: Memory use during merges (OOM)

2010-12-16 Thread Yonik Seeley
On Thu, Dec 16, 2010 at 5:51 AM, Michael McCandless luc...@mikemccandless.com wrote: If you are doing false deletions (calling .updateDocument when in fact the Term you are replacing cannot exist) it'd be best if possible to change the app to not call .updateDocument if you know the Term

Re: Query Problem

2010-12-16 Thread Ezequiel Calderara
I'll check the Tokenizer to see if that's the problem. The results of Analysis Page for SectionName:Programas_Home Query Analyzer org.apache.solr.schema.FieldType$DefaultAnalyzer {} term position 1 term text Programas_Home term type word source start,end 0,14 payload So it's not having problems

Re: Query Problem

2010-12-16 Thread Erick Erickson
OK, what version of Solr are you using? I can take a quick check to see what behavior I get Erick On Thu, Dec 16, 2010 at 4:44 PM, Ezequiel Calderara ezech...@gmail.comwrote: I'll check the Tokenizer to see if that's the problem. The results of Analysis Page for SectionName:Programas_Home

Re: Query Problem

2010-12-16 Thread Ezequiel Calderara
The jars are named like *1.4.1* . So i suppose its the version 1.4.1 Thanks! On Thu, Dec 16, 2010 at 6:54 PM, Erick Erickson erickerick...@gmail.comwrote: OK, what version of Solr are you using? I can take a quick check to see what behavior I get Erick On Thu, Dec 16, 2010 at 4:44 PM,

Re: Jquery Autocomplete Json formatting ?

2010-12-16 Thread Anurag
Installed Firebug Now getting the following error 4139 matches.call( document.documentElement, [test!='']:sizzle ); Though my solr server is running on port8983, I am not using any server to run this jquery, its just an html file in my home folder that i am opening in my firefox browser.

Re: Jquery Autocomplete Json formatting ?

2010-12-16 Thread Anurag
Installed Firebug Now getting the following error 4139 matches.call( document.documentElement, [test!='']:sizzle ); Though my solr server is running on port8983, I am not using any server to run this jquery, its just an html file in my home folder that i am opening in my firefox browser.

Re: Faceted Search Slows Down as index gets larger

2010-12-16 Thread Furkan Kuru
I am sorry for raising up this thread after 6 months. But we have still problems with faceted search on full-text fields. We try to get most frequent words in a text field that is created in 1 hour. The faceted search takes too much time even the matching number of documents (created_at within 1

Re: how to config DataImport Scheduling

2010-12-16 Thread Ahmet Arslan
I also have the same problem, i configure dataimport.properties file as shown in http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example but no change occur, can any one help me What version of solr are you using? This seems a new feature. So it won't work on solr

Re: Jquery Autocomplete Json formatting ?

2010-12-16 Thread lee carroll
I think this could be down to the same server rule applied to ajax requests. Your not allowed to display content from two different servers :-( the good news solr supports jsonp which is a neat trick around this try this (pasted from another thread) queryString = *:* $.getJSON(

Re: Query Problem

2010-12-16 Thread Erick Erickson
OK, it works perfectly for me on a 1.4.1 instance. I've looked over your files a couple of times and see nothing obvious (but you'll never find anyone better at overlooking the obvious than me!). Tokenizing and stemming are irrelevant in this case because your type is string, which is an

Re: Faceted Search Slows Down as index gets larger

2010-12-16 Thread Yonik Seeley
Another thing you can try is trunk. This specific case has been improved by an order of magnitude recenty. The case that has been sped up is initial population of the filterCache, or when the filterCache can't hold all of the unique values, or when faceting is configured to not use the

Re: bulk commits

2010-12-16 Thread Adam Estrada
One very important thing I forgot to mention is that you will have to increase the JAVA heap size for larger data sets. Set JAVA_OPT to something acceptable. Adam On Thu, Dec 16, 2010 at 3:27 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, Dec 16, 2010 at 3:06 PM, Dennis Gearon

Re: facet.pivot for date fields

2010-12-16 Thread Adeel Qureshi
i guess one last call for help .. i am assuming for people who wrote or have used the pivot faceting .. this should be a yes no question .. are date fields supported ? On Wed, Dec 15, 2010 at 12:58 PM, Adeel Qureshi adeelmahm...@gmail.comwrote: Thanks Pankaj - that was useful to know. I havent

Re: bulk commits

2010-12-16 Thread Dennis Gearon
Thanks Adam! Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

Got error when range query and highlight

2010-12-16 Thread Qi Ouyang
Hello all, I got an error as follows when I do a range query search ([1 TO *]) on an numeric field and highlight is set on another text field. 2010/12/15 10:58:55 org.apache.solr.common.SolrException log Fatal: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024

Re: Got error when range query and highlight

2010-12-16 Thread Ahmet Arslan
I got an error as follows when I do a range query search ([1 TO *]) on an numeric field and highlight is set on another text field. Are you using hl.highlightMultiTerm=true? Pasting your search URL can give more hints. Adding hl.requireFieldMatch=true should probably solve your problem.

Re: Got error when range query and highlight

2010-12-16 Thread Qi Ouyang
Thank you for reply. Are you using hl.highlightMultiTerm=true? Pasting your search URL can give more hints. Yes, I used the hl.highlightMultiTerm=true , my search query is as follows :

Re: Got error when range query and highlight

2010-12-16 Thread Ahmet Arslan
Adding hl.requireFieldMatch=true should probably solve your problem. Yes, adding hl.requireFieldMatch=true can solve my problem, but in my solution , I have a content field indexing all fields' contents to support full text search, but I also have another 2 fields title and body which

Solr (and mabye Java?) version numbering systems

2010-12-16 Thread Dennis Gearon
I've inferred from a bunch of posts that Solr 1.4 is actually the upcoming 4.x release? And the numbering systems on other Java products don't seem to match what's really out there,i.e Eclipse and Sun Java. So what IS the Solr versioning number system? Can anyone give a (maybe possible)

A schema inside a Solr Schema (Schema in a can)

2010-12-16 Thread Dennis Gearon
Is it possible to put name value pairs of any type in a native Solr Index field type? Like JSON/XML/YML? The reason that I ask, since you asked, is I want my main index schema to be a base object, and another multivalue column to be the attributes of base object inherited descendants. Is

Re: Best practice for Delta every 2 Minutes.

2010-12-16 Thread Li Li
I think it will not because default configuration can only have 2 newSearcher threads but the delay will be more and more long. The newer newSearcher will wait these 2 ealier one to finish. 2010/12/1 Jonathan Rochkind rochk...@jhu.edu: If your index warmings take longer than two minutes, but

Testing Solr

2010-12-16 Thread satya swaroop
Hi All, I built solr successfully and i am thinking to test it with nearly 300 pdf files, 300 docs, 300 excel files,...and so on of each type with 300 files nearly Is there any dummy data available to test for solr,Otherwise i need to download each and every file individually..??

Re: Best practice for Delta every 2 Minutes.

2010-12-16 Thread Li Li
we now meet the same situation and want to implement like this: we add new documents to a RAMDirectory and search two indice-- the index in disk and the RAM index. regularly(e.g. every hour we flush the RAMDirecotry into disk and make a new segment) to prevent error. before add to RAMDirecotry,we

Re: Best practice for Delta every 2 Minutes.

2010-12-16 Thread Dennis Gearon
BTW, what is a Delta (in this context, not an equipment line or a rocket, please :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself.

Re: Testing Solr

2010-12-16 Thread Dennis Gearon
There are websites with data sets out there. 'Data sets' may not be the right search terms, but it's something like that. Exactly what you want, I couldn't guess otherwise? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a