Re: Defining custom schema

2008-09-25 Thread Otis Gospodnetic
You need to create your schema manually. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: con [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, September 25, 2008 1:38:48 AM Subject: Re: Defining custom schema Ya I

Re: Dismax , query phrases

2008-09-25 Thread Norberto Meijome
On Wed, 24 Sep 2008 08:34:57 -0700 (PDT) Otis Gospodnetic [EMAIL PROTECTED] wrote: What happens if you change ps from 100 to 1 and comment out that ord function? Otis, I think what I am after is what Hoss described in his last paragraph in his reply to your email last year :

Shingles , min size?

2008-09-25 Thread Norberto Meijome
hi guys, I may have missed it ,but is it possible to tell the solr.ShingleFilterFactory the minimum number of grams to generate per shingle? Similar to NGramTokenizerFactory's minGramSize=3 maxGramSize=3 thanks! B _ {Beto|Norberto|Numard} Meijome Ask not what's inside

Re: Searching with Wildcards

2008-09-25 Thread Brian Carmalt
Hello all, Sorry I have taken so long to get back to Eriks reply, I used the technique of inserting a ? before the * to get at prototype working. However, if 1.3 does not support this anymore, then I really need to look into alternatives. What would be the scope of the work to implement

issue with commit

2008-09-25 Thread sunnyfr
Hi, I can't find a way to sort out my issue, can somebody please help me ? My problem is all my logs files look empty and no snapshot created, but everything seems to work, except this snapshot. auto commit seems ok according to the stat page, but no log ??? and snapshot are not created except

most searched keyword in solr

2008-09-25 Thread sanraj25
hi, how will we find most searched keyword in solr? If anybody can suggest us a good solution, it would be helpful thank you with Regards, P.Parkavi -- View this message in context: http://www.nabble.com/most-searched-keyword-in-solr-tp19664387p19664387.html Sent from the Solr - User mailing

Memory error - snapshooter help

2008-09-25 Thread sunnyfr
Hi, Any idea ? Sep 25 06:50:41 solr-test jsvc.exec[23286]: Sep 25, 2008 6:50:41 AM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: Cannot run program snapshooter: java.io.IOException: error=12, Cannot allocate memory My memory for java is JAVA_OPTS=-Xms6000m -Xmx6000m

Re: Not enough space

2008-09-25 Thread sunnyfr
Hi, I've obviously the same error, I just don't know how do you add swap space ? Thanks a lot, Yonik Seeley wrote: On 7/5/07, Xuesong Luo [EMAIL PROTECTED] wrote: Thanks, Chris and Yonik. You are right. I remember the heap size was over 500m when I got the Not enough space error message.

Re: Not enough space

2008-09-25 Thread Brian Carmalt
Search with Google for swap file linux linux or distro name There is tons of info out there. Am Donnerstag, den 25.09.2008, 02:07 -0700 schrieb sunnyfr: Hi, I've obviously the same error, I just don't know how do you add swap space ? Thanks a lot, Yonik Seeley wrote: On 7/5/07,

Re: most searched keyword in solr

2008-09-25 Thread Mark Miller
sanraj25 wrote: hi, how will we find most searched keyword in solr? If anybody can suggest us a good solution, it would be helpful thank you with Regards, P.Parkavi Write some code to record every query/keyword. Could be done at different places depending on how you define 'keyword'

Re: Memory error - snapshooter help

2008-09-25 Thread Bill Au
The OS is checking that there is enough memory... add swap space: http://www.nabble.com/Not-enough-space-to11423199.html#a11432978 On Thu, Sep 25, 2008 at 4:20 AM, sunnyfr [EMAIL PROTECTED] wrote: Hi, Any idea ? Sep 25 06:50:41 solr-test jsvc.exec[23286]: Sep 25, 2008 6:50:41 AM

How to select one entity at a time?

2008-09-25 Thread con
Hi I have got two entities in my data-config.xml file, entity1 and entity2. For condition-A I need to execute only entity1 and for condition-B only the entity2 needs to get executed. How can I mention it while accessing the search index in the REST way. Is there any option that i can give along

Re: Memory error - snapshooter help

2008-09-25 Thread Mark Miller
It is a mistake to give java 6gb out of 8gb available. First, when you say 6gb, thats just the heap - the java process will use memory beyond 6gb. That doesn't leave you with hardly any RAM for any other process. And it leaves you almost *nothing* for the filesystem cache. Effectively, you are

Re: Memory error - snapshooter help

2008-09-25 Thread sunnyfr
but according to you .. How much should I increased my memory ?? 512? Bill Au wrote: The OS is checking that there is enough memory... add swap space: http://www.nabble.com/Not-enough-space-to11423199.html#a11432978 On Thu, Sep 25, 2008 at 4:20 AM, sunnyfr [EMAIL PROTECTED] wrote:

Re: Memory error - snapshooter help

2008-09-25 Thread sunnyfr
Thanks a lot Mark, I will try that. markrmiller wrote: It is a mistake to give java 6gb out of 8gb available. First, when you say 6gb, thats just the heap - the java process will use memory beyond 6gb. That doesn't leave you with hardly any RAM for any other process. And it leaves you

solr filesystem dependencies

2008-09-25 Thread Erlend Hamnaberg
Hi list. I am using the EmbeddedSolrServer to embed solr in my application, however I have run into a snag. The only filesystem dependency that I want is the index itself. The current implementation of the SolrResource seems to suggest that i need a filesystem dependency to keep my configuration

Re: Refresh of synonyms.txt without reload

2008-09-25 Thread Batzenmann
Hi again, Walter Underwood wrote: More details on index-time vs. query-time synonyms are here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter thx, for pointing that out - That's definitely s.th. worth revising. Butimho the issue of a changing synonyms.txt

Pre-processing text in custom FilterFactory / TokenizerFactory

2008-09-25 Thread Jaco
Hello, I need to work with an external stemmer in Solr. This stemmer is accessible as a COM object (running Solr in tomcat on Windows platform). I managed to integrate this using the com4j library. I tried two scenario's: 1. Create a custom FilterFactory and Filter class for this. The external

NullPointerException

2008-09-25 Thread Dinesh Gupta
Hi All, I have attached my file. I am getting exception. Please suggest me how to short-out this issue. WARNING: Error creating document : SolrInputDocumnt[{id=id(1.0)={93146}, ttl=ttl(1.0)={Majestic from Pushpams.com}, cdt=cdt(1.0)={2001-09-04 15:40:40.0}, mdt=mdt(1.0)={2008-09-23

Re: most searched keyword in solr

2008-09-25 Thread Jon Baer
Why even do any of the work :-) Im not sure any of the free analytic apps (ala Google) can but the paid ones do, just drop the query into one of those and let them analyze ... http://www.google.com/analytics/ Then just parse the reports. - Jon On Sep 25, 2008, at 8:39 AM, Mark Miller

Re: Standard analyzer and acronyms

2008-09-25 Thread Luca Molteni
The schema browswer is a section in the admin panel of Solr. I don't know if I'm looking at original value, I think there are only filtered values in there. Thank you for the reply. Bye L.M. 2008/9/22 Otis Gospodnetic [EMAIL PROTECTED] Hi, Are you sure you are not looking at the original

Re: most searched keyword in solr

2008-09-25 Thread Walter Underwood
I process our HTTP logs. I'm sure there are log analyzers that handle search terms, though I wrote a bit of Python to do it. If you extract the search queries to a file, then use a Unix pipe to get a list: sort queries.txt | uniq -c | sort -rn counted-queries.txt wunder On 9/25/08 12:29 AM,

Re: Refresh of synonyms.txt without reload

2008-09-25 Thread Walter Underwood
First, define separate analyzer/filter chains for index and query. Do not include synonyms in the query chain. Second, use a separate indexing system and use Solr index distribution to sync the indexes to one or more query systems. This will create a new Searcher and caches on the query systems,

snappuller not fired

2008-09-25 Thread sunnyfr
Hi everybody, Any idea why, it might be the path ?? Conf file : !-- A postCommit event is fired after every commit or optimize command -- listener event=postCommit class=solr.RunExecutableListener str name=exesnapshooter/str str name=dir/str bool name=waittrue/bool

Re: Refresh of synonyms.txt without reload

2008-09-25 Thread Batzenmann
Walter Underwood wrote: First, define separate analyzer/filter chains for index and query. Do not include synonyms in the query chain. Second, use a separate indexing system and use Solr index distribution to sync the indexes to one or more query systems. This will create a new Searcher

Re: Refresh of synonyms.txt without reload

2008-09-25 Thread Otis Gospodnetic
Depending on how often synonyms are added you may or may not have/want to make Solr reload your synonyms. If you use index-time synonyms you definitely don't want to reindex every time they change if they change more frequently than what it takes to reindex. I believe you can use new MultiCore

Re: java.io.IOException: cannot read directory org.apache.lucene.store.FSDirectory@/home/solr/src/apache-solr-nightly/example/solr/data/index: list() returned null

2008-09-25 Thread Erik Holstad
Ran some more tests and when I'm only using autoCommit maxDocs25000/maxDocs I get

RE: Shingles , min size?

2008-09-25 Thread Steven A Rowe
Hi Norberto, ShingleMatrixFilter is capable of this, but ShingleFilter is not. It should be though - I think if ShingleFilter continues to exist, it should learn a few things from ShingleMatrixFilter's one-dimensional functionality. Steve On 09/25/2008 at 2:23 AM, Norberto Meijome wrote: hi

Best practice advice needed!

2008-09-25 Thread sundar shankar
Hi, We have an index of courses (about 4 million docs in prod) and we have a nightly that would pick up newly added courses and update the index accordingly. There is another Enterprise system that shares the same table and that could delete data from the table too. I just want to know

Re: Best practice advice needed!

2008-09-25 Thread Fuad Efendi
I am guessing your Enterprise system deletes/updates tables in RDBMS, and your SOLR indexes that data. Additionally to that, you have front-end interacting with SOLR and with RDBMS. At front-end level, in case of a search sent to SOLR returning primary keys for data, you may check your

Re: Best practice advice needed!

2008-09-25 Thread Erick Erickson
How long does it take to build the entire index? Can you just rebuild it from scratch every night? That would be the simplest. Best Erick On Thu, Sep 25, 2008 at 12:48 PM, sundar shankar [EMAIL PROTECTED]wrote: Hi, We have an index of courses (about 4 million docs in prod) and we have a

spellcheck: buildOnOptimize?

2008-09-25 Thread Jason Rennie
I see that there's an option to automatically rebuild the spelling index on a commit. That's a nice feature that we'll consider using, but we run commits every few thousand document updates, which would yield ~100 spelling index rebuilds a day. OTOH, we run an optimize about once/day which seems

Re: Best practice advice needed!

2008-09-25 Thread Walter Underwood
This will cause the result counts to be wrong and the deleted docs will stay in the search index forever. Some approaches for incremental update: * full sweep garbage collection: fetch every ID in the Solr DB and check whether that exists in the source DB, then delete the ones that don't exist.

Re: Best practice advice needed!

2008-09-25 Thread Walter Underwood
That should be flag it in a boolean column. --wunder On 9/25/08 11:51 AM, Walter Underwood [EMAIL PROTECTED] wrote: This will cause the result counts to be wrong and the deleted docs will stay in the search index forever. Some approaches for incremental update: * full sweep garbage

RE: Best practice advice needed!

2008-09-25 Thread sundar shankar
Great Thanks. Date: Thu, 25 Sep 2008 11:54:32 -0700 Subject: Re: Best practice advice needed! From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org That should be flag it in a boolean column. --wunder On 9/25/08 11:51 AM, Walter Underwood [EMAIL PROTECTED] wrote: This will

Re: spellcheck: buildOnOptimize?

2008-09-25 Thread Grant Ingersoll
On Sep 25, 2008, at 2:17 PM, Jason Rennie wrote: I see that there's an option to automatically rebuild the spelling index on a commit. That's a nice feature that we'll consider using, but we run commits every few thousand document updates, which would yield ~100 spelling index rebuilds a

Re: spellcheck: buildOnOptimize?

2008-09-25 Thread Shalin Shekhar Mangar
On Fri, Sep 26, 2008 at 12:43 AM, Grant Ingersoll [EMAIL PROTECTED]wrote: On Sep 25, 2008, at 2:17 PM, Jason Rennie wrote: I see that there's an option to automatically rebuild the spelling index on a commit. That's a nice feature that we'll consider using, but we run commits every few

Re: How to select one entity at a time?

2008-09-25 Thread Shalin Shekhar Mangar
On Thu, Sep 25, 2008 at 6:13 PM, con [EMAIL PROTECTED] wrote: Hi I have got two entities in my data-config.xml file, entity1 and entity2. For condition-A I need to execute only entity1 and for condition-B only the entity2 needs to get executed. How can I mention it while accessing the

Re: snappuller not fired

2008-09-25 Thread Shalin Shekhar Mangar
I think you have asked the question before too. I have the same answer, try giving the full (absolute) path to snapshooter and the bin directory. Check the logs to see if there are any errors. On Thu, Sep 25, 2008 at 8:24 PM, sunnyfr [EMAIL PROTECTED] wrote: Hi everybody, Any idea why, it

RE: Best practice advice needed!

2008-09-25 Thread sundar shankar
Hi Faud, Since I dont have too much of data (4 million) I dont have a master slave setup yet. How big a change would that be? Date: Thu, 25 Sep 2008 10:08:51 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Best practice advice needed! I am guessing your

Re: NullPointerException

2008-09-25 Thread Shalin Shekhar Mangar
I'm not sure about why the NullPointerException is coming. Is that the whole stack trace? The mdt and cdt are date in schema.xml but the format that is in the log is wrong. Look at the DateFormatTransformer in DataImportHandler which can format strings in your database to the correct date format

Re: Best practice advice needed!

2008-09-25 Thread Fuad Efendi
About web spiders: I simply use last modified timestamp field in SOLR, and I expire items after 30 days. If item was updated (timestamp changed) - it won't be deleted. If I delete it from database - it will be deleted from SOLR within 30 days. Spiders don't need 'transactional' updates.

Bunch of questions regarding enterprise configuration

2008-09-25 Thread Dev Team
Hi everybody, I'm new to Solr, and have been reading through documentation off-and-on for days, but still have some unanswered basic/fundamental questions that have a huge impact on my implementation approach. I am thinking of moving my company's web app's main search engine over to Solr. My goal

How to get count of different groups of items in a single query

2008-09-25 Thread Choi, David
Hi everyone, I tried looking in the mailing list archive, but couldn't find a good answer for what I'm trying to do. Say I have an index of data about cars. I want to search for all red cars, so I do something like: q=colour:red. This returns 100 results, of which 40 are model:Toyota, 30

Re: How to get count of different groups of items in a single query

2008-09-25 Thread Bess Sadler
Hi, David. In this case it looks like you're looking for the faceting functionality. You can read more about this on the wiki, here: http://wiki.apache.org/solr/SimpleFacetParameters?highlight=%28facet%29 In your case, you're going to want something like:

Searching Question

2008-09-25 Thread Jake Conk
Hello, We are using Solr for our new forums search feature. If possible when searching for the word Halo we would like threads that contain the word Halo the most with the least amount of posts in that thread to have a higher score. For instance, if we have a thread with 10 posts and the word

why index auto change?

2008-09-25 Thread 李学健
hi, all recently, i encounter this problem several times. index in solr automically changes, i never post any data today, but a part of index files as below: -rw-r--r-- 1 root root 3202872 Sep 24 22:23 _f.prx -rw-r--r-- 1 root root 14595 Sep 24 22:23 _f.tii -rw-r--r-- 1 root root 1072202 Sep 24

Re: spellcheck: buildOnOptimize?

2008-09-25 Thread Chris Hostetter
: postCommit/postOptimize callbacks happen after commit/optimize but before a : new searcher is opened. Therefore, it is not possible to re-build spellcheck : index on those events without opening a IndexReader directly on the solr FWIW: I believe it has to work that way because postCommit

Re: a question about solr queryparser

2008-09-25 Thread Chris Hostetter
: (correctly) in the solrconfig.xml. Could you paste the relevant part of : solrconfig.xml? I don't recall a bug related to this, but you could : also try Solr 1.3 if you believe you configured things conrrectly. also check the Analysis Tool (link from the admin page) and see what it says

Re: error when indexing null value of slong fied

2008-09-25 Thread Chris Hostetter
Missing is different then null ... in truth what i suspect yo uare doing is indexing something like this... field name=pubdate/field ...that is an empty string (), and the error is because an empty string can't be converted to a number : when i indexed a doc with null value of this

Re: Searching for future or null dates

2008-09-25 Thread Chris Hostetter
: I would also like to follow your advice but don't know how to do it with : defaultOperator=AND. What I am missing is the equivalent to OR: : AND: + : NOT: - : OR: ??? : I didn't find anything on the Solr or Lucene query syntax pages. If that's true, regretably there is no prefix operator to

RE: deleting record from the index using deleteByQuery method

2008-09-25 Thread Chris Hostetter
: confused about is the field cumulative_delete. Does this have any : significance to whether the delete was a success or not? Also shouldn't cumulative_delete is just the count of all delete commands since the SolreCore was started up (as opposed to delete which is the count since the last

Re: why index auto change?

2008-09-25 Thread Otis Gospodnetic
Somebody must have run some some index modifying command. I can't think of anything else that would touch the index. Have you triple-checked your logs? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: 李学健 [EMAIL PROTECTED] To:

Re: Searching Question

2008-09-25 Thread Otis Gospodnetic
Sounds like a case for a function query where you use the field that stores the number of posts for a thread to adjust the score. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jake Conk [EMAIL PROTECTED] To: solr-user@lucene.apache.org

Re: Bunch of questions regarding enterprise configuration

2008-09-25 Thread Otis Gospodnetic
Hi, Your questions don't have simple answers, but here are some quick one. - Original Message I'm new to Solr, and have been reading through documentation off-and-on for days, but still have some unanswered basic/fundamental questions that have a huge impact on my implementation