Re: cron job update index

2008-09-17 Thread sunnyfr
is it this one ? http://wiki.apache.org/solr/CollectionDistribution#head-9f393ae2a6230fe23e422f1583f31edbff7b1007 Otis Gospodnetic wrote: Hi Sunny, There is a very detailed page about this on the Wiki. Have you seen it? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr -

Re: cron job update index

2008-09-17 Thread Shalin Shekhar Mangar
On Wed, Sep 17, 2008 at 1:37 PM, sunnyfr [EMAIL PROTECTED] wrote: is it this one ? http://wiki.apache.org/solr/CollectionDistribution#head-9f393ae2a6230fe23e422f1583f31edbff7b1007 Yes. -- Regards, Shalin Shekhar Mangar.

scripts.conf

2008-09-17 Thread sunnyfr
Hi, Just to be sure ? scripts.conf is used if in my command runned .. snappuller or snapinstaller ... I don't write the value straight. It's like conf file with parameters inside ... Somebdoy has an exemple about scripts.conf and comand line to use parameters? Thanks, Sunny -- View this

How to set term frequency given a term and a value stating the frequency?

2008-09-17 Thread ristretto . rb
Hello, I'm looking through the wiki, so if it's there, I'll find it, and you can ignore this post. If this isn't documented, can anyone explain how to achieve this? Suppose I have two docs A and B that I want to index. I want to index these documents so that A has the equivalent of 100 copies

Re: scripts.conf

2008-09-17 Thread sunnyfr
There is as well solr/conf/rsyncd.conf what is the difference with scripts.conf ? and should that be in every instance of solr : like solr/user/conf and solr/books/conf ...? Cheers, sunnyfr wrote: Hi, Just to be sure ? scripts.conf is used if in my command runned .. snappuller or

Re: scripts.conf

2008-09-17 Thread sunnyfr
Ok, obviously rsyncd.conf is generated automaticly by rsync. Somebody has an exemple of scripts.conf ? sunnyfr wrote: There is as well solr/conf/rsyncd.conf what is the difference with scripts.conf ? and should that be in every instance of solr : like solr/user/conf and solr/books/conf

Re: scripts.conf

2008-09-17 Thread Koji Sekiguchi
Ok, obviously rsyncd.conf is generated automaticly by rsync. Somebody has an exemple of scripts.conf ? Have you read this? http://wiki.apache.org/solr/SolrCollectionDistributionScripts Koji

Can Solr be used to search public websites(Newbie).

2008-09-17 Thread convoyer
Hi all. I am quite new to solr. I am just checking whether this tool suits my application. I am developing a search application that searches all publically available websites and also some selective websites. Can I use solr for this purpose. If yes how can I get started. All the tutorials are

Re: scripts.conf

2008-09-17 Thread sunnyfr
Yes I did, it's just not clear about how it works with several instance. So far, like I explained, my tree looks like solr/user/bin (snappuller rsyncd...) solr/user/conf (scripts.conf) solr/user/logs (rsyncd-enable ...) solr/books/bin (snappuller rsyncd...) solr/books/conf (scripts.conf)

Re: Can Solr be used to search public websites(Newbie).

2008-09-17 Thread Ryan McKinley
Solr only manages the indexing/search side, it does not do any crawling like nutch. For crawling a small site, you may want to check out: http://aperture.sourceforge.net/ (mature, but RDF heavy) Or Droids: http://people.apache.org/~thorsten/droids/ Droids is new, and will change a lot soon,

Re: Can Solr be used to search public websites(Newbie).

2008-09-17 Thread George Everitt
Dear Con, Searching the entire Internet is a non-trivial computer science problem. It's kind of like asking a brain surgeon the best way to remove a tumor. The answer should be First, spend 16 years becoming a neurosurgeon. My point is, there is a whole lot you need to know beyond is

Re: scripts.conf

2008-09-17 Thread Bill Au
All the scripts dot in (.) the utility script scripts-util, which in turn dots in scripts.conf. Why are you running several instances, multiple ports, multiple webapps, or multiple cores? http://wiki.apache.org/solr/MultipleIndexes Bill On Wed, Sep 17, 2008 at 8:50 AM, sunnyfr [EMAIL

Re: help rsyncd-enable

2008-09-17 Thread Bill Au
try the command line in stead: /solr/user/bin/rsyncd-enable The scripts do not like to be bashed. Bill On Wed, Sep 17, 2008 at 9:24 AM, sunnyfr [EMAIL PROTECTED] wrote: [EMAIL PROTECTED]:/solr/user/bin# bash rsyncd-enable rsyncd-enable: line 21: cd: rsyncd-enable/..: Not a directory

Re: scripts.conf

2008-09-17 Thread sunnyfr
I created several instance for a multi core to manage users and books independently. didn't get : All the scripts dot in (.) the utility script scripts-util, which in turn dots in scripts.conf. Bill Au wrote: All the scripts dot in (.) the utility script scripts-util, which in turn dots

Re: help rsyncd-enable

2008-09-17 Thread sunnyfr
Thanks my bad, it was a problem with my user. Bill Au wrote: try the command line in stead: /solr/user/bin/rsyncd-enable The scripts do not like to be bashed. Bill On Wed, Sep 17, 2008 at 9:24 AM, sunnyfr [EMAIL PROTECTED] wrote: [EMAIL PROTECTED]:/solr/user/bin# bash

Re: How to copy a solr index to another index with a different schema collapsing stored data?

2008-09-17 Thread Erick Erickson
You *might* be able to reconstruct enough of the original documents from your indexes to create another without recrawling. I know Luke can reconstruct documents form an index, but for unstored data it's slow and may be lossy. But it may suit your needs given how long it takes to make your index

Re: scripts.conf

2008-09-17 Thread Bill Au
The . (dot) command executes a shell script in the current shell environment. Do you have a separate instance directory for each instance? http://wiki.apache.org/solr/CoreAdmin Each separate instance directory will have its own conf and data directory. So each one has its own scritps.conf:

snappuller / rsync

2008-09-17 Thread sunnyfr
Hi, Sorry again, Can you please clear up a point : snappuller should be run on a slave's server to check new snapshots and pull them. and rsyncd is runned from the master. I don't get really what is rsyncd role? thanks -- View this message in context:

Re: How to copy a solr index to another index with a different schema collapsing stored data?

2008-09-17 Thread Brian Carmalt
It wouldn't be that bad to merge the index externally and the reindex the results, if it is as simple as your example. Search for id:[1 TO *] and a fq for the category, increment the slice of the results you need to process until you have covered all of the docs in the category. Request the

Re: admin/logging page and Effective level

2008-09-17 Thread Sean Timm
Chris-- Sorry, your e-mail got lost in the noise. You're right, there does appear to be a problem. I can reproduce this by setting the root level to OFF and then setting it back to INFO. I'll take a look into it. Have you opened a JIRA issue for this? -Sean Chris Hostetter wrote: I'm

Re: Highlighter throws StringIndexOutOfBoundsException on multivalued fields

2008-09-17 Thread dojolava
I forgot: this concerns the Solr 1.3.0 release. On Wed, Sep 17, 2008 at 4:15 PM, dojolava [EMAIL PROTECTED] wrote: Hi, if I want to highlight a mutivalued field I get the following exception: String index out of range: 21 java.lang.StringIndexOutOfBoundsException: String index out of

Re: scripts.conf

2008-09-17 Thread sunnyfr
Ok it's exactly what I've done. Bill Au wrote: The . (dot) command executes a shell script in the current shell environment. Do you have a separate instance directory for each instance? http://wiki.apache.org/solr/CoreAdmin Each separate instance directory will have its own conf

Re: cron job update index

2008-09-17 Thread sunnyfr
hi, According to the fact that a Collection is a Lucene collection is a directory of files. These comprise the indexed and returnable data of a Solr search repository. I just want to be sure because this page speak about :

RE: snappuller / rsync

2008-09-17 Thread Kashyap, Raghu
Hi, Rsyncd is the rsync(http://samba.anu.edu.au/rsync/) daemon. You need to make sure that Rsynchd is running on both master the slave machines. You use snapshooter on the master server to create the snapshot run snappuller on the slave machines to pull those snapshots from master server and

RE: snappuller / rsync

2008-09-17 Thread sunnyfr
Hi Raghu, Thanks it's clear now; Kashyap, Raghu wrote: Hi, Rsyncd is the rsync(http://samba.anu.edu.au/rsync/) daemon. You need to make sure that Rsynchd is running on both master the slave machines. You use snapshooter on the master server to create the snapshot run snappuller

Re: admin/logging page and Effective level

2008-09-17 Thread Sean Timm
I didn't see a bug on this issue, so I opened SOLR-774 with a patch to fix this. -Sean Sean Timm wrote: Chris-- Sorry, your e-mail got lost in the noise. You're right, there does appear to be a problem. I can reproduce this by setting the root level to OFF and then setting it back to

Re: cron job update index

2008-09-17 Thread Shalin Shekhar Mangar
On Wed, Sep 17, 2008 at 8:12 PM, sunnyfr [EMAIL PROTECTED] wrote: According to the fact that a Collection is a Lucene collection is a directory of files. These comprise the indexed and returnable data of a Solr search repository. I just want to be sure because this page speak about :

Re: [SPAM] Multiple Process of the SAME solr instance

2008-09-17 Thread Matthew Runo
I'm not 100% sure on what you mean, but if you're asking if you can run two or more solr webapps and use them all to build up one index, then you can't. You'll end up with a corrupted index. Only one solr.war webapp can write to an index at a time. Thanks for your time! Matthew Runo

Re: What's the bottleneck?

2008-09-17 Thread Sean Timm
The HitCollector used by the Searcher is wrapped by a TimeLimitedCollector http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/TimeLimitedCollector.html which times out search requests that take longer than the maximum allowed search time limit during the

Re: cron job update index

2008-09-17 Thread sunnyfr
No actually I worked as well on replication so both answers are interesting. Ok Just saw that, I've to create a cron job that uses wget to hit the delta import, every 5mn or so. Am I doing something wrong or not? Every time I start (manually) delta-import (.../dataimport?command=delta-import)

Re: cron job update index

2008-09-17 Thread Shalin Shekhar Mangar
On Wed, Sep 17, 2008 at 9:14 PM, sunnyfr [EMAIL PROTECTED] wrote: Am I doing something wrong or not? Every time I start (manually) delta-import (.../dataimport?command=delta-import) and then I go back to check the statut : http://.../solr/books/dataimport, it's still running like it can't

Re: cron job update index

2008-09-17 Thread sunnyfr
Thanks it's clear now, It just means loads of documents has changed. Sorry but silly question about Then the main query is executed for each primary key identified by the deltaQuery. This main query is used to create the documents and index them. I don't see in the code the link between the

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread Otis Gospodnetic
Hi Mohit, I think we'll need a bit more info before we can help. What kinds of processes do you need and what are you trying to achieve? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: mohitranka [EMAIL PROTECTED] To:

Re: Some new SOLR features

2008-09-17 Thread Yonik Seeley
On Tue, Sep 16, 2008 at 10:12 AM, Jason Rutherglen [EMAIL PROTECTED] wrote: SQL database such as H2 Mainly to offer joins and be able to perform hierarchical queries. Can you define or give an example of what you mean by hierarchical queries? A downside of any type of cross-document queries

Re: How to set term frequency given a term and a value stating the frequency?

2008-09-17 Thread Otis Gospodnetic
There are Lucene field term Paylods that can be associated with each token, which I think you could use for this type of boosting, but there is not much built-in support for Payloads in Solr yet. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message

Re: cron job update index

2008-09-17 Thread Shalin Shekhar Mangar
On Wed, Sep 17, 2008 at 9:42 PM, sunnyfr [EMAIL PROTECTED] wrote: Sorry but silly question about Then the main query is executed for each primary key identified by the deltaQuery. This main query is used to create the documents and index them. I don't see in the code the link between the

Re: snappuller / rsync

2008-09-17 Thread Bill Au
You only need to run the rsync daemon on the master. Bill On Wed, Sep 17, 2008 at 10:54 AM, sunnyfr [EMAIL PROTECTED] wrote: Hi Raghu, Thanks it's clear now; Kashyap, Raghu wrote: Hi, Rsyncd is the rsync(http://samba.anu.edu.au/rsync/) daemon. You need to make sure that

Re: Some new SOLR features

2008-09-17 Thread Jason Rutherglen
If the configuration code is going to be rewritten then I would like to see the ability to dynamically update the configuration and schema without needing to reboot the server. Also I would like the configuration classes to just contain data and not have so many methods that operate on the

Re: Some new SOLR features

2008-09-17 Thread Jason Rutherglen
Can you define or give an example of what you mean by hierarchical queries? Good question, I think Erik Hatcher had more ideas on that. I was imagining joins or sub queries like SQL does. Clearly they won't be efficient, but it's easier than implementing joins (or is it) in SOLR? Joins limit

Re: Some new SOLR features

2008-09-17 Thread Yonik Seeley
On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: If the configuration code is going to be rewritten then I would like to see the ability to dynamically update the configuration and schema without needing to reboot the server. Exactly. Actually, multi-core allows you

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread mohitranka
Thanks for your replies. Actually the Solr instance will have many indexes to be updated simaltaneously, (say 100). Now i want to create 10 thread/process, so that I can process 10 indexes at a time, instead of 1. I hope i am more clear with my requirement. :-) Thanks and regards, Mohit Ranka

RE: Some new SOLR features

2008-09-17 Thread Lance Norskog
My vote is for dynamically scanning a directory of configuration files. When a new one appears, or an existing file is touched, load it. When a configuration disappears, unload it. This model works very well for servlet containers. Lance -Original Message- From: [EMAIL PROTECTED]

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread Otis Gospodnetic
Mohit, it sounds like you are looking for http://wiki.apache.org/solr/MultipleIndexes Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: mohitranka [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, September 17, 2008 3:14:38

Re: How to set term frequency given a term and a value stating the frequency?

2008-09-17 Thread Gene Campbell
I decided to store the word X number of times when indexing the doc. times = 5 value = times * dog # dog dog dog dog dog gets indexed, of course times is specific to each doc. thanks for the help and advice Otis!! cheers gene On Thu, Sep 18, 2008 at 4:27 AM, Otis Gospodnetic [EMAIL

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread mohitranka
Otis, Thanks for your reply. I think i misdirected you from my previous message. What I meant was 100 documents. which should be added to solr index. Sorry for lack of clarity in the query. Thanks and regards, Mohit Ranla Otis Gospodnetic wrote: Mohit, it sounds like you are

Re: Some new SOLR features

2008-09-17 Thread Yonik Seeley
On Wed, Sep 17, 2008 at 4:50 PM, Henrib [EMAIL PROTECTED] wrote: Yonik Seeley wrote: ...multi-core allows you to instantiate a completely new core and swap it for the old one, but it's a bit of a heavyweight approach ...a schema object would not be mutable, but that one could easily

Re: Searching with Wildcards

2008-09-17 Thread dojolava
Hi, I have another question on the wildcard problem: In the previous Solr releases there was a workaround to highlight wildcard queries using the StandardRequestHandler by adding a ? in between: e.g. hou?* would highlight house. But this is not working anymore. Is there maybe another workaround?

Re: Searching with Wildcards

2008-09-17 Thread Mark Miller
Alas no, the queryparser now uses an unhighlightable constantscore query. I'd personally like to make it work at the Lucene level, but not sure how thats going to proceed. The tradeoff is that you won't have max boolean clause issues and wildcard searches should be faster. It is a bummer

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread Otis Gospodnetic
Mohit, Have you tried following the Solr tutorial? Adding multiple documents to Solr is a normal Solr usage and you go through that if you follow the tutorial on the site. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: mohitranka [EMAIL

Hardware config for SOLR

2008-09-17 Thread Andrey Shulinskiy
Hello, We're planning to use SOLR for our project, got some questions. So I asked some Qs yesterday, got no answers whatsoever. Wondering if they didn't make sense, or if the e-mail was too long... :-) Anyway, I'll try to ask them again and hope for some answers this time. It's a very

Setting request method to post on SolrQuery causes ClassCastException

2008-09-17 Thread syoung
Hi, I need to have queries over a certain length done as a post instead of a get. However, when I set the method to post, I get a ClassCastException. Here is the code: public QueryResponse query(SolrQuery solrQuery) { QueryResponse response = null; try { if

how to find terms on a page?

2008-09-17 Thread ristretto . rb
Hello, I haven't heard of or found a way to find the number of times a term is found on a page. Lucene uses it in scoring, I believe, (solr scoring: http://tinyurl.com/4tb55r) Basically, for a given page, I would like a list of terms on the page and number of times the terms appear on the page?

problem index accented character with release version of solr 1.3

2008-09-17 Thread Joshua Reedy
I have been using a stable dev version of 1.3 for a few months. Today, I began testing the final release version, and I encountered a strange problem. The only thing that has changed in my setup is the solr code (I didn't make any config change or change the schema). a document has a text field

Re: problem index accented character with release version of solr 1.3

2008-09-17 Thread Ryan McKinley
My guess is it has to do with switching the StAX implementation to geronimo API and the woodstox implementation https://issues.apache.org/jira/browse/SOLR-770 I'm not sure what the solution is though... On Sep 17, 2008, at 10:02 PM, Joshua Reedy wrote: I have been using a stable dev

Special character matching 'x' ?

2008-09-17 Thread Sanjay Suri
Hi, Can someone shed some light on this? One of my field values has the name Räikkönen which contains a special characters. Strangely, as I see it anyway, it matches on the search query 'x' ? Can someone explain or point me to the solution/documentation? Any help appreciated, -Sanjay --

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread mohitranka
Otis, I understand that 1 solr instance can store n documents (one-by-one). My query was how to create m such instances/processes/threads so that m documents get stored at a time, instead of 1 at a time. All the instances should read at the same port. Otis Gospodnetic wrote: Mohit, Have

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread mohitranka
Will having multiple cores, instead of one, server the purpose? mohitranka wrote: Otis, I understand that 1 solr instance can store n documents (one-by-one). My query was how to create m such instances/processes/threads so that m documents get stored at a time, instead of 1 at a time.

Field level security

2008-09-17 Thread Geoff Hopson
Hi, First post/question, so please be gentle :-) I am trying to put together a security model around fields in my index. My requirement is that a user may not have permission to view certain fields in the index when he does a search. For example, he may have permission to see the name and

Solr vs Autonomy

2008-09-17 Thread Geoff Hopson
Hi, I'm under pressure to justify the use of Solr on my project, and others are suggesting that Autonomy be used instead. Apart from price, does anyone have a list of pros/cons around Autonomy compared to Solr? Thanks geoff

Re: Special character matching 'x' ?

2008-09-17 Thread Akshay
You need to configure Tomcat appropriately for recognizing international characters in the URI. Take a look at this to see if it helps, http://wiki.apache.org/solr/SolrTomcat#head-20147ee4d9dd5ca83ed264898280ab60457847c4 On Thu, Sep 18, 2008 at 10:53 AM, Sanjay Suri [EMAIL PROTECTED] wrote: Hi,

Re: Multiple Process of the SAME solr instance

2008-09-17 Thread Shalin Shekhar Mangar
On Thu, Sep 18, 2008 at 11:03 AM, mohitranka [EMAIL PROTECTED] wrote: Otis, I understand that 1 solr instance can store n documents (one-by-one). My query was how to create m such instances/processes/threads so that m documents get stored at a time, instead of 1 at a time. All the instances