problem of solr replcation's speed

2010-10-31 Thread kafka0102
It takes about one hour to replacate 6G index for solr in my env. But my network can transfer file about 10-20M/s using scp. So solr's http replcation is too slow, it's normal or I do something wrong?

Re: problem of solr replcation's speed

2010-10-31 Thread Peter Karich
we have an identical-sized index and it takes ~5minutes It takes about one hour to replacate 6G index for solr in my env. But my network can transfer file about 10-20M/s using scp. So solr's http replcation is too slow, it's normal or I do something wrong?

Re: Newbie to Solr, LIKE:foo

2010-10-31 Thread Erick Erickson
Not really. The problem here is that to perform this raw, you'd need to enumerate every term in the index, which is pretty slow. One solution is to use one of the ngram tokenizers, probably the NGramFilterFactory to process the output of your tokenizers. Here's a related place to start...

Re: org.tartarus package in lucene/solr?

2010-10-31 Thread Erick Erickson
In what? Where? What's the problem you're seeing? Why do you ask? Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Oct 29, 2010 at 4:19 AM, Tharindu Mathew mcclou...@gmail.comwrote: Hi, How come $subject is present?? -- Regards, Tharindu

Re: Ensuring stable timestamp ordering

2010-10-31 Thread Erick Erickson
O, I didn't realize that, thanks! Erick On Sat, Oct 30, 2010 at 10:27 PM, Lance Norskog goks...@gmail.com wrote: Hi- NOW does not get re-run for each document. If you give a large upload batch, the same NOW is given to each document. It would be handy to have an auto-incrementing date

Re: Basic Document Question

2010-10-31 Thread Erick Erickson
I guess that depends on what you mean by re-index, but here are some guesses. All of them share the assumption that you can determine #what# you want to index from the various sites. That is, you have some way of identifying the content you care about. Solr won't help you at all in identifying

RE: Ensuring stable timestamp ordering

2010-10-31 Thread Toke Eskildsen
Lance Norskog [goks...@gmail.com] wrote: It would be handy to have an auto-incrementing date field, so that each document would get a unique number and the timestamp would then be the unique ID of the document. If someone want to implement this, I'll just note that the granilarity of Solr

Re: Ensuring stable timestamp ordering

2010-10-31 Thread Michael Sokolov
Hmm - personally, I wouldn't want to rely on timestamps as a unique-id generation scheme. Might we not one day want to have distributed parallel indexing that merges lazily? Keeping timestamps unique and in sync across multiple nodes would be a tough requirement. I would be happy simply

indexing '-

2010-10-31 Thread PeterKerk
I have a city named 's-Hertogenbosch I want it to be indexed exactly like that, so 's-Hertogenbosch (without ) But now I get: lst name=city int name=hertogenbosch1/int int name=s1/int int name=shertogenbosch1/int /lst What filter should I add/remove from my field

Re: indexing '-

2010-10-31 Thread Ken Stanley
On Sun, Oct 31, 2010 at 12:12 PM, PeterKerk vettepa...@hotmail.com wrote: I have a city named 's-Hertogenbosch I want it to be indexed exactly like that, so 's-Hertogenbosch (without ) But now I get: lst name=city int name=hertogenbosch1/int int name=s1/int int

Re: indexing '-

2010-10-31 Thread PeterKerk
I already tried the normal string type, but that doesnt work either. I now use this: fieldType name=mytype class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType But that doesnt do it

Re: Modelling Access Control

2010-10-31 Thread Dennis Gearon
Ah haaa. I see now. :-) I didn't make that connection. Hopefully I would hbave before I ever tried to implement that :-) Kind of like user names and icons on a windows login :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is

Re: Consulting in Solr tuning, stop words, dictionary, etc

2010-10-31 Thread Dennis Gearon
Thanks Erick. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

RE: Ensuring stable timestamp ordering

2010-10-31 Thread Dennis Gearon
Even microseconds may not be enough on some really good, fast machine. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from

Re: indexing '-

2010-10-31 Thread Savvas-Andreas Moysidis
One way to view how your Tokenizers/Filters chain transforms your input terms, is to use the analysis page of the Solr admin web application. This is very handy when troubleshooting issues related to how terms are indexed. On 31 October 2010 17:13, PeterKerk vettepa...@hotmail.com wrote: I

Re: Commit/Optimise question

2010-10-31 Thread Savvas-Andreas Moysidis
Thanks Eric. For the record, we are using 1.4.1 and SolrJ. On 31 October 2010 01:54, Erick Erickson erickerick...@gmail.com wrote: What version of Solr are you using? About committing. I'd just let the solr defaults handle that. You configure this in the autocommit section of solrconfig.xml.

Start parameter and result grouping

2010-10-31 Thread Pavel Minchenkov
Hi, I'm trying to implement paging when grouping is on. Start parameter works, but the result contains all the documents that were before him. http://localhost:8983/solr/select?q=testgroup=truegroup.field=marketplaceIdgroup.limit=1rows=1start=0(I get 1 document).

Re: Start parameter and result grouping

2010-10-31 Thread Markus Jelsma
Ah, seems you're just one day behind. SOLR-2207, paging with field collapsing, has just been resolved: https://issues.apache.org/jira/browse/SOLR-2207 Hi, I'm trying to implement paging when grouping is on. Start parameter works, but the result contains all the documents that were

Re: Start parameter and result grouping

2010-10-31 Thread Markus Jelsma
Oh, and see the just updated wiki page as well: http://wiki.apache.org/solr/FieldCollapsing Ah, seems you're just one day behind. SOLR-2207, paging with field collapsing, has just been resolved: https://issues.apache.org/jira/browse/SOLR-2207 Hi, I'm trying to implement paging when

RE: Ensuring stable timestamp ordering

2010-10-31 Thread Toke Eskildsen
Dennis Gearon [gear...@sbcglobal.net] wrote: Even microseconds may not be enough on some really good, fast machine. True, especially since the timer might not provide microsecond granularity although the returned value is in microseconds. However, an unique timestamp generator should keep

Re: indexing '-

2010-10-31 Thread Erick Erickson
Did you restart solr after the changes? Did you reindex? Because the string type should do what you want. And you've shown us fieldType definitions. What field are you using with them? Best Erick On Sun, Oct 31, 2010 at 1:13 PM, PeterKerk vettepa...@hotmail.com wrote: I already tried the

Re: Searching with wrong keyboard layout or using translit

2010-10-31 Thread Alexey Serba
Another approach for this problem is to use another Solr core for storing users queries for auto complete functionality ( see http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ ) and index not only user_query field, but also transliterated and

Solr in virtual host as opposed to /lib

2010-10-31 Thread Eric Martin
Is there an issue running Solr in /home/lib as opposed to running it somewhere outside of the virtual hosts like /lib? Eric

Design and Usage Questions

2010-10-31 Thread getagrip
Hi, I've got some basic usage / design questions. 1. The SolrJ wiki proposes to use the same CommonsHttpSolrServer instance for all requests to avoid connection leaks. So if I create a Singleton instance upon application-startup I can securely use this instance for ALL queries/updates

Re: Solr in virtual host as opposed to /lib

2010-10-31 Thread Erick Erickson
Can you expand on your question? Are you having a problem? Is this idle curiosity? Because I have no idea how to respond when there is so little information. Best Erick On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin e...@makethembite.com wrote: Is there an issue running Solr in /home/lib as

RE: Solr in virtual host as opposed to /lib

2010-10-31 Thread Eric Martin
Hi, Thank you. This is more than idle curiosity. I am trying to debug an issue I am having with my installation and this is one step in verifying that I have a setup that does not consume resources. I am trying to debunk my internal myth that having Solr nad Nutch in a virtual host would be

RE: indexing '-

2010-10-31 Thread Jonathan Rochkind
What do you actually want to do? Give an example of a string that would be found in the source document (to index), and a few queries that you want to match it (and that presumably aren't matching it with the methods you've tried, since you say it doesn't work) Both a string type or a text

RE: Solr in virtual host as opposed to /lib

2010-10-31 Thread Jonathan Rochkind
What servlet container are you putting your Solr in? Jetty? Tomcat? Something else? Are you fronting it with apache on top of that? (I think maybe you are, otherwise I'm not sure how the phrase 'virtual host' applies). In general, Solr of course doesn't care what directory it's in on disk, so

RE: Solr in virtual host as opposed to /lib

2010-10-31 Thread Eric Martin
Excellent information. Thank you. Solr is acting just fine then. I can connect to it no issues, it indexes fine and there didn't seem to be any complication with it. Now I can rule it out and go about solving, what you pointed out, and I agree, to be a java/nutch issue. Nutch is a crawler I use

Re: problem of solr replcation's speed

2010-10-31 Thread Lance Norskog
If you are copying from an indexer while you are indexing new content, this would cause contention for the disk head. Does indexing slow down during this period? Lance 2010/10/31 Peter Karich peat...@yahoo.de:  we have an identical-sized index and it takes ~5minutes It takes about one hour

Re: Design and Usage Questions

2010-10-31 Thread Lance Norskog
2. The SolrJ library handling of content streams is pull, not push. That is, you give it a reader and it pulls content when it feels like it. If your software to feed the connection wants to write the data, you have to either buffer the whole thing or do a dual-thread writer/reader pair. The

Re: Solr in virtual host as opposed to /lib

2010-10-31 Thread Lance Norskog
With virtual hosting you can give CPU memory quotas to your different VMs. This allows you to control the Nutch v.s. The World problem. Unforch, you cannot allocate disk channel. With two i/o bound apps, this is a problem. On Sun, Oct 31, 2010 at 4:38 PM, Eric Martin e...@makethembite.com wrote:

RE: Solr in virtual host as opposed to /lib

2010-10-31 Thread Eric Martin
Oh. So I should take out the installations and move them to /some_dir as opposed to inside my virtual host of /home/my solr nutch is here/www ' -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Sunday, October 31, 2010 7:26 PM To: solr-user@lucene.apache.org