It takes about one hour to replacate 6G index for solr in my env. But my
network can transfer file about 10-20M/s using scp. So solr's http replcation
is too slow, it's normal or I do something wrong?
we have an identical-sized index and it takes ~5minutes
It takes about one hour to replacate 6G index for solr in my env. But
my network can transfer file about 10-20M/s using scp. So solr's http
replcation is too slow, it's normal or I do something wrong?
Not really. The problem here is that to perform this raw, you'd need
to enumerate every term in the index, which is pretty slow.
One solution is to use one of the ngram tokenizers, probably the
NGramFilterFactory to process the output of your tokenizers. Here's a
related place to start...
In what? Where? What's the problem you're seeing? Why do you ask?
Please review: http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Fri, Oct 29, 2010 at 4:19 AM, Tharindu Mathew mcclou...@gmail.comwrote:
Hi,
How come $subject is present??
--
Regards,
Tharindu
O, I didn't realize that, thanks!
Erick
On Sat, Oct 30, 2010 at 10:27 PM, Lance Norskog goks...@gmail.com wrote:
Hi-
NOW does not get re-run for each document. If you give a large upload
batch, the same NOW is given to each document.
It would be handy to have an auto-incrementing date
I guess that depends on what you mean by re-index, but here are some
guesses.
All of them share the assumption that you can determine #what# you want to
index from the various sites. That is, you have some way of identifying
the content you care about.
Solr won't help you at all in identifying
Lance Norskog [goks...@gmail.com] wrote:
It would be handy to have an auto-incrementing date field, so that
each document would get a unique number and the timestamp would then
be the unique ID of the document.
If someone want to implement this, I'll just note that the granilarity of Solr
Hmm - personally, I wouldn't want to rely on timestamps as a unique-id
generation scheme. Might we not one day want to have distributed
parallel indexing that merges lazily? Keeping timestamps unique and in
sync across multiple nodes would be a tough requirement. I would be
happy simply
I have a city named 's-Hertogenbosch
I want it to be indexed exactly like that, so 's-Hertogenbosch (without
)
But now I get:
lst name=city
int name=hertogenbosch1/int
int name=s1/int
int name=shertogenbosch1/int
/lst
What filter should I add/remove from my field
On Sun, Oct 31, 2010 at 12:12 PM, PeterKerk vettepa...@hotmail.com wrote:
I have a city named 's-Hertogenbosch
I want it to be indexed exactly like that, so 's-Hertogenbosch (without
)
But now I get:
lst name=city
int name=hertogenbosch1/int
int name=s1/int
int
I already tried the normal string type, but that doesnt work either.
I now use this:
fieldType name=mytype class=solr.TextField sortMissingLast=true
omitNorms=true
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
/analyzer
/fieldType
But that doesnt do it
Ah haaa. I see now. :-)
I didn't make that connection. Hopefully I would hbave before I ever tried to
implement that :-)
Kind of like user names and icons on a windows login :-)
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is
Thanks Erick.
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so you do not have to make them
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
Even microseconds may not be enough on some really good, fast machine.
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so you do not have to make them
yourself. from
One way to view how your Tokenizers/Filters chain transforms your input
terms, is to use the analysis page of the Solr admin web application. This
is very handy when troubleshooting issues related to how terms are indexed.
On 31 October 2010 17:13, PeterKerk vettepa...@hotmail.com wrote:
I
Thanks Eric. For the record, we are using 1.4.1 and SolrJ.
On 31 October 2010 01:54, Erick Erickson erickerick...@gmail.com wrote:
What version of Solr are you using?
About committing. I'd just let the solr defaults handle that. You configure
this in the autocommit section of solrconfig.xml.
Hi,
I'm trying to implement paging when grouping is on.
Start parameter works, but the result contains all the documents that were
before him.
http://localhost:8983/solr/select?q=testgroup=truegroup.field=marketplaceIdgroup.limit=1rows=1start=0(I
get 1 document).
Ah, seems you're just one day behind. SOLR-2207, paging with field collapsing,
has just been resolved:
https://issues.apache.org/jira/browse/SOLR-2207
Hi,
I'm trying to implement paging when grouping is on.
Start parameter works, but the result contains all the documents that were
Oh, and see the just updated wiki page as well:
http://wiki.apache.org/solr/FieldCollapsing
Ah, seems you're just one day behind. SOLR-2207, paging with field
collapsing, has just been resolved:
https://issues.apache.org/jira/browse/SOLR-2207
Hi,
I'm trying to implement paging when
Dennis Gearon [gear...@sbcglobal.net] wrote:
Even microseconds may not be enough on some really good, fast machine.
True, especially since the timer might not provide microsecond granularity
although the returned value is in microseconds. However, an unique timestamp
generator should keep
Did you restart solr after the changes? Did you reindex? Because the string
type
should do what you want.
And you've shown us fieldType definitions. What field are you using with
them?
Best
Erick
On Sun, Oct 31, 2010 at 1:13 PM, PeterKerk vettepa...@hotmail.com wrote:
I already tried the
Another approach for this problem is to use another Solr core for
storing users queries for auto complete functionality ( see
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
) and index not only user_query field, but also transliterated and
Is there an issue running Solr in /home/lib as opposed to running it
somewhere outside of the virtual hosts like /lib?
Eric
Hi,
I've got some basic usage / design questions.
1. The SolrJ wiki proposes to use the same CommonsHttpSolrServer
instance for all requests to avoid connection leaks.
So if I create a Singleton instance upon application-startup I can
securely use this instance for ALL queries/updates
Can you expand on your question? Are you having a problem? Is this idle
curiosity?
Because I have no idea how to respond when there is so little information.
Best
Erick
On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin e...@makethembite.com wrote:
Is there an issue running Solr in /home/lib as
Hi,
Thank you. This is more than idle curiosity. I am trying to debug an issue I
am having with my installation and this is one step in verifying that I have
a setup that does not consume resources. I am trying to debunk my internal
myth that having Solr nad Nutch in a virtual host would be
What do you actually want to do? Give an example of a string that would be
found in the source document (to index), and a few queries that you want to
match it (and that presumably aren't matching it with the methods you've tried,
since you say it doesn't work)
Both a string type or a text
What servlet container are you putting your Solr in? Jetty? Tomcat? Something
else? Are you fronting it with apache on top of that? (I think maybe you are,
otherwise I'm not sure how the phrase 'virtual host' applies).
In general, Solr of course doesn't care what directory it's in on disk, so
Excellent information. Thank you. Solr is acting just fine then. I can
connect to it no issues, it indexes fine and there didn't seem to be any
complication with it. Now I can rule it out and go about solving, what you
pointed out, and I agree, to be a java/nutch issue.
Nutch is a crawler I use
If you are copying from an indexer while you are indexing new content,
this would cause contention for the disk head. Does indexing slow down
during this period?
Lance
2010/10/31 Peter Karich peat...@yahoo.de:
we have an identical-sized index and it takes ~5minutes
It takes about one hour
2.
The SolrJ library handling of content streams is pull, not push.
That is, you give it a reader and it pulls content when it feels like
it. If your software to feed the connection wants to write the data,
you have to either buffer the whole thing or do a dual-thread
writer/reader pair.
The
With virtual hosting you can give CPU memory quotas to your
different VMs. This allows you to control the Nutch v.s. The World
problem. Unforch, you cannot allocate disk channel. With two i/o bound
apps, this is a problem.
On Sun, Oct 31, 2010 at 4:38 PM, Eric Martin e...@makethembite.com wrote:
Oh. So I should take out the installations and move them to /some_dir as
opposed to inside my virtual host of /home/my solr nutch is here/www
'
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Sunday, October 31, 2010 7:26 PM
To: solr-user@lucene.apache.org
33 matches
Mail list logo