On Tuesday 17 August 2010 13:47:32 Jeroen van Vianen wrote:
>
> Yes. I have lots of similar results because of these URLs occurring many
> times for the same original URL.
You can use deduplication [1]. It generates signatures for (near) exact
content depending on configuration. It can then opt
On 17-8-2010 13:35, Markus Jelsma wrote:
I assume it's about your Solr index again (for which you should mail to the
Solr mailinglist). It features deleteById and deleteByQuery methods but in
your case it's going to be rather hard. Your URL field is, using the stock
schema, analyzed and has a tok
On 17-8-2010 13:35, Alex McLintock wrote:
I happen to have accumulated a lot of URLs in my index with the following
layout:
http://www.company.com/directory1;if(T.getElementsByClassName(
http://www.company.com/directory2;this.bottomContainer.appendChild(u);break;case
Hmmm,
This may be thinkin
On 17 August 2010 12:04, Jeroen van Vianen wrote:
> Hi,
>
> I happen to have accumulated a lot of URLs in my index with the following
> layout:
>
> http://www.company.com/directory1;if(T.getElementsByClassName(
> http://www.company.com/directory2;this.bottomContainer.appendChild(u);break;case
Hmm
Hi,
I assume it's about your Solr index again (for which you should mail to the
Solr mailinglist). It features deleteById and deleteByQuery methods but in
your case it's going to be rather hard. Your URL field is, using the stock
schema, analyzed and has a tokenizer that strips characters such
Hi,
I happen to have accumulated a lot of URLs in my index with the
following layout:
http://www.company.com/directory1;if(T.getElementsByClassName(
http://www.company.com/directory2;this.bottomContainer.appendChild(u);break;case
There seem to be errors in the discovery of links from one page
6 matches
Mail list logo