On 2010-02-20 23:32, reinhard schwab wrote:
Andrzej Bialecki schrieb:
On 2010-02-20 22:45, reinhard schwab wrote:
the content of one page is stored even 7 times.
http://www.cinema-paradiso.at/gallery2/main.php?g2_page=8
i believe this comes from
Recno:: 383
URL::
Andrzej Bialecki schrieb:
On 2010-02-21 12:36, reinhard schwab wrote:
Andrzej Bialecki schrieb:
On 2010-02-20 23:32, reinhard schwab wrote:
Andrzej Bialecki schrieb:
On 2010-02-20 22:45, reinhard schwab wrote:
the content of one page is stored even 7 times.
the content of one page is stored even 7 times.
http://www.cinema-paradiso.at/gallery2/main.php?g2_page=8
i believe this comes from
Recno:: 383
URL:: http://www.cinema-paradiso.at/gallery2/main.php?g2_highlightId=54519
Content::
Version: -1
url:
On 2010-02-20 22:45, reinhard schwab wrote:
the content of one page is stored even 7 times.
http://www.cinema-paradiso.at/gallery2/main.php?g2_page=8
i believe this comes from
Recno:: 383
URL:: http://www.cinema-paradiso.at/gallery2/main.php?g2_highlightId=54519
Duplicate content is usually
i implement now this tool by forking SegmentMerger.
i have only added an additional filter in the map method and
keep the segment name.
i have then be surprised, that the reduce method logs 4 times the content
of a crawl datum.
why this?
i have logged then the content objects and they seem to be
i would like to have a segment filter, which filters out unneeded content.
i only want to keep the content of pages which are still indexed in solr
and which belong to this segment,
when i query solr by this segment name.
is there any existing tool available?
SegmentMerger is a no go for me. it